Question by wU8wwh8

VIEWS: 11 PAGES: 50

									                                                                                     7.44
7.4 Discriminant Rules (More Than Two Populations)

Almost all of the discrimination rules given in Section 7.1 can
be extended to having more than two populations (there is
no longer a single linear discriminant function rule).

Example: Wheat kernels (wheat.sas, wheat.R, wheat.dat)
   A researcher at Kansas State University wanted to
   classify wheat kernels into “healthy”, “sprout”, or “scab”
   types. The researcher developed an automated method
   to take the following measurements: kernel density,
   hardness index, size index, weight index, and moisture
   content. The purpose of the discriminant analysis was to
   see if the wheat kernels could accurately be classified
   into the three types using the five measurements.

       Partial listing of the data set (s=single, k=kernel):
 lot   class kernel type1 skden1 skden2 skden3 skhard              sksize   skwt    skmst
5911    hrw    1    Healthy 1.35  1.35   1.36   60.33               2.30    24.65   12.02
5911    hrw    2    Healthy 1.29  1.29   1.29   56.09               2.73    33.30   12.17
5911    hrw    3    Healthy 1.24  1.23   1.23   43.99               2.51    31.76   11.88
5911    hrw    4    Healthy 1.34  1.34   1.34   53.82               2.27    32.71   12.11
5911    hrw    5    Healthy 1.26  1.26   1.26   44.39               2.35    26.07   12.06
  
srw4    srw   276   Scab   1.03      1.03        1.03      -9.57    2.06    23.82 12.65


    Lot is the batch of the wheat
    Class is hrw=hard red winter wheat and srw=soft red
     winter wheat; a new variable called hrw is created to
     be a binary numerical variable denoting the different
     classes
                             2005 Christopher R. Bilder
                                                                                                                7.45
    Kernel is an identification number
    type1 is the kernel type – Healthy, Sprout, Scab
    skden1, skden2, and skden3 are the density of the
     kernel (the measurement was repeated three times); a
     new variable called skden is created to be the average
     of the three measurements
    skhard is the kernel’s hardness
    sksize is the kernel’s size
    skwt is the kernel’s weight
    skmst is the kernel’s moisture content


Initial R summary measures (see program for code)
                            Bubble plot for first three PCs
                                     Wheat data
                  4




                                                      269

                                         270
                                                       267    265
                                                     274      210
                                               275      180      209
                                                                   212
                                                      172       207 240
                                                                 178
                                               29                    216235
                  2




                                                              174        191
                                                                       205
                                                             175      158
                                                                        232
                                                                          276 204
                                                       208 254 215 163
                                                    266 176 153 170
                                                            272 177211
                                                               251 169          185
                                                         206      173 198
                             27
                                       98                  171
                                                           268              146 193
                                                                             213
                                                                           263
                                                                             183
                                 140 34 69                      252           234
                                                                             195 238
                                                                                260
                                                                       247 155
                                                                       219
                                                                         236      203
                                         32 101 74 30
                                          64 6679                271 273 242 192
                                                                   228        264229
                           141                                                   256 224
            PC2




                                                                                179 199
                       142                        65
                                                          67                      150 189
                                                                           168164 261 184 188
                                                                               262 226
                                                                              257        200
                                             971719       102             225 149194 196
                                                                            253      148
                                    144    71                                       246 239190
                                                                                          165 182
                                                      108              187 159
                                                                       249                  222
                                                                                       156233230
                  0




                                                                         244               258
                                                                                     167 227
                                                                                       237 151
                                                                                          152
                                                                                    181 202218
                                                     15 99
                                                 92 11 135
                                            37 13391 139
                                                                                             201
                                                                           214 248 145259 221 197
                                                                                        162
                                                                                       147
                                                                                      154
                                        12572 26          24
                                                         107            220 231   241
                                                                                245
                                143     44 62           8178                       161 223
                                                             93 13
                                                       35 1290
                                                         84 21
                                                       7313482 10
                                              49 136 70 3363 5106 22
                                                                                           166
                                                                                          250     160
                                                                                                 255
                                                           127 54
                                                           113
                                                          50 77 100
                                                            57
                                                            68
                                                            103                                  186
                              128              123           1 18         96            243
                                                   129 104 36 89 75 85
                                                                  86 8    95                          217
                                   31         111 137 61           4394 80
                                                130 25                628
                                          112           13146 83 16 14
                                                                   60 76 9
                                                                       105
                                                                 53 2
                                                               138
                                               122118 40 39
                                                         45             58 87
                                                                  38 2055 23
                                                                  52 56
                                                 121
                                                   119
                  -2




                                                             47
                                              116 109       51  43       59                         157
                                                                             7
                                                     110
                                                   132 42           41 88
                                                       124       48
                                                      120 114     117

                                                              115     126
                  -4




                          -4                   -2                    0                 2                    4

                                                               PC1

                               2005 Christopher R. Bilder
                                   Wheat star plot                7.46
                                         10
 1
21                           10                              20

51
71
91
101
121
141

161
181
201
221
241
261
               hrw
      skhard         skden

      sksize         skmst



                               2005 Christopher R. Bilder
                                                                                                                            7.47
                                                    Parallel coordinate plot for wheat data




                               Healthy
                               Sprout
                               Scab




wheat2.kernel   wheat2.skden             wheat2.skhard            wheat2.sksize           wheat2.skwt   wheat2.skmst   wheat2.hrw



                                                          2005 Christopher R. Bilder
                                                                                                                           7.48
                                       Parallel coordinate plot for wheat data - highlight kernel 31




                               Healthy
                               Sprout
                               Scab
                               Kernel 31




wheat2.kernel   wheat2.skden           wheat2.skhard            wheat2.sksize           wheat2.skwt    wheat2.skmst   wheat2.hrw




                                                        2005 Christopher R. Bilder
                                                                                                                              7.49
                                         Parallel coordinate plot for wheat data - sort by type1




                                                 Healthy
                                                 Sprout
                                                 Scab




wheat.colors2   wheat2.kernel   wheat2.skden         wheat2.skhard        wheat2.sksize       wheat2.skwt   wheat2.skmst   wheat2.hrw


                                                        2005 Christopher R. Bilder
                                                                                         7.50
Initial SAS summary measures (see program for code)
                             The PRINCOMP Procedure

                            Observations                    276
                            Variables                         6

                              Simple Statistics
                           skden            skhard                        sksize
     Mean            1.191571450       25.91269460                   2.200355072
     StD             0.140570141       27.91452693                   0.495121266

                              Simple Statistics
                            skwt            skmst                           hrw
     Mean            27.43524638       11.18883322                  0.5217391304
     StD              7.97612238        2.03025511                  0.5004345937

                              Correlation Matrix
                skden     skhard    sksize      skwt                 skmst         hrw
 skden         1.0000     0.0989    0.2073    0.2764                0.0089      -.0151
 skhard        0.0989     1.0000    -.0980    -.3395                -.1883      0.3909
 sksize        0.2073     -.0980    1.0000    0.7566                0.0046      0.0248
 skwt          0.2764     -.3395    0.7566    1.0000                0.1442      -.1410
 skmst         0.0089     -.1883    0.0046    0.1442                1.0000      -.6816
 hrw           -.0151     0.3909    0.0248    -.1410                -.6816      1.0000

                     Eigenvalues of the Correlation Matrix
                 Eigenvalue    Difference    Proportion    Cumulative
          1      2.14089924    0.45206317        0.3568        0.3568
          2      1.68883607    0.68799736        0.2815        0.6383
          3      1.00083871    0.30098689        0.1668        0.8051
          4      0.69985182    0.41680304        0.1166        0.9217
          5      0.28304878    0.09652340        0.0472        0.9689
          6      0.18652538                      0.0311        1.0000

                                    Eigenvectors
                 Prin1      Prin2       Prin3    Prin4                Prin5      Prin6
 skden        0.182219   0.315600    0.734515 -.557323             0.042130   0.123681
 skhard       -.394409   0.195882    0.547652 0.630087             -.198030   -.264431
 sksize       0.419028   0.501916    -.115373 0.384230             -.232840   0.597778
 skwt         0.542541   0.382239    -.126822 0.079601             0.156216   -.716044
 skmst        0.394311   -.461683    0.319975 0.372033             0.601871   0.168304
 hrw          -.431070   0.500849    -.169666 0.001902             0.719821   0.128047


   3-4 principal components appear to be the true
   dimension of the data. Note the possible interpretations
   of the principal components.
                                     2005 Christopher R. Bilder
                                                           7.51


Kernel #31 is the outlier on the plot




Notice there is some separations between the different
kernel classes. The division is actually for the hard and soft
read winter wheat (hrw variable). It may be of interest to try
a separate analysis for the variables??? I will not do that
here and just use hrw as a variable for discriminating
between the wheat kernels types.




                         2005 Christopher R. Bilder
                                                                                7.52
What should be done about kernel #31? Talk to the
researcher to make sure this kernel’s data values are
correct.

Discriminant Analysis:
    title2 'Discriminant analysis on the wheat data set -
    priors proportional';
    proc discrim data=set1 method=normal crossvalidate
                 out=list_set outcross=cross_set;
      class type1;
      var skden skhard sksize skwt skmst hrw ;
      priors proportional;
    run;

    title2 'Missclassifications from crossvalidation';
    proc print data=cross_set;
      where type1 ne _into_;
      var type1 _into_ healthy sprout scab;
    run;

                         Chris Bilder, STAT 873
      Discriminant analysis on the wheat data set - priors proportional

                              The DISCRIM Procedure
          Observations        276          DF Total                       275
          Variables             6          DF Within Classes              273
          Classes               3          DF Between Classes               2

                             Class Level Information
                Variable                                                       Prior
    type1       Name       Frequency           Weight       Proportion   Probability
    Healthy     Healthy           96          96.0000         0.347826      0.347826
    Scab        Scab              84          84.0000         0.304348      0.304348
    Sprout      Sprout            96          96.0000         0.347826      0.347826

                       Pooled Covariance Matrix Information
                                       Natural Log of the
                         Covariance    Determinant of the
                        Matrix Rank     Covariance Matrix
                                  6               2.83517

              Pairwise Generalized Squared Distances Between Groups
                  2         _   _       -1 _    _

                              2005 Christopher R. Bilder
                                                                            7.53
        D (i|j) = (X - X )' COV           (X - X ) - 2 ln PRIOR
                    i   j                   i   j              j

              Generalized Squared Distance to type1
       From
       type1          Healthy          Scab        Sprout
       Healthy        2.11211       7.28413       2.75637
       Scab           7.01707       2.37917       5.44346
       Sprout         2.75637       5.71052       2.11211

                  Linear Discriminant Function

               _      -1 _                                              -1 _
Constant = -.5 X' COV    X + ln PRIOR               Coefficient = COV      X
                 j        j           j             Vector                   j

              Linear Discriminant Function for type1
       Variable        Healthy          Scab        Sprout
       Constant     -108.89245     -88.62721    -101.75140
       skden          92.60313      79.65724      86.94798
       skhard         -0.06784      -0.07288      -0.08135
       sksize         10.47804      10.41204      10.98232
       skwt            0.04965      -0.19559       0.01680
       skmst           5.68324       5.79647       5.67197
       hrw            18.26673      18.63187      18.46743


     Classification Summary for Calibration Data: WORK.SET1
   Resubstitution Summary using Linear Discriminant Function

              Generalized Squared Distance Function
            2          _       -1   _
           D (X) = (X-X )' COV (X-X ) - 2 ln PRIOR
            j            j           j              j
       Posterior Probability of Membership in Each type1
                             2                    2
         Pr(j|X) = exp(-.5 D (X)) / SUM exp(-.5 D (X))
                             j         k          k

   Number of Observations and Percent Classified into type1
  From
  type1        Healthy         Scab       Sprout        Total
  Healthy           74            7           15           96
                 77.08         7.29        15.63       100.00
  Scab              10           64           10            84
                 11.90        76.19        11.90       100.00
  Sprout            23           19           54           96
                 23.96        19.79        56.25       100.00
  Total            107           90           79          276
                 38.77        32.61        28.62       100.00
                      2005 Christopher R. Bilder
                                                                          7.54
Priors         0.34783          0.30435             0.34783

                Error Count Estimates for type1
                   Healthy        Scab      Sprout                Total
Rate                0.2292      0.2381      0.4375               0.3043
Priors              0.3478      0.3043      0.3478


   Classification Summary for Calibration Data: WORK.SET1
Cross-validation Summary using Linear Discriminant Function

             Generalized Squared Distance Function

        2          _             -1    _
       D (X) = (X-X       )' COV    (X-X     ) - 2 ln PRIOR
        j            (X)j       (X)     (X)j                j
       Posterior Probability of Membership in Each type1
                               2                     2
          Pr(j|X) = exp(-.5 D (X)) / SUM exp(-.5 D (X))
                               j        k            k

 Number of Observations and Percent Classified into type1
From
type1        Healthy         Scab       Sprout        Total
Healthy           67            7           22           96
               69.79         7.29        22.92       100.00
Scab              11           62           11           84
               13.10        73.81        13.10       100.00
Sprout            25           19           52           96
               26.04        19.79        54.17       100.00
Total            103           88           85          276
               37.32        31.88        30.80       100.00
Priors       0.34783      0.30435      0.34783

                Error Count Estimates for type1
                   Healthy        Scab      Sprout                Total
Rate                0.3021      0.2619      0.4583               0.3442
Priors              0.3478      0.3043      0.3478



                     Chris Bilder, STAT 873
            Missclassifications from crossvalidation

 Obs      type1       _INTO_          Healthy          Sprout     Scab
   8      Healthy     Sprout          0.45678         0.48732   0.05591
  11      Healthy     Scab            0.29606         0.25829   0.44564
  14      Sprout      Healthy         0.65448         0.33797   0.00755
  15      Sprout      Scab            0.11832         0.25361   0.62806
  16      Sprout      Healthy         0.54047         0.42140   0.03813
                      2005 Christopher R. Bilder
                                                                    7.55

    271   Scab   Healthy         0.37123        0.28543   0.34334
    273   Scab   Healthy         0.47232        0.34115   0.18653




                  2005 Christopher R. Bilder
                                                                                    7.56
            title2 'Discriminant analysis on the wheat data set';
            proc discrim data=set1 method=normal crossvalidate
                         outcross=cross_set;
              class type1;
              var skden skhard sksize skwt skmst hrw ;
              priors equal;
            run;


            title2 'Missclassifications from crossvalidation';
            proc print data=cross_set;
              where type1 ne _into_;
              var type1 _into_ healthy sprout scab;
            run;

                             The DISCRIM Procedure
          Observations      276           DF Total                      275
          Variables           6          DF Within Classes              273
          Classes             3          DF Between Classes               2

                           Class Level Information
            Variable                                                       Prior
type1       Name         Frequency         Weight        Proportion   Probability
Healthy     Healthy             96        96.0000          0.347826      0.333333
Scab        Scab                84        84.0000          0.304348      0.333333
Sprout      Sprout              96        96.0000          0.347826      0.333333

                   Pooled Covariance Matrix Information
                                   Natural Log of the
                     Covariance    Determinant of the
                    Matrix Rank     Covariance Matrix
                              6               2.83517


          Pairwise Generalized Squared Distances Between Groups
                     2         _   _        -1 _   _
                    D (i|j) = (X - X )' COV   (X - X )
                                 i   j          i   j

                   Generalized Squared Distance to type1
            From
            type1          Healthy          Scab        Sprout
            Healthy              0       4.90497       0.64426
            Scab           4.90497             0       3.33136
            Sprout         0.64426       3.33136             0

                         Linear Discriminant Function

                                 2005 Christopher R. Bilder
                                                                      7.57

               _      -1 _                                   -1 _
Constant = -.5 X' COV    X          Coefficient Vector = COV    X
                 j         j                                      j

            Linear Discriminant Function for type1
     Variable        Healthy          Scab        Sprout
     Constant     -107.83640     -87.43762    -100.69535
     skden          92.60313      79.65724      86.94798
     skhard         -0.06784      -0.07288      -0.08135
     sksize         10.47804      10.41204      10.98232
     skwt            0.04965      -0.19559       0.01680
     skmst           5.68324       5.79647       5.67197
     hrw            18.26673      18.63187      18.46743


   Classification Summary for Calibration Data: WORK.SET1
 Resubstitution Summary using Linear Discriminant Function

           Generalized Squared Distance Function
                 2         _       -1   _
                D (X) = (X-X )' COV (X-X )
                 j           j           j
     Posterior Probability of Membership in Each type1
                          2                    2
       Pr(j|X) = exp(-.5 D (X)) / SUM exp(-.5 D (X))
                          j        k           k

 Number of Observations and Percent Classified into type1
From
type1        Healthy         Scab       Sprout        Total
Healthy           74            7           15            96
               77.08         7.29        15.63       100.00
Scab              10           65            9            84
               11.90        77.38        10.71       100.00
Sprout            23           20           53            96
               23.96        20.83        55.21       100.00
Total            107           92           77          276
               38.77        33.33        27.90       100.00
Priors       0.33333      0.33333      0.33333

               Error Count Estimates for type1
                  Healthy        Scab      Sprout            Total
Rate               0.2292      0.2262      0.4479           0.3011
Priors             0.3333      0.3333      0.3333




                          2005 Christopher R. Bilder
                                                                                 7.58

       Classification Summary for Calibration Data: WORK.SET1
    Cross-validation Summary using Linear Discriminant Function

                   Generalized Squared Distance Function
                     2         _             -1    _
                    D (X) = (X-X      )' COV    (X-X     )
                     j           (X)j       (X)     (X)j
             Posterior Probability of Membership in Each type1
                                   2                       2
               Pr(j|X) = exp(-.5 D (X)) / SUM exp(-.5 D (X))
                                   j          k           k

     Number of Observations and Percent Classified into type1
    From
    type1        Healthy         Scab       Sprout        Total
    Healthy           66            9           21            96
                   68.75         9.38        21.88       100.00
    Scab              10           65            9            84
                   11.90        77.38        10.71       100.00
    Sprout            25           20           51            96
                   26.04        20.83        53.13       100.00
    Total            101           94           81          276
                   36.59        34.06        29.35       100.00
    Priors       0.33333      0.33333      0.33333

                       Error Count Estimates for type1
                          Healthy        Scab      Sprout                Total
    Rate                   0.3125      0.2262      0.4688               0.3358
    Priors                 0.3333      0.3333      0.3333

       Obs      type1      _INTO_         Healthy           Sprout     Scab
         8      Healthy    Sprout         0.45316          0.48345   0.06339
        11      Healthy    Scab           0.27834          0.24283   0.47882
        14      Sprout     Healthy        0.65378          0.33760   0.00862
        15      Sprout     Scab           0.10858          0.23273   0.65869
        16      Sprout     Healthy        0.53754          0.41912   0.04334
        17      Sprout     Scab           0.15010          0.12690   0.72299
        18      Sprout     Healthy        0.56431          0.35650   0.07919
        19      Sprout     Scab           0.18301          0.16055   0.65644
   
       268      Scab       Healthy      0.52590          0.13806     0.33604
       273      Scab       Healthy      0.46007          0.33229     0.20764


Notes:
    The classification error rates are a little better using the
     PRIORS=PROPORTIONAL option. Note that the
                                 2005 Christopher R. Bilder
                                                         7.59
    proportion in each wheat class is approximately the
    same.
   Using the linear discriminant rules is a little more
    complicated when the number of populations is more
    than 2. For this example, 2 different linear discriminant
    functions are needed. See wheat.sas for the PROC
    IML code used to classify the wheat kernels.
   Also contained in wheat.sas is the PROC IML code
    needed to show how the Mahalanobis distance and the
    posterior probability are found. Examine this on your
    own.
   No cost of classifications are used here
   The covariance matrices for healthy, sprout, and scab
    were found to be unequal (p-value<0.0001) using the
    POOL=TEST option in PROC DISCRIM. When the
    quadratic discriminant rule is used, the classification
    error rates are a little less. The actual code and output
    used to find these rates are excluded from the notes.
   Examine the 3D plot of the principal components for
    justification of why some classification error rates are
    larger than others.




                       2005 Christopher R. Bilder
                                                                       7.60
        Summary of classification errors
                               Classification Error Rates
               Actual          Healthy Scab        Sprout   Overall Error
     1=2=3 Resubstitution 22.92% 23.81% 43.75%             30.43%
    priors=prop. Crossvalidation 30.21% 26.19% 45.83%        34.42%

     Different i Resubstitution 26.04% 20.24% 33.33%         26.54%
    priors=equal Crossvalidation 31.25% 22.62% 41.67%        31.85%

      1=2= Resubstitution 22.92% 22.62% 44.79%            30.11%
    priors=equal Crossvalidation 31.25% 22.62% 46.88%        33.58%

    Different i Resubstitution 25.00% 21.43% 31.25%          26.09%
    prior=prop. Crossvalidation 31.25% 23.81% 40.63%         32.25%



7.5 Variable Selection Procedures

       In order to find the most parsimonious model that best
       estimates the dependent variable in regression analysis,
       variable selection procedures are used to narrow down
       the number of independent variables. Similar variable
       selection procedures caan be used for discriminant
       analysis. This helps to eliminate variables that do not
       help to discriminant between the different populations.

ANCOVA REVIEW (STAT 801)

       One-way ANOVA model: Yij =  + i + ij
                              2005 Christopher R. Bilder
                                                           7.61


  where ij~ind. N(0,2)
         i is the effect of treatment i
          is the grand mean
   Yij is the response of the jth object to treatment i

Example: Wheat kernels

  Let Yij be the hardness of the jth kernel from the ith
  classification.

  Y11 = hardness of kernel 1 from healthy class
  1 = healthy effect, 2 = sprout effect, 3 = scab effect

  Note that if 1 = 2 = 3, there are no mean differences
  among the kernel types. In this case, would hardness
  be a good discriminator between the kernel types?

One-way ANCOVA model: Yij =  + i + ixij + ij

  i = slope coefficient
  xij = covariate

Example: Wheat kernels

  xij = variable that has an effect on hardness

  Note that if 1 = 2 = 3, there are no mean differences
  among the kernel types when xij is accounted for. In
                      2005 Christopher R. Bilder
                                                            7.62
     this case, would hardness be a good discriminator
     between the kernel types?


Forward selection
  1. Find the variable that is the best discriminator among all
     the variables. This variable produces the largest F
     statistic value in a one-way ANOVA model.

  Example: Wheat kernels (sorry about the notation)

      skdenij =  + i + ij
      skhardij =  + i + ij
      sksizeij =  + i + ij
      skwtij =  + i + ij
      skmstij =  + i + ij
      hrwij =  + i + ij

      For each model, perform an F test for Ho:1=2=3 vs.
      Ha: not all equal. In other words, test for equality of
      the mean density for the 3 kernel types; test for
      equality of the mean hardness for the 3 kernel types;…
      The largest F statistic value somewhat equates to the
      “biggest” difference and thus the variable that will
      discriminate between the 3 kernel types the best.

  2. Find the next best discriminator among all the remaining
     variables. This variable produces the largest F statistic

                          2005 Christopher R. Bilder
                                                          7.63
  value in a one-way ANCOVA model with the first
  variable as a covariate.

  Example: Wheat kernels (sorry about the notation)

      Suppose hardness had the largest F statistic from
      the previous step.

      skdenij =  + i + iskhardij + ij
      sksizeij =  + i + iskhardij + ij
      skwtij =  + i + iskhardij + ij
      skmstij =  + i + iskhardij + ij
      hrwij =  + i + iskhardij + ij

      For each model, perform an F test for Ho:1=2=3
      vs. Ha: not all equal. In other words, test for equality
      of the mean density for the 3 kernel types adjusting
      for the hardness effect; test for equality of the mean
      size for the 3 kernel types adjusting for the hardness
      effect;… The largest F statistic value somewhat
      equates to the “biggest” difference and thus the
      variable that will discriminate between the 3 kernel
      types the best after adjusting for the hardness effect.

3. Find the next best discriminator among all the remaining
   variables. This variable produces the largest F statistic
   value in a one-way ANCOVA model with the first and
   second variable as a covariate.
4. Continue the process.
                        2005 Christopher R. Bilder
                                                            7.64


   The forward selection procedure stops when the F-tests
   no longer produce any significant outcomes. When there
   are no significant variables in step 1, the variables will
   not be helpful in discriminating between the populations.

Backward elimination
   1. Find the variable that is the worst discriminator among
      all the variables. This variable produces the smallest
      F statistic value in a one-way ANCOVA model using
      the other p-1 variables as covariates. Remove this
      variable from the variable pool.

    Example: Wheat kernels (sorry about the notation)

        Example models:

        skdenij =  + i + i1skhardij + i2sksizeij +
                  i3skwtij + i4skmstij + i5hrwij + ij
        sksizeij =  + i + i1skdenij + i2skhardij +
                  i3skwtij + i4skmstij + i5hrwij + ij
        

    2. Find the next worst discriminator among all the
       remaining variables. This variable produces the
       smallest F statistic value in a one-way ANCOVA model
       using the other p-2 variables as covariates. Remove
       this variable from the variable pool.
    3. Continue the process.
                          2005 Christopher R. Bilder
                                                         7.65


   The backward elimination procedure stops when all of
   the F-tests produce a significant outcome.

Stepwise selection
    After each forward selection step, backward elimination
    is used to determine if any variables can be removed
    from the variable pool.

Johnson’s recommendations
   Use backward elimination when the number of variables
    is at most 15.
   Use stepwise selection when the number of variables is
    more than 15.
   Use =0.01 for backward elimination.
   When using stepwise selection, use 0.250.50 for
    entry and =0.15 for elimination of a variable.
   Use variable selection procedures only as a guide to
    help choose the variables.

Caveats
  These procedures do not measure how well the
   variables actually discriminate!
  These procedures are examining differences between
   means, not differences between entire populations.
   Means can be different, but populations can “overlap” –
   see example below.


                        2005 Christopher R. Bilder
                                                                                                        7.66
Example: Discrimination between 4 populations using 2
variables

                               In the plot below, both x1 and x2 would probably be found
                               to be “significant” by the variable selection procedures.
                               Which variable will do the better job of discriminating
                               between the 4 populations?

                                          f(x)= contours of 4 different populations where  is
                                          a very small number greater than 0
                                     X2

                                                                                              +
Difference in means for X2, but an




                                                                                       +
overlap of the populations




                                                                                             Mean = +
                                                                      +


                                                  +




                                                                                                   X1
                                              Difference in means for X1 and separation of
                                              the populations



                                                         2005 Christopher R. Bilder
                                                                                                        7.67
                               In the plot below, both x1 and x2 would probably be found
                               to be “significant” by the variable selection procedures.
                               Which variable do you think would be found to be the
                               “most significant” by the variable sections procedures?

                                          f(x)= contours of 4 different populations where  is
                                          a very small number greater than 0
                                     X2
                                                                                              +
Difference in means for X2, but an




                                                +
overlap of the populations




                                                                                             Mean = +


                                                +



                                                    +



                                                                                                   X1
                                              Difference in means for X1 and separation of
                                              the populations


                               See Figures 7.5 and 7.6 of Johnson (1998).



                                                         2005 Christopher R. Bilder
                                                                                         7.68
Example: Wheat kernels (wheat.sas)

     Reduce the number of variables used for the
     discriminant analysis without creating a larger
     classification error.

     PROC STEPDISC can be used to perform the variable
     selection procedures. Below is part of the SAS code
     and output used to perform backward elimination. See
     Johnson (1998, p.249) for a stepwise selection example.
          title2 'Backward elimination using alpha=0.01';
          proc stepdisc data=set1 method=backward sls=0.01;
            class type1;
            var skden skhard sksize skwt skmst hrw;
          run;
                             The STEPDISC Procedure

            The Method for Selecting Variables is BACKWARD
 Observations       276           Variable(s) in the Analysis        6
 Class Levels         3           Variable(s) will be Included       0
                                  Significance Level to Stay      0.01
                        Class Level Information
                Variable
     type1      Name         Frequency       Weight    Proportion
     Healthy    Healthy             96      96.0000      0.347826
     Scab       Scab                84      84.0000      0.304348
     Sprout     Sprout              96      96.0000      0.347826

                          The STEPDISC Procedure
                        Backward Elimination: Step 0

                   All variables have been entered.

                          Multivariate Statistics
Statistic                          Value F Value              Num DF   Den DF   Pr > F
Wilks' Lambda                   0.491901    19.02                 12      536   <.0001
Pillai's Trace                  0.543834    16.74                 12      538   <.0001
Average Squared                 0.271917
Canonical Correlation

                                2005 Christopher R. Bilder
                                                                                       7.69

                      Backward Elimination: Step 1
                  Statistics for Removal, DF = 2, 268
                            Partial
              Variable     R-Square    F Value     Pr > F
              skden          0.2115      35.95    <.0001
              skhard         0.0158       2.15    0.1186
              sksize         0.0055       0.74    0.4790
              skwt           0.1123      16.96    <.0001
              skmst          0.0037       0.49     0.6105
              hrw            0.0015       0.20    0.8164
                     Variable hrw will be removed.

                  Variable(s) that have been Removed
                                 hrw

                        Multivariate Statistics
Statistic                        Value F Value              Num DF   Den DF   Pr > F
Wilks' Lambda                 0.492646    22.85                 10      538   <.0001
Pillai's Trace                0.542885    20.12                 10      540   <.0001
Average Squared               0.271442
Canonical Correlation

                         Backward Elimination: Step 2
                 Statistics for Removal, DF = 2, 269
                            Partial
              Variable     R-Square    F Value    Pr > F
              skden          0.2142      36.67    <.0001
              skhard         0.0162       2.21    0.1114
              sksize         0.0055       0.75    0.4741
              skwt           0.1118      16.94    <.0001
              skmst          0.0042       0.57    0.5640
                    Variable skmst will be removed.

                  Variable(s) that have been Removed
                              skmst hrw

                        Multivariate Statistics
Statistic                        Value F Value              Num DF   Den DF   Pr > F
Wilks' Lambda                 0.494749    28.46                  8      540   <.0001
Pillai's Trace                0.539764    25.04                  8      542   <.0001
Average Squared               0.269882
Canonical Correlation

                     Backward Elimination: Step 3
                 Statistics for Removal, DF = 2, 270
                           Partial
              Variable    R-Square    F Value     Pr > F
              skden         0.2143      36.82    <.0001
              skhard        0.0151       2.06    0.1290
                              2005 Christopher R. Bilder
                                                                                          7.70
                 sksize        0.0068       0.92    0.3983
                 skwt          0.1097      16.63    <.0001
                      Variable sksize will be removed.

                     Variable(s) that have been Removed
                              sksize skmst hrw

                          Multivariate Statistics
Statistic                          Value F Value               Num DF   Den DF   Pr > F
Wilks' Lambda                   0.498133    37.66                   6      542   <.0001
Pillai's Trace                  0.533732    33.00                   6      544   <.0001
Average Squared                 0.266866
Canonical Correlation

                         Backward Elimination: Step 4
                    Statistics for Removal, DF = 2, 271
                               Partial
                 Variable     R-Square    F Value     Pr > F
                 skden          0.2150      37.10    <.0001
                 skhard         0.0112       1.54    0.2160
                 skwt           0.2556      46.52    <.0001
                       Variable skhard will be removed.

                     Variable(s) that have been Removed
                          skhard sksize skmst hrw

                          Multivariate Statistics
Statistic                          Value F Value               Num DF   Den DF   Pr > F
Wilks' Lambda                   0.503800    55.61                   4      544   <.0001
Pillai's Trace                  0.523153    48.35                   4      546   <.0001
Average Squared                 0.261577
Canonical Correlation

                        Backward Elimination: Step 5
                    Statistics for Removal, DF = 2, 272
                              Partial
                 Variable    R-Square    F Value     Pr > F
                 skden         0.2289      40.37    <.0001
                 skwt          0.2926      56.25    <.0001
                        No variables can be removed.
                       No further steps are possible.

                       Backward Elimination Summary
        Number              Partial                                   Wilks'       Pr <
 Step       In    Removed R-Square F Value Pr > F                     Lambda     Lambda
    0        6                .          .     .                  0.49190126     <.0001
    1        5    hrw        0.0015     0.20 0.8164               0.49264634     <.0001
    2        4    skmst      0.0042     0.57 0.5640               0.49474856     <.0001
    3        3    sksize     0.0068     0.92 0.3983               0.49813344     <.0001
    4        2    skhard     0.0112     1.54 0.2160               0.50379958     <.0001
                                 2005 Christopher R. Bilder
                                                                   7.71

                                          Average
                                          Squared
                 Number                 Canonical           Pr >
          Step       In   Removed     Correlation           ASCC
             0        6                0.27191712         <.0001
             1        5   hrw          0.27144225         <.0001
             2        4   skmst        0.26988206         <.0001
             3        3   sksize       0.26686581         <.0001
             4        2   skhard       0.26157659         <.0001


Notes:
  All variables are used at first.
  The “worst” discriminator is hrw since it has the lowest F
   statistic value.
  The partial R-square measures the proportion reduction
   in the sum of squared error by adding the corresponding
   variable to the model given the other variables of interest
   are already in the model.

    See a regression book such as Neter, Kutner,
    Nachtsheim, and Wasserman (Chapter 7, 1996)

    A low partial R2 means that the variable is not helpful,
    and a large partial R2 means that the variable is helpful.

   The Wilk’s lambda and Pillai’s trace statistics are used in
    multivariate analysis of variance (MANOVA). In there
    first appearance in the output, they are testing

        Ho:Healthy=Sprout=Scab
        Ha:Not all equal

                            2005 Christopher R. Bilder
                                                                 7.72
        where
                  i,skden 
                           
                   i,skhard
                             
                  i,sksize 
            i              for i=Healthy, Sprout, and Scab
                  i,skwt 
                  i,skmst 
                            
                 
                   i,hrw  

      These statistics will be discussed in Chapter 11.

   The average squared canonical correlation will be
    discussed in Chapter 12. This can be interpreted for
    now as a measure of how good the remaining variables
    do in discriminating between the wheat classes. The
    closer to 1, the better the variables will discriminate.
   Only skden and skwt remain at the end of the backward
    elimination procedure.

Next, PROC DISCRIM should be used again to determine
how well the remaining variables discriminate between the
three wheat classes. Hopefully, the classification error rate
has not substantially increased with the removal of the four
variables.




                         2005 Christopher R. Bilder
                                                                           7.73
Previous error rates (use all 6 variables):
                                   Classification Error Rates
                Actual            Healthy     Scab      Sprout       Overall Error
  1=2=3
               Crossvalidation    30.21%          26.19%    45.83%     34.42%
 priors=prop.

 Different i
              Crossvalidation     31.25%          22.62%    41.67%     31.85%
priors=equal
      
  1=2=
              Crossvalidation     31.25%          22.62%    46.88%     33.58%
priors=equal
       
 Different i
                Crossvalidation 31.25%            23.81%    40.63%     32.25%
 prior=prop.


New error rates (use only density and weight):
                                   Classification Error Rates
                Actual            Healthy     Scab      Sprout       Overall Error
  1=2=3
               Crossvalidation    29.17%          23.81%    45.83%     33.33%
 priors=prop.

 Different i
              Crossvalidation     37.50%          25.00%    37.50%     33.33%
priors=equal
      
  1=2=
              Crossvalidation     29.17%          23.81%    45.83%     32.94%
priors=equal
       
 Different i
                Crossvalidation 37.50%            26.19%    37.50%     34.06%
 prior=prop.


It does not look like much (if anything) is lost by using only 2
of the variables instead of all 6!
                              2005 Christopher R. Bilder
                                                            7.74
Notes:
  The test for equal covariances again rejected equality.
  Since there are only two variables used for the
   discrimination, we can view the discrimination process in
   two dimensions. Below are three plots summarizing the
   discriminant analysis. The plots are done in R with the
   data with classifications first exported out of SAS:
    **********************************************************;
    * Export information out into ASCII text file to be used
    * with R;
    data smaller_set;
      set cross_set1;
      keep skden skwt type1 _into_;
    run;

    proc export data=smaller_set outfile =
               "c:\chris\unl\stat873\chapter 7\class_wheat.txt"
               DBMS=DLM Replace;
    run;



    Code is in the classify_plots.R file.
    #I first exported an ASCII text file from SAS which
    # contained the density, weight, class, and _INTO_

    wheat<-read.table(file = "C:\\chris\\UNL\\STAT873\\Chapter
                      7\\class_wheat.txt", header=TRUE)

    par(pty="s")

    #Original obs.
    plot(x = wheat$skwt, y = wheat$skden, xlab="Weight",
         ylab="Density", type="n", main = "Density vs. Weight
         \n Original observations",
         panel.first=grid(col="gray", lty="dotted"))
    points(x = wheat$skwt[wheat$type1=="Healthy"], y =
           wheat$skden[wheat$type1=="Healthy"], pch = 1, col =
           1, cex = 1)
                         2005 Christopher R. Bilder
                                                       7.75
points(x = wheat$skwt[wheat$type1=="Sprout"], y =
       wheat$skden[wheat$type1=="Sprout"], pch = 2, col =
       2, cex = 1)
points(x = wheat$skwt[wheat$type1=="Scab"], y =
       wheat$skden[wheat$type1=="Scab"], pch = 5, col = 3,
       cex = 1)
legend(locator(1), legend=c("Healthy", "Sprout", "Scab"),
       bg = "white", pch = c(1,2,5), col=c(1,2,3))


#Classified obs.
plot(x = wheat$skwt, y = wheat$skden, xlab="Weight",
     ylab="Density", type="n", main = "Density vs. Weight
     \n Classified observations",
     panel.first=grid(col="gray", lty="dotted"))
points(x = wheat$skwt[wheat$X_INTO_=="Healthy"], y =
       wheat$skden[wheat$X_INTO_=="Healthy"], pch = 1, col
        = 1, cex = 1)
points(x = wheat$skwt[wheat$X_INTO_=="Sprout"], y =
       wheat$skden[wheat$X_INTO_=="Sprout"], pch = 2, col =
       2, cex = 1)
points(x = wheat$skwt[wheat$X_INTO_=="Scab"], y =
       wheat$skden[wheat$X_INTO_=="Scab"], pch = 5, col =
       3, cex = 1)
legend(locator(1), legend=c("Healthy", "Sprout", "Scab"),
       bg = "white", pch = c(1,2,5), col=c(1,2,3))


#Overlaid
plot(x = wheat$skwt, y = wheat$skden, xlab="Weight",
     ylab="Density", type="n", main = "Density vs. Weight
      \n Classified (large points) overlaid on the original
     (small points) observations",
     panel.first=grid(col="gray", lty="dotted"))
points(x = wheat$skwt[wheat$type1=="Healthy"], y =
       wheat$skden[wheat$type1=="Healthy"], pch = 1, col =
       1, cex = 0.75)
points(x = wheat$skwt[wheat$type1=="Sprout"], y =
       wheat$skden[wheat$type1=="Sprout"], pch = 2, col =
       2, cex = 0.75)
points(x = wheat$skwt[wheat$type1=="Scab"], y =
       wheat$skden[wheat$type1=="Scab"], pch = 5, col = 3,
       cex = 0.75)
                    2005 Christopher R. Bilder
                                                       7.76

points(x = wheat$skwt[wheat$X_INTO_=="Healthy"], y =
       wheat$skden[wheat$X_INTO_=="Healthy"], pch = 1, col
       = 1, cex = 1.5)
points(x = wheat$skwt[wheat$X_INTO_=="Sprout"], y =
       wheat$skden[wheat$X_INTO_=="Sprout"], pch = 2, col =
       2, cex = 1.5)
points(x = wheat$skwt[wheat$X_INTO_=="Scab"], y =
       wheat$skden[wheat$X_INTO_=="Scab"], pch = 5, col =
       3, cex = 1.5)

legend(locator(1), legend=c("Healthy", "Sprout", "Scab"),bg
        = "white", pch = c(1,2,5), col=c(1,2,3))




                    2005 Christopher R. Bilder
                                                                          7.77
                                     Density vs. Weight
                                    Original observations




              2.0
                                                             Healthy
                                                             Sprout




              1.8
                                                             Scab



              1.6
Density

              1.4
              1.2
              1.0
              0.8




                     10            20                30        40

                                            Weight


                                     Density vs. Weight
                                   Classified observations
               2.0




                                                                Healthy
                                                                Sprout
               1.8




                                                                Scab
               1.6
    Density

               1.4
               1.2
               1.0
               0.8




                     10            20            30           40

                                            Weight


                           2005 Christopher R. Bilder
                                                                     7.78

                             Density vs. Weight
Classified (large points) overlaid on the original (small points) observations
           2.0




                                                          Healthy
                                                          Sprout
           1.8




                                                          Scab
           1.6
 Density

           1.4
           1.2
           1.0
           0.8




                 10       20                    30        40

                                        Weight

Question: Suppose there were more than 3 variables left
after the variable selection procedure, how could you show
the results of the discriminant analysis?

                            2005 Christopher R. Bilder
                                                          7.79
7.6 Canonical Discriminant Functions

   New variables (called canonical variables) are created
   from an original set in order to reduce the dimensionality
   of the problem. These new variables are used in the
   discriminant analysis.

       Creating the new variables has similar goals as with
       PCA and FA, but the computations are different.

       Hopefully, the canonical variables can be
       interpreted.

       The objectives are similar to what is shown on the
       graph on the previous page. Reduce the dimension
       of the problem so that one can graphically view the
       discriminant analysis.


   This section will not be discussed in class. If you would
   like to know more about it, lecture notes for it are
   available on the scheduled web page of the course
   website.




                       2005 Christopher R. Bilder
                                                                                    7.80
7.7 Nearest Neighbor Discriminant Analysis

    The previous methods rely on a multivariate normal
    distribution assumption. The methods discussed next do
    not.

    Consider observation r. Find the closest observation,
    call it observation s. Classify observation r as coming
    from the population corresponding to observation s.


Example: Use Euclidean distance (Mahalanobis distance
with the covariance matrix equal to the identity matrix) to
classify the RED DOT observation.

    Suppose a bivariate random sample is taken from 3
    populations. The observations are plotted below
    corresponding to their observation number. The goal is
    to classify the RED DOT observation.
            Nearest Neighbors using Euclidean distance
                                       x2
                           3                2
                                   3             1 11
                       3
                  3        2                2
                                   1
                       3       3                              Closest observation
                   3               2    2               1          x1
                                         2
            3rd Closest observation


                                                2nd Closest observation
                                2005 Christopher R. Bilder
                                                          7.81


    Since the nearest observation corresponds to population
    #1, the observation is classified into population #1.


In general, nearest neighbor discriminant analysis uses the
Mahalanobis distance (using a pooled covariance matrix) to
measure the distance between observations. In addition, the
“k” nearest neighbors are found. The largest proportion of
nearest neighbors correspond to the classification for the
observation of interest. If there is a tie, the “k+1” nearest
neighbors are used. If there is still a tie, the “k+2” nearest
neighbors …


Example: Use Euclidean distance to classify the RED DOT
observation.

    If k=3 is used, the RED DOT is classified into population
    #2.


Note that none of this relies on any distributional
assumptions!!! Therefore, this is a “nonparametric”
procedure.


Example: wheat.sas and wheat.dat

                        2005 Christopher R. Bilder
                                                                                    7.82
     Equal priors are used for now to help demonstrate the
     method.
title2 'k nearest neighbor';
proc discrim data=set1 method=npar k=5 list crosslist crossvalidate;
  class type1;
  var skden skhard sksize skwt skmst hrw;
  priors equal;
run;

title2 'k nearest neighbor';
proc discrim data=set1 method=npar k=5 crossvalidate outcross=cross_set1;
  class type1;
  var skden skwt;
  priors equal;
run;


                            The DISCRIM Procedure
          Observations      276          DF Total                       275
          Variables           6          DF Within Classes              273
          Classes             3          DF Between Classes               2

                           Class Level Information
            Variable                                                        Prior
type1       Name         Frequency         Weight        Proportion   Probability
Healthy     Healthy             96        96.0000          0.347826      0.333333
Scab        Scab                84        84.0000          0.304348      0.333333
Sprout      Sprout              96        96.0000          0.347826      0.333333


          Classification Results for Calibration Data: WORK.SET1
             Resubstitution Results using 5 Nearest Neighbors

                          Squared Distance Function

                          2                  -1
                         D (X,Y) = (X-Y)' COV (X-Y)

            Posterior Probability of Membership in Each type1

             m (X) = Proportion of obs in group k in 5
              k      nearest neighbors of X

              Pr(j|X) = m (X) PRIOR / SUM ( m (X) PRIOR )
                         j         j   k     k         k

              Posterior Probability of Membership in type1

                                 2005 Christopher R. Bilder
                                                                                         7.83
                From          Classified
       Obs      type1         into type1       Healthy               Scab     Sprout
         1      Healthy       Scab     *        0.3784             0.4324     0.1892
         2      Healthy       Healthy           0.6000             0.0000     0.4000
         3      Healthy       Sprout   *        0.2000             0.0000     0.8000
         4      Healthy       Healthy           1.0000             0.0000     0.0000
         5      Healthy       Scab     *        0.1892             0.4324     0.3784
         6      Healthy       Healthy           0.8000             0.0000     0.2000


                          * Misclassified observation
                          T Tie for largest probability


              Classification Summary for Calibration Data: WORK.SET1
                 Resubstitution Summary using 5 Nearest Neighbors

                            Squared Distance Function
                            2                   -1
                           D (X,Y) = (X-Y)' COV (X-Y)
                Posterior Probability of Membership in Each type1
                 m (X) = Proportion of obs in group k in 5
                  k      nearest neighbors of X
                  Pr(j|X) = m (X) PRIOR / SUM ( m (X) PRIOR )
                              j        j    k      k        k
            Number of Observations and Percent Classified into type1
    From
    type1           Healthy        Scab           Sprout              Other      Total
    Healthy              68           3               22                  3         96
                       0.83        3.13            22.92               3.13     100.00
    Scab                  9          66                8                  1         84
                      10.71       78.57             9.52               1.19     100.00
    Sprout               27           6               57                  6         96
                      28.13        6.25            59.38               6.25     100.00
    Total               104          75               87                 10        276
                      37.68       27.17            31.52               3.62     100.00
    Priors          0.33333     0.33333          0.33333

                          Error Count Estimates for type1
                             Healthy        Scab      Sprout                   Total
           Rate               0.2917      0.2143      0.4063                  0.3041
           Priors             0.3333      0.3333      0.3333


              Classification Results for Calibration Data: WORK.SET1
                Cross-validation Results using 5 Nearest Neighbors
                            Squared Distance Function
                            2                  -1
                           D (X,Y) = (X-Y)' COV (X-Y)
                Posterior Probability of Membership in Each type1
                                     2005 Christopher R. Bilder
                                                                                         7.84
                m (X) = Proportion of obs in group k in 5
                 k      nearest neighbors of X
                 Pr(j|X) = m (X) PRIOR / SUM ( m (X) PRIOR )
                            j         j    k     k        k

                 Posterior    Probability of Membership            in type1
                From          Classified
       Obs      type1         into type1    Healthy                  Scab     Sprout
         1      Healthy       Scab     *     0.3808                0.4307     0.1884
         2      Healthy       Healthy        0.6025                0.0000     0.3975
         3      Healthy       Sprout   *     0.0000                0.0000     1.0000


       276       Scab        Sprout   *     0.0000      0.4354                0.5646
                          * Misclassified observation
                          T Tie for largest probability

              Classification Summary for Calibration Data: WORK.SET1
                Cross-validation Summary using 5 Nearest Neighbors
                            Squared Distance Function
                            2                   -1
                           D (X,Y) = (X-Y)' COV (X-Y)
                Posterior Probability of Membership in Each type1
                 m (X) = Proportion of obs in group k in 5
                  k      nearest neighbors of X
                  Pr(j|X) = m (X) PRIOR / SUM ( m (X) PRIOR )
                              j        j    k      k        k
            Number of Observations and Percent Classified into type1
    From
    type1           Healthy        Scab           Sprout              Other      Total
    Healthy              56           7               33                  0         96
                      58.33        7.29            34.38               0.00     100.00
    Scab                  9          57               14                  4         84
                      10.71       67.86            16.67               4.76     100.00
    Sprout               36          11               49                  0         96
                      37.50       11.46            51.04               0.00     100.00
    Total               101          75               96                  4        276
                      36.59       27.17            34.78               1.45     100.00
    Priors          0.33333     0.33333          0.33333

                          Error Count Estimates for type1

                              Healthy           Scab               Sprout      Total
           Rate                0.4167         0.3214               0.4896     0.4092
           Priors              0.3333         0.3333               0.3333




                                     2005 Christopher R. Bilder
                                                                                      7.85
2nd PROC DISCRIM OUTPUT:
                            The DISCRIM Procedure
          Observations      276          DF Total                         275
          Variables           2          DF Within Classes                273
          Classes             3          DF Between Classes                 2

                           Class Level Information
            Variable                                                          Prior
type1       Name         Frequency         Weight        Proportion     Probability
Healthy     Healthy             96        96.0000          0.347826        0.333333
Scab        Scab                84        84.0000          0.304348        0.333333
Sprout      Sprout              96        96.0000          0.347826        0.333333

          Classification Summary for Calibration Data: WORK.SET1
             Resubstitution Summary using 5 Nearest Neighbors

                          Squared Distance Function
                          2                  -1
                         D (X,Y) = (X-Y)' COV (X-Y)

            Posterior Probability of Membership in Each type1
             m (X) = Proportion of obs in group k in 5
              k      nearest neighbors of X

              Pr(j|X) = m (X) PRIOR / SUM ( m (X) PRIOR )
                         j         j   k     k         k

       Number of Observations and Percent Classified into type1
 From
 type1        Healthy        Scab      Sprout       Other       Total
 Healthy           69           6          16           5          96
                71.88        6.25       16.67        5.21      100.00
 Scab               9          63          11           1          84
                10.71       75.00       13.10        1.19      100.00
 Sprout            13           9          71           3          96
                13.54        9.38       73.96        3.13      100.00
 Total             91          78          98           9         276
                32.97       28.26       35.51        3.26      100.00
 Priors       0.33333     0.33333     0.33333

                       Error Count Estimates for type1

                          Healthy           Scab               Sprout     Total
     Rate                  0.2813         0.2500               0.2604    0.2639
     Priors                0.3333         0.3333               0.3333


                          The DISCRIM Procedure
          Classification Summary for Calibration Data: WORK.SET1
                                 2005 Christopher R. Bilder
                                                                                 7.86
           Cross-validation Summary using 5 Nearest Neighbors

                       Squared Distance Function
                       2                   -1
                      D (X,Y) = (X-Y)' COV (X-Y)
           Posterior Probability of Membership in Each type1
            m (X) = Proportion of obs in group k in 5
             k      nearest neighbors of X
             Pr(j|X) = m (X) PRIOR / SUM ( m (X) PRIOR )
                         j        j    k      k        k

        Number of Observations and Percent Classified into type1
From
type1           Healthy       Scab           Sprout           Other      Total
Healthy              62         10               24               0         96
                  64.58      10.42            25.00            0.00     100.00
Scab                  9         58               16               1         84
                  10.71      69.05            19.05            1.19     100.00
Sprout               22         15               59               0         96
                  22.92      15.63            61.46            0.00     100.00
Total                93         83               99               1        276
                  33.70      30.07            35.87            0.36     100.00
Priors          0.33333    0.33333          0.33333

                      Error Count Estimates for type1
                         Healthy        Scab      Sprout               Total
       Rate               0.3542      0.3095      0.3854              0.3497
       Priors             0.3333      0.3333      0.3333


Notes:
  The METHOD=NPAR and K=5 options specifies that
   nearest neighbor discriminant analysis should be used
   with k=5 nearest neighbors.
  Why was 5 used? Just for illustrative purposes. In
   practice, try a few different values and see which one
   does the best job with classification.
  The LIST and CROSSLIST options allow for the printing
   of the posterior probabilities. These are calculated as



                                2005 Christopher R. Bilder
                                                                   7.87
           prop. from group j
           3
            prop. from group i
           i=1


For example, the posterior probabilities (resubstitution)
for observation #1 using PRIORS EQUAL is:

 prop. from Healthy                     2 / 96
                                                        0.3784
 3
  prop. from group i           2 / 96  2 / 84  1/ 96
 i=1


     prop. from Scab                    2 / 84
                                                        0.4324
 3
  prop. from group i           2 / 96  2 / 84  1/ 96
 i=1


 prop. from Sprout                      1/ 96
                                                        0.1892
 3
  prop. from group i           2 / 96  2 / 84  1/ 96
 i=1


If the PRIORS=PROPORTIONAL option is specified, the
posterior probabilities are calculated as

           (prop. from group j)  p j
           K
            (prop. from group i)pi
           i=1


Notice that using this option results in posterior
probabilities equal to 0, 0.2, 0.4, 0.6, 0.8, and 1 only for
this data set. For example, the posterior probability
                        2005 Christopher R. Bilder
                                                                                           7.88
  (resubstitution) that observation #1 is classified as
  healthy is:

       (prop. from Healthy)pHealthy
           3
            (prop. from group i)(pi )
           i=1


       
                                          2 / 96  96 / 276                       
                                                                                       2
                  2 / 96  96 / 276    2 / 84 84 / 276   1/ 96 96 / 276  5

  This corresponding output is not presented here;
  however, this helps to illustrate why proportional priors
  may be more desirable!

 Resubstitution – An observation is classified using the k
  closest observations including the observation under
  consideration itself.

       Question: Suppose k=1 nearest neighbor
       discriminant analysis is used to classify the
       observations in the “training” data set (i.e., the
       original data set that is used to develop the
       discriminant rule). What is the classification error
       rate?

 Crossvalidation – An observation is classified using the k
  closest observations excluding the observation under
  consideration. The error rates for this example appear to
  be worse than the rates from the other discriminant
  analysis methods.
                                     2005 Christopher R. Bilder
                                                         7.89
 Notice the classification error rates are lower for using
  just two of the variables instead of all six!
 Observations classified as “OTHER” are observations
  that had a tie in the posterior probabilities. I do not
  believe that SAS goes to the next closest observation to
  break the tie.
 Below are plots of the original data and the
  classifications. Compare the classification plot with those
  obtained using the other discriminant analysis methods.
  Notice the nearest neighbor method do not have absolute
  distinctions between the classified observations – they
  are intermixed. Also, there is not a simple way to overlay
  these two plots in a SAS graph sheet (the OVERLAY
  option in the PLOT statement will not work). One could
  possibly overlay the plots using PROC GREPLAY.




                       2005 Christopher R. Bilder
                               7.90




 2005 Christopher R. Bilder
                                                           7.91
Other notes:
 In order to use k nearest neighbor discriminant analysis on
  a hold out, validation, or new data set, the original or
  training data set needs to be used. With the previous
  discriminant analysis methods, only mean vectors and
  covariance matrices were needed (possibly also priors,…).
 See the Loess method for fitting nonparametric regression
  models in regression analysis. This method uses some of
  the same “nearest neighbor” ideas to fit the model.
  Section 10.4 of Neter, Kutner, Nachtsheim, and
  Wasserman (1996) has a discussion on it.
 Johnson (1998) does not give any guidance to choosing k.
  An ad hoc way is to try a few different k values and pick
  the one that gives the lowest classification error rate.
 When should k nearest neighbor discriminant analysis be
  used?

    When there is reason to believe the data does not come
    from multivariate normal distributions.

 When does data come from a multivariate normal
  distribution?

    Usually only when it is simulated!

 What should you do?

    Try all types of discriminant analysis and use the one
    that gives the lowest classification error rates where the
                         2005 Christopher R. Bilder
                                                        7.92
error rate is measured by an unbiased or close to
unbiased classification method. To visually see the
classification error rates, trellis dot plots can be used.
For example, below is a simple plot showing the
crossvalidation error rates for a number of different
nearest neighbor discriminant analysis procedures. The
code is in err_plots.R and the error rates are in
err_rates_NN.csv.

#Typed in all error rates into a Excel file and then saved
it as a comma delimited file
nn.err.rates<-read.table(file =
    "C:\\chris\\UNL\\STAT873\\Chapter 7\\err_rates_NN.csv",
    sep = ",", header=TRUE)

#Load package
library(lattice)

lset(col.whitebg()) #Set color theme for plots

dotplot(formula = Method ~ crossval.err | var.used, data =
        nn.err.rates, xlab = "Crossvalidation error rate",
        ylab = "Method", groups = priors, layout = c(1, 2,
        1), auto.key = TRUE)




                     2005 Christopher R. Bilder
                                                                    7.93
                                Priors = equal
                                Priors = proportional

                              Variables = skden and skwt


         NN - K=7


         NN - K=5


         NN - K=3


         NN - K=1
    Method




                                      Variables = all


         NN - K=7


         NN - K=5


         NN - K=3


         NN - K=1



                    0.35                       0.40          0.45
                                Crossvalidation error rate


 Other types of nearest neighbor methods:

   Instead of using the k nearest neighbors, a region
   around the observation can be examined. This region
   can be specified to be a particular radius around the
   observation of interest. See the R= and KERNEL=
   options for PROC DISCRIM.

                            2005 Christopher R. Bilder

								
To top