# Question by wU8wwh8

VIEWS: 11 PAGES: 50

• pg 1
```									                                                                                     7.44
7.4 Discriminant Rules (More Than Two Populations)

Almost all of the discrimination rules given in Section 7.1 can
be extended to having more than two populations (there is
no longer a single linear discriminant function rule).

Example: Wheat kernels (wheat.sas, wheat.R, wheat.dat)
A researcher at Kansas State University wanted to
classify wheat kernels into “healthy”, “sprout”, or “scab”
types. The researcher developed an automated method
to take the following measurements: kernel density,
hardness index, size index, weight index, and moisture
content. The purpose of the discriminant analysis was to
see if the wheat kernels could accurately be classified
into the three types using the five measurements.

Partial listing of the data set (s=single, k=kernel):
lot   class kernel type1 skden1 skden2 skden3 skhard              sksize   skwt    skmst
5911    hrw    1    Healthy 1.35  1.35   1.36   60.33               2.30    24.65   12.02
5911    hrw    2    Healthy 1.29  1.29   1.29   56.09               2.73    33.30   12.17
5911    hrw    3    Healthy 1.24  1.23   1.23   43.99               2.51    31.76   11.88
5911    hrw    4    Healthy 1.34  1.34   1.34   53.82               2.27    32.71   12.11
5911    hrw    5    Healthy 1.26  1.26   1.26   44.39               2.35    26.07   12.06

srw4    srw   276   Scab   1.03      1.03        1.03      -9.57    2.06    23.82 12.65

 Lot is the batch of the wheat
 Class is hrw=hard red winter wheat and srw=soft red
winter wheat; a new variable called hrw is created to
be a binary numerical variable denoting the different
classes
 2005 Christopher R. Bilder
7.45
 Kernel is an identification number
 type1 is the kernel type – Healthy, Sprout, Scab
 skden1, skden2, and skden3 are the density of the
kernel (the measurement was repeated three times); a
new variable called skden is created to be the average
of the three measurements
 skhard is the kernel’s hardness
 sksize is the kernel’s size
 skwt is the kernel’s weight
 skmst is the kernel’s moisture content

Initial R summary measures (see program for code)
Bubble plot for first three PCs
Wheat data
4

269

270
267    265
274      210
275      180      209
212
172       207 240
178
29                    216235
2

174        191
205
175      158
232
276 204
208 254 215 163
266 176 153 170
272 177211
251 169          185
206      173 198
27
98                  171
268              146 193
213
263
183
140 34 69                      252           234
195 238
260
247 155
219
236      203
32 101 74 30
64 6679                271 273 242 192
228        264229
141                                                   256 224
PC2

179 199
142                        65
67                      150 189
168164 261 184 188
262 226
257        200
971719       102             225 149194 196
253      148
144    71                                       246 239190
165 182
108              187 159
249                  222
156233230
0

244               258
167 227
237 151
152
181 202218
15 99
92 11 135
37 13391 139
201
214 248 145259 221 197
162
147
154
12572 26          24
107            220 231   241
245
143     44 62           8178                       161 223
93 13
35 1290
84 21
7313482 10
49 136 70 3363 5106 22
166
250     160
255
127 54
113
50 77 100
57
68
103                                  186
128              123           1 18         96            243
129 104 36 89 75 85
86 8    95                          217
31         111 137 61           4394 80
130 25                628
112           13146 83 16 14
60 76 9
105
53 2
138
122118 40 39
45             58 87
38 2055 23
52 56
121
119
-2

47
116 109       51  43       59                         157
7
110
132 42           41 88
124       48
120 114     117

115     126
-4

-4                   -2                    0                 2                    4

PC1

 2005 Christopher R. Bilder
Wheat star plot                7.46
10
1
21                           10                              20

51
71
91
101
121
141

161
181
201
221
241
261
hrw
skhard         skden

sksize         skmst

 2005 Christopher R. Bilder
7.47
Parallel coordinate plot for wheat data

Healthy
Sprout
Scab

wheat2.kernel   wheat2.skden             wheat2.skhard            wheat2.sksize           wheat2.skwt   wheat2.skmst   wheat2.hrw

 2005 Christopher R. Bilder
7.48
Parallel coordinate plot for wheat data - highlight kernel 31

Healthy
Sprout
Scab
Kernel 31

wheat2.kernel   wheat2.skden           wheat2.skhard            wheat2.sksize           wheat2.skwt    wheat2.skmst   wheat2.hrw

 2005 Christopher R. Bilder
7.49
Parallel coordinate plot for wheat data - sort by type1

Healthy
Sprout
Scab

wheat.colors2   wheat2.kernel   wheat2.skden         wheat2.skhard        wheat2.sksize       wheat2.skwt   wheat2.skmst   wheat2.hrw

 2005 Christopher R. Bilder
7.50
Initial SAS summary measures (see program for code)
The PRINCOMP Procedure

Observations                    276
Variables                         6

Simple Statistics
skden            skhard                        sksize
Mean            1.191571450       25.91269460                   2.200355072
StD             0.140570141       27.91452693                   0.495121266

Simple Statistics
skwt            skmst                           hrw
Mean            27.43524638       11.18883322                  0.5217391304
StD              7.97612238        2.03025511                  0.5004345937

Correlation Matrix
skden     skhard    sksize      skwt                 skmst         hrw
skden         1.0000     0.0989    0.2073    0.2764                0.0089      -.0151
skhard        0.0989     1.0000    -.0980    -.3395                -.1883      0.3909
sksize        0.2073     -.0980    1.0000    0.7566                0.0046      0.0248
skwt          0.2764     -.3395    0.7566    1.0000                0.1442      -.1410
skmst         0.0089     -.1883    0.0046    0.1442                1.0000      -.6816
hrw           -.0151     0.3909    0.0248    -.1410                -.6816      1.0000

Eigenvalues of the Correlation Matrix
Eigenvalue    Difference    Proportion    Cumulative
1      2.14089924    0.45206317        0.3568        0.3568
2      1.68883607    0.68799736        0.2815        0.6383
3      1.00083871    0.30098689        0.1668        0.8051
4      0.69985182    0.41680304        0.1166        0.9217
5      0.28304878    0.09652340        0.0472        0.9689
6      0.18652538                      0.0311        1.0000

Eigenvectors
Prin1      Prin2       Prin3    Prin4                Prin5      Prin6
skden        0.182219   0.315600    0.734515 -.557323             0.042130   0.123681
skhard       -.394409   0.195882    0.547652 0.630087             -.198030   -.264431
sksize       0.419028   0.501916    -.115373 0.384230             -.232840   0.597778
skwt         0.542541   0.382239    -.126822 0.079601             0.156216   -.716044
skmst        0.394311   -.461683    0.319975 0.372033             0.601871   0.168304
hrw          -.431070   0.500849    -.169666 0.001902             0.719821   0.128047

3-4 principal components appear to be the true
dimension of the data. Note the possible interpretations
of the principal components.
 2005 Christopher R. Bilder
7.51

Kernel #31 is the outlier on the plot

Notice there is some separations between the different
kernel classes. The division is actually for the hard and soft
read winter wheat (hrw variable). It may be of interest to try
a separate analysis for the variables??? I will not do that
here and just use hrw as a variable for discriminating
between the wheat kernels types.

 2005 Christopher R. Bilder
7.52
What should be done about kernel #31? Talk to the
researcher to make sure this kernel’s data values are
correct.

Discriminant Analysis:
title2 'Discriminant analysis on the wheat data set -
priors proportional';
proc discrim data=set1 method=normal crossvalidate
out=list_set outcross=cross_set;
class type1;
var skden skhard sksize skwt skmst hrw ;
priors proportional;
run;

title2 'Missclassifications from crossvalidation';
proc print data=cross_set;
where type1 ne _into_;
var type1 _into_ healthy sprout scab;
run;

Chris Bilder, STAT 873
Discriminant analysis on the wheat data set - priors proportional

The DISCRIM Procedure
Observations        276          DF Total                       275
Variables             6          DF Within Classes              273
Classes               3          DF Between Classes               2

Class Level Information
Variable                                                       Prior
type1       Name       Frequency           Weight       Proportion   Probability
Healthy     Healthy           96          96.0000         0.347826      0.347826
Scab        Scab              84          84.0000         0.304348      0.304348
Sprout      Sprout            96          96.0000         0.347826      0.347826

Pooled Covariance Matrix Information
Natural Log of the
Covariance    Determinant of the
Matrix Rank     Covariance Matrix
6               2.83517

Pairwise Generalized Squared Distances Between Groups
2         _   _       -1 _    _

 2005 Christopher R. Bilder
7.53
D (i|j) = (X - X )' COV           (X - X ) - 2 ln PRIOR
i   j                   i   j              j

Generalized Squared Distance to type1
From
type1          Healthy          Scab        Sprout
Healthy        2.11211       7.28413       2.75637
Scab           7.01707       2.37917       5.44346
Sprout         2.75637       5.71052       2.11211

Linear Discriminant Function

_      -1 _                                              -1 _
Constant = -.5 X' COV    X + ln PRIOR               Coefficient = COV      X
j        j           j             Vector                   j

Linear Discriminant Function for type1
Variable        Healthy          Scab        Sprout
Constant     -108.89245     -88.62721    -101.75140
skden          92.60313      79.65724      86.94798
skhard         -0.06784      -0.07288      -0.08135
sksize         10.47804      10.41204      10.98232
skwt            0.04965      -0.19559       0.01680
skmst           5.68324       5.79647       5.67197
hrw            18.26673      18.63187      18.46743

Classification Summary for Calibration Data: WORK.SET1
Resubstitution Summary using Linear Discriminant Function

Generalized Squared Distance Function
2          _       -1   _
D (X) = (X-X )' COV (X-X ) - 2 ln PRIOR
j            j           j              j
Posterior Probability of Membership in Each type1
2                    2
Pr(j|X) = exp(-.5 D (X)) / SUM exp(-.5 D (X))
j         k          k

Number of Observations and Percent Classified into type1
From
type1        Healthy         Scab       Sprout        Total
Healthy           74            7           15           96
77.08         7.29        15.63       100.00
Scab              10           64           10            84
11.90        76.19        11.90       100.00
Sprout            23           19           54           96
23.96        19.79        56.25       100.00
Total            107           90           79          276
38.77        32.61        28.62       100.00
 2005 Christopher R. Bilder
7.54
Priors         0.34783          0.30435             0.34783

Error Count Estimates for type1
Healthy        Scab      Sprout                Total
Rate                0.2292      0.2381      0.4375               0.3043
Priors              0.3478      0.3043      0.3478

Classification Summary for Calibration Data: WORK.SET1
Cross-validation Summary using Linear Discriminant Function

Generalized Squared Distance Function

2          _             -1    _
D (X) = (X-X       )' COV    (X-X     ) - 2 ln PRIOR
j            (X)j       (X)     (X)j                j
Posterior Probability of Membership in Each type1
2                     2
Pr(j|X) = exp(-.5 D (X)) / SUM exp(-.5 D (X))
j        k            k

Number of Observations and Percent Classified into type1
From
type1        Healthy         Scab       Sprout        Total
Healthy           67            7           22           96
69.79         7.29        22.92       100.00
Scab              11           62           11           84
13.10        73.81        13.10       100.00
Sprout            25           19           52           96
26.04        19.79        54.17       100.00
Total            103           88           85          276
37.32        31.88        30.80       100.00
Priors       0.34783      0.30435      0.34783

Error Count Estimates for type1
Healthy        Scab      Sprout                Total
Rate                0.3021      0.2619      0.4583               0.3442
Priors              0.3478      0.3043      0.3478

Chris Bilder, STAT 873
Missclassifications from crossvalidation

Obs      type1       _INTO_          Healthy          Sprout     Scab
8      Healthy     Sprout          0.45678         0.48732   0.05591
11      Healthy     Scab            0.29606         0.25829   0.44564
14      Sprout      Healthy         0.65448         0.33797   0.00755
15      Sprout      Scab            0.11832         0.25361   0.62806
16      Sprout      Healthy         0.54047         0.42140   0.03813
 2005 Christopher R. Bilder
7.55

271   Scab   Healthy         0.37123        0.28543   0.34334
273   Scab   Healthy         0.47232        0.34115   0.18653

 2005 Christopher R. Bilder
7.56
title2 'Discriminant analysis on the wheat data set';
proc discrim data=set1 method=normal crossvalidate
outcross=cross_set;
class type1;
var skden skhard sksize skwt skmst hrw ;
priors equal;
run;

title2 'Missclassifications from crossvalidation';
proc print data=cross_set;
where type1 ne _into_;
var type1 _into_ healthy sprout scab;
run;

The DISCRIM Procedure
Observations      276           DF Total                      275
Variables           6          DF Within Classes              273
Classes             3          DF Between Classes               2

Class Level Information
Variable                                                       Prior
type1       Name         Frequency         Weight        Proportion   Probability
Healthy     Healthy             96        96.0000          0.347826      0.333333
Scab        Scab                84        84.0000          0.304348      0.333333
Sprout      Sprout              96        96.0000          0.347826      0.333333

Pooled Covariance Matrix Information
Natural Log of the
Covariance    Determinant of the
Matrix Rank     Covariance Matrix
6               2.83517

Pairwise Generalized Squared Distances Between Groups
2         _   _        -1 _   _
D (i|j) = (X - X )' COV   (X - X )
i   j          i   j

Generalized Squared Distance to type1
From
type1          Healthy          Scab        Sprout
Healthy              0       4.90497       0.64426
Scab           4.90497             0       3.33136
Sprout         0.64426       3.33136             0

Linear Discriminant Function

 2005 Christopher R. Bilder
7.57

_      -1 _                                   -1 _
Constant = -.5 X' COV    X          Coefficient Vector = COV    X
j         j                                      j

Linear Discriminant Function for type1
Variable        Healthy          Scab        Sprout
Constant     -107.83640     -87.43762    -100.69535
skden          92.60313      79.65724      86.94798
skhard         -0.06784      -0.07288      -0.08135
sksize         10.47804      10.41204      10.98232
skwt            0.04965      -0.19559       0.01680
skmst           5.68324       5.79647       5.67197
hrw            18.26673      18.63187      18.46743

Classification Summary for Calibration Data: WORK.SET1
Resubstitution Summary using Linear Discriminant Function

Generalized Squared Distance Function
2         _       -1   _
D (X) = (X-X )' COV (X-X )
j           j           j
Posterior Probability of Membership in Each type1
2                    2
Pr(j|X) = exp(-.5 D (X)) / SUM exp(-.5 D (X))
j        k           k

Number of Observations and Percent Classified into type1
From
type1        Healthy         Scab       Sprout        Total
Healthy           74            7           15            96
77.08         7.29        15.63       100.00
Scab              10           65            9            84
11.90        77.38        10.71       100.00
Sprout            23           20           53            96
23.96        20.83        55.21       100.00
Total            107           92           77          276
38.77        33.33        27.90       100.00
Priors       0.33333      0.33333      0.33333

Error Count Estimates for type1
Healthy        Scab      Sprout            Total
Rate               0.2292      0.2262      0.4479           0.3011
Priors             0.3333      0.3333      0.3333

 2005 Christopher R. Bilder
7.58

Classification Summary for Calibration Data: WORK.SET1
Cross-validation Summary using Linear Discriminant Function

Generalized Squared Distance Function
2         _             -1    _
D (X) = (X-X      )' COV    (X-X     )
j           (X)j       (X)     (X)j
Posterior Probability of Membership in Each type1
2                       2
Pr(j|X) = exp(-.5 D (X)) / SUM exp(-.5 D (X))
j          k           k

Number of Observations and Percent Classified into type1
From
type1        Healthy         Scab       Sprout        Total
Healthy           66            9           21            96
68.75         9.38        21.88       100.00
Scab              10           65            9            84
11.90        77.38        10.71       100.00
Sprout            25           20           51            96
26.04        20.83        53.13       100.00
Total            101           94           81          276
36.59        34.06        29.35       100.00
Priors       0.33333      0.33333      0.33333

Error Count Estimates for type1
Healthy        Scab      Sprout                Total
Rate                   0.3125      0.2262      0.4688               0.3358
Priors                 0.3333      0.3333      0.3333

Obs      type1      _INTO_         Healthy           Sprout     Scab
8      Healthy    Sprout         0.45316          0.48345   0.06339
11      Healthy    Scab           0.27834          0.24283   0.47882
14      Sprout     Healthy        0.65378          0.33760   0.00862
15      Sprout     Scab           0.10858          0.23273   0.65869
16      Sprout     Healthy        0.53754          0.41912   0.04334
17      Sprout     Scab           0.15010          0.12690   0.72299
18      Sprout     Healthy        0.56431          0.35650   0.07919
19      Sprout     Scab           0.18301          0.16055   0.65644

268      Scab       Healthy      0.52590          0.13806     0.33604
273      Scab       Healthy      0.46007          0.33229     0.20764

Notes:
 The classification error rates are a little better using the
PRIORS=PROPORTIONAL option. Note that the
 2005 Christopher R. Bilder
7.59
proportion in each wheat class is approximately the
same.
   Using the linear discriminant rules is a little more
complicated when the number of populations is more
than 2. For this example, 2 different linear discriminant
functions are needed. See wheat.sas for the PROC
IML code used to classify the wheat kernels.
   Also contained in wheat.sas is the PROC IML code
needed to show how the Mahalanobis distance and the
posterior probability are found. Examine this on your
own.
   No cost of classifications are used here
   The covariance matrices for healthy, sprout, and scab
were found to be unequal (p-value<0.0001) using the
POOL=TEST option in PROC DISCRIM. When the
quadratic discriminant rule is used, the classification
error rates are a little less. The actual code and output
used to find these rates are excluded from the notes.
   Examine the 3D plot of the principal components for
justification of why some classification error rates are
larger than others.

 2005 Christopher R. Bilder
7.60
 Summary of classification errors
Classification Error Rates
Actual          Healthy Scab        Sprout   Overall Error
1=2=3 Resubstitution 22.92% 23.81% 43.75%             30.43%
priors=prop. Crossvalidation 30.21% 26.19% 45.83%        34.42%

Different i Resubstitution 26.04% 20.24% 33.33%         26.54%
priors=equal Crossvalidation 31.25% 22.62% 41.67%        31.85%

1=2= Resubstitution 22.92% 22.62% 44.79%            30.11%
priors=equal Crossvalidation 31.25% 22.62% 46.88%        33.58%

Different i Resubstitution 25.00% 21.43% 31.25%          26.09%
prior=prop. Crossvalidation 31.25% 23.81% 40.63%         32.25%

7.5 Variable Selection Procedures

In order to find the most parsimonious model that best
estimates the dependent variable in regression analysis,
variable selection procedures are used to narrow down
the number of independent variables. Similar variable
selection procedures caan be used for discriminant
analysis. This helps to eliminate variables that do not
help to discriminant between the different populations.

ANCOVA REVIEW (STAT 801)

One-way ANOVA model: Yij =  + i + ij
 2005 Christopher R. Bilder
7.61

where ij~ind. N(0,2)
i is the effect of treatment i
 is the grand mean
Yij is the response of the jth object to treatment i

Example: Wheat kernels

Let Yij be the hardness of the jth kernel from the ith
classification.

Y11 = hardness of kernel 1 from healthy class
1 = healthy effect, 2 = sprout effect, 3 = scab effect

Note that if 1 = 2 = 3, there are no mean differences
among the kernel types. In this case, would hardness
be a good discriminator between the kernel types?

One-way ANCOVA model: Yij =  + i + ixij + ij

i = slope coefficient
xij = covariate

Example: Wheat kernels

xij = variable that has an effect on hardness

Note that if 1 = 2 = 3, there are no mean differences
among the kernel types when xij is accounted for. In
 2005 Christopher R. Bilder
7.62
this case, would hardness be a good discriminator
between the kernel types?

Forward selection
1. Find the variable that is the best discriminator among all
the variables. This variable produces the largest F
statistic value in a one-way ANOVA model.

Example: Wheat kernels (sorry about the notation)

skdenij =  + i + ij
skhardij =  + i + ij
sksizeij =  + i + ij
skwtij =  + i + ij
skmstij =  + i + ij
hrwij =  + i + ij

For each model, perform an F test for Ho:1=2=3 vs.
Ha: not all equal. In other words, test for equality of
the mean density for the 3 kernel types; test for
equality of the mean hardness for the 3 kernel types;…
The largest F statistic value somewhat equates to the
“biggest” difference and thus the variable that will
discriminate between the 3 kernel types the best.

2. Find the next best discriminator among all the remaining
variables. This variable produces the largest F statistic

 2005 Christopher R. Bilder
7.63
value in a one-way ANCOVA model with the first
variable as a covariate.

Example: Wheat kernels (sorry about the notation)

Suppose hardness had the largest F statistic from
the previous step.

skdenij =  + i + iskhardij + ij
sksizeij =  + i + iskhardij + ij
skwtij =  + i + iskhardij + ij
skmstij =  + i + iskhardij + ij
hrwij =  + i + iskhardij + ij

For each model, perform an F test for Ho:1=2=3
vs. Ha: not all equal. In other words, test for equality
of the mean density for the 3 kernel types adjusting
for the hardness effect; test for equality of the mean
size for the 3 kernel types adjusting for the hardness
effect;… The largest F statistic value somewhat
equates to the “biggest” difference and thus the
variable that will discriminate between the 3 kernel
types the best after adjusting for the hardness effect.

3. Find the next best discriminator among all the remaining
variables. This variable produces the largest F statistic
value in a one-way ANCOVA model with the first and
second variable as a covariate.
4. Continue the process.
 2005 Christopher R. Bilder
7.64

The forward selection procedure stops when the F-tests
no longer produce any significant outcomes. When there
are no significant variables in step 1, the variables will
not be helpful in discriminating between the populations.

Backward elimination
1. Find the variable that is the worst discriminator among
all the variables. This variable produces the smallest
F statistic value in a one-way ANCOVA model using
the other p-1 variables as covariates. Remove this
variable from the variable pool.

Example: Wheat kernels (sorry about the notation)

Example models:

skdenij =  + i + i1skhardij + i2sksizeij +
i3skwtij + i4skmstij + i5hrwij + ij
sksizeij =  + i + i1skdenij + i2skhardij +
i3skwtij + i4skmstij + i5hrwij + ij


2. Find the next worst discriminator among all the
remaining variables. This variable produces the
smallest F statistic value in a one-way ANCOVA model
using the other p-2 variables as covariates. Remove
this variable from the variable pool.
3. Continue the process.
 2005 Christopher R. Bilder
7.65

The backward elimination procedure stops when all of
the F-tests produce a significant outcome.

Stepwise selection
After each forward selection step, backward elimination
is used to determine if any variables can be removed
from the variable pool.

Johnson’s recommendations
 Use backward elimination when the number of variables
is at most 15.
 Use stepwise selection when the number of variables is
more than 15.
 Use =0.01 for backward elimination.
 When using stepwise selection, use 0.250.50 for
entry and =0.15 for elimination of a variable.
 Use variable selection procedures only as a guide to
help choose the variables.

Caveats
 These procedures do not measure how well the
variables actually discriminate!
 These procedures are examining differences between
means, not differences between entire populations.
Means can be different, but populations can “overlap” –
see example below.

 2005 Christopher R. Bilder
7.66
Example: Discrimination between 4 populations using 2
variables

In the plot below, both x1 and x2 would probably be found
to be “significant” by the variable selection procedures.
Which variable will do the better job of discriminating
between the 4 populations?

f(x)= contours of 4 different populations where  is
a very small number greater than 0
X2

+
Difference in means for X2, but an

+
overlap of the populations

Mean = +
+

+

X1
Difference in means for X1 and separation of
the populations

 2005 Christopher R. Bilder
7.67
In the plot below, both x1 and x2 would probably be found
to be “significant” by the variable selection procedures.
Which variable do you think would be found to be the
“most significant” by the variable sections procedures?

f(x)= contours of 4 different populations where  is
a very small number greater than 0
X2
+
Difference in means for X2, but an

+
overlap of the populations

Mean = +

+

+

X1
Difference in means for X1 and separation of
the populations

See Figures 7.5 and 7.6 of Johnson (1998).

 2005 Christopher R. Bilder
7.68
Example: Wheat kernels (wheat.sas)

Reduce the number of variables used for the
discriminant analysis without creating a larger
classification error.

PROC STEPDISC can be used to perform the variable
selection procedures. Below is part of the SAS code
and output used to perform backward elimination. See
Johnson (1998, p.249) for a stepwise selection example.
title2 'Backward elimination using alpha=0.01';
proc stepdisc data=set1 method=backward sls=0.01;
class type1;
var skden skhard sksize skwt skmst hrw;
run;
The STEPDISC Procedure

The Method for Selecting Variables is BACKWARD
Observations       276           Variable(s) in the Analysis        6
Class Levels         3           Variable(s) will be Included       0
Significance Level to Stay      0.01
Class Level Information
Variable
type1      Name         Frequency       Weight    Proportion
Healthy    Healthy             96      96.0000      0.347826
Scab       Scab                84      84.0000      0.304348
Sprout     Sprout              96      96.0000      0.347826

The STEPDISC Procedure
Backward Elimination: Step 0

All variables have been entered.

Multivariate Statistics
Statistic                          Value F Value              Num DF   Den DF   Pr > F
Wilks' Lambda                   0.491901    19.02                 12      536   <.0001
Pillai's Trace                  0.543834    16.74                 12      538   <.0001
Average Squared                 0.271917
Canonical Correlation

 2005 Christopher R. Bilder
7.69

Backward Elimination: Step 1
Statistics for Removal, DF = 2, 268
Partial
Variable     R-Square    F Value     Pr > F
skden          0.2115      35.95    <.0001
skhard         0.0158       2.15    0.1186
sksize         0.0055       0.74    0.4790
skwt           0.1123      16.96    <.0001
skmst          0.0037       0.49     0.6105
hrw            0.0015       0.20    0.8164
Variable hrw will be removed.

Variable(s) that have been Removed
hrw

Multivariate Statistics
Statistic                        Value F Value              Num DF   Den DF   Pr > F
Wilks' Lambda                 0.492646    22.85                 10      538   <.0001
Pillai's Trace                0.542885    20.12                 10      540   <.0001
Average Squared               0.271442
Canonical Correlation

Backward Elimination: Step 2
Statistics for Removal, DF = 2, 269
Partial
Variable     R-Square    F Value    Pr > F
skden          0.2142      36.67    <.0001
skhard         0.0162       2.21    0.1114
sksize         0.0055       0.75    0.4741
skwt           0.1118      16.94    <.0001
skmst          0.0042       0.57    0.5640
Variable skmst will be removed.

Variable(s) that have been Removed
skmst hrw

Multivariate Statistics
Statistic                        Value F Value              Num DF   Den DF   Pr > F
Wilks' Lambda                 0.494749    28.46                  8      540   <.0001
Pillai's Trace                0.539764    25.04                  8      542   <.0001
Average Squared               0.269882
Canonical Correlation

Backward Elimination: Step 3
Statistics for Removal, DF = 2, 270
Partial
Variable    R-Square    F Value     Pr > F
skden         0.2143      36.82    <.0001
skhard        0.0151       2.06    0.1290
 2005 Christopher R. Bilder
7.70
sksize        0.0068       0.92    0.3983
skwt          0.1097      16.63    <.0001
Variable sksize will be removed.

Variable(s) that have been Removed
sksize skmst hrw

Multivariate Statistics
Statistic                          Value F Value               Num DF   Den DF   Pr > F
Wilks' Lambda                   0.498133    37.66                   6      542   <.0001
Pillai's Trace                  0.533732    33.00                   6      544   <.0001
Average Squared                 0.266866
Canonical Correlation

Backward Elimination: Step 4
Statistics for Removal, DF = 2, 271
Partial
Variable     R-Square    F Value     Pr > F
skden          0.2150      37.10    <.0001
skhard         0.0112       1.54    0.2160
skwt           0.2556      46.52    <.0001
Variable skhard will be removed.

Variable(s) that have been Removed
skhard sksize skmst hrw

Multivariate Statistics
Statistic                          Value F Value               Num DF   Den DF   Pr > F
Wilks' Lambda                   0.503800    55.61                   4      544   <.0001
Pillai's Trace                  0.523153    48.35                   4      546   <.0001
Average Squared                 0.261577
Canonical Correlation

Backward Elimination: Step 5
Statistics for Removal, DF = 2, 272
Partial
Variable    R-Square    F Value     Pr > F
skden         0.2289      40.37    <.0001
skwt          0.2926      56.25    <.0001
No variables can be removed.
No further steps are possible.

Backward Elimination Summary
Number              Partial                                   Wilks'       Pr <
Step       In    Removed R-Square F Value Pr > F                     Lambda     Lambda
0        6                .          .     .                  0.49190126     <.0001
1        5    hrw        0.0015     0.20 0.8164               0.49264634     <.0001
2        4    skmst      0.0042     0.57 0.5640               0.49474856     <.0001
3        3    sksize     0.0068     0.92 0.3983               0.49813344     <.0001
4        2    skhard     0.0112     1.54 0.2160               0.50379958     <.0001
 2005 Christopher R. Bilder
7.71

Average
Squared
Number                 Canonical           Pr >
Step       In   Removed     Correlation           ASCC
0        6                0.27191712         <.0001
1        5   hrw          0.27144225         <.0001
2        4   skmst        0.26988206         <.0001
3        3   sksize       0.26686581         <.0001
4        2   skhard       0.26157659         <.0001

Notes:
 All variables are used at first.
 The “worst” discriminator is hrw since it has the lowest F
statistic value.
 The partial R-square measures the proportion reduction
in the sum of squared error by adding the corresponding
variable to the model given the other variables of interest

See a regression book such as Neter, Kutner,
Nachtsheim, and Wasserman (Chapter 7, 1996)

A low partial R2 means that the variable is not helpful,
and a large partial R2 means that the variable is helpful.

 The Wilk’s lambda and Pillai’s trace statistics are used in
multivariate analysis of variance (MANOVA). In there
first appearance in the output, they are testing

Ho:Healthy=Sprout=Scab
Ha:Not all equal

 2005 Christopher R. Bilder
7.72
where
 i,skden 
          
  i,skhard

 i,sksize 
i              for i=Healthy, Sprout, and Scab
 i,skwt 
 i,skmst 
           

  i,hrw  

These statistics will be discussed in Chapter 11.

 The average squared canonical correlation will be
discussed in Chapter 12. This can be interpreted for
now as a measure of how good the remaining variables
do in discriminating between the wheat classes. The
closer to 1, the better the variables will discriminate.
 Only skden and skwt remain at the end of the backward
elimination procedure.

Next, PROC DISCRIM should be used again to determine
how well the remaining variables discriminate between the
three wheat classes. Hopefully, the classification error rate
has not substantially increased with the removal of the four
variables.

 2005 Christopher R. Bilder
7.73
Previous error rates (use all 6 variables):
Classification Error Rates
Actual            Healthy     Scab      Sprout       Overall Error
1=2=3
Crossvalidation    30.21%          26.19%    45.83%     34.42%
priors=prop.

Different i
Crossvalidation     31.25%          22.62%    41.67%     31.85%
priors=equal

1=2=
Crossvalidation     31.25%          22.62%    46.88%     33.58%
priors=equal

Different i
Crossvalidation 31.25%            23.81%    40.63%     32.25%
prior=prop.

New error rates (use only density and weight):
Classification Error Rates
Actual            Healthy     Scab      Sprout       Overall Error
1=2=3
Crossvalidation    29.17%          23.81%    45.83%     33.33%
priors=prop.

Different i
Crossvalidation     37.50%          25.00%    37.50%     33.33%
priors=equal

1=2=
Crossvalidation     29.17%          23.81%    45.83%     32.94%
priors=equal

Different i
Crossvalidation 37.50%            26.19%    37.50%     34.06%
prior=prop.

It does not look like much (if anything) is lost by using only 2
of the variables instead of all 6!
 2005 Christopher R. Bilder
7.74
Notes:
 The test for equal covariances again rejected equality.
 Since there are only two variables used for the
discrimination, we can view the discrimination process in
two dimensions. Below are three plots summarizing the
discriminant analysis. The plots are done in R with the
data with classifications first exported out of SAS:
**********************************************************;
* Export information out into ASCII text file to be used
* with R;
data smaller_set;
set cross_set1;
keep skden skwt type1 _into_;
run;

proc export data=smaller_set outfile =
"c:\chris\unl\stat873\chapter 7\class_wheat.txt"
DBMS=DLM Replace;
run;

Code is in the classify_plots.R file.
#I first exported an ASCII text file from SAS which
# contained the density, weight, class, and _INTO_

par(pty="s")

#Original obs.
plot(x = wheat\$skwt, y = wheat\$skden, xlab="Weight",
ylab="Density", type="n", main = "Density vs. Weight
\n Original observations",
panel.first=grid(col="gray", lty="dotted"))
points(x = wheat\$skwt[wheat\$type1=="Healthy"], y =
wheat\$skden[wheat\$type1=="Healthy"], pch = 1, col =
1, cex = 1)
 2005 Christopher R. Bilder
7.75
points(x = wheat\$skwt[wheat\$type1=="Sprout"], y =
wheat\$skden[wheat\$type1=="Sprout"], pch = 2, col =
2, cex = 1)
points(x = wheat\$skwt[wheat\$type1=="Scab"], y =
wheat\$skden[wheat\$type1=="Scab"], pch = 5, col = 3,
cex = 1)
legend(locator(1), legend=c("Healthy", "Sprout", "Scab"),
bg = "white", pch = c(1,2,5), col=c(1,2,3))

#Classified obs.
plot(x = wheat\$skwt, y = wheat\$skden, xlab="Weight",
ylab="Density", type="n", main = "Density vs. Weight
\n Classified observations",
panel.first=grid(col="gray", lty="dotted"))
points(x = wheat\$skwt[wheat\$X_INTO_=="Healthy"], y =
wheat\$skden[wheat\$X_INTO_=="Healthy"], pch = 1, col
= 1, cex = 1)
points(x = wheat\$skwt[wheat\$X_INTO_=="Sprout"], y =
wheat\$skden[wheat\$X_INTO_=="Sprout"], pch = 2, col =
2, cex = 1)
points(x = wheat\$skwt[wheat\$X_INTO_=="Scab"], y =
wheat\$skden[wheat\$X_INTO_=="Scab"], pch = 5, col =
3, cex = 1)
legend(locator(1), legend=c("Healthy", "Sprout", "Scab"),
bg = "white", pch = c(1,2,5), col=c(1,2,3))

#Overlaid
plot(x = wheat\$skwt, y = wheat\$skden, xlab="Weight",
ylab="Density", type="n", main = "Density vs. Weight
\n Classified (large points) overlaid on the original
(small points) observations",
panel.first=grid(col="gray", lty="dotted"))
points(x = wheat\$skwt[wheat\$type1=="Healthy"], y =
wheat\$skden[wheat\$type1=="Healthy"], pch = 1, col =
1, cex = 0.75)
points(x = wheat\$skwt[wheat\$type1=="Sprout"], y =
wheat\$skden[wheat\$type1=="Sprout"], pch = 2, col =
2, cex = 0.75)
points(x = wheat\$skwt[wheat\$type1=="Scab"], y =
wheat\$skden[wheat\$type1=="Scab"], pch = 5, col = 3,
cex = 0.75)
 2005 Christopher R. Bilder
7.76

points(x = wheat\$skwt[wheat\$X_INTO_=="Healthy"], y =
wheat\$skden[wheat\$X_INTO_=="Healthy"], pch = 1, col
= 1, cex = 1.5)
points(x = wheat\$skwt[wheat\$X_INTO_=="Sprout"], y =
wheat\$skden[wheat\$X_INTO_=="Sprout"], pch = 2, col =
2, cex = 1.5)
points(x = wheat\$skwt[wheat\$X_INTO_=="Scab"], y =
wheat\$skden[wheat\$X_INTO_=="Scab"], pch = 5, col =
3, cex = 1.5)

legend(locator(1), legend=c("Healthy", "Sprout", "Scab"),bg
= "white", pch = c(1,2,5), col=c(1,2,3))

 2005 Christopher R. Bilder
7.77
Density vs. Weight
Original observations

2.0
Healthy
Sprout

1.8
Scab

1.6
Density

1.4
1.2
1.0
0.8

10            20                30        40

Weight

Density vs. Weight
Classified observations
2.0

Healthy
Sprout
1.8

Scab
1.6
Density

1.4
1.2
1.0
0.8

10            20            30           40

Weight

 2005 Christopher R. Bilder
7.78

Density vs. Weight
Classified (large points) overlaid on the original (small points) observations
2.0

Healthy
Sprout
1.8

Scab
1.6
Density

1.4
1.2
1.0
0.8

10       20                    30        40

Weight

Question: Suppose there were more than 3 variables left
after the variable selection procedure, how could you show
the results of the discriminant analysis?

 2005 Christopher R. Bilder
7.79
7.6 Canonical Discriminant Functions

New variables (called canonical variables) are created
from an original set in order to reduce the dimensionality
of the problem. These new variables are used in the
discriminant analysis.

Creating the new variables has similar goals as with
PCA and FA, but the computations are different.

Hopefully, the canonical variables can be
interpreted.

The objectives are similar to what is shown on the
graph on the previous page. Reduce the dimension
of the problem so that one can graphically view the
discriminant analysis.

This section will not be discussed in class. If you would
like to know more about it, lecture notes for it are
available on the scheduled web page of the course
website.

 2005 Christopher R. Bilder
7.80
7.7 Nearest Neighbor Discriminant Analysis

The previous methods rely on a multivariate normal
distribution assumption. The methods discussed next do
not.

Consider observation r. Find the closest observation,
call it observation s. Classify observation r as coming
from the population corresponding to observation s.

Example: Use Euclidean distance (Mahalanobis distance
with the covariance matrix equal to the identity matrix) to
classify the RED DOT observation.

Suppose a bivariate random sample is taken from 3
populations. The observations are plotted below
corresponding to their observation number. The goal is
to classify the RED DOT observation.
Nearest Neighbors using Euclidean distance
x2
3                2
3             1 11
3
3        2                2
1
3       3                              Closest observation
3               2    2               1          x1
2
3rd Closest observation

2nd Closest observation
 2005 Christopher R. Bilder
7.81

Since the nearest observation corresponds to population
#1, the observation is classified into population #1.

In general, nearest neighbor discriminant analysis uses the
Mahalanobis distance (using a pooled covariance matrix) to
measure the distance between observations. In addition, the
“k” nearest neighbors are found. The largest proportion of
nearest neighbors correspond to the classification for the
observation of interest. If there is a tie, the “k+1” nearest
neighbors are used. If there is still a tie, the “k+2” nearest
neighbors …

Example: Use Euclidean distance to classify the RED DOT
observation.

If k=3 is used, the RED DOT is classified into population
#2.

Note that none of this relies on any distributional
assumptions!!! Therefore, this is a “nonparametric”
procedure.

Example: wheat.sas and wheat.dat

 2005 Christopher R. Bilder
7.82
Equal priors are used for now to help demonstrate the
method.
title2 'k nearest neighbor';
proc discrim data=set1 method=npar k=5 list crosslist crossvalidate;
class type1;
var skden skhard sksize skwt skmst hrw;
priors equal;
run;

title2 'k nearest neighbor';
proc discrim data=set1 method=npar k=5 crossvalidate outcross=cross_set1;
class type1;
var skden skwt;
priors equal;
run;

The DISCRIM Procedure
Observations      276          DF Total                       275
Variables           6          DF Within Classes              273
Classes             3          DF Between Classes               2

Class Level Information
Variable                                                        Prior
type1       Name         Frequency         Weight        Proportion   Probability
Healthy     Healthy             96        96.0000          0.347826      0.333333
Scab        Scab                84        84.0000          0.304348      0.333333
Sprout      Sprout              96        96.0000          0.347826      0.333333

Classification Results for Calibration Data: WORK.SET1
Resubstitution Results using 5 Nearest Neighbors

Squared Distance Function

2                  -1
D (X,Y) = (X-Y)' COV (X-Y)

Posterior Probability of Membership in Each type1

m (X) = Proportion of obs in group k in 5
k      nearest neighbors of X

Pr(j|X) = m (X) PRIOR / SUM ( m (X) PRIOR )
j         j   k     k         k

Posterior Probability of Membership in type1

 2005 Christopher R. Bilder
7.83
From          Classified
Obs      type1         into type1       Healthy               Scab     Sprout
1      Healthy       Scab     *        0.3784             0.4324     0.1892
2      Healthy       Healthy           0.6000             0.0000     0.4000
3      Healthy       Sprout   *        0.2000             0.0000     0.8000
4      Healthy       Healthy           1.0000             0.0000     0.0000
5      Healthy       Scab     *        0.1892             0.4324     0.3784
6      Healthy       Healthy           0.8000             0.0000     0.2000


* Misclassified observation
T Tie for largest probability

Classification Summary for Calibration Data: WORK.SET1
Resubstitution Summary using 5 Nearest Neighbors

Squared Distance Function
2                   -1
D (X,Y) = (X-Y)' COV (X-Y)
Posterior Probability of Membership in Each type1
m (X) = Proportion of obs in group k in 5
k      nearest neighbors of X
Pr(j|X) = m (X) PRIOR / SUM ( m (X) PRIOR )
j        j    k      k        k
Number of Observations and Percent Classified into type1
From
type1           Healthy        Scab           Sprout              Other      Total
Healthy              68           3               22                  3         96
0.83        3.13            22.92               3.13     100.00
Scab                  9          66                8                  1         84
10.71       78.57             9.52               1.19     100.00
Sprout               27           6               57                  6         96
28.13        6.25            59.38               6.25     100.00
Total               104          75               87                 10        276
37.68       27.17            31.52               3.62     100.00
Priors          0.33333     0.33333          0.33333

Error Count Estimates for type1
Healthy        Scab      Sprout                   Total
Rate               0.2917      0.2143      0.4063                  0.3041
Priors             0.3333      0.3333      0.3333

Classification Results for Calibration Data: WORK.SET1
Cross-validation Results using 5 Nearest Neighbors
Squared Distance Function
2                  -1
D (X,Y) = (X-Y)' COV (X-Y)
Posterior Probability of Membership in Each type1
 2005 Christopher R. Bilder
7.84
m (X) = Proportion of obs in group k in 5
k      nearest neighbors of X
Pr(j|X) = m (X) PRIOR / SUM ( m (X) PRIOR )
j         j    k     k        k

Posterior    Probability of Membership            in type1
From          Classified
Obs      type1         into type1    Healthy                  Scab     Sprout
1      Healthy       Scab     *     0.3808                0.4307     0.1884
2      Healthy       Healthy        0.6025                0.0000     0.3975
3      Healthy       Sprout   *     0.0000                0.0000     1.0000


276       Scab        Sprout   *     0.0000      0.4354                0.5646
* Misclassified observation
T Tie for largest probability

Classification Summary for Calibration Data: WORK.SET1
Cross-validation Summary using 5 Nearest Neighbors
Squared Distance Function
2                   -1
D (X,Y) = (X-Y)' COV (X-Y)
Posterior Probability of Membership in Each type1
m (X) = Proportion of obs in group k in 5
k      nearest neighbors of X
Pr(j|X) = m (X) PRIOR / SUM ( m (X) PRIOR )
j        j    k      k        k
Number of Observations and Percent Classified into type1
From
type1           Healthy        Scab           Sprout              Other      Total
Healthy              56           7               33                  0         96
58.33        7.29            34.38               0.00     100.00
Scab                  9          57               14                  4         84
10.71       67.86            16.67               4.76     100.00
Sprout               36          11               49                  0         96
37.50       11.46            51.04               0.00     100.00
Total               101          75               96                  4        276
36.59       27.17            34.78               1.45     100.00
Priors          0.33333     0.33333          0.33333

Error Count Estimates for type1

Healthy           Scab               Sprout      Total
Rate                0.4167         0.3214               0.4896     0.4092
Priors              0.3333         0.3333               0.3333

 2005 Christopher R. Bilder
7.85
2nd PROC DISCRIM OUTPUT:
The DISCRIM Procedure
Observations      276          DF Total                         275
Variables           2          DF Within Classes                273
Classes             3          DF Between Classes                 2

Class Level Information
Variable                                                          Prior
type1       Name         Frequency         Weight        Proportion     Probability
Healthy     Healthy             96        96.0000          0.347826        0.333333
Scab        Scab                84        84.0000          0.304348        0.333333
Sprout      Sprout              96        96.0000          0.347826        0.333333

Classification Summary for Calibration Data: WORK.SET1
Resubstitution Summary using 5 Nearest Neighbors

Squared Distance Function
2                  -1
D (X,Y) = (X-Y)' COV (X-Y)

Posterior Probability of Membership in Each type1
m (X) = Proportion of obs in group k in 5
k      nearest neighbors of X

Pr(j|X) = m (X) PRIOR / SUM ( m (X) PRIOR )
j         j   k     k         k

Number of Observations and Percent Classified into type1
From
type1        Healthy        Scab      Sprout       Other       Total
Healthy           69           6          16           5          96
71.88        6.25       16.67        5.21      100.00
Scab               9          63          11           1          84
10.71       75.00       13.10        1.19      100.00
Sprout            13           9          71           3          96
13.54        9.38       73.96        3.13      100.00
Total             91          78          98           9         276
32.97       28.26       35.51        3.26      100.00
Priors       0.33333     0.33333     0.33333

Error Count Estimates for type1

Healthy           Scab               Sprout     Total
Rate                  0.2813         0.2500               0.2604    0.2639
Priors                0.3333         0.3333               0.3333

The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.SET1
 2005 Christopher R. Bilder
7.86
Cross-validation Summary using 5 Nearest Neighbors

Squared Distance Function
2                   -1
D (X,Y) = (X-Y)' COV (X-Y)
Posterior Probability of Membership in Each type1
m (X) = Proportion of obs in group k in 5
k      nearest neighbors of X
Pr(j|X) = m (X) PRIOR / SUM ( m (X) PRIOR )
j        j    k      k        k

Number of Observations and Percent Classified into type1
From
type1           Healthy       Scab           Sprout           Other      Total
Healthy              62         10               24               0         96
64.58      10.42            25.00            0.00     100.00
Scab                  9         58               16               1         84
10.71      69.05            19.05            1.19     100.00
Sprout               22         15               59               0         96
22.92      15.63            61.46            0.00     100.00
Total                93         83               99               1        276
33.70      30.07            35.87            0.36     100.00
Priors          0.33333    0.33333          0.33333

Error Count Estimates for type1
Healthy        Scab      Sprout               Total
Rate               0.3542      0.3095      0.3854              0.3497
Priors             0.3333      0.3333      0.3333

Notes:
 The METHOD=NPAR and K=5 options specifies that
nearest neighbor discriminant analysis should be used
with k=5 nearest neighbors.
 Why was 5 used? Just for illustrative purposes. In
practice, try a few different values and see which one
does the best job with classification.
 The LIST and CROSSLIST options allow for the printing
of the posterior probabilities. These are calculated as

 2005 Christopher R. Bilder
7.87
prop. from group j
3
 prop. from group i
i=1

For example, the posterior probabilities (resubstitution)
for observation #1 using PRIORS EQUAL is:

prop. from Healthy                     2 / 96
                             0.3784
3
 prop. from group i           2 / 96  2 / 84  1/ 96
i=1

prop. from Scab                    2 / 84
                             0.4324
3
 prop. from group i           2 / 96  2 / 84  1/ 96
i=1

prop. from Sprout                      1/ 96
                             0.1892
3
 prop. from group i           2 / 96  2 / 84  1/ 96
i=1

If the PRIORS=PROPORTIONAL option is specified, the
posterior probabilities are calculated as

(prop. from group j)  p j
K
 (prop. from group i)pi
i=1

Notice that using this option results in posterior
probabilities equal to 0, 0.2, 0.4, 0.6, 0.8, and 1 only for
this data set. For example, the posterior probability
 2005 Christopher R. Bilder
7.88
(resubstitution) that observation #1 is classified as
healthy is:

(prop. from Healthy)pHealthy
3
 (prop. from group i)(pi )
i=1


 2 / 96  96 / 276                       
2
 2 / 96  96 / 276    2 / 84 84 / 276   1/ 96 96 / 276  5

This corresponding output is not presented here;
however, this helps to illustrate why proportional priors
may be more desirable!

 Resubstitution – An observation is classified using the k
closest observations including the observation under
consideration itself.

Question: Suppose k=1 nearest neighbor
discriminant analysis is used to classify the
observations in the “training” data set (i.e., the
original data set that is used to develop the
discriminant rule). What is the classification error
rate?

 Crossvalidation – An observation is classified using the k
closest observations excluding the observation under
consideration. The error rates for this example appear to
be worse than the rates from the other discriminant
analysis methods.
 2005 Christopher R. Bilder
7.89
 Notice the classification error rates are lower for using
just two of the variables instead of all six!
 Observations classified as “OTHER” are observations
that had a tie in the posterior probabilities. I do not
believe that SAS goes to the next closest observation to
break the tie.
 Below are plots of the original data and the
classifications. Compare the classification plot with those
obtained using the other discriminant analysis methods.
Notice the nearest neighbor method do not have absolute
distinctions between the classified observations – they
are intermixed. Also, there is not a simple way to overlay
these two plots in a SAS graph sheet (the OVERLAY
option in the PLOT statement will not work). One could
possibly overlay the plots using PROC GREPLAY.

 2005 Christopher R. Bilder
7.90

 2005 Christopher R. Bilder
7.91
Other notes:
 In order to use k nearest neighbor discriminant analysis on
a hold out, validation, or new data set, the original or
training data set needs to be used. With the previous
discriminant analysis methods, only mean vectors and
covariance matrices were needed (possibly also priors,…).
 See the Loess method for fitting nonparametric regression
models in regression analysis. This method uses some of
the same “nearest neighbor” ideas to fit the model.
Section 10.4 of Neter, Kutner, Nachtsheim, and
Wasserman (1996) has a discussion on it.
 Johnson (1998) does not give any guidance to choosing k.
An ad hoc way is to try a few different k values and pick
the one that gives the lowest classification error rate.
 When should k nearest neighbor discriminant analysis be
used?

When there is reason to believe the data does not come
from multivariate normal distributions.

 When does data come from a multivariate normal
distribution?

Usually only when it is simulated!

 What should you do?

Try all types of discriminant analysis and use the one
that gives the lowest classification error rates where the
 2005 Christopher R. Bilder
7.92
error rate is measured by an unbiased or close to
unbiased classification method. To visually see the
classification error rates, trellis dot plots can be used.
For example, below is a simple plot showing the
crossvalidation error rates for a number of different
nearest neighbor discriminant analysis procedures. The
code is in err_plots.R and the error rates are in
err_rates_NN.csv.

#Typed in all error rates into a Excel file and then saved
it as a comma delimited file
"C:\\chris\\UNL\\STAT873\\Chapter 7\\err_rates_NN.csv",

library(lattice)

lset(col.whitebg()) #Set color theme for plots

dotplot(formula = Method ~ crossval.err | var.used, data =
nn.err.rates, xlab = "Crossvalidation error rate",
ylab = "Method", groups = priors, layout = c(1, 2,
1), auto.key = TRUE)

 2005 Christopher R. Bilder
7.93
Priors = equal
Priors = proportional

Variables = skden and skwt

NN - K=7

NN - K=5

NN - K=3

NN - K=1
Method

Variables = all

NN - K=7

NN - K=5

NN - K=3

NN - K=1

0.35                       0.40          0.45
Crossvalidation error rate

 Other types of nearest neighbor methods:

Instead of using the k nearest neighbors, a region
around the observation can be examined. This region
can be specified to be a particular radius around the
observation of interest. See the R= and KERNEL=
options for PROC DISCRIM.

 2005 Christopher R. Bilder

```
To top