# supervised classification1

W
Shared by:
Categories
Tags
-
Stats
views:
0
posted:
7/30/2012
language:
pages:
14
Document Sample

Bayesian Discriminant Analysis
• This supervised learning technique uses Bayes’
rule but is different in philosophy from the well
known work of Aitken, Taroni, et al.
Prior probability
• Bayes’ rule:                         This can be a problem!

Pr x | grp i.d.Pr grp i.d.
Pr grp i.d. | x  
Pr x 
• Pr is probability
• Equation means: “How does the probability of an
item being a member of group change, given
evidence x”
Bayesian Discriminant Analysis
• Bayes’ rule can be turned into a classification
rule:
If: Pr x | grp 1Pr grp 1  Pr x | grp 2 Pr grp 2 
=> Choose group 1

*If priors are both 0.5,
decision boundaries are
where curves cross
Bayes-Gaussian Discriminant Analysis
• If the data is multivariate normal drawn from
the same population, the decision rule
becomes:        arg min  d x 
                       grp j   unk
j 1,...k
Like an average
with the “distance” defined as:                  cov mat

1 T                                               1 k
 x j S1 x j  ln Pr(grp j) and                      (ni  1)Si
T    1
d j (xunk )  x S x
j    pl unk          pl                                 S pl 
2                                               n  k i 1

• Note that if the data is just 1D this is just an
equation for a line:
2
xj       x
d j (xunk )   2 xunk  2  ln Pr(grp j)
j

s        2s

slope                    intercept
Bayes-Gaussian Discriminant Analysis
• If the data is multivariate normal but drawn
from different populations, the decision rule is
the same but the “decision distance” becomes:
1 T 1                         1 T 1
d j (xunk )   xunk S j xunk  x j Spl xunk  x j Spl x j  ln Pr(grp j)
T 1

2                              2

• Note that if the data is just 1D this is an equation
for a parabola:
2
1 2     xj       xj
d j (xunk )   2 xunk  2 xunk  2  ln Pr(grp j)
2s j     sj       2s j

a            b                     c
Bayes-Gaussian Discriminant Analysis
• The “quadratic” version is always called
• The “linear” version is called by a number of
names!
• linear discriminant analysis, LDA
• Some combination of of the above with the words,
Gaussian or classification
• A number of techniques use the name LDA!
• Important to specify the equations used to tell the
difference!
Bayes-Gaussian Discriminant Analysis

Groups have similar covariance structure:   Groups have different covariance structure:
linear discriminant rule should work well   quadratic discriminant rule may work better
Canonical Variate Analysis
• This supervised technique is called Linear
Discriminant Analysis (LDA) in R
• Also called Fisher linear discriminant analysis
• CVA is closely related to linear Bayes-Gaussian
discriminant analysis
• Works on a principle similar to PCA: Look for
“interesting directions in data space”
• CVA: Find directions in space which best separate
groups.
• Technically: find directions which maximize ratio of
between group to within variation
Canonical Variate Analysis
Project on PC1:
Not necessarily good
group separation!

Project on CV1:
Good group separation!

Note: There are #groups -1 or p CVs
which ever is smaller
Canonical Variate Analysis
• Use between-group to within-group covariance
matrix, W-1B to find directions of best group
      
W1B Acv  Acvs
• CVA can be used for dimension reduction.
• Caution! These “dimensions” are not at right
angles (i.e. not orthogonal)
• CVA plots can thus be distorted from reality
• Caution! CVA will not work well with very
correlated data
Canonical Variate Analysis
2D CVA of gasoline data set:   2D PCA of gasoline data set:
Canonical Variate Analysis
• Distance metric used in CVA to assign group
i.d. of an unknown data point: arg min d x 
                       j 1,...k
grp j   unk

1 T
           x                       
 x j A cvA cv x j  ln Pr(grp j)
T
d j (xunk )  x A cvA
j
T
cv        unk
T

2

• If data is Gaussian and group covariance structures
are the same then CVA classification is the same as
Bayes-Gaussian classification.
*Now Exercise:
Explore some data sets with:
lda_group_explore.R
Try try simple supervised classification with:
lda_group_predict.R
lda_group_predict2.R
Partial Least Squares Discriminant Analysis
• PLS-DA is a supervised discrimination
technique and very popular in chemometrics
• Works well with highly correlated variables (like in
spectroscopy)
• Lots of correlation causes CVA to fail!
• Group labels coded into a “response matrix” Y
• PLS searches for directions of maximum covariance in
X and Y.
• Dimension reduction
Partial Least Squares Discriminant Analysis

2D PLS of gasoline data set:   2D PCA of gasoline data set:
Partial Least Squares Discriminant Analysis
• Group assignments of observation vectors are
• Typically “soft-max” function is used.
Y-scores

*Now Exercise:
Try try plsda.R

Observation Vectors

Related docs
Other docs by HC120730072434
where the wild things are dance unit
Albano MRC1978