supervised classification1

W
Shared by: HC120730072434
Categories
Tags
-
Stats
views:
0
posted:
7/30/2012
language:
pages:
14
Document Sample
scope of work template
							    Bayesian Discriminant Analysis
• This supervised learning technique uses Bayes’
  rule but is different in philosophy from the well
  known work of Aitken, Taroni, et al.
                                          Prior probability
• Bayes’ rule:                         This can be a problem!

                            Pr x | grp i.d.Pr grp i.d.
       Pr grp i.d. | x  
                                        Pr x 
   • Pr is probability
   • Equation means: “How does the probability of an
     item being a member of group change, given
     evidence x”
    Bayesian Discriminant Analysis
• Bayes’ rule can be turned into a classification
  rule:
    If: Pr x | grp 1Pr grp 1  Pr x | grp 2 Pr grp 2 
                    => Choose group 1




                                            *If priors are both 0.5,
                                            decision boundaries are
                                              where curves cross
Bayes-Gaussian Discriminant Analysis
• If the data is multivariate normal drawn from
  the same population, the decision rule
  becomes:        arg min  d x 
                                                 grp j   unk
                                       j 1,...k
                                                                            Like an average
                               with the “distance” defined as:                  cov mat

                              1 T                                               1 k
                              x j S1 x j  ln Pr(grp j) and                      (ni  1)Si
               T    1
 d j (xunk )  x S x
               j    pl unk          pl                                 S pl 
                              2                                               n  k i 1

  • Note that if the data is just 1D this is just an
    equation for a line:
                                                     2
                                  xj       x
                   d j (xunk )   2 xunk  2  ln Pr(grp j)
                                                     j

                                  s        2s

                                  slope                    intercept
Bayes-Gaussian Discriminant Analysis
• If the data is multivariate normal but drawn
  from different populations, the decision rule is
  the same but the “decision distance” becomes:
                    1 T 1                         1 T 1
     d j (xunk )   xunk S j xunk  x j Spl xunk  x j Spl x j  ln Pr(grp j)
                                      T 1

                    2                              2
     New quadratic term

  • Note that if the data is just 1D this is an equation
    for a parabola:
                                                 2
                       1 2     xj       xj
       d j (xunk )   2 xunk  2 xunk  2  ln Pr(grp j)
                      2s j     sj       2s j

                       a            b                     c
Bayes-Gaussian Discriminant Analysis
• The “quadratic” version is always called
  quadratic discriminant analysis, QDA
• The “linear” version is called by a number of
  names!
  • linear discriminant analysis, LDA
  • Some combination of of the above with the words,
    Gaussian or classification
• A number of techniques use the name LDA!
  • Important to specify the equations used to tell the
    difference!
    Bayes-Gaussian Discriminant Analysis




Groups have similar covariance structure:   Groups have different covariance structure:
linear discriminant rule should work well   quadratic discriminant rule may work better
       Canonical Variate Analysis
• This supervised technique is called Linear
  Discriminant Analysis (LDA) in R
  • Also called Fisher linear discriminant analysis
  • CVA is closely related to linear Bayes-Gaussian
    discriminant analysis
• Works on a principle similar to PCA: Look for
  “interesting directions in data space”
  • CVA: Find directions in space which best separate
    groups.
     • Technically: find directions which maximize ratio of
       between group to within variation
           Canonical Variate Analysis
Project on PC1:
Not necessarily good
group separation!

                             Project on CV1:
                             Good group separation!




                       Note: There are #groups -1 or p CVs
                              which ever is smaller
      Canonical Variate Analysis
• Use between-group to within-group covariance
  matrix, W-1B to find directions of best group
  separation (CVA loadings, Acv):
                    
               W1B Acv  Acvs
  • CVA can be used for dimension reduction.
  • Caution! These “dimensions” are not at right
    angles (i.e. not orthogonal)
     • CVA plots can thus be distorted from reality
     • Always check loading angles!
  • Caution! CVA will not work well with very
    correlated data
        Canonical Variate Analysis
2D CVA of gasoline data set:   2D PCA of gasoline data set:
        Canonical Variate Analysis
• Distance metric used in CVA to assign group
  i.d. of an unknown data point: arg min d x 
                                                                j 1,...k
                                                                              grp j   unk



                                             1 T
                                x                       
                                             x j A cvA cv x j  ln Pr(grp j)
                 T
   d j (xunk )  x A cvA
                 j
                            T
                            cv        unk
                                                        T

                                             2

  • If data is Gaussian and group covariance structures
    are the same then CVA classification is the same as
    Bayes-Gaussian classification.
                               *Now Exercise:
                         Explore some data sets with:
                             lda_group_explore.R
                Try try simple supervised classification with:
                             lda_group_predict.R
                            lda_group_predict2.R
Partial Least Squares Discriminant Analysis
   • PLS-DA is a supervised discrimination
     technique and very popular in chemometrics
     • Works well with highly correlated variables (like in
       spectroscopy)
        • Lots of correlation causes CVA to fail!
     • Group labels coded into a “response matrix” Y
        • PLS searches for directions of maximum covariance in
          X and Y.
     • Loading for X can be used like PCA loading
        • Dimension reduction
        • Loading plots
Partial Least Squares Discriminant Analysis

    2D PLS of gasoline data set:   2D PCA of gasoline data set:
Partial Least Squares Discriminant Analysis
   • Group assignments of observation vectors are
     made by interpreting Y scores.
     • Typically “soft-max” function is used.
        Y-scores




                                        *Now Exercise:
                                        Try try plsda.R




                      Observation Vectors

						
Related docs
Other docs by HC120730072434
where the wild things are dance unit
Views: 7  |  Downloads: 0
Albano MRC1978
Views: 5  |  Downloads: 0
Responses to Info Kit
Views: 9  |  Downloads: 0
Slide 1
Views: 0  |  Downloads: 0
i5n2zzdv9y
Views: 0  |  Downloads: 0
PowerPoint Presentation
Views: 7  |  Downloads: 0
Newsletter Oct 10
Views: 0  |  Downloads: 0
Fuchsia Brands Ltd
Views: 1  |  Downloads: 0