Document Sample
PCA Powered By Docstoc
					                                  PCA-Principal Components Analysis
        The mathematics of PCA have already been covered in the material on vectors
and matrices—specifically in discussion of the SVD of a general rectangular (e.g.,
observations by variables) matrix X, and on the eigenstructure of a square symmetric
matrix (e.g., the positive definite or semidefinite—i.e. Gramian—matrix         1
                                                                             ( n 1)   X X  the

covariance or correlation matrix for X, depending on whether X is in deviation [mean
centered] or standard form [mean centered and standardized variances; means = 0 and
variance =1 for all variables])
        In our discussion of the (reduced) SVD we showed that the n×m matrix X can be
decomposed into:
                                        X  Z s DW 
where Zs (n×m matrix) is an orthonormal section:
                                        Z s Z s  I m

D is an m×m diagonal with nonnegative diagonals, ordered from largest to smallest, and
W is an m×m orthogonal matrix, so that:
                                         W W  WW   I m
        In PCA, we are effectively decomposing the raw (data) score matrix X into a
product of the same general form, but with slightly different notation, and different
scaling of component matrices; i.e.: in a PCA:

                                        n   X p  Z s D1 / 2 U

                                        PCA                 SVD
                                            p               m
                                            U               W
                                         D1/2       =        D  1
                                                                  n 1

                                            Zs      =        n  1Z    s

        Difference in notation is arbitrary, while difference in scaling is because matrix
whose eigenstructure is solved for in defining the PCA solution is not XX, but           n11 XX ,
the covariance or correlation matrix. [If X is neither mean centered or variance
standardized this is simply a matrix of sums of squares and cross products, usually called
just a "cross product" matrix—but in PCA the matrix involved is generally the covariance
or correlation matrix. In Chapter 4of AMD book, PCA is restricted to correlation matrix
only—but a PCA of a covariance matrix is quite often done as well, especially when the
variables are regarded as comparable in scale.] The eigenstructure analysis of          n11 XX
(covariance or correlation matrix—but hencefore to be referred to as the correlation
matrix, unless stated otherwise) is of the form:
                                  R   n1 XX  UD U 

where D = diag(1,2,p) where (1,2,p) are the (ordered) eigenvalues of R, so that
                                      X  Z s D1 / 2 U 

                                  D1 / 2  diag 1 ,  2 ,  p   
while                             Z s  XUD 1 / 2

is the rescaled version of Z s in the SVD of X, with scaling by multiplying by the scalar

       
    n 1 .
        Choosing number of components: Bartlett's Sphericity Test: Given p variables
and n observations

                                     
                          2 p 2  p / 2  n  1  2 p  5 / 6 n R

where R is the generalized variance associated with the correlation matrix R. d.f. for

the  2 statistic is  p 2  p / 2  p p  1 / 2 (= # of off—diagonal correlations), and
 n denotes the natural logarithm.

        Use Bartlett's test on initial R. If H0 is not rejected, then variables are not
significantly different from a "spherical" set of variables (totally uncorrelated), and there
is little point in doing a PCA.
        After extracting first PC, can define
                                  R res  R  u11u1  1u1u1
                                    2                      
and apply Bartlett's test to that res. matrix. Need to adjust d.f.'s though—since effective
number of variables is p* = p – 1 (also adjust calculation of  2 , substituting p* for p, and
d.f.res = p*(p* – 1)/2).
        In general, if the kth PC is extracted, compute
                                         R res  R kres   k u k u
                                           k 1                        k

and compute Bartlett's test with
                                               pk 1  p  k ,

and d.f. for  2 being: d.f. = p*(p* – 1)/2) (dropping any subscript from p*)
        Bartlett's test is too stringent (too likely to reject H0), so is likely to lead to very
many components (PC's). In practice, while it would be wise to compute it for initial R
matrix, to be sure there is some structure in the data, it is not generally a good criterion
for # of PC's to keep (keeps too many)!
Other approaches:
        Scree test (see Fig. 4-14)
        Kaiser's rule
        Horn's procedure (see Fig. 4-16 & 4-17)
      Figure 4.14
Scree plot for GSP
        share data




                     Index       2   4   6   8   10   12
           Figure 4.16
    Scree plot for GSP
  data (solid line) with
    Horn's benchmark
(dashed line) suggests
   retaining first three
 principal components




                           Index       2   4   6   8   10   12
           Figure 4.17
 Scree plot for Burke's
  data (solid line) with
    Horn's benchmark
(dashed line) suggests
 that observed pattern
      is consistent with
         spherical data



                             Index       1   2   3   4   5   6

Shared By: