Document Sample

PCA-Principal Components Analysis The mathematics of PCA have already been covered in the material on vectors and matrices—specifically in discussion of the SVD of a general rectangular (e.g., observations by variables) matrix X, and on the eigenstructure of a square symmetric matrix (e.g., the positive definite or semidefinite—i.e. Gramian—matrix 1 ( n 1) X X the covariance or correlation matrix for X, depending on whether X is in deviation [mean centered] or standard form [mean centered and standardized variances; means = 0 and variance =1 for all variables]) In our discussion of the (reduced) SVD we showed that the n×m matrix X can be decomposed into: X Z s DW where Zs (n×m matrix) is an orthonormal section: Z s Z s I m D is an m×m diagonal with nonnegative diagonals, ordered from largest to smallest, and W is an m×m orthogonal matrix, so that: W W WW I m In PCA, we are effectively decomposing the raw (data) score matrix X into a product of the same general form, but with slightly different notation, and different scaling of component matrices; i.e.: in a PCA: n X p Z s D1 / 2 U where PCA SVD p m U W D1/2 = D 1 n 1 Zs = n 1Z s Difference in notation is arbitrary, while difference in scaling is because matrix whose eigenstructure is solved for in defining the PCA solution is not XX, but n11 XX , the covariance or correlation matrix. [If X is neither mean centered or variance standardized this is simply a matrix of sums of squares and cross products, usually called just a "cross product" matrix—but in PCA the matrix involved is generally the covariance or correlation matrix. In Chapter 4of AMD book, PCA is restricted to correlation matrix only—but a PCA of a covariance matrix is quite often done as well, especially when the variables are regarded as comparable in scale.] The eigenstructure analysis of n11 XX (covariance or correlation matrix—but hencefore to be referred to as the correlation matrix, unless stated otherwise) is of the form: R n1 XX UD U 1 where D = diag(1,2,p) where (1,2,p) are the (ordered) eigenvalues of R, so that X Z s D1 / 2 U where D1 / 2 diag 1 , 2 , p while Z s XUD 1 / 2 is the rescaled version of Z s in the SVD of X, with scaling by multiplying by the scalar n 1 . Choosing number of components: Bartlett's Sphericity Test: Given p variables and n observations 2 p 2 p / 2 n 1 2 p 5 / 6 n R where R is the generalized variance associated with the correlation matrix R. d.f. for the 2 statistic is p 2 p / 2 p p 1 / 2 (= # of off—diagonal correlations), and n denotes the natural logarithm. Use Bartlett's test on initial R. If H0 is not rejected, then variables are not significantly different from a "spherical" set of variables (totally uncorrelated), and there is little point in doing a PCA. After extracting first PC, can define R res R u11u1 1u1u1 2 and apply Bartlett's test to that res. matrix. Need to adjust d.f.'s though—since effective number of variables is p* = p – 1 (also adjust calculation of 2 , substituting p* for p, and d.f.res = p*(p* – 1)/2). In general, if the kth PC is extracted, compute R res R kres k u k u k 1 k and compute Bartlett's test with pk 1 p k , and d.f. for 2 being: d.f. = p*(p* – 1)/2) (dropping any subscript from p*) Bartlett's test is too stringent (too likely to reject H0), so is likely to lead to very many components (PC's). In practice, while it would be wise to compute it for initial R matrix, to be sure there is some structure in the data, it is not generally a good criterion for # of PC's to keep (keeps too many)! Other approaches: Scree test (see Fig. 4-14) Kaiser's rule Horn's procedure (see Fig. 4-16 & 4-17) Figure 4.14 Scree plot for GSP share data 3 2 Eigen 1 0 Index 2 4 6 8 10 12 Figure 4.16 Scree plot for GSP data (solid line) with Horn's benchmark (dashed line) suggests retaining first three principal components 3 2 Eigen 1 0 Index 2 4 6 8 10 12 Figure 4.17 Scree plot for Burke's data (solid line) with Horn's benchmark (dashed line) suggests that observed pattern is consistent with spherical data 1.5 1.0 Burke 0.5 0.0 Index 1 2 3 4 5 6

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 14 |

posted: | 10/11/2011 |

language: | English |

pages: | 6 |

OTHER DOCS BY qingyunliuliu

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.