# Factor Analysis and Principal Components

Document Sample

```					   Factor Analysis
and
Principal Components
Factor analysis with principal
components presented as a
subset of factor analysis
techniques, which it is subset.
Principal Components: (PC)
Principal components is about explaining the variance-covariance
structure, , of a set of variables through a few linear combinations
of these variables.

In general PC is used for either:
1. Data reduction
or
2. Interpretation

If you have p variables x   x1 ,
        , x p  you need p  components

to capture all the variability, but often a smaller number, k, principal
components can capture most of the variability.
So the original data of n measurements on p variables can be
reduced to a to a data set of n measurements on k principal
components. PC tends to be a means to an end but not the end
itself. That is PC is often not the final step. The PC may be
used for multiple regression, cluster analysis, etc.
Let x   x1 ,
           , x p  with   covariance matrix 

consider the linear combination

Y1  a1x  a11x1  a12 x 2   a1p x p
p
Y2  a  x   a 2i x i
2
i 1
p
Yp  a  x   a pi x i
p
i 1

Var  Yi   a a i , i  1, 2,
i                   ,p
Cov  Yi , Yk   a a k , i, k  1, 2,
i                      ,p

1. First principal component  linear combination a1x
that maximizes Var  a1x  subject to a1a1  1
                

2. second principal component  linear combination a  x
2

that maximizes Var  a  x  subject to a  a 2  1
2                  2

and Cov  a1x,a  x   0
 2

i th . i th principal component  linear combination a x
i

that maximizes Var  a x  subject to a a i  1
i                 i

and Cov  a x,a  x   0 for k  i.
i    k
Find the principal components and the proportion of the total
population of the total population variance explained by
each when the covariance matrix is

  2  2 0 
 2         2      1      1
       2
  ,     
2      2
 0  2  2 
             

To solve this you will have to go though your notes, but you can
do this even though didn’t give you the formula.
Hint : Re call : Maximization of Quadratic forms for points on
the Unit Sphere.
Let B be a positive definite matrix with eigenvalues
 pp 
1   2   3              p and associated normalized eigenvectors
e1 ,e 2 ,        ,e p . Then
x Bx
max               1  attained when x  e1 
x 0     x x
x Bx
min               p  attained when x  e p 
x  0 x x

Moreover
x Bx
max                k 1  attained when x  e k 1 k  1, 2,   , p  1
x e1 , ,e k x x
  I        2          0
2       3    2        2    2
  2   2  2 42   0
   
2
                      


   2 or    2 1   2      
for 1   2    e1  1

      2 0 1       2


 2  2 1   2         e  1 2 1
2            2 1 2

3     1   2 
2
e  1 2 1
3             2 1     2

Principal Components          Var             % Total var

1
Y1  X1 
1
X3          2               1
2        2                              3
          
1       1       1
Y2  X1 
2        2
X 2  X3
2
2 1   2       1
3
1  2     
           
1       1        1                      1
Y3  X1       X2      X3                    1  2
2        2        2      
2 1   2      3
Let  be the covariance matrix associated with the random vector
x   x1 , x 2 ,
                , x p  . Let  have the eigenvalue  eigenvector pairs

 1 ,e1  ,   2 ,e2  ,    ,   p ,e p  where 1   2      p  0.
Then the i th principal component is given by
Yi  ex  ei1x1  ei 2 x 2   eip x p i  1, 2, , p
i

Var  Yi   eei   i
i                   i  1, 2,   ,p
Cov  Yi , Yk   ee k  0 i  k
i

1. 11   22            pp
p                                              p
  Var  x i   1   2              p   Var  Yi 
i 1                                         i 1
Proof :
By definition tr     11   22           pp
we can write  as   PP where  is the diagonal matrix of
eigenvalues and P  e1 ,e 2 ,
                     ,e p 

PP  PP  I
p
tr     tr  PP   tr  PP   tr       i
i 1
p                                    p
 Var  x i   tr     tr       Yi 
i 1                                 i 1
p             p
Thus total population variance   ii    i
i 1         i 1

Thus proportion of total pop.var. due to k th principal component
k
is      p
k  1, 2,   , p.
 i
i 1

Y1  e1x , Y2  e x ,
2                 , Yp  e x are the principal components
p

eik  i
from the covariance matrix , then Yi ,x k                         i, k  1, 2,   ,p
 kk
are the correlation coefficients between Yi and the variables x k .
Here  1 ,e1  ,   2 ,e 2  ,     ,   p ,e p  are the eigenvalue  eigenvector
pair for .
eik  i
Show Yi ,x k 
 kk
Proof : set a   0,0,
k                  ,1,0,    ,0           x k  a x
k

Cov  X k , Yk   Cov  a  x,ex   a  ei
k    i        k              ei   i ei
So Cov  X k , Yi   a   i ei   i eik
k

Var  Yi    i    Show earlier 
Var  X k   kk
Cov  Yi , X k 
Yi ,Xk 
Var  Yi  Var  X k 
 i eik           eik  i
                                i, k  1, 2,      ,p
 i kk              kk
Pricipal Components from Standard Variables
Z1   X1  1    11
Z2   X 2   2     22

Zp   X p   p     pp
1
     1
In matrix notation Z  V   2
X  
    
 11     0                   0 
                                  
1
 0        22                0 
where V  2

                                  
 0       0                    pp 
                                  
1       1

1
   1

E  Z  0 Cov  Z    V    V   
2        2

         
The PC of Z can be obtained from the eigenvectors
of the correlation matrix  of X.
For notation we shall continue to use Yi to refer to the i th PC and
 i ,ei  as the eigenvalue  eigenvector pair from either  or .
However, the   i ,ei  derived from  are, in general, not the same as
the ones derived from .
The i th PC of the standardized variables Z   Z1 , Z2 ,
                      , Zp 

1
          1
with Cov  Z    is given by Yi  eZ  e  V 
i     i
2
X     i  1,   ,p
 
p                  p
 Var  Yi    Var  Zi     The number of random variables, not rho 
i 1            i 1

Yi ,Zk  eik  i       i,k  1,2,   ,p where   i ,ei  are
the eigenvalue  eigenvector pair for , with 1   2                  p  0.
If S  Sik  is the p  p sample covariance matrix with
ˆ ˆ        
ˆ ˆ
eigenvalue  eigenvector pairs 1 ,e1 ,  2 ,e 2 ,          ,   ,e 
ˆ ˆ
p   p

the i th sample principal component is given by
yi  ex  ei1x1  ei2 x 2   eip x p i  1, 2, , p.
ˆ ˆi       ˆ       ˆ           ˆ
ˆ    ˆ
where 1   2             ˆ
  p  0 and X is any observation
on the var iables X1 , X 2 ,      , Xp.
ˆ
Also sample variance of y k   k k  1, 2,
ˆ                               ,p
sample covariance of  yi , y k   0 i  k
ˆ ˆ

ˆ ˆ
eik  i
ryi ,x k 
ˆ                     i, k  1, 2,   ,p
Skk
Factor Analysis

The main purpose of factor analysis is to try to describe
the covariance relationships among many variables in term of
a few underlying, but unobservable, random quantities called
factors.

The Orthogonal Factor Model:
The observable random vector x, with p components, has mean
 and covariance matrix, .
The factor model proposes that x is linearly dependent upon
a few unobservable random variables, F1, F2 ,….,Fm , called
common factors and p additional sources of variation
1, 2 ,….,p called error, or specific factors.
X1  1  l11F1  l12 F2        l1m Fm  1
X 2   2  l21F1  l22 F2       l2m Fm   2

X p   p  lp1F1  lp2 F2          lpm Fm   p
in matrix notation :
X   L           F  
(p1)    (pm) (m1)     (p1)

The p deviations X1  1 , X 2   2 ,                      , X p  p
are expressed in terms of F1 ,                , Fm , 1 ,     ,  p , that is
p  m random variables.
Assumptions :
E  F  0     Cov  F   E  FF  I
(m1)                        (mm)

1 0    0
0      0
E   0      Cov     E                  
2
(p1)                           pp            
           
0  0    p 

F and  are independent thus Cov  , F   E  F  0
(pm)

Solve for  in terms of L and .
  Cov  X   E  X    X    

                    

 X    X      LF    LF   
  LF      LF    
              
              
 LF  LF     LF   LF  

E  X    X      E  LF  LF     LF   LF  

                    
     
                                    

 E  LFFL  FL  LF  
 LE  FF L  E  F L  LE  F  E  
indepenent  0   independent  0

 LIL    LL  
Also    X    F   LF    F  LFF  F
so   Cov  X, F   E  X    F
             
 LE  FF  E  F  L.

So Var  X i   li1 
2
 lim   i
2

Cov  X i , X k   li1lk1       lim l km i  k
Cov  X i , Fj   lij
The Principal Component  and principal factor  method
 One method for factor analysis 

Re call Spectral decomposition :
Let   a covariance matrix  have eigenvalue  eigenvector pair
 i ,ei  with 1   2          p  0. Then

  1e1e1   2 e 2 e 
2         p e p e
p

 1 e1 

        
  2 e 
  1 e1 |       2 e2 |               p ep          
2
                             |                      
        
  p e 
      p
Thus for a factor analysis where m  p  #factors  # var iables 
and  i  0 for all i
  L L  0  LL
(pp)   (pp) (pp)   (pp)   (pp)

Since we almost always want fewer factors than original var iables
 m  p .
One approach when the last p  m eigenvalue are small is to neglect
the contribution of  m1e m1e 1    p e p e to .
m                 p

We use an approximation and now   L L removing the last
(pp)   (pm) (pm)

p  m components. This assumes 1 error, can still be ignored.
If we wish to allow for  to be included
  LL  
 1 e1 

         1 0     0
  2 e   0  2   0
  1 e1     2 e2            m em                      
2
                                                        
         0  0

p 
  m e  
      m
m
where  i  ii   lij
2
for i  1,   , p.
j1
When applying this approach it is typical to center the observations
 minus X .

In the case where the units of the variables are not the same
                      
 e.g. K.g. and cm. 
      weight   height


it is usually desirable to work with the standardized variables,
  X j1  X1    S11 
                     
 X j2  X 2    S22 
Zj                          j  1, 2,   ,n
                     
                     
 X jp  X p 
                 Spp 

Maximum Likelihood Method for Factor Analysis

If the common factors, F, and  can be assumed to be normally
may be obtained.

When Fj and  j are jointly normal, the observations
X j    LFj   j are then normal. The likelihood is
 1 1 n
L  ,     2  
 np 2

n 2
        
exp   tr  X j  X X j  X   
 2     j1

n  X    X    


 1  1 n                      
 n 1 p 2
L  ,     2           
 n 1 2
   
exp   tr    X j  X X j  X    
 2 
             j1                

 n                           
exp    X     1  X     .
p 2   1 2
  2       
 2                           
which depends on L and  through   LL  .

To make L will defined, a unique solution, impose the condition that
L 1L  a diagonal matrix

Proportion of total sample                  ˆ2  ˆ2 
l1j l2 j      ˆpj
l2

variance due to the j   th
factor        S11  S22     Spp
Factor Rotation

orthogonal transformation have the same ability to reproduce
the covariance (or correlation) matrix.

ˆ    ˆ
L*  LT where TT  TT  I

so LL    LTTL    L*L*  
ˆˆ ˆ ˆ        ˆ ˆ ˆ ˆ         ˆ
The  remains unchanged also.
ˆ
Imagine if these were only m  2 factors
ˆ      ˆ
L*  L T
p2   (p2) (22)

 cos  sin   clockwise
where T  
  sin  cos   rotation


cos   sin   counter clockwise
or T  
 sin  cos       rotation

 is an angle which the factor loadings will be rotated through.
A Reference:
 Thefollowing 13 slides comes
from:
 Multivariate      Data Analysis Using
SPSS
 By   John Zhang
 ARL, IUP
Factor Analysis-1
   The main goal of factor analysis is data reduction.
A typical use of factor analysis is in survey
research, where a researcher wishes to represent
a number of questions with a smaller number of
factors
   Two questions in factor analysis:
   How many factors are there and what they represent
(interpretation)
   Two technical aids:
   Eigenvalues
   Percentage of variance accounted for
Factor Analysis-2
   Two types of factor analysis:
   Exploratory: introduce here
   Confirmatory: SPSS AMOS
   Theoretical basis:
   Correlations among variables are explained by
underlying factors
   An example of mathematical 1 factor model for two
variables:

V1=L1*F1+E1
V2=L2*F1+E2
Factor Analysis-3
   Each variable is composed of a common factor (F1)
   V1 and V2 correlate because the common factor and
   A set of correlations can derive different factor
   One should pick the simplest solution
That is the findings
should not differ by

Factor Analysis-4                                     methodology of
analysis nor by sample

   A factor solution needs to confirm:
 By a different factor method
 By a different sample
   More on terminology
correlation between the variable and the factor
   Communality: the proportion of variability for
a given variable that is explained by the factor
   Extraction: the process by which the factors
are determined from a large set of variables
Factor Analysis-5 (Principle components)
   Principle component: one of the extraction
methods
   A principle component is a linear combination of
observed variables that is independent (orthogonal) of
other components
   The first component accounts for the largest amount of
variance in the input data; the second component
accounts for the largest amount or the remaining
variance…
   Components are orthogonal means they are
uncorrelated
Factor Analysis-6 (Principle components)
   Possible application of principle
components:
   E.g. in a survey research, it is common to have
many questions to address one issue (e.g.
customer service). It is likely that these
questions are highly correlated. It is
problematic to use these variables in some
statistical procedures (e.g. regression). One
can use factor scores, computed from factor
Factor Analysis-7 (Principle components)
   Principle component vs. other extract methods:
   Principle component focus on accounting for the
maximum among of variance (the diagonal of a
correlation matrix)
   Other extract methods (e.g. principle axis factoring)
focus more on accounting for the correlations between
variables (off diagonal correlations)
   Principle component can be defined as a unique
combination of variables but the other factor methods
can not
   Principle component are use for data reduction but more
difficult to interpret
Factor Analysis-8
   Number of factors:
   Eigenvalues are often used to determine how
many factors to take
   Take as many factors there are eigenvalues greater
than 1
 Eigenvalue represents the amount of standardized
variance in the variable accounted for by a factor
 The amount of standardized variance in a variable is 1
 The sum of eigenvalues is the percentage of variance
accounted for
Factor Analysis-9
   Rotation
   Objective: to facilitate interpretation
   Orthogonal rotation: done when data reduction is the
objective and factors need to be orthogonal
   Varimax: attempts to simplify interpretation by maximize
   Quartimax: simplify solution by finding a rotation that
variable
   Oblique rotation: use when there are reason to allow
factors to be correlated
   Oblimin and Promax (promax runs fast)
Factor Analysis-10
   Factor scores: if you are satisfy with a
factor solution
   You can request that a new set of variables be
created that represents the scores of each
observation on the factor (difficult of interpret)
   You can use the lambda coefficient to judge
which variables are highly related to the factor;
the compute the sum of the mean of this
variables for further analysis (easy to interpret)
Factor Analysis-11
   Sample size: the sample size should be about 10
to 15 times the number of variables (as other
multivariate procedures)
   Number of methods: there are 8 factoring
methods, including principle component
   Principle axis: account for correlations between the
variables
   Unweighted least-squares: minimize the residual
between the observed and the reproduced correlation
matrix
Factor Analysis-12
   Generalize least-squares: similar to Unweighted least-
squares but give more weight to the variables with
stronger correlation
   Maximum Likelihood: generate the solution that is the
most likely to produce the correlation matrix
   Alpha Factoring: Consider variables as a sample; not
   Image factoring: decompose the variables into a
common part and a unique part, then work with the
common part
Factor Analysis-13
   Recommendations:
   Principle components and principle axis are the
most common used methods
   When there are multicollinearity, use principle
components
   Rotations are often done. Try to use Varimax
Reference
 Factor Analysis from SPSS
 Much of the wording comes from the SPSS
help and tutorial.
Factor Analysis
   Factor Analysis is primarily used for data
reduction or structure detection.
   The purpose of data reduction is to remove
redundant (highly correlated) variables from
the data file, perhaps replacing the entire data
file with a smaller number of uncorrelated
variables.
   The purpose of structure detection is to
examine the underlying (or latent)
relationships between the variables.
Factor Analysis
   The Factor Analysis procedure has several extraction methods for
constructing a solution.
   For Data Reduction. The principal components method of extraction
begins by finding a linear combination of variables (a component) that
accounts for as much variation in the original variables as possible. It
then finds another component that accounts for as much of the
remaining variation as possible and is uncorrelated with the previous
component, continuing in this way until there are as many
components as original variables. Usually, a few components will
account for most of the variation, and these components can be used
to replace the original variables. This method is most often used to
reduce the number of variables in the data file.
   For Structure Detection. Other Factor Analysis extraction methods go
one step further by adding the assumption that some of the variability
in the data cannot be explained by the components (usually called
factors in other extraction methods). As a result, the total variance
explained by the solution is smaller; however, the addition of this
structure to the factor model makes these methods ideal for
examining relationships between the variables.
   With any extraction method, the two questions that a good solution
should try to answer are "How many components (factors) are needed
to represent the variables?" and "What do these components
represent?"
Factor Analysis: Data Reduction
 An industry analyst would like to predict
automobile sales from a set of predictors.
However, many of the predictors are
correlated, and the analyst fears that this
 This information is contained in the file
car_sales.sav . Use Factor Analysis with
principal components extraction to focus
the analysis on a manageable subset of
the predictors.
Factor Analysis: Structure Detection
 A telecommunications provider wants to
better understand service usage patterns
in its customer database. If services can
be clustered by usage, the company can
offer more attractive packages to its
customers.
 A random sample from the customer
database is contained in telco.sav . Factor
Analysis to determine the underlying
structure in service usage.
 Use: Principal Axis Factoring
Example of Factor Analysis: Structure Detection

Telecommunications
provider wants to
better understand
service usage
patterns in its
customer database.
Selecting service
offerings
Example of Factor Analysis: Descriptives

Click descriptives:
Recommend
checking Initial
Solution (default)
“Anti-image” and
“KMO and …”.
Example of Factor Analysis: Extraction

Click Extraction:
Select Method
“Principal axis
factoring”.
Recommend
Keep defaults but
also check “Scree
plot”.
Example of Factor Analysis: Rotation

Click
Rotation:
Select
“Varimax”
and
plot(s)”.
Understanding the Output
The Kaiser-Meyer-Olkin Measure of Sampling Adequacy is a statistic that
indicates the proportion of variance in your variables that might be caused
by underlying factors. Perhaps can’t use factor analys if <0.5
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling

Bartlett's Test of      Approx. Chi-Square     6230.901
Sphericity              df                           91
Sig.                       .000

Bartlett's test of sphericity tests the hypothesis that your correlation
matrix is an identity matrix, which would indicate that your variables
are unrelated and therefore unsuitable for structure detection. Sig.
<0.05 than factor analysis may be helpful.
Understanding the Output
Communalities

Initial    Extraction   Extraction communalities
Long distance last month         .297        .748       are estimates of the
Toll free last month             .510        .564    variance in each variable
Equipment last month             .579        .697       accounted for by the
Calling card last month          .266        .307       factors in the factor
Wireless last month              .660        .708      solution. Small values
Multiple lines                   .276        .340    indicate variables that do
Voice mail                       .471        .501    not fit well with the factor
Paging service                   .527        .541       solution, and should
Internet                         .455        .525    possibly be dropped from
Caller ID                        .552        .623     the analysis. The lower
Call waiting                     .545        .610      values of Multiple lines
Call forwarding                  .532        .596      and Calling card show
3-way calling                    .506        .561
that they don't fit as well
Electronic billing               .416        .488
as the others.
Extraction Method: Principal Axis Factoring.
Understanding the Output             Before
rotation

Only three factors in
the initial solution have
eigenvalues greater
than 1. Together, they
account for almost
65% of the variability
in the original
variables. This
suggests that three
latent influences are
associated with service
usage, but there
remains room for a lot
of unexplained
variation.
Understanding the Output           After
rotation

From rotation
approximately now
56% of the variation
10% loss in
explanation of the
variation.
In general, there are a lot of services that have correlations
greater than 0.2 with multiple factors, which muddies the
picture. The rotated factor matrix should clear this up.

Understanding the Output                                        Before
rotation
Factor Matrixa

Factor
1           2             3         The relationships in the
Long distance last month         .146       -.254          .814     unrotated factor matrix
Toll free last month             .652       -.373          .020    are somewhat clear. The
Equipment last month             .494        .671          .054    third factor is associated
Calling card last month          .364       -.243          .339     with Long distance last
Wireless last month              .799        .261          .037
month. The second
Multiple lines                   .257        .280          .442
corresponds most
Voice mail                       .669        .228         -.038
Paging service                   .692        .246         -.050
strongly to Equipment
Internet                         .323        .648         -.014
last month, Internet, and
Caller ID                        .689       -.345         -.172   Electronic billing. The first
Call waiting                     .678       -.366         -.126    factor is associated with
Call forwarding                  .684       -.336         -.128       Toll free last month,
3-way calling                    .662       -.338         -.093       Wireless last month,
Electronic billing               .250        .652         -.035        Voice mail, Paging
Extraction Method: Principal Axis Factoring.                        service, Caller ID, Call
a. Attempted to extract 3 factors. More than 25 iterations      waiting, Call forwarding,
required. (Convergence=.002). Extraction was
terminated.
and 3-way calling.
Understanding the Output                                     After
rotation
a
Rotated Factor Matrix

Factor                  The first rotated factor is
1          2            3          most highly correlated
Long distance last month       .062      -.121         .854    with Toll free last month,
Toll free last month           .726       .018         .191      Caller ID, Call waiting,
Equipment last month           .067       .831         .049     Call forwarding, and 3-
Calling card last month        .348      -.012         .431        way calling. These
Wireless last month            .530       .637         .146
variables are not
Multiple lines                -.025       .384         .438
particularly correlated
Voice mail                     .455       .539         .054
Paging service
with the other two
.468       .566         .044
Internet                      -.049       .722        -.045
factors. The second factor
Caller ID                      .787       .056         .008    is most highly correlated
Call waiting                   .779       .033         .054        with Equipment last
Call forwarding                .768       .062         .048      month, Internet, and
3-way calling                  .743       .050         .078      Electronic billing. The
Electronic billing            -.107       .686        -.080       third factor is largely
Extraction Method: Principal Axis Factoring.                        unaffected by the
Rotation Method: Varimax with Kaiser Normalization.                      rotation.
a. Rotation converged in 4 iterations.
Thus, there are three major groupings of services, as defined by the services
that are most highly correlated with the three factors. Given these groupings,
you can make the following observations about the remaining services:
Understanding the Output
a
Rotated Factor Matrix

Factor
Because of their moderately large
1          2            3       correlations with both the first and
Long distance last month       .062      -.121         .854        second factors, Wireless last
Toll free last month           .726       .018         .191       month, Voice mail, and Paging
Equipment last month           .067       .831         .049      service bridge the "Extras" and
Calling card last month                                          "Tech" groups. Calling card last
.348      -.012         .431
month is moderately correlated
Wireless last month            .530       .637         .146      with the first and third factors,
Multiple lines                -.025       .384         .438      thus it bridges the "Extras" and
Voice mail                     .455       .539         .054     "Long Distance" groups. Multiple
Paging service                                                lines is moderately correlated with
.468       .566         .044
the second and third factors, thus
Internet                      -.049       .722        -.045      it bridges the "Tech" and "Long
Caller ID                      .787       .056         .008     Distance" groups. This suggests
Call waiting                   .779       .033         .054       avenues for cross-selling. For
Call forwarding                                                      example, customers who
.768       .062         .048
subscribe to extra services may be
3-way calling                  .743       .050         .078       more predisposed to accepting
Electronic billing            -.107       .686        -.080    special offers on wireless services
Extraction Method: Principal Axis Factoring.                        than Internet services.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 4 iterations.
Summary: What Was Learned
   Using a principal axis factors extraction,
you have uncovered three latent factors
that describe relationships between your
variables. These factors suggest various
patterns of service usage, which you can
use to more efficiently increase cross-
sales.
Using Principal Components
   Principal Components can aid in
clustering.

   What is principal components?
   Principal is a statistical technique that creates
new variables that are linear functions of the
old variables. The main goal of principal
components is to to reduce the number of
variables needed to analyze.
Principal Components
Analysis (PCA)
What it is and when it
should be used.
Introduction to PCA
   What does principal components analysis do?
    Takes a set of correlated variables and creates a smaller set of
uncorrelated variables.
    These newly created variables are called principal components.
   There are two main objectives for using PCA
1.   Reduce the dimensionality of the data.
–   In simple English: turn p variables into less than p variables.
–   While reducing the number of variables we attempt to keep as much
information of the original variables as possible.
–   Thus we try to reduce the number of variables without loss of
information.
2.   Identify new meaningful underlying variables.
–   This is often not possible.
–   The “principal components created are linear combinations of the
original variables and often don’t lend to any meaning beyond that.
   There are several reasons why and situations where PCA is
useful.
Introduction to PCA
   There are several reasons why PCA is useful.
1.   PCA is helpful in discovering if abnormalities exist in a multivariate
dataset.
2.   Clustering (which will be covered later):
–   PCA is helpful when it is desirable to classify units into groups with
similar attributes.
   For example: In marketing you may want to classify your customers into
groups (or clusters) with similar attributes for marketing purposes.
–   It can also be helpful for verifying the clusters created when clustering.
3.   Discriminant analysis:
–   In some cases there may be more response variables than
independent variables. It is not possible to use discriminant analysis
in this case.
–   Principal components can help reduce the number of response
variables to a number less than that of the independent variables.
4.   Regression:
–   It can help address the issue of multicolinearity in the independent
variables.
Introduction to PCA

   Formation of principal components
1.   They are uncorrelated
2.   The 1st principal component accounts for as
much of the variability in the data as possible.
3.   The 2nd principal component accounts for as
much of the remaining variability as possible.
4.   The 3rd …
5.   Etc.
Principal Components and Least Squares
   Think of the Least Squares model
Y  X E
Y is a n  p matrix of the centered observed variables.
X is a n  j matrix of the scores on the 1st j principal components.
B is a j  p matrix of the eigenvectors.
E is a n  p matrix of the residuals.
• Eigenvector <mathematics> A vector which, when acted on by a
particular linear transformations, produces a scalar multiple of the
original vector. The scalar in question is called the
eigenvalue corresponding to this eigenvector.
•www.dictionary.com
Calculation of the PCA
   There are two options:
1.   Correlation matrix.
2.   Covariance matrix.
   Using the covariance matrix will cause
variables with large variances to be more
strongly associated with components with
large eigenvalues and the opposite is true
of variables with small variances.
   For the above reason you should use the
correlation matrix unless the variables are
comparable or have been standardized.
Limitations to Principal Components
   PCA converts a set of correlated variables
into a smaller set of uncorrelated
variables.

   If the variables are already uncorrelated
than PCA has nothing to add.

   Often it is difficult to impossible to explain
a principal component. That is often
principal components do not lend
themselves to any meaning.
SAS Example of PCA
   We will analyze data on crime.
   CRIME RATES PER 100,000 POPULATION BY STATE.
   The variables are:
1.   MURDER
2.   RAPE
3.   ROBBERY
4.   ASSAULT
5.   BURGLARY
6.   LARCENY
7.   AUTO
SAS command for PCA
   SAS CODE:
    PROC PRINCOMP DATA=CRIME OUT=CRIMCOMP;
    run;

The dataset is CRIME and results
will be saved to CRIMCOMP
SAS Output Of Crime Example
Observations     50
Variables            7

Simple Statistics
MURDER         RAPE     ROBBERY         ASSAULT BURGLARY            LARCENY           AUTO
Mean 7.444000000 25.73400000 124.0920000 211.3000000          1291.904000 2671.288000 377.5260000
StD   3.866768941 10.75962995   88.3485672 100.2530492         432.455711   725.908707 193.3944175

Correlation Matrix
MURDER RAPE ROBBERY ASSAULT BURGLARY LARCENY AUTO
MURDER         1.0000 0.6012     0.4837      0.6486          0.3858     0.1019 0.0688
RAPE           0.6012 1.0000     0.5919      0.7403          0.7121     0.6140 0.3489
ROBBERY        0.4837 0.5919     1.0000      0.5571          0.6372     0.4467 0.5907
ASSAULT        0.6486 0.7403     0.5571      1.0000          0.6229     0.4044 0.2758
BURGLARY       0.3858 0.7121     0.6372      0.6229          1.0000     0.7921 0.5580
LARCENY        0.1019 0.6140     0.4467      0.4044          0.7921     1.0000 0.4442
AUTO           0.0688 0.3489     0.5907      0.2758          0.5580     0.4442 1.0000
More SAS Output Of Crime Example
0.09798342=0.22203947 - 0.12045606                               The first two
principal components
Eigenvalues of the Correlation Matrix                        captures 76.48% of
the variation.
Eigenvalue Difference Proportion Cumulative
1    4.11495951     2.87623768            0.5879     0.5879
2    1.23872183     0.51290521            0.1770     0.7648
3    0.72581663     0.40938458            0.1037     0.8685
4    0.31643205     0.05845759            0.0452     0.9137
5    0.25797446     0.03593499            0.0369     0.9506
6    0.22203947     0.09798342            0.0317     0.9823
7    0.12405606                           0.0177     1.0000

The proportion of variability explained               If you include 6 of the 7
by each principal component                      principal components you
individually. This value equals the                   capture 98.23% of the
Eigenvalue/(sum of the Eigenvalues).               variability. The 7th component
only captures 1.77%.
More SAS Output Of Crime Example
Eigenvectors
Prin1   Prin2     Prin3      Prin4        Prin5       Prin6        Prin7
MURDER           0.300279 -.629174 0.178245 -.232114 0.538123 0.259117 0.267593
RAPE             0.431759 -.169435 -.244198 0.062216 0.188471 -.773271 -.296485
ROBBERY          0.396875 0.042247 0.495861 -.557989 -.519977 -.114385 -.003903
ASSAULT          0.396652 -.343528 -.069510 0.629804 -.506651 0.172363 0.191745
BURGLARY         0.440157 0.203341 -.209895 -.057555 0.101033 0.535987 -.648117
LARCENY          0.357360 0.402319 -.539231 -.234890 0.030099 0.039406 0.601690
AUTO             0.295177 0.502421 0.568384 0.419238 0.369753 -.057298 0.147046

Prin1 has all positive                  Prin2 has positive and negative values. Murder, Rape,
values. This variable                       and Assault are all negative (Violent Crimes).
can be used as a proxy                   Robbery, Burglary, Larceny, and Auto are all positive
for overall crime rate.                       (Property). This variable can be used for an
understanding of Property vs. Violent crime.
CRIME RATES PER 100,000 POPULATION BY STATE
STATES LISTED IN ORDER OF OVERALL CRIME
RATE AS DETERMINED BY THE FIRST PRINCIPAL
COMPONENT

Lowest 10 States and Then theTop 10 States
CRIME RATES PER 100,000 POPULATION BY STATE.
STATES LISTED IN ORDER OF PROPERTY VS. VIOLENT CRIME AS
DETERMINED BY THE SECOND PRINCIPAL COMPONENT

Lowest 10 States and Then theTop 10 States
Correlation From SAS: First the Descriptive Statistics
(A part of the output from Correlation)
Correlation Matrix
Correlation Matrix: Just the Variables

Note that there is correlation
among the crime rates.
Correlation Matrix: Just the Principal Components

Note that there is no
correlation among the
principal components.
Correlation Matrix: Just the Principal Components

Note the higher/very high
correlations with the 1st few
principal components and it
decreases as it goes closer to the
last principal component.
What If We Told SAS to Produce
Only 2 Principal Components?
Eigenvalues of the Correlation Matrix
Eigenvalue Difference Proportion Cumulative
1   4.11495951   2.87623768       0.5879       0.5879
2   1.23872183                    0.1770       0.7648

The 2 principal                      Eigenvectors
components                                Prin1     Prin2
produced when it
MURDER         0.300279 -.629174
RAPE           0.431759 -.169435
produce only 2
principal              ROBBERY        0.396875 0.042247

components are             ASSAULT        0.396652 -.343528
exactly the same            BURGLARY       0.440157 0.203341
for when it              LARCENY        0.357360 0.402319
produced all.                            0.295177 0.502421
AUTO

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 185 posted: 8/31/2010 language: English pages: 80