Factor Analysis and Principal Components

Document Sample
Factor Analysis and Principal Components Powered By Docstoc
					   Factor Analysis
          and
Principal Components
 Factor analysis with principal
  components presented as a
    subset of factor analysis
techniques, which it is subset.
               Principal Components: (PC)
Principal components is about explaining the variance-covariance
structure, , of a set of variables through a few linear combinations
of these variables.

In general PC is used for either:
1. Data reduction
    or
2. Interpretation

If you have p variables x   x1 ,
                                     , x p  you need p  components
                                            
to capture all the variability, but often a smaller number, k, principal
components can capture most of the variability.
So the original data of n measurements on p variables can be
reduced to a to a data set of n measurements on k principal
components. PC tends to be a means to an end but not the end
itself. That is PC is often not the final step. The PC may be
used for multiple regression, cluster analysis, etc.
  Let x   x1 ,
                      , x p  with   covariance matrix 
                             
  consider the linear combination
        
  Y1  a1x  a11x1  a12 x 2   a1p x p
                 p
  Y2  a  x   a 2i x i
         2
                i 1
                 p
  Yp  a  x   a pi x i
         p
                i 1

  Var  Yi   a a i , i  1, 2,
                 i                   ,p
  Cov  Yi , Yk   a a k , i, k  1, 2,
                      i                      ,p
                                                   
1. First principal component  linear combination a1x
  that maximizes Var  a1x  subject to a1a1  1
                                        


2. second principal component  linear combination a  x
                                                     2

  that maximizes Var  a  x  subject to a  a 2  1
                         2                  2

  and Cov  a1x,a  x   0
              2


i th . i th principal component  linear combination a x
                                                       i

  that maximizes Var  a x  subject to a a i  1
                         i                 i

  and Cov  a x,a  x   0 for k  i.
              i    k
Find the principal components and the proportion of the total
population of the total population variance explained by
each when the covariance matrix is


             2  2 0 
            2         2      1      1
              2
                        ,     
                                2      2
            0  2  2 
                        


To solve this you will have to go though your notes, but you can
do this even though didn’t give you the formula.
Hint : Re call : Maximization of Quadratic forms for points on
the Unit Sphere.
Let B be a positive definite matrix with eigenvalues
        pp 
1   2   3              p and associated normalized eigenvectors
e1 ,e 2 ,        ,e p . Then
         x Bx
max               1  attained when x  e1 
  x 0     x x
        x Bx
min               p  attained when x  e p 
 x  0 x x

Moreover
            x Bx
  max                k 1  attained when x  e k 1 k  1, 2,   , p  1
x e1 , ,e k x x
                 I        2          0
                            2       3    2        2    2
Answer :
          2   2  2 42   0
   
     2
                              
                               
                      
   2 or    2 1   2      
for 1   2    e1  1
                 
                           2 0 1       2
                                          
         
 2  2 1   2         e  1 2 1
                           2            2 1 2
                                              
3     1   2 
         2
                          e  1 2 1
                           3             2 1     2
                                                   
   Principal Components          Var             % Total var

    1
Y1  X1 
            1
               X3          2               1
    2        2                              3
                                       
    1       1       1
Y2  X1 
    2        2
               X 2  X3
                    2
                           2 1   2       1
                                            3
                                             1  2     
                                                        
    1       1        1                      1
Y3  X1       X2      X3                    1  2
    2        2        2      
                           2 1   2      3
   Let  be the covariance matrix associated with the random vector
   x   x1 , x 2 ,
                        , x p  . Let  have the eigenvalue  eigenvector pairs
                               
    1 ,e1  ,   2 ,e2  ,    ,   p ,e p  where 1   2      p  0.
   Then the i th principal component is given by
   Yi  ex  ei1x1  ei 2 x 2   eip x p i  1, 2, , p
         i

   Var  Yi   eei   i
                 i                   i  1, 2,   ,p
   Cov  Yi , Yk   ee k  0 i  k
                      i



1. 11   22            pp
       p                                              p
     Var  x i   1   2              p   Var  Yi 
      i 1                                         i 1
Proof :
       By definition tr     11   22           pp
we can write  as   PP where  is the diagonal matrix of
eigenvalues and P  e1 ,e 2 ,
                                         ,e p 
                                               
PP  PP  I
                                                      p
tr     tr  PP   tr  PP   tr       i
                                                     i 1
 p                                    p
 Var  x i   tr     tr       Yi 
i 1                                 i 1
                                              p             p
Thus total population variance   ii    i
                                             i 1         i 1

Thus proportion of total pop.var. due to k th principal component
        k
is      p
               k  1, 2,   , p.
        i
       i 1
      
Y1  e1x , Y2  e x ,
                 2                 , Yp  e x are the principal components
                                           p

                                                           eik  i
from the covariance matrix , then Yi ,x k                         i, k  1, 2,   ,p
                                                               kk
are the correlation coefficients between Yi and the variables x k .
Here  1 ,e1  ,   2 ,e 2  ,     ,   p ,e p  are the eigenvalue  eigenvector
pair for .
                                      eik  i
              Show Yi ,x k 
                                         kk
Proof : set a   0,0,
              k                  ,1,0,    ,0           x k  a x
                                                               k

Cov  X k , Yk   Cov  a  x,ex   a  ei
                           k    i        k              ei   i ei
So Cov  X k , Yi   a   i ei   i eik
                        k

Var  Yi    i    Show earlier 
Var  X k   kk
              Cov  Yi , X k 
Yi ,Xk 
            Var  Yi  Var  X k 
          i eik           eik  i
                                       i, k  1, 2,      ,p
          i kk              kk
Pricipal Components from Standard Variables
Z1   X1  1    11
     Z2   X 2   2     22


     Zp   X p   p     pp
                                 1
                          1
In matrix notation Z  V   2
                                      X  
                         
               11     0                   0 
                                                
           1
               0        22                0 
where V  2
                                                 
                                                
               0       0                    pp 
                                                
                           1       1
                       
                       1
                              1
                                   
E  Z  0 Cov  Z    V    V   
                       2        2

                                
The PC of Z can be obtained from the eigenvectors
of the correlation matrix  of X.
For notation we shall continue to use Yi to refer to the i th PC and
 i ,ei  as the eigenvalue  eigenvector pair from either  or .
However, the   i ,ei  derived from  are, in general, not the same as
the ones derived from .
The i th PC of the standardized variables Z   Z1 , Z2 ,
                                                                     , Zp 
                                                                           
                                                             1
                                                       1
with Cov  Z    is given by Yi  eZ  e  V 
                                     i     i
                                                         2
                                                                  X     i  1,   ,p
                                              
 p                  p
 Var  Yi    Var  Zi     The number of random variables, not rho 
i 1            i 1

Yi ,Zk  eik  i       i,k  1,2,   ,p where   i ,ei  are
the eigenvalue  eigenvector pair for , with 1   2                  p  0.
If S  Sik  is the p  p sample covariance matrix with
                               ˆ ˆ        
                                        ˆ ˆ
eigenvalue  eigenvector pairs 1 ,e1 ,  2 ,e 2 ,          ,   ,e 
                                                                   ˆ ˆ
                                                                    p   p

the i th sample principal component is given by
    yi  ex  ei1x1  ei2 x 2   eip x p i  1, 2, , p.
    ˆ ˆi       ˆ       ˆ           ˆ
      ˆ    ˆ
where 1   2             ˆ
                            p  0 and X is any observation
on the var iables X1 , X 2 ,      , Xp.
                              ˆ
Also sample variance of y k   k k  1, 2,
                        ˆ                               ,p
sample covariance of  yi , y k   0 i  k
                       ˆ ˆ

                     ˆ ˆ
                     eik  i
        ryi ,x k 
         ˆ                     i, k  1, 2,   ,p
                       Skk
Factor Analysis

         The main purpose of factor analysis is to try to describe
the covariance relationships among many variables in term of
a few underlying, but unobservable, random quantities called
factors.

        The Orthogonal Factor Model:
The observable random vector x, with p components, has mean
 and covariance matrix, .
The factor model proposes that x is linearly dependent upon
a few unobservable random variables, F1, F2 ,….,Fm , called
common factors and p additional sources of variation
1, 2 ,….,p called error, or specific factors.
X1  1  l11F1  l12 F2        l1m Fm  1
X 2   2  l21F1  l22 F2       l2m Fm   2


X p   p  lp1F1  lp2 F2          lpm Fm   p
      in matrix notation :
            X   L           F  
             (p1)    (pm) (m1)     (p1)

lij is called the i th loading of the jth factor.
L is called the matrix of factor loadings.
The p deviations X1  1 , X 2   2 ,                      , X p  p
are expressed in terms of F1 ,                , Fm , 1 ,     ,  p , that is
p  m random variables.
Assumptions :
  E  F  0     Cov  F   E  FF  I
         (m1)                        (mm)

                                             1 0    0
                                             0      0
  E   0      Cov     E                  
                                                  2
         (p1)                           pp            
                                                        
                                             0  0    p 


F and  are independent thus Cov  , F   E  F  0
                                                      (pm)

Solve for  in terms of L and .
  Cov  X   E  X    X    
                  
                                      
                                       
 X    X      LF    LF   
           LF      LF    
                                    
                                    
          LF  LF     LF   LF  

E  X    X      E  LF  LF     LF   LF  
  
                      
                            
                                                                 
                                                                  
       E  LFFL  FL  LF  
       LE  FF L  E  F L  LE  F  E  
                         indepenent  0   independent  0

       LIL    LL  
Also    X    F   LF    F  LFF  F
so   Cov  X, F   E  X    F
                                   
         LE  FF  E  F  L.


So Var  X i   li1 
                  2
                                 lim   i
                                   2


       Cov  X i , X k   li1lk1       lim l km i  k
       Cov  X i , Fj   lij
The Principal Component  and principal factor  method
 One method for factor analysis 

Re call Spectral decomposition :
Let   a covariance matrix  have eigenvalue  eigenvector pair
 i ,ei  with 1   2          p  0. Then
            
     1e1e1   2 e 2 e 
                         2         p e p e
                                            p

                                                      1 e1 
                                                            
                                                             
                                                       2 e 
       1 e1 |       2 e2 |               p ep          
                                                            2
                                    |                      
                                                             
                                                       p e 
                                                           p
Thus for a factor analysis where m  p  #factors  # var iables 
and  i  0 for all i
       L L  0  LL
     (pp)   (pp) (pp)   (pp)   (pp)




Since we almost always want fewer factors than original var iables
 m  p .
One approach when the last p  m eigenvalue are small is to neglect
the contribution of  m1e m1e 1    p e p e to .
                               m                 p

We use an approximation and now   L L removing the last
                                           (pp)   (pm) (pm)

p  m components. This assumes 1 error, can still be ignored.
If we wish to allow for  to be included
       LL  
                                             1 e1 
                                                   
                                                     1 0     0
                                              2 e   0  2   0
       1 e1     2 e2            m em                      
                                                   2
                                                               
                                                     0  0
                                                                    
                                                                 p 
                                              m e  
                                                  m
                   m
where  i  ii   lij
                     2
                           for i  1,   , p.
                   j1
When applying this approach it is typical to center the observations
 minus X .

In the case where the units of the variables are not the same
                      
 e.g. K.g. and cm. 
      weight   height
                       

it is usually desirable to work with the standardized variables,
             X j1  X1    S11 
                                
            X j2  X 2    S22 
      Zj                          j  1, 2,   ,n
                                
                                
            X jp  X p 
                            Spp 
                                 
     Maximum Likelihood Method for Factor Analysis

  If the common factors, F, and  can be assumed to be normally
distributed, then ML estimates of the factor loadings and  variance
may be obtained.

 When Fj and  j are jointly normal, the observations
 X j    LFj   j are then normal. The likelihood is
                                                 1 1 n
 L  ,     2  
                         np 2
                                 
                                     n 2
                                                                  
                                            exp   tr  X j  X X j  X   
                                                 2     j1


                                                          n  X    X    
                                                                                
                                                                                
                                              1  1 n                      
                n 1 p 2
L  ,     2           
                                n 1 2
                                                                     
                                         exp   tr    X j  X X j  X    
                                              2 
                                                          j1                
                                                                               
                                       n                           
                                  exp    X     1  X     .
                      p 2   1 2
               2       
                                       2                           
   which depends on L and  through   LL  .

To make L will defined, a unique solution, impose the condition that
L 1L  a diagonal matrix


 Proportion of total sample                  ˆ2  ˆ2 
                                             l1j l2 j      ˆpj
                                                            l2
                                       
variance due to the j   th
                              factor        S11  S22     Spp
   Factor Rotation

All factor loadings obtained from the initial loadings by a
orthogonal transformation have the same ability to reproduce
the covariance (or correlation) matrix.

          ˆ    ˆ
          L*  LT where TT  TT  I

          so LL    LTTL    L*L*  
             ˆˆ ˆ ˆ        ˆ ˆ ˆ ˆ         ˆ
          The  remains unchanged also.
              ˆ
Imagine if these were only m  2 factors
       ˆ      ˆ
       L*  L T
       p2   (p2) (22)

           cos  sin   clockwise
where T  
            sin  cos   rotation
                          

           cos   sin   counter clockwise
    or T  
            sin  cos       rotation


 is an angle which the factor loadings will be rotated through.
A Reference:
 Thefollowing 13 slides comes
 from:
  Multivariate      Data Analysis Using
   SPSS
    By   John Zhang
         ARL, IUP
Factor Analysis-1
   The main goal of factor analysis is data reduction.
    A typical use of factor analysis is in survey
    research, where a researcher wishes to represent
    a number of questions with a smaller number of
    factors
   Two questions in factor analysis:
       How many factors are there and what they represent
        (interpretation)
   Two technical aids:
       Eigenvalues
       Percentage of variance accounted for
Factor Analysis-2
   Two types of factor analysis:
       Exploratory: introduce here
       Confirmatory: SPSS AMOS
   Theoretical basis:
       Correlations among variables are explained by
        underlying factors
       An example of mathematical 1 factor model for two
        variables:

                V1=L1*F1+E1
                V2=L2*F1+E2
Factor Analysis-3
       Each variable is composed of a common factor (F1)
        multiply by a loading coefficient (L1, L2 – the lambdas
        or factor loadings) plus a random component
       V1 and V2 correlate because the common factor and
        should relate to the factor loadings, thus, the factor
        loadings can be estimated by the correlations
       A set of correlations can derive different factor
        loadings (i.e. the solutions are not unique)
       One should pick the simplest solution
                                                     That is the findings
                                                    should not differ by

Factor Analysis-4                                     methodology of
                                                   analysis nor by sample

            A factor solution needs to confirm:
               By a different factor method
               By a different sample
   More on terminology
       Factor loading: interpreted as the Pearson
        correlation between the variable and the factor
       Communality: the proportion of variability for
        a given variable that is explained by the factor
       Extraction: the process by which the factors
        are determined from a large set of variables
Factor Analysis-5 (Principle components)
   Principle component: one of the extraction
    methods
       A principle component is a linear combination of
        observed variables that is independent (orthogonal) of
        other components
       The first component accounts for the largest amount of
        variance in the input data; the second component
        accounts for the largest amount or the remaining
        variance…
       Components are orthogonal means they are
        uncorrelated
Factor Analysis-6 (Principle components)
   Possible application of principle
    components:
       E.g. in a survey research, it is common to have
        many questions to address one issue (e.g.
        customer service). It is likely that these
        questions are highly correlated. It is
        problematic to use these variables in some
        statistical procedures (e.g. regression). One
        can use factor scores, computed from factor
        loadings on each orthogonal component
Factor Analysis-7 (Principle components)
   Principle component vs. other extract methods:
       Principle component focus on accounting for the
        maximum among of variance (the diagonal of a
        correlation matrix)
       Other extract methods (e.g. principle axis factoring)
        focus more on accounting for the correlations between
        variables (off diagonal correlations)
       Principle component can be defined as a unique
        combination of variables but the other factor methods
        can not
       Principle component are use for data reduction but more
        difficult to interpret
Factor Analysis-8
   Number of factors:
       Eigenvalues are often used to determine how
        many factors to take
            Take as many factors there are eigenvalues greater
             than 1
               Eigenvalue represents the amount of standardized
                variance in the variable accounted for by a factor
               The amount of standardized variance in a variable is 1
               The sum of eigenvalues is the percentage of variance
                accounted for
Factor Analysis-9
   Rotation
       Objective: to facilitate interpretation
       Orthogonal rotation: done when data reduction is the
        objective and factors need to be orthogonal
            Varimax: attempts to simplify interpretation by maximize
             the variances of the variable loadings on each factor
            Quartimax: simplify solution by finding a rotation that
             produces high and low loadings across factors for each
             variable
       Oblique rotation: use when there are reason to allow
        factors to be correlated
            Oblimin and Promax (promax runs fast)
Factor Analysis-10
   Factor scores: if you are satisfy with a
    factor solution
       You can request that a new set of variables be
        created that represents the scores of each
        observation on the factor (difficult of interpret)
       You can use the lambda coefficient to judge
        which variables are highly related to the factor;
        the compute the sum of the mean of this
        variables for further analysis (easy to interpret)
Factor Analysis-11
   Sample size: the sample size should be about 10
    to 15 times the number of variables (as other
    multivariate procedures)
   Number of methods: there are 8 factoring
    methods, including principle component
       Principle axis: account for correlations between the
        variables
       Unweighted least-squares: minimize the residual
        between the observed and the reproduced correlation
        matrix
Factor Analysis-12
     Generalize least-squares: similar to Unweighted least-
      squares but give more weight to the variables with
      stronger correlation
     Maximum Likelihood: generate the solution that is the
      most likely to produce the correlation matrix
     Alpha Factoring: Consider variables as a sample; not
      using factor loadings
     Image factoring: decompose the variables into a
      common part and a unique part, then work with the
      common part
Factor Analysis-13
   Recommendations:
       Principle components and principle axis are the
        most common used methods
       When there are multicollinearity, use principle
        components
       Rotations are often done. Try to use Varimax
Reference
 Factor Analysis from SPSS
 Much of the wording comes from the SPSS
  help and tutorial.
Factor Analysis
   Factor Analysis is primarily used for data
    reduction or structure detection.
       The purpose of data reduction is to remove
        redundant (highly correlated) variables from
        the data file, perhaps replacing the entire data
        file with a smaller number of uncorrelated
        variables.
       The purpose of structure detection is to
        examine the underlying (or latent)
        relationships between the variables.
Factor Analysis
   The Factor Analysis procedure has several extraction methods for
    constructing a solution.
       For Data Reduction. The principal components method of extraction
        begins by finding a linear combination of variables (a component) that
        accounts for as much variation in the original variables as possible. It
        then finds another component that accounts for as much of the
        remaining variation as possible and is uncorrelated with the previous
        component, continuing in this way until there are as many
        components as original variables. Usually, a few components will
        account for most of the variation, and these components can be used
        to replace the original variables. This method is most often used to
        reduce the number of variables in the data file.
       For Structure Detection. Other Factor Analysis extraction methods go
        one step further by adding the assumption that some of the variability
        in the data cannot be explained by the components (usually called
        factors in other extraction methods). As a result, the total variance
        explained by the solution is smaller; however, the addition of this
        structure to the factor model makes these methods ideal for
        examining relationships between the variables.
       With any extraction method, the two questions that a good solution
        should try to answer are "How many components (factors) are needed
        to represent the variables?" and "What do these components
        represent?"
Factor Analysis: Data Reduction
 An industry analyst would like to predict
  automobile sales from a set of predictors.
  However, many of the predictors are
  correlated, and the analyst fears that this
  might adversely affect her results.
 This information is contained in the file
  car_sales.sav . Use Factor Analysis with
  principal components extraction to focus
  the analysis on a manageable subset of
  the predictors.
Factor Analysis: Structure Detection
 A telecommunications provider wants to
  better understand service usage patterns
  in its customer database. If services can
  be clustered by usage, the company can
  offer more attractive packages to its
  customers.
 A random sample from the customer
  database is contained in telco.sav . Factor
  Analysis to determine the underlying
  structure in service usage.
 Use: Principal Axis Factoring
Example of Factor Analysis: Structure Detection




                                 Telecommunications
                                  provider wants to
                                  better understand
                                    service usage
                                    patterns in its
                                 customer database.
                                 Selecting service
                                     offerings
Example of Factor Analysis: Descriptives




                               Click descriptives:
                                   Recommend
                                 checking Initial
                               Solution (default)
                               In addition, check
                                “Anti-image” and
                                  “KMO and …”.
Example of Factor Analysis: Extraction




                                  Click Extraction:
                                   Select Method
                                   “Principal axis
                                      factoring”.
                                    Recommend
                                 Keep defaults but
                                 also check “Scree
                                         plot”.
Example of Factor Analysis: Rotation



                                          Click
                                       Rotation:
                                         Select
                                       “Varimax”
                                          and
                                        Loading
                                        plot(s)”.
                  Understanding the Output
 The Kaiser-Meyer-Olkin Measure of Sampling Adequacy is a statistic that
indicates the proportion of variance in your variables that might be caused
       by underlying factors. Perhaps can’t use factor analys if <0.5
                   KMO and Bartlett's Test
  Kaiser-Meyer-Olkin Measure of Sampling
  Adequacy.                                           .888

  Bartlett's Test of      Approx. Chi-Square     6230.901
  Sphericity              df                           91
                          Sig.                       .000


 Bartlett's test of sphericity tests the hypothesis that your correlation
 matrix is an identity matrix, which would indicate that your variables
 are unrelated and therefore unsuitable for structure detection. Sig.
               <0.05 than factor analysis may be helpful.
                    Understanding the Output
                 Communalities

                             Initial    Extraction   Extraction communalities
Long distance last month         .297        .748       are estimates of the
Toll free last month             .510        .564    variance in each variable
Equipment last month             .579        .697       accounted for by the
Calling card last month          .266        .307       factors in the factor
Wireless last month              .660        .708      solution. Small values
Multiple lines                   .276        .340    indicate variables that do
Voice mail                       .471        .501    not fit well with the factor
Paging service                   .527        .541       solution, and should
Internet                         .455        .525    possibly be dropped from
Caller ID                        .552        .623     the analysis. The lower
Call waiting                     .545        .610      values of Multiple lines
Call forwarding                  .532        .596      and Calling card show
3-way calling                    .506        .561
                                                     that they don't fit as well
Electronic billing               .416        .488
                                                            as the others.
Extraction Method: Principal Axis Factoring.
Understanding the Output             Before
                                    rotation



                     Only three factors in
                   the initial solution have
                      eigenvalues greater
                    than 1. Together, they
                      account for almost
                     65% of the variability
                         in the original
                         variables. This
                      suggests that three
                     latent influences are
                   associated with service
                        usage, but there
                    remains room for a lot
                         of unexplained
                            variation.
Understanding the Output           After
                                 rotation




                       From rotation
                    approximately now
                   56% of the variation
                   is explained about a
                        10% loss in
                    explanation of the
                         variation.
    In general, there are a lot of services that have correlations
     greater than 0.2 with multiple factors, which muddies the
       picture. The rotated factor matrix should clear this up.

                        Understanding the Output                                        Before
                                                                                       rotation
                        Factor Matrixa

                                          Factor
                                1           2             3         The relationships in the
Long distance last month         .146       -.254          .814     unrotated factor matrix
Toll free last month             .652       -.373          .020    are somewhat clear. The
Equipment last month             .494        .671          .054    third factor is associated
Calling card last month          .364       -.243          .339     with Long distance last
Wireless last month              .799        .261          .037
                                                                       month. The second
Multiple lines                   .257        .280          .442
                                                                       corresponds most
Voice mail                       .669        .228         -.038
Paging service                   .692        .246         -.050
                                                                     strongly to Equipment
Internet                         .323        .648         -.014
                                                                  last month, Internet, and
Caller ID                        .689       -.345         -.172   Electronic billing. The first
Call waiting                     .678       -.366         -.126    factor is associated with
Call forwarding                  .684       -.336         -.128       Toll free last month,
3-way calling                    .662       -.338         -.093       Wireless last month,
Electronic billing               .250        .652         -.035        Voice mail, Paging
Extraction Method: Principal Axis Factoring.                        service, Caller ID, Call
   a. Attempted to extract 3 factors. More than 25 iterations      waiting, Call forwarding,
      required. (Convergence=.002). Extraction was
      terminated.
                                                                       and 3-way calling.
                       Understanding the Output                                     After
                                                                                  rotation
                                      a
                  Rotated Factor Matrix

                                       Factor                  The first rotated factor is
                              1          2            3          most highly correlated
Long distance last month       .062      -.121         .854    with Toll free last month,
Toll free last month           .726       .018         .191      Caller ID, Call waiting,
Equipment last month           .067       .831         .049     Call forwarding, and 3-
Calling card last month        .348      -.012         .431        way calling. These
Wireless last month            .530       .637         .146
                                                                    variables are not
Multiple lines                -.025       .384         .438
                                                                 particularly correlated
Voice mail                     .455       .539         .054
Paging service
                                                                   with the other two
                               .468       .566         .044
Internet                      -.049       .722        -.045
                                                              factors. The second factor
Caller ID                      .787       .056         .008    is most highly correlated
Call waiting                   .779       .033         .054        with Equipment last
Call forwarding                .768       .062         .048      month, Internet, and
3-way calling                  .743       .050         .078      Electronic billing. The
Electronic billing            -.107       .686        -.080       third factor is largely
Extraction Method: Principal Axis Factoring.                        unaffected by the
Rotation Method: Varimax with Kaiser Normalization.                      rotation.
   a. Rotation converged in 4 iterations.
 Thus, there are three major groupings of services, as defined by the services
 that are most highly correlated with the three factors. Given these groupings,
    you can make the following observations about the remaining services:
                       Understanding the Output
                                      a
                  Rotated Factor Matrix

                                       Factor
                                                               Because of their moderately large
                              1          2            3       correlations with both the first and
Long distance last month       .062      -.121         .854        second factors, Wireless last
Toll free last month           .726       .018         .191       month, Voice mail, and Paging
Equipment last month           .067       .831         .049      service bridge the "Extras" and
Calling card last month                                          "Tech" groups. Calling card last
                               .348      -.012         .431
                                                                 month is moderately correlated
Wireless last month            .530       .637         .146      with the first and third factors,
Multiple lines                -.025       .384         .438      thus it bridges the "Extras" and
Voice mail                     .455       .539         .054     "Long Distance" groups. Multiple
Paging service                                                lines is moderately correlated with
                               .468       .566         .044
                                                              the second and third factors, thus
Internet                      -.049       .722        -.045      it bridges the "Tech" and "Long
Caller ID                      .787       .056         .008     Distance" groups. This suggests
Call waiting                   .779       .033         .054       avenues for cross-selling. For
Call forwarding                                                      example, customers who
                               .768       .062         .048
                                                              subscribe to extra services may be
3-way calling                  .743       .050         .078       more predisposed to accepting
Electronic billing            -.107       .686        -.080    special offers on wireless services
Extraction Method: Principal Axis Factoring.                        than Internet services.
Rotation Method: Varimax with Kaiser Normalization.
   a. Rotation converged in 4 iterations.
Summary: What Was Learned
   Using a principal axis factors extraction,
    you have uncovered three latent factors
    that describe relationships between your
    variables. These factors suggest various
    patterns of service usage, which you can
    use to more efficiently increase cross-
    sales.
Using Principal Components
   Principal Components can aid in
    clustering.

   What is principal components?
       Principal is a statistical technique that creates
        new variables that are linear functions of the
        old variables. The main goal of principal
        components is to to reduce the number of
        variables needed to analyze.
Principal Components
    Analysis (PCA)
 What it is and when it
   should be used.
    Introduction to PCA
   What does principal components analysis do?
        Takes a set of correlated variables and creates a smaller set of
         uncorrelated variables.
        These newly created variables are called principal components.
   There are two main objectives for using PCA
    1.   Reduce the dimensionality of the data.
         –   In simple English: turn p variables into less than p variables.
         –   While reducing the number of variables we attempt to keep as much
             information of the original variables as possible.
         –   Thus we try to reduce the number of variables without loss of
             information.
    2.   Identify new meaningful underlying variables.
         –   This is often not possible.
         –   The “principal components created are linear combinations of the
             original variables and often don’t lend to any meaning beyond that.
   There are several reasons why and situations where PCA is
    useful.
    Introduction to PCA
   There are several reasons why PCA is useful.
    1.   PCA is helpful in discovering if abnormalities exist in a multivariate
         dataset.
    2.   Clustering (which will be covered later):
         –   PCA is helpful when it is desirable to classify units into groups with
             similar attributes.
                 For example: In marketing you may want to classify your customers into
                  groups (or clusters) with similar attributes for marketing purposes.
         –   It can also be helpful for verifying the clusters created when clustering.
    3.   Discriminant analysis:
         –   In some cases there may be more response variables than
             independent variables. It is not possible to use discriminant analysis
             in this case.
         –   Principal components can help reduce the number of response
             variables to a number less than that of the independent variables.
    4.   Regression:
         –   It can help address the issue of multicolinearity in the independent
             variables.
Introduction to PCA

   Formation of principal components
    1.   They are uncorrelated
    2.   The 1st principal component accounts for as
         much of the variability in the data as possible.
    3.   The 2nd principal component accounts for as
         much of the remaining variability as possible.
    4.   The 3rd …
    5.   Etc.
Principal Components and Least Squares
      Think of the Least Squares model
             Y  X E
 Y is a n  p matrix of the centered observed variables.
 X is a n  j matrix of the scores on the 1st j principal components.
 B is a j  p matrix of the eigenvectors.
 E is a n  p matrix of the residuals.
• Eigenvector <mathematics> A vector which, when acted on by a
particular linear transformations, produces a scalar multiple of the
original vector. The scalar in question is called the
eigenvalue corresponding to this eigenvector.
   •www.dictionary.com
Calculation of the PCA
   There are two options:
    1.   Correlation matrix.
    2.   Covariance matrix.
   Using the covariance matrix will cause
    variables with large variances to be more
    strongly associated with components with
    large eigenvalues and the opposite is true
    of variables with small variances.
   For the above reason you should use the
    correlation matrix unless the variables are
    comparable or have been standardized.
Limitations to Principal Components
   PCA converts a set of correlated variables
    into a smaller set of uncorrelated
    variables.

   If the variables are already uncorrelated
    than PCA has nothing to add.

   Often it is difficult to impossible to explain
    a principal component. That is often
    principal components do not lend
    themselves to any meaning.
SAS Example of PCA
   We will analyze data on crime.
   CRIME RATES PER 100,000 POPULATION BY STATE.
   The variables are:
    1.   MURDER
    2.   RAPE
    3.   ROBBERY
    4.   ASSAULT
    5.   BURGLARY
    6.   LARCENY
    7.   AUTO
                       SAS command for PCA
   SAS CODE:
        PROC PRINCOMP DATA=CRIME OUT=CRIMCOMP;
        run;

                      The dataset is CRIME and results
                       will be saved to CRIMCOMP
 SAS Output Of Crime Example
                                     Observations     50
                                     Variables            7



                                     Simple Statistics
        MURDER         RAPE     ROBBERY         ASSAULT BURGLARY            LARCENY           AUTO
Mean 7.444000000 25.73400000 124.0920000 211.3000000          1291.904000 2671.288000 377.5260000
StD   3.866768941 10.75962995   88.3485672 100.2530492         432.455711   725.908707 193.3944175

                                     Correlation Matrix
                  MURDER RAPE ROBBERY ASSAULT BURGLARY LARCENY AUTO
      MURDER         1.0000 0.6012     0.4837      0.6486          0.3858     0.1019 0.0688
      RAPE           0.6012 1.0000     0.5919      0.7403          0.7121     0.6140 0.3489
      ROBBERY        0.4837 0.5919     1.0000      0.5571          0.6372     0.4467 0.5907
      ASSAULT        0.6486 0.7403     0.5571      1.0000          0.6229     0.4044 0.2758
      BURGLARY       0.3858 0.7121     0.6372      0.6229          1.0000     0.7921 0.5580
      LARCENY        0.1019 0.6140     0.4467      0.4044          0.7921     1.0000 0.4442
      AUTO           0.0688 0.3489     0.5907      0.2758          0.5580     0.4442 1.0000
    More SAS Output Of Crime Example
       0.09798342=0.22203947 - 0.12045606                               The first two
                                                                    principal components
        Eigenvalues of the Correlation Matrix                        captures 76.48% of
                                                                        the variation.
    Eigenvalue Difference Proportion Cumulative
1    4.11495951     2.87623768            0.5879     0.5879
2    1.23872183     0.51290521            0.1770     0.7648
3    0.72581663     0.40938458            0.1037     0.8685
4    0.31643205     0.05845759            0.0452     0.9137
5    0.25797446     0.03593499            0.0369     0.9506
6    0.22203947     0.09798342            0.0317     0.9823
7    0.12405606                           0.0177     1.0000

The proportion of variability explained               If you include 6 of the 7
    by each principal component                      principal components you
 individually. This value equals the                   capture 98.23% of the
Eigenvalue/(sum of the Eigenvalues).               variability. The 7th component
                                                        only captures 1.77%.
More SAS Output Of Crime Example
                                     Eigenvectors
                     Prin1   Prin2     Prin3      Prin4        Prin5       Prin6        Prin7
MURDER           0.300279 -.629174 0.178245 -.232114 0.538123 0.259117 0.267593
RAPE             0.431759 -.169435 -.244198 0.062216 0.188471 -.773271 -.296485
ROBBERY          0.396875 0.042247 0.495861 -.557989 -.519977 -.114385 -.003903
ASSAULT          0.396652 -.343528 -.069510 0.629804 -.506651 0.172363 0.191745
BURGLARY         0.440157 0.203341 -.209895 -.057555 0.101033 0.535987 -.648117
LARCENY          0.357360 0.402319 -.539231 -.234890 0.030099 0.039406 0.601690
AUTO             0.295177 0.502421 0.568384 0.419238 0.369753 -.057298 0.147046


 Prin1 has all positive                  Prin2 has positive and negative values. Murder, Rape,
 values. This variable                       and Assault are all negative (Violent Crimes).
can be used as a proxy                   Robbery, Burglary, Larceny, and Auto are all positive
for overall crime rate.                       (Property). This variable can be used for an
                                              understanding of Property vs. Violent crime.
CRIME RATES PER 100,000 POPULATION BY STATE
STATES LISTED IN ORDER OF OVERALL CRIME
RATE AS DETERMINED BY THE FIRST PRINCIPAL
COMPONENT

Lowest 10 States and Then theTop 10 States
CRIME RATES PER 100,000 POPULATION BY STATE.
STATES LISTED IN ORDER OF PROPERTY VS. VIOLENT CRIME AS
DETERMINED BY THE SECOND PRINCIPAL COMPONENT

Lowest 10 States and Then theTop 10 States
Correlation From SAS: First the Descriptive Statistics
       (A part of the output from Correlation)
Correlation Matrix
Correlation Matrix: Just the Variables




           Note that there is correlation
             among the crime rates.
Correlation Matrix: Just the Principal Components




                Note that there is no
               correlation among the
               principal components.
Correlation Matrix: Just the Principal Components




                 Note the higher/very high
                correlations with the 1st few
                principal components and it
              decreases as it goes closer to the
                 last principal component.
What If We Told SAS to Produce
Only 2 Principal Components?
                           Eigenvalues of the Correlation Matrix
                        Eigenvalue Difference Proportion Cumulative
                    1   4.11495951   2.87623768       0.5879       0.5879
                    2   1.23872183                    0.1770       0.7648



  The 2 principal                      Eigenvectors
    components                                Prin1     Prin2
 produced when it
                             MURDER         0.300279 -.629174
     is asked to
                             RAPE           0.431759 -.169435
  produce only 2
      principal              ROBBERY        0.396875 0.042247

  components are             ASSAULT        0.396652 -.343528
 exactly the same            BURGLARY       0.440157 0.203341
    for when it              LARCENY        0.357360 0.402319
   produced all.                            0.295177 0.502421
                             AUTO