Exploratory Factor Analysis by hcj

VIEWS: 1 PAGES: 53

									Exploratory Factor Analysis
                         Measurement

First term:
    – Used several indicators to measure a single true value
    – Talked about summing responses (and Cronbach’s α)
    – Hope is that the errors balance themselves out
Now:
    – More sophisticated techniques
    – Multiple latent variables
    – Build more complex models

But: Does not mean that summing is wrong. It is assumes a certain
  model, which may be accurate.
         One Observed Variable




True Value     Response        Error


         Model = Data + Residual
          Exploratory Factor Analysis (EFA)

• Reducing a large number of manifest (observed) variables to a
  smaller number of hypothesized latent (unobserved) variables
  (called factors).
• Exploratory
   – Data reduction
   – See how many latent variables there are
• Confirmatory
   – Specifying how variables relate and testing models

• Underlying assumption
   – That these latent variables exist and influence the
      responses on the observed variables

               (often used in similar situations to PCA)
    Types of Exploratory Latent Variable Analysis


                                 Manifest Variable
Latent                     Metrical           Categorical
Metrical                 Factor Analysis      Latent Trait
(continuous, interval)                         Analysis
                                         (item response model)

Categorical              Latent Profile       Latent Class
                           Analysis             Analysis
                         (Taxometics)                            .


                                          Bartholomew & Knott (1999)
Spearman’s g paper
• Asked and provided an answer for what is still one of the big
  debates among the public about psychology.
• Provided a methodology to investigate which has grown in
  importance.
• Framed psychology in a way that is still influential.
“Most of those hostile to Experimental Psychology are in the habit of
  reproaching its methods with insignificance, and even with triviality ….
  they protest that such means can never shed any real light upon the
  human soul, unlock the eternal antinomy of Free Will, or reveal the inward
  nature of Time and Space” (1904, p. 203).
“The present article, therefore, advocates a ‘Correlational Psychology’ ”
  (1904, p. 205)
                      g theories

                             g


Pitch    Light   Weight Classics French       English Maths


Is g real, or just the commonalities of the observed variables?
   One Manifest Variable and One Latent Variable


x1: Classics
                                 e1


                                  x1



                                       y1
  What is y1?

  Manifest variables are a (simple) linear regression of latent
  variables
Thurstone’s The Vectors of Mind (1934)
from Multiple Factor Analysis (1947)
 Four Manifest Variables and Two Latent Variables


x1:    Classics                      e1    e2      e3        e4
x2:    Psychology
x3:    Physics
x4:    Mathematics                    x1      x2     x3       x4




                                            y1          y2
      What are y1 and y2?

      Manifest variables are a linear combination of latent variables
     Changing picture into an equation


x1i      a11 y1i                      e1i
x 2i  a21 y1i         a22 y 2i       e2i
x3i                   a32 y 2i       e3i
x 4i                  a42 y 2i       e4i

  a’s are loadings or weights for the arrows.
x1i     a11 y1i                   e1i
x 2i  a21 y1i       a22 y 2i     e2i
x3i                 a32 y 2i     e3i
x 4i                a42 y 2i     e4i
                             e1         e2       e3         e4

Each equation is a
regression where latent          x1        x2         x3     x4
variables predict
a manifest variable

                                  a11
                                          a21          a32
                                                a22          a42

                                        y1             y2
     X = FαT + E



From last week on PCA:
       PC = Xα
 x1i = α21 SocAvoid i +                  e1 i
 x2i = α21 SocAvoid i +                  e2 i
 x3i = α31 SocAvoid i +                  e3 i
 x4i = α41 SocAvoid i + α42 Fear i   +   e4 i
 x5i = α51 SocAvoid i + α52 Fear i   +   e5 i
 x6i =                  α62 Fear i   +   e6 i
 x7i =                  α72 Fear i   +   e7 i
 x8i =                  α82 Fear i   +   e8 i



Decide the number of factors
1. scree (often with PCA)
2. hypothesis test (and information criteria)
3. compared model correlation with observed
4. what makes sense
Let's see how this might work,
           but first ...
                            Rotation


• The aim of EFA is to help the researcher understand the
  relationships among variables.
• The 2 dimensional map IS the solution (if there are 2 factors).
   – Sometimes the solution is easier to conceptualize if the solution
      is rotated. What does this mean?
• Return to academic achievement example.
Suppose six academic topics: top ones Classics, History, and
Drama, bottom ones Maths, Physics, and Computer Science

     .8




     .6




     .4




     .2




     -.0




     -.2




     -.4




     -.6
       0.0                   .5                     1.0
        Unrotated Solution

.8




.6
        = fac1 + fac2 + error
.4




.2




-.0




-.2




-.4        = fac1 - fac2 + error
-.6
  0.0           .5                 1.0
        Rotated Solution
.8




.6




.4
          = factor 2 + error
.2




-.0




-.2       = factor 1 + error

-.4




-.6
  0.0          .5              1.0
                         Aim of Rotation

• Easier to interpret the factor structure (and therefore use the factors
  in subsequent analyses).
• Is it easier to name the factors?
• The proof is in the eating.

• Is it cheating? Usually not.
                     Types of Rotation


• Non-orthogonal (correlated factors)
   – Oblimin (delta allows for correlation)
   – Promax
• Orthogonal (uncorrelated factors) (perpendicular)
   – Varimax (simplifies factor interpretation)
       • minimizes number of variables loading on factors
   – Quartimax (simplifies variable interpretation)
       • minimizes number of factors needed for each variable
   – Equamax (combines previous two)

• And many others
Complex: 14 arrows
30˚ was a guess.
Computer has lots of methods
    – orthogonal versus non-orthogonal
varimax most common
Why do this?
Only 8 arrows
        Example: Drugs in California (in Everitt)


• 1634 students in 7th-9th grade in 11 schools in Los Angeles
  (independence! There are multilevel factor analysis methods.)
• Asked about different types of drug use on 1 (never tried) to 5
  (used regularly) scale

      cigarettes       beer             wine
      spirits          cocaine          tranquillizers
      drug-store medication             opiates
      marijuana        hashish          inhalants
      hallucinogens    amphetamines
• Look at the distributions and scatter plots.
   – Should be roughly Normally distributed.
• Then, look at the bivariate correlations
• Then, look at the “among all variables” correlation:
                                                                          Correlation Matrix

                         CIGS    BEER      WINE       SPIRITS    COKE        TRANQ      FROMSHOP    OPIATES    MARIJUAN    HASH     GLUE     HALLUCIN    SPEED
Correlation   CIGS       1.000     .447       .422        .436     .114         .203         .091       .082        .513     .304     .245       .101       .245
              BEER        .447    1.000       .619        .604     .068         .146         .103       .063        .445     .318     .203       .088       .199
              WINE        .422     .619     1.000         .583     .053         .139         .110       .066        .365     .240     .183       .074      -.184
              SPIRITS     .436     .604       .583      1.000      .115         .258         .122       .097        .482     .368     .255       .139       .293
              COKE
              TRANQ
                          .114
                          .203
                                   .068
                                   .146
                                          1. Look at histograms then scatter plots
                                              .053
                                              .139
                                                          .115
                                                          .258
                                                                  1.000
                                                                   .349
                                                                                .349
                                                                               1.000
                                                                                             .209
                                                                                             .221
                                                                                                        .321
                                                                                                        .355
                                                                                                                    .186
                                                                                                                    .316
                                                                                                                             .303
                                                                                                                             .377
                                                                                                                                      .272
                                                                                                                                      .323
                                                                                                                                                 .279
                                                                                                                                                 .367
                                                                                                                                                            .278
                                                                                                                                                            .545
              FROMSHOP    .091     .103       .110        .122     .209         .221        1.000       .201        .150     .163     .310       .232       .232
              OPIATES
              MARIJUAN
                          .082
                          .513
                                   .063
                                   .445
                                          2. Then correlation matrix. Wait 3 days.
                                              .066
                                              .365
                                                          .097
                                                          .482
                                                                   .321
                                                                   .186
                                                                                .355
                                                                                .316
                                                                                             .201
                                                                                             .150
                                                                                                       1.000
                                                                                                        .154
                                                                                                                    .154
                                                                                                                  1.000
                                                                                                                             .219
                                                                                                                             .530
                                                                                                                                      .288
                                                                                                                                      .301
                                                                                                                                                 .320
                                                                                                                                                 .204
                                                                                                                                                            .314
                                                                                                                                                            .394
              HASH        .304     .318       .240        .368     .303         .377         .163       .219        .530    1.000     .302       .368       .467
              GLUE        .245     .203       .183        .255     .272         .323         .310       .288        .301     .302    1.000       .340       .392
              HALLUCIN    .101     .088       .074        .139     .279         .367         .232       .320        .204     .368     .340     Correlation Mat
                                                                                                                                                1.000       .511
              SPEED       .245     .199      -.184        .293     .278         .545         .232       .314        .394     .467     .392       .511      1.000


                                                     CIGS           BEER                       WINE            SPIRITS              COKE                TRANQ      F
  Correlation            CIGS                        1.000            .447                        .422             .436               .114                 .203
                         BEER                         .447           1.000                        .619             .604               .068                 .146
                         WINE                         .422            .619                      1.000              .583               .053                 .139
                         SPIRITS                      .436            .604                        .583           1.000                .115                 .258
                         COKE                         .114            .068                        .053             .115              1.000                 .349
                         TRANQ                        .203            .146                        .139             .258               .349                1.000
                         FROMSHOP                     .091            .103                        .110             .122               .209                 .221
                         OPIATES                      .082            .063                        .066             .097               .321                 .355
                         MARIJUAN                     .513            .445                        .365             .482               .186                 .316
                         HASH                         .304            .318                        .240             .368               .303                 .377
                         GLUE                         .245            .203                        .183             .255               .272                 .323
                         HALLUCIN                     .101            .088                        .074             .139               .279                 .367
                         SPEED                        .245            .199                       -.184             .293               .278                 .545
                                                                          Correlation Matrix

                          CIGS    BEER     WINE      SPIRITS     COKE        TRANQ      FROMSHOP    OPIATES    MARIJUAN    HASH     GLUE      HALLUCIN   SPEED
Correlation   CIGS        1.000     .447      .422       .436      .114         .203         .091       .082        .513     .304     .245        .101      .245
              BEER         .447    1.000      .619       .604      .068         .146         .103       .063        .445     .318     .203        .088      .199
              WINE         .422     .619    1.000        .583      .053         .139         .110       .066        .365     .240     .183        .074     -.184
              SPIRITS      .436     .604      .583     1.000       .115         .258         .122       .097        .482     .368     .255        .139      .293
              COKE         .114     .068      .053       .115     1.000         .349         .209       .321        .186     .303     .272        .279      .278
              TRANQ
              FROMSHOP
                           .203
                           .091
                                    .146
                                    .103      Write on your output.
                                              .139
                                              .110
                                                         .258
                                                         .122
                                                                   .349
                                                                   .209
                                                                               1.000
                                                                                .221
                                                                                             .221
                                                                                            1.000
                                                                                                        .355
                                                                                                        .201
                                                                                                                    .316
                                                                                                                    .150
                                                                                                                             .377
                                                                                                                             .163
                                                                                                                                      .323
                                                                                                                                      .310
                                                                                                                                                  .367
                                                                                                                                                  .232
                                                                                                                                                            .545
                                                                                                                                                            .232
              OPIATES      .082     .063      .066       .097      .321         .355         .201      1.000        .154     .219     .288        .320      .314
              MARIJUAN
              HASH
                           .513
                           .304
                                    .445
                                    .318
                                              Take it for coffee (and dinner?).
                                              .365
                                              .240
                                                         .482
                                                         .368
                                                                   .186
                                                                   .303
                                                                                .316
                                                                                .377
                                                                                             .150
                                                                                             .163
                                                                                                        .154
                                                                                                        .219
                                                                                                                  1.000
                                                                                                                    .530
                                                                                                                             .530
                                                                                                                            1.000
                                                                                                                                      .301
                                                                                                                                      .302
                                                                                                                                                  .204
                                                                                                                                                  .368
                                                                                                                                                            .394
                                                                                                                                                            .467
              GLUE         .245     .203      .183       .255      .272         .323         .310       .288        .301     .302    1.000        .340      .392
              HALLUCIN     .101     .088      .074       .139      .279         .367         .232       .320        .204     .368     .340       1.000      .511
              SPEED        .245     .199     -.184       .293      .278         .545         .232       .314        .394     .467    Correlation Matrix
                                                                                                                                      .392        .511     1.000


                                            CIGS                BEER                   WINE          SPIRITS               COKE              TRANQ          FROM
 Correlation             CIGS               1.000                 .447                    .422           .436                .114               .203
                         BEER                .447                1.000                    .619           .604                .068               .146
                         WINE                .422                 .619                  1.000            .583                .053               .139
                         SPIRITS             .436                 .604                    .583         1.000                 .115               .258
                         COKE                .114                 .068                    .053           .115               1.000               .349
                         TRANQ               .203                 .146                    .139           .258                .349              1.000
                         FROMSHOP            .091                 .103                    .110           .122                .209               .221
                         OPIATES             .082                 .063                    .066           .097                .321               .355
                         MARIJUAN            .513                 .445                    .365           .482                .186               .316
                         HASH                .304                 .318                    .240           .368                .303               .377
                         GLUE                .245                 .203                    .183           .255                .272               .323
                         HALLUCIN            .101                 .088                    .074           .139                .279               .367
                         SPEED               .245                 .199                   -.184           .293                .278               .545
                              Output

• In descriptives, can get bivariate correlations printed and an
  overall measure of association
   – Kaiser-Meyer-Olkin measure = .76
       (Greater than .5 considered okay)
   – Bartlett’s test of Sphericity is very significant

• Extraction describes the method of analysis (principal
  components or several factor analysis methods, of which
  maximum likelihood is most used). Will speak about these later.
• Also, tick SCREE plot.
A great metaphor: Screes often made from PCA solution
                          Cattell versus Kaiser
                                  Does the solution make sense
                                  and is the model correlation
                                  close to the observed.




“Routinely our own laboratory has used some two of the scree, the Kaiser,
[lists others]. Most often we have used the two first ...” (Cattell, 1998, p. 164)
   Unrotated Factor Matrix (in options often worth suppressing
   values < .25)




Does factor solution make sense? Important to look at, explore.
Rotated Solution: Drugs
            What can be done with factors?


• Often people save “factor scores” and use these in subsequent
  analyses.
   – Can be easier if factors are uncorrelated because of
     collinearity problems.
12 item
questionnaire
                3 Latent
                variables




                            Response
                            variable
                          Last Week
             Principal Component Analysis (PCA)


• An alternative to EFA
• A data reduction technique
   – Most textbooks treat very differently from EFA (e.g.,
     Bartholomew et al.)
   – Some do not differentiate much (e.g., Field). SPSS just
     treats PCA just as a method of solving EFA. Others, like
     SYSTAT and R, differentiate more.
• EFA leads onto confirmatory factor analysis (CFA) and structural
  equation modelling (SEM). Model based.
• EFA kind of leads onto item response modelling (ITM or ITR)

• PCA leads onto correspondence analysis (CA). Data reduction.
  Used in face and voice recognition systems.
            Differences between PCA and EFA
               The Mathematical Differences



           EFA
e1    e2     e3        e4                PCA


 x1    x2        x3     x4    x1    x2     x3       x4




      y1          y2               y1          y2
                   Or in Equations


         EFA    x1i    a11 y1i                      e1i
               x 2i  a21 y1i        a22 y 2i      e2i
               x3i                  a32 y 2i       e3i
               x 4i                 a42 y 2i      e4i
PCA

y1i      a11 x1i  a21 x2i
y 2i               a22 x2i        a32 x3i       a42 x4i
                   Pragmatic Approach


• Some methodologists argue for hours about whether EFA or
  PCA should be used.
• I was taught use whichever works best.
• Both have advanced science (does not mean good)

• Various extensions.
   – For example, ordinal PCA simpler than ordinal EFA, and
     structural equation modelling requires factors
              Form of the Latent Variable
Meehl and others argue that many psychological constructs are
taxonomic




               There are gophers and there are
           chipmunks, but there are no gophmunks.
                            Journal

Use data on http://www.fiu.edu/~dwright/qm4psych/desc.dat
and run a factor analysis. Use SPSS or R.

In R do:
desdata <-
  read.table("http://www.fiu.edu/~dwright/qm4psych/desc.dat",
   header=TRUE)

If using SPSS, use syntax (see Field's book or
http://www.ats.ucla.edu/stat/SPSS/modules/input.htm).
                        Summary


• In science, theoretical constructs are often unobservable
  things.
• Even when things are observable, measurement error
  means often there is a need to calculate “summary”
  variables.
• EFA can be used when you have multiple observed
  variables and want to reduce them to a smaller set.
  (Though PCA is designed for this).
• This set can then be used in further analyses.

								
To top