VIEWS: 1 PAGES: 53 POSTED ON: 2/10/2012 Public Domain
Exploratory Factor Analysis Measurement First term: – Used several indicators to measure a single true value – Talked about summing responses (and Cronbach’s α) – Hope is that the errors balance themselves out Now: – More sophisticated techniques – Multiple latent variables – Build more complex models But: Does not mean that summing is wrong. It is assumes a certain model, which may be accurate. One Observed Variable True Value Response Error Model = Data + Residual Exploratory Factor Analysis (EFA) • Reducing a large number of manifest (observed) variables to a smaller number of hypothesized latent (unobserved) variables (called factors). • Exploratory – Data reduction – See how many latent variables there are • Confirmatory – Specifying how variables relate and testing models • Underlying assumption – That these latent variables exist and influence the responses on the observed variables (often used in similar situations to PCA) Types of Exploratory Latent Variable Analysis Manifest Variable Latent Metrical Categorical Metrical Factor Analysis Latent Trait (continuous, interval) Analysis (item response model) Categorical Latent Profile Latent Class Analysis Analysis (Taxometics) . Bartholomew & Knott (1999) Spearman’s g paper • Asked and provided an answer for what is still one of the big debates among the public about psychology. • Provided a methodology to investigate which has grown in importance. • Framed psychology in a way that is still influential. “Most of those hostile to Experimental Psychology are in the habit of reproaching its methods with insignificance, and even with triviality …. they protest that such means can never shed any real light upon the human soul, unlock the eternal antinomy of Free Will, or reveal the inward nature of Time and Space” (1904, p. 203). “The present article, therefore, advocates a ‘Correlational Psychology’ ” (1904, p. 205) g theories g Pitch Light Weight Classics French English Maths Is g real, or just the commonalities of the observed variables? One Manifest Variable and One Latent Variable x1: Classics e1 x1 y1 What is y1? Manifest variables are a (simple) linear regression of latent variables Thurstone’s The Vectors of Mind (1934) from Multiple Factor Analysis (1947) Four Manifest Variables and Two Latent Variables x1: Classics e1 e2 e3 e4 x2: Psychology x3: Physics x4: Mathematics x1 x2 x3 x4 y1 y2 What are y1 and y2? Manifest variables are a linear combination of latent variables Changing picture into an equation x1i a11 y1i e1i x 2i a21 y1i a22 y 2i e2i x3i a32 y 2i e3i x 4i a42 y 2i e4i a’s are loadings or weights for the arrows. x1i a11 y1i e1i x 2i a21 y1i a22 y 2i e2i x3i a32 y 2i e3i x 4i a42 y 2i e4i e1 e2 e3 e4 Each equation is a regression where latent x1 x2 x3 x4 variables predict a manifest variable a11 a21 a32 a22 a42 y1 y2 X = FαT + E From last week on PCA: PC = Xα x1i = α21 SocAvoid i + e1 i x2i = α21 SocAvoid i + e2 i x3i = α31 SocAvoid i + e3 i x4i = α41 SocAvoid i + α42 Fear i + e4 i x5i = α51 SocAvoid i + α52 Fear i + e5 i x6i = α62 Fear i + e6 i x7i = α72 Fear i + e7 i x8i = α82 Fear i + e8 i Decide the number of factors 1. scree (often with PCA) 2. hypothesis test (and information criteria) 3. compared model correlation with observed 4. what makes sense Let's see how this might work, but first ... Rotation • The aim of EFA is to help the researcher understand the relationships among variables. • The 2 dimensional map IS the solution (if there are 2 factors). – Sometimes the solution is easier to conceptualize if the solution is rotated. What does this mean? • Return to academic achievement example. Suppose six academic topics: top ones Classics, History, and Drama, bottom ones Maths, Physics, and Computer Science .8 .6 .4 .2 -.0 -.2 -.4 -.6 0.0 .5 1.0 Unrotated Solution .8 .6 = fac1 + fac2 + error .4 .2 -.0 -.2 -.4 = fac1 - fac2 + error -.6 0.0 .5 1.0 Rotated Solution .8 .6 .4 = factor 2 + error .2 -.0 -.2 = factor 1 + error -.4 -.6 0.0 .5 1.0 Aim of Rotation • Easier to interpret the factor structure (and therefore use the factors in subsequent analyses). • Is it easier to name the factors? • The proof is in the eating. • Is it cheating? Usually not. Types of Rotation • Non-orthogonal (correlated factors) – Oblimin (delta allows for correlation) – Promax • Orthogonal (uncorrelated factors) (perpendicular) – Varimax (simplifies factor interpretation) • minimizes number of variables loading on factors – Quartimax (simplifies variable interpretation) • minimizes number of factors needed for each variable – Equamax (combines previous two) • And many others Complex: 14 arrows 30˚ was a guess. Computer has lots of methods – orthogonal versus non-orthogonal varimax most common Why do this? Only 8 arrows Example: Drugs in California (in Everitt) • 1634 students in 7th-9th grade in 11 schools in Los Angeles (independence! There are multilevel factor analysis methods.) • Asked about different types of drug use on 1 (never tried) to 5 (used regularly) scale cigarettes beer wine spirits cocaine tranquillizers drug-store medication opiates marijuana hashish inhalants hallucinogens amphetamines • Look at the distributions and scatter plots. – Should be roughly Normally distributed. • Then, look at the bivariate correlations • Then, look at the “among all variables” correlation: Correlation Matrix CIGS BEER WINE SPIRITS COKE TRANQ FROMSHOP OPIATES MARIJUAN HASH GLUE HALLUCIN SPEED Correlation CIGS 1.000 .447 .422 .436 .114 .203 .091 .082 .513 .304 .245 .101 .245 BEER .447 1.000 .619 .604 .068 .146 .103 .063 .445 .318 .203 .088 .199 WINE .422 .619 1.000 .583 .053 .139 .110 .066 .365 .240 .183 .074 -.184 SPIRITS .436 .604 .583 1.000 .115 .258 .122 .097 .482 .368 .255 .139 .293 COKE TRANQ .114 .203 .068 .146 1. Look at histograms then scatter plots .053 .139 .115 .258 1.000 .349 .349 1.000 .209 .221 .321 .355 .186 .316 .303 .377 .272 .323 .279 .367 .278 .545 FROMSHOP .091 .103 .110 .122 .209 .221 1.000 .201 .150 .163 .310 .232 .232 OPIATES MARIJUAN .082 .513 .063 .445 2. Then correlation matrix. Wait 3 days. .066 .365 .097 .482 .321 .186 .355 .316 .201 .150 1.000 .154 .154 1.000 .219 .530 .288 .301 .320 .204 .314 .394 HASH .304 .318 .240 .368 .303 .377 .163 .219 .530 1.000 .302 .368 .467 GLUE .245 .203 .183 .255 .272 .323 .310 .288 .301 .302 1.000 .340 .392 HALLUCIN .101 .088 .074 .139 .279 .367 .232 .320 .204 .368 .340 Correlation Mat 1.000 .511 SPEED .245 .199 -.184 .293 .278 .545 .232 .314 .394 .467 .392 .511 1.000 CIGS BEER WINE SPIRITS COKE TRANQ F Correlation CIGS 1.000 .447 .422 .436 .114 .203 BEER .447 1.000 .619 .604 .068 .146 WINE .422 .619 1.000 .583 .053 .139 SPIRITS .436 .604 .583 1.000 .115 .258 COKE .114 .068 .053 .115 1.000 .349 TRANQ .203 .146 .139 .258 .349 1.000 FROMSHOP .091 .103 .110 .122 .209 .221 OPIATES .082 .063 .066 .097 .321 .355 MARIJUAN .513 .445 .365 .482 .186 .316 HASH .304 .318 .240 .368 .303 .377 GLUE .245 .203 .183 .255 .272 .323 HALLUCIN .101 .088 .074 .139 .279 .367 SPEED .245 .199 -.184 .293 .278 .545 Correlation Matrix CIGS BEER WINE SPIRITS COKE TRANQ FROMSHOP OPIATES MARIJUAN HASH GLUE HALLUCIN SPEED Correlation CIGS 1.000 .447 .422 .436 .114 .203 .091 .082 .513 .304 .245 .101 .245 BEER .447 1.000 .619 .604 .068 .146 .103 .063 .445 .318 .203 .088 .199 WINE .422 .619 1.000 .583 .053 .139 .110 .066 .365 .240 .183 .074 -.184 SPIRITS .436 .604 .583 1.000 .115 .258 .122 .097 .482 .368 .255 .139 .293 COKE .114 .068 .053 .115 1.000 .349 .209 .321 .186 .303 .272 .279 .278 TRANQ FROMSHOP .203 .091 .146 .103 Write on your output. .139 .110 .258 .122 .349 .209 1.000 .221 .221 1.000 .355 .201 .316 .150 .377 .163 .323 .310 .367 .232 .545 .232 OPIATES .082 .063 .066 .097 .321 .355 .201 1.000 .154 .219 .288 .320 .314 MARIJUAN HASH .513 .304 .445 .318 Take it for coffee (and dinner?). .365 .240 .482 .368 .186 .303 .316 .377 .150 .163 .154 .219 1.000 .530 .530 1.000 .301 .302 .204 .368 .394 .467 GLUE .245 .203 .183 .255 .272 .323 .310 .288 .301 .302 1.000 .340 .392 HALLUCIN .101 .088 .074 .139 .279 .367 .232 .320 .204 .368 .340 1.000 .511 SPEED .245 .199 -.184 .293 .278 .545 .232 .314 .394 .467 Correlation Matrix .392 .511 1.000 CIGS BEER WINE SPIRITS COKE TRANQ FROM Correlation CIGS 1.000 .447 .422 .436 .114 .203 BEER .447 1.000 .619 .604 .068 .146 WINE .422 .619 1.000 .583 .053 .139 SPIRITS .436 .604 .583 1.000 .115 .258 COKE .114 .068 .053 .115 1.000 .349 TRANQ .203 .146 .139 .258 .349 1.000 FROMSHOP .091 .103 .110 .122 .209 .221 OPIATES .082 .063 .066 .097 .321 .355 MARIJUAN .513 .445 .365 .482 .186 .316 HASH .304 .318 .240 .368 .303 .377 GLUE .245 .203 .183 .255 .272 .323 HALLUCIN .101 .088 .074 .139 .279 .367 SPEED .245 .199 -.184 .293 .278 .545 Output • In descriptives, can get bivariate correlations printed and an overall measure of association – Kaiser-Meyer-Olkin measure = .76 (Greater than .5 considered okay) – Bartlett’s test of Sphericity is very significant • Extraction describes the method of analysis (principal components or several factor analysis methods, of which maximum likelihood is most used). Will speak about these later. • Also, tick SCREE plot. A great metaphor: Screes often made from PCA solution Cattell versus Kaiser Does the solution make sense and is the model correlation close to the observed. “Routinely our own laboratory has used some two of the scree, the Kaiser, [lists others]. Most often we have used the two first ...” (Cattell, 1998, p. 164) Unrotated Factor Matrix (in options often worth suppressing values < .25) Does factor solution make sense? Important to look at, explore. Rotated Solution: Drugs What can be done with factors? • Often people save “factor scores” and use these in subsequent analyses. – Can be easier if factors are uncorrelated because of collinearity problems. 12 item questionnaire 3 Latent variables Response variable Last Week Principal Component Analysis (PCA) • An alternative to EFA • A data reduction technique – Most textbooks treat very differently from EFA (e.g., Bartholomew et al.) – Some do not differentiate much (e.g., Field). SPSS just treats PCA just as a method of solving EFA. Others, like SYSTAT and R, differentiate more. • EFA leads onto confirmatory factor analysis (CFA) and structural equation modelling (SEM). Model based. • EFA kind of leads onto item response modelling (ITM or ITR) • PCA leads onto correspondence analysis (CA). Data reduction. Used in face and voice recognition systems. Differences between PCA and EFA The Mathematical Differences EFA e1 e2 e3 e4 PCA x1 x2 x3 x4 x1 x2 x3 x4 y1 y2 y1 y2 Or in Equations EFA x1i a11 y1i e1i x 2i a21 y1i a22 y 2i e2i x3i a32 y 2i e3i x 4i a42 y 2i e4i PCA y1i a11 x1i a21 x2i y 2i a22 x2i a32 x3i a42 x4i Pragmatic Approach • Some methodologists argue for hours about whether EFA or PCA should be used. • I was taught use whichever works best. • Both have advanced science (does not mean good) • Various extensions. – For example, ordinal PCA simpler than ordinal EFA, and structural equation modelling requires factors Form of the Latent Variable Meehl and others argue that many psychological constructs are taxonomic There are gophers and there are chipmunks, but there are no gophmunks. Journal Use data on http://www.fiu.edu/~dwright/qm4psych/desc.dat and run a factor analysis. Use SPSS or R. In R do: desdata <- read.table("http://www.fiu.edu/~dwright/qm4psych/desc.dat", header=TRUE) If using SPSS, use syntax (see Field's book or http://www.ats.ucla.edu/stat/SPSS/modules/input.htm). Summary • In science, theoretical constructs are often unobservable things. • Even when things are observable, measurement error means often there is a need to calculate “summary” variables. • EFA can be used when you have multiple observed variables and want to reduce them to a smaller set. (Though PCA is designed for this). • This set can then be used in further analyses.