Path Analysis and Structural Equation Modeling: Part I: Path Analysis David L. Streiner, Ph.D. Professor, Dep’t of Psychiatry, U of T Professor, Dep’t of Psychiatry & Behavioural Neurosciences, McMaster University Senior Editor, Health Reports A Bit of Philosophy of Science Experimental Correlational Control of variables Yes No Subject assignment Possible No Typical design RCT Cross-sectional Statistics ANOVA Correlation Homogeneity High Low Search for ... Group effects Relationships Show causation Yes No The Problems: Science does not require control. But, cannot draw causation from correlation. Can we make any causal statements from non-experimental studies? One attempt was path analysis. It doesn’t, but it remains a powerful tool. The Path to Path Analysis: Step 1 - Bivariate correlation – Limited to two variables – No distinction between DV and IV The Path to Path Analysis: Step 1 - Bivariate correlation Step 2 - Multiple correlation – Distinguishes between DV and IVs – Unlimited number of IVs But: – Assumes IVs measured without error – Variable must be either DV or IV (DV in one step can’t be IV in next) The Path to Path Analysis: Step 1 - Bivariate correlation Step 2 - Multiple correlation Step 3 - Path analysis – Can have many DVs – DV at one step can be IV in next An Example: Well-being is a function of: – Symptoms – Wealth – Intelligence In Regression Terms: YWell-Being = b0 + b1 Symptoms + b2 Wealth + b3 IQ + e In Path Analysis Terms: Symptoms Wealth Well-Being e IQ But Maybe... Symptoms Wealth Well-Being e IQ Correlation Matrix: Well-Being Symptoms Wealth IQ Well-Being 1.000 -.608 .807 .677 Symptoms 1.000 -.505 -.433 Wealth 1.000 .678 IQ 1.000 Adding Correlations: Symptoms -.433 .677 -.505 Wealth Well-Being (.199) .678 IQ ( weights in parentheses) Relationship Between r and : Correlation between Well-Being and Symptoms is -0.608 weight between Well-Being and Symptoms is -0.245 Is there a relationship between these parameters? Relationship Between r and : Using Symptoms and Well-Being: – Its weight is -0.245 – Exerts indirect effect through Wealth: -0.433 x 0.199 = -0.086 – Also indirect effect through IQ: -0.505 x 0.548 = -0.277 – So, total effect is: (-0.245) + (-0.086) + (-0.277) = -0.608 which is the correlation Relationship Between r and : So, the correlation, r, is the sum of: the direct effect of the IV on the DV plus its indirect effects through its correlation with the other IVs Relationship Between r and : For the correlation between Well-Being and Symptoms: rWB-Sx = Sx + (rWB-Wealth X Wealth) + (rWB-IQ X IQ) Correlation and Reproduced Matrix: Well-Being Symptoms Wealth IQ Well-Being 1.000 -.608 .807 .677 Symptoms -.608 1.000 -.505 -.433 Wealth .807 -.505 1.000 .678 IQ .677 -.433 .678 1.000 The Alternative Model: Symptoms -.505 Wealth Well-Being (.200) (.678) IQ Correlation and Reproduced Matrix: Well-Being Symptoms Wealth IQ Well-Being 1.000 -.608 .807 .677 Symptoms -.593 1.000 -.505 -.433 Wealth .810 -.505 1.000 .678 IQ .573 -.342 .678 1.000 Rules for Following Paths: 1 For any single path you can go through a given variable only once. 2 Once you’ve gone forward along a path using one arrow, you can’t go back on a path using a different arrow. 3 You can’t go through a double-headed curved arrow more than one time. 4 You can’t enter a variable on one arrowhead and leave it on another arrowhead. Valid Paths For Symptoms: Symptoms Wealth Well-Being IQ Valid Paths For Wealth: Symptoms Wealth Well-Being IQ Valid Paths For Symptoms: Symptoms Wealth Well-Being IQ An Invalid Path For Symptoms: Symptoms Wealth Well-Being IQ Path Analysis Causality: Symptoms -.505 Wealth Well-Being (.200) (.678) IQ Some Terminology: Exogenous variables: – Have straight arrows emerging from them and none pointing to them. Endogenous variables: – Have at least one straight arrow pointing to them. Why the Change in Terms? Independent Variable Symptoms Dependent Variable ? Wealth Well-Being IQ Independent Variable Why the Change in Terms? Exogenous Variable Symptoms Endogenous Variable Endogenous Variable Wealth Well-Being IQ Exogenous Variable Types of Path Models: X1 Y X2 Types of Path Models: X1 Y X2 Types of Path Models: X1 Y1 X2 Y2 For Example: Mom’ Kid’s Anxiety Anxiety Mom’s Kid’s Depression Depression For Example: Anxiety Anxiety (Time 1) (Time 2) Depression Depression (Time 1) (Time 2) Types of Path Models: X1 Y X2 For Example: Medication Symptoms Family EE Types of Path Models: X1 Y X2 For Example: Having Depression a child Social Isolation Types of Path Models: X1 X2 Y1 Y2 Nonrecursive Models: X1 X2 Y1 Y2 For Example: Mom’s Kid’s Anxiety Anxiety Mom’s Kid’s Depression Depression For Example: Mom’s Kid’s Anxiety Anxiety Mom’s Kid’s Depression Depression Disturbance Terms: X1 Y1 D1 X2 Y2 D2 K.I.S.S. Number of Parameters Number of Observations K.I.S.S. k x (k + 1) Number of Observations = 2 where k = number of variables How Many Parameters? Purpose to determine what affects endogenous variables: – Which paths are important (straight paths) – How exogenous variables work together (curved paths) – Variances of exogenous variables – Disturbances of endogenous variables Not variances of endogenous variables Counting Parameters: 7 Symptoms 4 8 10 2 6 Wealth Well-Being D1 5 9 IQ Counting Parameters: 3 exogenous variables + 1 endogenous variable, so k = 4 Number of observations = (4 x 5) / 2 = 10 Number of parameters = Number of observations Counting Parameters: 6 Symptoms 9 8 2 D2 Wealth Well-Being D1 5 4 7 IQ Counting Parameters: 3 exogenous variables + 1 endogenous variable, so k = 4 Number of observations = (4 x 5) / 2 = 10 Number of parameters < Number of observations Counting Parameters: Why not count variance of Well-Being? Why variance of Wealth counted in 1st diagram but not 2nd? Why no more parameters than observations? Why Not Variance of Well-Being? Endogenous variable Not free to vary; dependent on values of exogenous variables Goal of PA to explain variances of variables and covariances between variables that can vary Why Count Wealth in 1st Diagram But Not 2nd? Exogenous Endogenous Symptoms Symptoms Wealth Well-Being Wealth Well-Being IQ IQ Why No More Parameters Than Observations? a=b+c If a = 5, what are b and c? – Infinite number of solutions – Model is undefined (under-identified) – There isn’t a unique solution Why No More Parameters Than Observations? a=b+c If a = 5 and b = -3 what is c? – Only one solution – Model is defined (just-identified) Why No More Parameters Than Observations? a=b+c If a = 5, b = -3 and c is 8 – Model is correct – Nothing to identify (trivial) – Model is over-defined (over-identified) As Good As It Gets (Goodness-of-Fit): Significance of path coefficients Reproduced (implied) correlation matrix Model as a whole Significance of Paths: Path coefficients are parameters Therefore, estimated with some error z = Path Coefficient / SEEstimate Reproduced Correlation Matrix: In 1st diagram, reproduced correlations = actual correlations In 2nd diagram, reproduced correlations < actual correlations Model 1 better than Model 2, but: – Model 1 too good (10 Pars, 10 Obs) – In Model 2, 0bs = 10, Parameters = 9 The Model as a Whole: Goodness-of-Fit Chi-Squared (2GoF) In most tests, bigger is better Here, we want 2GoF to be as small as possible Why 2 GoF Should be Small: 2 tests difference between observed and expected findings. Usually, expected values determined under HO of no effect. We want findings to be different from this. Why 2 GoF Should be Small: For goodness of fit, we are not testing difference between observed and HO. Testing difference between observed and hypothesized models. Do not want there to be a difference. df = (#Observations - #Parameters) Interpreting 2 GoF: Greatly affected by sample size: – If low, SEs large, so hard to find difference – If high, every model differs from data Does not mean there may not be a better model. Does not indicate causality! Two Different Models. 2GoF(1) = 2.044 2GoF(1) = 2.044 Symptoms Symptoms Wealth Well-Being Wealth Well-Being IQ IQ An Over-Identified Model: Symptoms # Observations = 10 # Parameters = 10 Wealth Well-Being df = 0 Untestable IQ Assumptions: Similar to OLS regression. Exogenous variables measured without error. – If violated, overestimates direct paths, underestimates indirect paths All important variables included. Additive model. Only moderate correlations among exogenous variables. Sample Size: Affects SEs of path coefficients, variances, and covariances. No formulae for calculating N. Minimum of 10 subjects per parameter (some argue for 20). Minimum of 100 (some say 200).