# Structural Equation Modeling With Mplus-BYU - FHSS Research by zhouwenjuan

VIEWS: 2 PAGES: 114

• pg 1
```									Structural Equation Modeling
Using Mplus

Chongming Yang
Research Support Center
FHSS College
Structural?

   Structuralism
   Components
   Relations
Objectives

   Introduction to SEM
   The model
   Parameters
   Estimation
   Model evaluation
   Applications
   Estimate simple models with Mplus
Continuous Dependent
Variables

Session I
Information of Variable

   Mean
   Variance
   Skewedness
   Kurtosis
Variance & Covariance
n

 (x  x )i
2

V    i

n 1
n

 ( x  x )( y  y )
i          i
Cov       i

n 1
Covariance Matrix
(S)
x1      x2      x3

x1   V1

x2   Cov21   V2

x3   Cov31   Cov32   V3
Statistical Model

   Probabilistic statement about Relations of
variables
   Imperfect but useful representation of
reality
Structural Equation Modeling

   A system of regression equations for
latent variables to estimate and test direct
and indirect effects without the influence
of measurement errors.
   To estimate and test theories about
interrelations among observed and latent
variables.
Latent Variable
(Construct / Factor / Trait)
   A hypothetical variable
   cannot be measured directly
   No objective measurement unit
   inferred from observable manifestations
   Multiple manifestations (indicators)
   Normally distributed interval dimension
How is Depression
Distributed in?

   BYU students

   Patients for Therapy
Normal Distributions
Levels of Analyses

   Observed

   Latent
Test Theories

   Classical True Score Theory:
Observed Score = True score + Error
   Item Response Theory
   Generalizability (Raykov & Marcoulides, 2006)
Graphic Symbols of SEM

   Rectangle – observed variable
   Oval -- latent variable or error
   Single-headed arrow -- causal relation
Graphic Measurement Model
of Latent 

1       X1   1

2
2       X2         

3       X3   3
Equations
 Specific   equations
X1 = 1 + 1
X2 = 2 + 2
X3 = 3 + 3
 Matrix   Symbols
X =  + 
 True   Score Theory?
Relations of Variances

VX1 = 12 + 1
VX2 = 22 + 2
VX3 = 32 + 3

 = measurement error / uniqueness
Unknown Parameters

VX1 = 12 + 1
VX2 = 22 + 2
VX3 = 32 + 3
Sample Covariance Matrix
(S)
x1      x2      x3

x1   V1

x2   Cov21   V2

x3   Cov31   Cov32   V3
Variance of 

   Variance of  = common covariance of X1
X2 and X3
1

0
0
Variance of 
2
3

0
Unstandardized Parameterization
(scaling)
 1   =1   (set variance of X1 =1; X1 called reference Indicator)

   Variance of  = common variance of X1 X2
and X3
   Squared  = explained variance of X (R2)
   Variance of  = unexplained variance--
error
   Total Variance = Squared  +  Variance
Just Identified Model

1        X1    1

2
2        X2           

3        X3    3
Reference Indicator
(marker)
   Choose conceptually the best

   Small variance  non-convergence
   Different markers  different parameters
estimates and their standard errors
   Affect measurement invariance tests
   Not affect standardized estimates
Standardized Parameterizations
(scaling)
   Variance of  = 1 = common variance of
X1 X2 and X3
   Squared  = explained variance of X (R2)
   Variance of  = 1 - 2
   Mean of  = 0
   Mean of  = 0
Two Kinds of Parameters

   Fixed at 0, 1, or other values
   Freely estimated
d1    Analytic

General                            Being       e1
d2   Reasoning                                       Appreciated
Intelligence
Job
Satisfaction
d3     Verbal                                          Social      e2
Relations

Self                                     z1
d4    Control      Emotional
Intelligence
Recognize/                                 z2
d5
Assess                                         Perceived
Benefit      e3
Marital
d6   Agreeable-                  Satisfaction
ness                                          Perceived
Personality                          Cost        e4

d7   Openness
Structural Equation Model
in Matrix Symbols

X  = x +  (exogenous)
 Y = y +  (endogenous)

  =  +  +  (structural model)

Note: Measurement model reflects the true score
theory
Structural Equation Model
in Matrix Symbols

 X = x + x +  (measurement)
 Y = y + y +  (measurement)

  = α +  +  +  (structural)

Note: SEM with mean structure.
Model Implied Covariance Matrix
(Σ)

Note: This covariance matrix contains unknown parameters in the equations.
(I-B) = non-singular
Estimations/Fit Functions

   Hypothesis:  = S or  - S = 0

   Maximum Likelihood
F = log|||| + trace(S-1) - log||S|| - (p+q)
Convergence -- Reaching Limit

   Minimize F while adjust unknown
Parameters through iterative process
   Convergence value: F difference between
last two iterations
   Default convergence = .0001
   Increase to help convergence (0.001 or 0.01)
e.g.   Analysis: convergence = .01;
No Convergence

   No unique parameter estimates
   Lack of degrees of freedom  under
identification
   Variance of reference indicator too small
   Fixed parameters are left to be freely
estimated
   Misspecified model
Absolute Fit Index

2 = F(N-1) (N = sample size)
df = p(p+1)/2 – q
P = number of variances, covariances, & means
q = number of unknown parameters to be estimated

prob = ? (Nonsignificant        2   indicates good fit, Why?)
Sample Information
x1          x2        x3     x4      …
x1   v1
x2   cov21   v2
x3   cov31   cov32     v3
x4   cov41   cov42     cov43 v4 …
…
Mean1 Mean2       Mean3     Mean4 …

Total info = P(P+1)/2 + Means
Absolute Fit -- SRMR

   Standardized Root Mean Square Residual
   SRMR = Difference between observed and
implied covariances in standardized metric
   Desirable when < .90, but no consensus
Relative Fit:
Relative to Baseline (Null) Model
   All unknown parameters are fixed at 0
   Variables not related (====0)
   Model implied covariance  = 0
   Fit to sample covariance matrix S
   Obtain 2,   df, prob < .0000
Relative Fit Indices
   CFI = 1- (2-df)/(2b-dfb)
   b = baseline model
   Comparative Fit Index, desirable => .95; 95% better than b model

   TLI = (2b/dfb - 2/df) / (2b/dfb-1)
(Tucker-Lewis Index, desirable => .90)

   RMSEA = √(2-df)/(n*df)
(Root Mean Square of Error Approximation, desirable <=.06
penalize a large model with more unknown parameters)
Special Case A
d1
1       t4a3    e3

Verbal
Aggression   t4a93   e2

t4a94   e1

Sex
d2
1       t4a37   e6

Physical    t4a57   e5
Aggression

t4a90   e4
Special Cases A

   Assumption: x = 

y = x +  + 
 =  + x + 
Special Case B
e1   x1
Verbal
e2   x2                      d
Aggression
e3   x3
Peer
Status
e4   x4
Physical
e5   x5
Aggression
e6   x6
Special Cases B

Assumption: y = 

x = x + x + 
y =  +  + 
Other Special Cases of SEM

   Confirmatory Factor Analysis (measurement model only)
   Multiple & Multivariate Regression
   ANOVA / MANOVA (multigroup CFA)
   ANCOVA
   Path Analysis Model (no latent variables)
   Simultaneous Econometric Equations…
   Growth Curve Modeling
   …
EFA vs. CFA
e1          e2      e3     e4          e5      e6
1           1       1      1           1       1

x1         x2       x3    x4          x5       x6

1                          1

Factor 1                   Factor 2

Exploratory Factor Analysis
Confirmatory Factor Analysis

e1          e2      e3     e4          e5      e6
1           1       1      1           1       1

x1         x2       x3    x4          x5       x6

1                          1

Factor 1                   Factor 2
Multiple Regression

x1
e
1

x2          Y

x3
ANCOVA

e1
1

Pretest1    Posttest1

Group
e2
1

Pretest2    Posttest2
Multivariate Normality Assumption

Observed data summed up perfectly by
covariance matrix S (+ means M), S thus
is an estimator of the population
covariance 
Consequences of Violation

   Inflated 2 & deflated CFI and TLI
reject plausible models
   Inflated standard errors  attenuate
variables (structural parameters)

(Cause: Sample covariances were underestimated)
Accommodating Strategies
   Correcting Fit
   Satorra-Bentler Scaled 2 & Standard Errors
(estimator = mlm; in Mplus)
   Correcting standard errors
   Bootstrapping
   Transforming Nonnormal variables
   Transforming into new normal indicators
(undesirable)
   SEM with Categorical Variables
Satorra-Bentler Scaled            2   & SE
   S-B 2 = d-1(ML-based 2)     (d= Scaling factor
that incorporates kurtosis)
   Effect: performs well with continuous data
in terms of 2, CFI, TLI, RMSEA,
parameter estimates and standard errors.
also works with certain-categorical
variables (See next slide)

Analysis: estimator = MLM;
Workable Categorical Data
7.000

6.000

5.000

4.000

3.000

2.000

1.000

0.000
1.000   2.000   3.000   4.000   5.000
Nonworkable Categorical Data
6.000

5.000

4.000

3.000

2.000

1.000

0.000
1.000    2.000    3.000
Bootstrapping
(resampling of data)

   Original btstrp1   btstrp2 …
x y       x y       x y
1 5       5 3       1 3
2 4       1 1       5 4
3 3       3 2       4 1
4 2       4 5       2 2
5 1       2 4       3 5
. .        . .       . .
Limitation of Bootstrapping
   Assumption: Sample = Population
   Useful Diagnostic Tool
   Does not Compensate for
   small or unrepresentative samples
   severely non-normal or
   absence of independent samples for the cross-
validation
   Analysis: Bootstrap = 500 (standard/residual);
   Output: stand cinterval;
Mplus

   www.statmodel.com
Multiple Programs Integrated

   SEM of both continuous and categorical
variables
   Multilevel modeling
   Mixture modeling (identify hidden groups)
   Complex survey data modeling
(stratification, clustering, weights)
   Modern missing data treatment
   Monte Carlo Simulations
Types of Mplus Files

   Data (*.dat, *.txt)
   Input (specify a model, <=80
columns/line)
   Output (automatically produced)
   Plot (automatically produced)
Data File Format

   Free
   Delimited by tab, space, or comma
   All missing values must be flagged with
special numbers / symbols
   Default in Mplus
   Computationally slow with large data set
   Fixed
Format = 3F3, 5F3.2, F5.1;
Mplus Input

 DATA: File = ?
 VARIABLE: Names=?; Usevar=?; Categ=?;

 ANALYSIS: Type = ?

 MODEL: (BY, ON, WITH)

 OUTPUT: Stand;
Model Specification in Mplus
   BY  Measured by (F by x1 x2 x3 x4)
   ON  Regressed on (y on x)
   WITH  Correlated with (x with y)
   XWITH  Interact with (inter | F1 xwith F2)
   PON  Pair ON (y1 y2 on x1 x2 = y1 on x1; y2 on x2)
   PWITH  pair with (x1 x2 with y1 y2 = x1 with y1; y1
with y2)
Default Specification

   Error or residual (disturbance)
   Covariance of exogenous variables in CFA
   Certain covariances of residuals (z2)

z1           z2
Graphic Model

y1   y2   y3       y7    y8             y9

F1                  F3

y13   y14        y15

d3

F5

F2
d4
d5

y4   y5   y6            F4

y10   y11         y12
Model Specification

   Model:
f1 by y1-y3;
f2 by y4-y6;
f3 by y7-y9;
f4 by y10-y12;
f5 by y13-y15;
f3 on f1 f2;
f4 on f2;
f5 on f2 f3 f4 ;     MeaErrors are au
Practice
   Prepare two data files for Mplus
   Mediation.sav
   Aggress.sav
   Model Specification
   Single Group CFA
   Examine Mediation Effects in a Full SEM
   Run a MIMIC model of aggressions
   Multigroup CFA to examine measurement
invariance
SPSS Data

   Missing Values?
   Leave as blank to use fixed format
   Recode into special number to use free format
   Save as & choose file type
   Fixed ASCII
   Free *.dat (with or without variable names?)
   Copy & paste variable names into Mplus
input file
Mplus Interface

   Activate Mplus Program
   Language Generator
   Manually Create An Input File
Four Separate Files
(Mplus)
   Data
   best prepared with other programs
   Input
   Need manually specify a model
   Output
   automatic output window
   Graph
   automatic graph file
Data File
   Individual Case Data (*.dat or *.txt)
   Free Format (default)
 Variable separated by tab, comma, or space
 All missing values must be flagged with special
symbols or numbers).
   Fixed Format
 Variable takes fixed space, e.g. 2F2, 4F6, 5F6.3
 Missing values can be left blank

   Summary Data
 Variance-Covariance matrix, means
 Correlation matrix, standard deviation, means
SPSS  Mplus

   Open “Antisocial.sav” with SPSS
   Work in Variable Window
   Option 1: Fixed Format
   Change Format to Simplify
   Save as ? (Type=Fixed ASCII)
   Option 2: Free Format
   Recode missing values
   Save as ? (Tab-delimited)
Fixed Format

   F3 4F3.2 25F1
F3      One variable that takes 3
columns
4F3.2  4 variables, each has 3 column
with 2 decimals with a column
25F1  25 variables, each uses on
column
Copy SPSS Variable Names
into Mplus
   Variables 
   Highlight to select variables
   Paste 
   Go to Syntax Window 
   Select & Copy 
   Paste under Names Are in Mplus input
file
   Practice now
SAS  Mplus
   Assign flags to missing values (use Array
code for many variables)
   Proc Export Data = Data File
Outfile = “Mplus input file folder\*.dat”
DBMS = dlm Replace;
Run;
   Practice
Fixed Format Out of SAS

   Open with SPSS
   Save as Fixed Format
   Practice
Stata2mplus

   Converting a stata data file to *.dat

Find out:
http://www.ats.ucla.edu/stat/stata/faq/stata
2mplus.htm
Modification Indices

   Lower bound estimate of the expected chi
square decrease
   Freely estimating a parameter fixed at 0
   MPlus Output: stand Mod(10);
(covariance of errors)
   Caution: justification?
Indirect (Mediation) Effect

     A*B

   Mplus specification:
Model Indirect: DV IND Mediator IV;
Model Comparison
   Model:
   Probabilistic statement about the relations of
variables
   Imperfect but useful

   Models Differ:
   Different Variables and Different Relations
(, , , )
   Same Variables but Different Relations
(, , , )
Nested Model
   A Nested Model (b) comes from general
Model (a) by

   Removing a parameter (e.g. a path)

   Fixing a parameter at a value (e.g. 0)

   Constraining parameter to be equal to another

   Both models have the same variables
Test If A=B
y1   y2   y3        y7    y8             y9

A
F1                   F3

y13   y14        y15
B
d3

F5

F2
d4
d5

y4   y5   y6             F4

y10   y11         y12
Model Comparison via
 2 Difference

2 =        df =       (Nested model)
2 =        df =       (Default model)
___________________________________
2dif =      dfdif =   p = ? (a single tail)

Find p value at the following website:
http://www.tutor-homework.com/statistics_tables/statistics_tables.html

Conclusion:
If p > .05, there is no difference between the default model and nested
model. Or the Hypothesis that the parameters of the two models are equal
is not supported.
Practice

   Test if effect A=B
Equality Constraints in Mplus

   Parameter Labels:
   Numbers
   Letters
   Combination of numbers of letters

   Constraint (B=A)
   F3 on F1 (A);
   F3 on F2 (A);
Run CFA with Real Data

a3       e1

Verbal
Aggression      a93      e2

a94      e3

a37      e4

Physical
Aggression      a57      e5

a90      e6
Multigroup Analysis
VARIABLE:
USEVAR = X1 X2 X3 X4;
Grouping IS sex (0=F 1=M);
ANALYSIS: TYPE = MISSING H1;
MODEL:
F1 BY X1 - X4;

MODEL M:
F1 BY X2 - X4;

Note: sex is grouping variable
and is not used in the model.
Why Measurement Invariance
Matters?
   Xg1 = g1 + g1g1 + g1
   Xg2 = g2 + g2g2 + g2

   Xg1- Xg2= (g1 - g2) + (g1g1-g2g2) + (g1-g2)

   Xg1- Xg2 =          + (g1- g2)
Test Measurement Invariance
Default Model
Model:
F1 By a3
a93(1)
a94 (2);
F2 By a37
a57 (3)
a90 (4);
Model M:
F1 By
a93 ()
a94 ();
F2 By
a57 ()
a90 ();
Output: stand;
Note: Reference indicators in
the second group are omitted.
Test Measurement Invariance
Constrained Model
Model:
F1 By a3
a93(1)
a94 (2);
F2 By a37
a57 (3)
a90 (4);
Model M:
F1 By
a93 (1)
a94 (2);
F2 By
a57 (3)
a90 (4);
Output: stand;
Note: Reference indicators in
the second group are omitted.
Estimate with Real Data
a3    e1

Verbal
a93   e2
Aggression

Sex
a94   e3
d1
Race1

d2
Race2                            a37   e4

Physical
a57   e5
Aggression

a90   e6
SEM with Categorical
Indicators

Session II
Problems of Ordinal Scales

   Not truly interval measure of a latent
dimension, having measurement errors
   Limited range, biased against extreme
scores
   Items are equally weighted (implicitly by
1) when summed up or averaged, losing
item sensitivity
Criticisms on Using Ordinal Scales
as Measures of Latent Constructs
   Steven (1951): …means should be avoided because
its meaning could be easily interpreted beyond ranks.
   Merbitz(1989): Ordinal scales and foundations of
misinference
   Muthen (1983): Pearson product moment correlations
of ordinal scales will produce distorted results in
structural equation modeling.
   Write (1998): “…misuses nonlinear raw scores or
Likert scales as though they were linear measures will
produce systematically distorted results. …It’s not only
unfair, it is immoral.”
Assumption of Categorical
Indicators
   A categorical indicator is a coarse
categorization of a normally distributed
underlying dimension
Latent (Polychoric) Correlation
Categorization of Latent Dimension
& Threshold 

No                 Yes
Y

Never    m-1    Sometimes m    Often

1         2      3      4      5
Threshold

   The values of a latent dimension at which
respondents have 50% probability of
   Number of thresholds = response
categories – 1. e.g. a binary variable has
one threshold.
   Mplus specification [x\$1] [y\$2];
Normal Cumulative Distributions
Measurement Models of Categorical
Indicators (2P IRT)

Probit: P (=1|) = [(- + )-1/2 ]
(Estimation = Weight Least Square with df adjusted for

Means and Variances)

Logistic: P (=1|) = 1 / (1+ e-(- + ))
(Maximum Likelihood Estimation)
Converting CFA to IRT
Parameters
   Probit Conversion
   a = -1/2
   b = /
   Logit Conversion
   a = /D      (D=1.7)

   b = /
One Parameter
Item Response Theory Model

   Analysis: Estimator = ML;
   Model:
F by X1@1.7
X2@1.7
…
Xn@1.7;
Sample Information 

   Latent Correlation Matrix  
equivalent to covariance matrix of
continuous indicators
   Threshold matrix Δ 
equivalent to means of continuous
indicators
Stages of Estimation

   Sample information:
Correlations/threshold/intercepts
(Maximum Likelihood)
   Correlation structure (Weight Least
Square)
g
F=    (s(g)-(g))’W(g)-1(s(g)-(g))
g=1
W-1 matrix

   Elements:
S1 intercepts or/and thresholds
S2 slopes
S3 residual variances and correlations
   W-1 : divided by sample size
Estimation

   WLSMV:
Weight Least Square estimation 2 with
degrees of freedom adjusted for Means
and Variances of latent and observed
variables
Baseline Model

   Estimated thresholds of all the categorical
indicators
   df = p 2– 3p    (p = 3 of polychoric correlations)
Data Preparation Tip

   Categorical indicators are required to have
consistent response categories across
groups

   Run Crosstab to identify zero cells

   Recode variables to collapse certain
categories to eliminate zero cells
Inconsistent Categories

1    2    3    4    5
Male     60   80   43   4    0
Female   57   86   32   16   2

1    2    3    4
Male     60   80   43   4
Female   57   86   32   18
Specify Dependent Variables
as Categorical
   Variable:
   Categ = x1-x3;
   Categ = all;
Reporting Results
   Guidelines:
   Conceptual Model
   Software + Version
   Data (continuous or categorical?)
   Treatment of Missing Values
   Estimation method
   Model fit indices (2(df), p, CFI, TLI, RMSEA)
   Structural parameter estimates (estimate,
significance, 95% confidence intervals)
( = .23*, CI = .18~.28)
Reliability of Categorical Indicators
(variance approach)

 = (i)2/ [(i)2 + 2],              where

2 = sum of residual variances
i = items or indicator
 2 i = 1 - 2

McDonald, R. P. (1999). Test theory: A unified treatment (p.89) Mahwah,
New Jersey: Lawrence Erlbaum Associates.
Calculator of Reliability
(Categorical Indicators)

   SPSS reliability data
   SPSS reliability syntax
Trouble Shooting Strategy