Structural Equation Modeling With Mplus-BYU - FHSS Research

Document Sample
Structural Equation Modeling With Mplus-BYU - FHSS Research Powered By Docstoc
					Structural Equation Modeling
        Using Mplus

        Chongming Yang
     Research Support Center
          FHSS College
                 Structural?

   Structuralism
       Components
       Relations
                    Objectives

   Introduction to SEM
       The model
       Parameters
       Estimation
       Model evaluation
       Applications
   Estimate simple models with Mplus
Continuous Dependent
      Variables

       Session I
       Information of Variable

   Mean
   Variance
   Skewedness
   Kurtosis
 Variance & Covariance
     n

      (x  x )i
                       2


V    i

           n 1
           n

           ( x  x )( y  y )
                   i          i
Cov       i

                       n 1
          Covariance Matrix
                 (S)
     x1      x2      x3

x1   V1

x2   Cov21   V2

x3   Cov31   Cov32   V3
             Statistical Model

   Probabilistic statement about Relations of
    variables
   Imperfect but useful representation of
    reality
    Structural Equation Modeling

   A system of regression equations for
    latent variables to estimate and test direct
    and indirect effects without the influence
    of measurement errors.
   To estimate and test theories about
    interrelations among observed and latent
    variables.
                  Latent Variable
                 (Construct / Factor / Trait)
   A hypothetical variable
       cannot be measured directly
       No objective measurement unit
       inferred from observable manifestations
   Multiple manifestations (indicators)
   Normally distributed interval dimension
          How is Depression
           Distributed in?

   BYU students

   Patients for Therapy
Normal Distributions
             Levels of Analyses


   Observed

   Latent
               Test Theories


   Classical True Score Theory:
    Observed Score = True score + Error
   Item Response Theory
   Generalizability (Raykov & Marcoulides, 2006)
       Graphic Symbols of SEM

   Rectangle – observed variable
   Oval -- latent variable or error
   Single-headed arrow -- causal relation
   Double-headed arrow -- correlation
Graphic Measurement Model
        of Latent 

    1       X1   1

                  2
    2       X2         

    3       X3   3
               Equations
 Specific   equations
  X1 = 1 + 1
  X2 = 2 + 2
  X3 = 3 + 3
 Matrix   Symbols
  X =  + 
 True   Score Theory?
    Relations of Variances

VX1 = 12 + 1
VX2 = 22 + 2
VX3 = 32 + 3



 = measurement error / uniqueness
    Unknown Parameters

VX1 = 12 + 1
VX2 = 22 + 2
VX3 = 32 + 3
     Sample Covariance Matrix
               (S)
     x1      x2      x3

x1   V1

x2   Cov21   V2

x3   Cov31   Cov32   V3
              Variance of 

   Variance of  = common covariance of X1
    X2 and X3
                                  1

                     0
                                         0
                         Variance of 
                2
                                         3

                              0
    Unstandardized Parameterization
               (scaling)
 1   =1   (set variance of X1 =1; X1 called reference Indicator)

   Variance of  = common variance of X1 X2
    and X3
   Squared  = explained variance of X (R2)
   Variance of  = unexplained variance--
    error
   Total Variance = Squared  +  Variance
Just Identified Model

 1        X1    1

                 2
 2        X2           

 3        X3    3
             Reference Indicator
                  (marker)
   Choose conceptually the best

       Small variance  non-convergence
       Different markers  different parameters
        estimates and their standard errors
       Affect measurement invariance tests
       Not affect standardized estimates
    Standardized Parameterizations
               (scaling)
   Variance of  = 1 = common variance of
    X1 X2 and X3
   Squared  = explained variance of X (R2)
   Variance of  = 1 - 2
   Mean of  = 0
   Mean of  = 0
       Two Kinds of Parameters

   Fixed at 0, 1, or other values
   Freely estimated
d1    Analytic


                    General                            Being       e1
d2   Reasoning                                       Appreciated
                  Intelligence
                                     Job
                                 Satisfaction
d3     Verbal                                          Social      e2
                                                      Relations

       Self                                     z1
d4    Control      Emotional
                  Intelligence
     Recognize/                                 z2
d5
      Assess                                         Perceived
                                                      Benefit      e3
                                   Marital
d6   Agreeable-                  Satisfaction
       ness                                          Perceived
                  Personality                          Cost        e4

d7   Openness
      Structural Equation Model
          in Matrix Symbols


X  = x +  (exogenous)
 Y = y +  (endogenous)

  =  +  +  (structural model)


Note: Measurement model reflects the true score
  theory
       Structural Equation Model
           in Matrix Symbols


 X = x + x +  (measurement)
 Y = y + y +  (measurement)

  = α +  +  +  (structural)




Note: SEM with mean structure.
   Model Implied Covariance Matrix
                 (Σ)




Note: This covariance matrix contains unknown parameters in the equations.
      (I-B) = non-singular
       Estimations/Fit Functions

   Hypothesis:  = S or  - S = 0

   Maximum Likelihood
    F = log|||| + trace(S-1) - log||S|| - (p+q)
    Convergence -- Reaching Limit

   Minimize F while adjust unknown
    Parameters through iterative process
   Convergence value: F difference between
    last two iterations
   Default convergence = .0001
   Increase to help convergence (0.001 or 0.01)
     e.g.   Analysis: convergence = .01;
             No Convergence

   No unique parameter estimates
   Lack of degrees of freedom  under
    identification
   Variance of reference indicator too small
   Fixed parameters are left to be freely
    estimated
   Misspecified model
            Absolute Fit Index

2 = F(N-1) (N = sample size)
df = p(p+1)/2 – q
  P = number of variances, covariances, & means
  q = number of unknown parameters to be estimated

prob = ? (Nonsignificant        2   indicates good fit, Why?)
             Sample Information
      x1          x2        x3     x4      …
x1   v1
x2   cov21   v2
x3   cov31   cov32     v3
x4   cov41   cov42     cov43 v4 …
…
     Mean1 Mean2       Mean3     Mean4 …



Total info = P(P+1)/2 + Means
          Absolute Fit -- SRMR

   Standardized Root Mean Square Residual
   SRMR = Difference between observed and
    implied covariances in standardized metric
   Desirable when < .90, but no consensus
           Relative Fit:
Relative to Baseline (Null) Model
   All unknown parameters are fixed at 0
   Variables not related (====0)
   Model implied covariance  = 0
   Fit to sample covariance matrix S
   Obtain 2,   df, prob < .0000
                Relative Fit Indices
   CFI = 1- (2-df)/(2b-dfb)
       b = baseline model
       Comparative Fit Index, desirable => .95; 95% better than b model



   TLI = (2b/dfb - 2/df) / (2b/dfb-1)
    (Tucker-Lewis Index, desirable => .90)



   RMSEA = √(2-df)/(n*df)
    (Root Mean Square of Error Approximation, desirable <=.06
     penalize a large model with more unknown parameters)
      Special Case A
             d1
               1       t4a3    e3

            Verbal
          Aggression   t4a93   e2

                       t4a94   e1

Sex
              d2
               1       t4a37   e6

           Physical    t4a57   e5
          Aggression

                       t4a90   e4
             Special Cases A

   Assumption: x = 

    y = x +  + 
     =  + x + 
          Special Case B
e1   x1
               Verbal
e2   x2                      d
             Aggression
e3   x3
                            Peer
                           Status
e4   x4
              Physical
e5   x5
             Aggression
e6   x6
         Special Cases B

Assumption: y = 

x = x + x + 
y =  +  + 
      Other Special Cases of SEM

   Confirmatory Factor Analysis (measurement model only)
   Multiple & Multivariate Regression
   ANOVA / MANOVA (multigroup CFA)
   ANCOVA
   Path Analysis Model (no latent variables)
   Simultaneous Econometric Equations…
   Growth Curve Modeling
   …
                 EFA vs. CFA
e1          e2      e3     e4          e5      e6
 1           1       1      1           1       1

x1         x2       x3    x4          x5       x6

     1                          1


         Factor 1                   Factor 2



         Exploratory Factor Analysis
         Confirmatory Factor Analysis

e1          e2      e3     e4          e5      e6
 1           1       1      1           1       1

x1         x2       x3    x4          x5       x6

     1                          1


         Factor 1                   Factor 2
Multiple Regression


    x1
                e
                1




    x2          Y




    x3
   ANCOVA

               e1
                1



Pretest1    Posttest1




 Group
               e2
                1




Pretest2    Posttest2
Multivariate Normality Assumption


Observed data summed up perfectly by
covariance matrix S (+ means M), S thus
is an estimator of the population
covariance 
Consequences of Violation

   Inflated 2 & deflated CFI and TLI
    reject plausible models
   Inflated standard errors  attenuate
    factor loadings and relations of latent
    variables (structural parameters)


            (Cause: Sample covariances were underestimated)
     Accommodating Strategies
   Correcting Fit
       Satorra-Bentler Scaled 2 & Standard Errors
        (estimator = mlm; in Mplus)
   Correcting standard errors
       Bootstrapping
   Transforming Nonnormal variables
       Transforming into new normal indicators
      (undesirable)
   SEM with Categorical Variables
    Satorra-Bentler Scaled            2   & SE
   S-B 2 = d-1(ML-based 2)     (d= Scaling factor
    that incorporates kurtosis)
   Effect: performs well with continuous data
    in terms of 2, CFI, TLI, RMSEA,
    parameter estimates and standard errors.
    also works with certain-categorical
    variables (See next slide)

Analysis: estimator = MLM;
        Workable Categorical Data
7.000




6.000




5.000




4.000




3.000




2.000




1.000




0.000
           1.000   2.000   3.000   4.000   5.000
 Nonworkable Categorical Data
6.000




5.000




4.000




3.000




2.000




1.000




0.000
        1.000    2.000    3.000
              Bootstrapping
              (resampling of data)

   Original btstrp1   btstrp2 …
    x y       x y       x y
    1 5       5 3       1 3
    2 4       1 1       5 4
    3 3       3 2       4 1
    4 2       4 5       2 2
    5 1       2 4       3 5
    . .        . .       . .
Limitation of Bootstrapping
   Assumption: Sample = Population
   Useful Diagnostic Tool
   Does not Compensate for
       small or unrepresentative samples
       severely non-normal or
       absence of independent samples for the cross-
        validation
   Analysis: Bootstrap = 500 (standard/residual);
   Output: stand cinterval;
                 Mplus

   www.statmodel.com
    Multiple Programs Integrated

   SEM of both continuous and categorical
    variables
   Multilevel modeling
   Mixture modeling (identify hidden groups)
   Complex survey data modeling
    (stratification, clustering, weights)
   Modern missing data treatment
   Monte Carlo Simulations
          Types of Mplus Files

   Data (*.dat, *.txt)
   Input (specify a model, <=80
    columns/line)
   Output (automatically produced)
   Plot (automatically produced)
               Data File Format

   Free
       Delimited by tab, space, or comma
       All missing values must be flagged with
        special numbers / symbols
       Default in Mplus
       Computationally slow with large data set
   Fixed
     Format = 3F3, 5F3.2, F5.1;
             Mplus Input

 DATA: File = ?
 VARIABLE: Names=?; Usevar=?; Categ=?;

 ANALYSIS: Type = ?

 MODEL: (BY, ON, WITH)

 OUTPUT: Stand;
     Model Specification in Mplus
   BY  Measured by (F by x1 x2 x3 x4)
   ON  Regressed on (y on x)
   WITH  Correlated with (x with y)
   XWITH  Interact with (inter | F1 xwith F2)
   PON  Pair ON (y1 y2 on x1 x2 = y1 on x1; y2 on x2)
   PWITH  pair with (x1 x2 with y1 y2 = x1 with y1; y1
    with y2)
          Default Specification

   Error or residual (disturbance)
   Covariance of exogenous variables in CFA
   Certain covariances of residuals (z2)



                         z1           z2
               Graphic Model

y1   y2   y3       y7    y8             y9



     F1                  F3

                                             y13   y14        y15

                              d3

                                                   F5

     F2
                                   d4
                                                         d5

y4   y5   y6            F4



                  y10   y11         y12
             Model Specification

   Model:
      f1 by y1-y3;
      f2 by y4-y6;
      f3 by y7-y9;
      f4 by y10-y12;
      f5 by y13-y15;
      f3 on f1 f2;
      f4 on f2;
      f5 on f2 f3 f4 ;     MeaErrors are au
                        Practice
   Prepare two data files for Mplus
       Mediation.sav
       Aggress.sav
   Model Specification
   Single Group CFA
   Examine Mediation Effects in a Full SEM
   Run a MIMIC model of aggressions
   Multigroup CFA to examine measurement
    invariance
                    SPSS Data

   Missing Values?
       Leave as blank to use fixed format
       Recode into special number to use free format
   Save as & choose file type
       Fixed ASCII
       Free *.dat (with or without variable names?)
   Copy & paste variable names into Mplus
    input file
             Mplus Interface

   Activate Mplus Program
   Language Generator
   Manually Create An Input File
             Four Separate Files
                  (Mplus)
   Data
       best prepared with other programs
   Input
       Need manually specify a model
   Output
       automatic output window
   Graph
       automatic graph file
                        Data File
   Individual Case Data (*.dat or *.txt)
       Free Format (default)
          Variable separated by tab, comma, or space
          All missing values must be flagged with special
           symbols or numbers).
       Fixed Format
          Variable takes fixed space, e.g. 2F2, 4F6, 5F6.3
          Missing values can be left blank

   Summary Data
          Variance-Covariance matrix, means
          Correlation matrix, standard deviation, means
                  SPSS  Mplus

   Open “Antisocial.sav” with SPSS
   Work in Variable Window
   Option 1: Fixed Format
       Change Format to Simplify
       Save as ? (Type=Fixed ASCII)
   Option 2: Free Format
       Recode missing values
       Save as ? (Tab-delimited)
               Fixed Format

   F3 4F3.2 25F1
    F3      One variable that takes 3
    columns
    4F3.2  4 variables, each has 3 column
               with 2 decimals with a column
     25F1  25 variables, each uses on
              column
       Copy SPSS Variable Names
              into Mplus
   Menu: Utilities 
   Variables 
   Highlight to select variables
   Paste 
   Go to Syntax Window 
   Select & Copy 
   Paste under Names Are in Mplus input
    file
   Practice now
               SAS  Mplus
   Assign flags to missing values (use Array
    code for many variables)
   Proc Export Data = Data File
    Outfile = “Mplus input file folder\*.dat”
    DBMS = dlm Replace;
    Run;
   Practice
       Fixed Format Out of SAS

   Open with SPSS
   Save as Fixed Format
   Practice
                Stata2mplus

   Converting a stata data file to *.dat

Find out:
http://www.ats.ucla.edu/stat/stata/faq/stata
  2mplus.htm
          Modification Indices

   Lower bound estimate of the expected chi
    square decrease
   Freely estimating a parameter fixed at 0
   MPlus Output: stand Mod(10);
   Start with least important parameters
    (covariance of errors)
   Caution: justification?
      Indirect (Mediation) Effect

     A*B




   Mplus specification:
    Model Indirect: DV IND Mediator IV;
              Model Comparison
   Model:
       Probabilistic statement about the relations of
        variables
       Imperfect but useful


   Models Differ:
       Different Variables and Different Relations
                                   (, , , )
       Same Variables but Different Relations
                                   (, , , )
             Nested Model
   A Nested Model (b) comes from general
    Model (a) by

       Removing a parameter (e.g. a path)

       Fixing a parameter at a value (e.g. 0)

       Constraining parameter to be equal to another

   Both models have the same variables
               Test If A=B
y1   y2   y3        y7    y8             y9


               A
     F1                   F3

                                              y13   y14        y15
               B
                               d3

                                                    F5

     F2
                                    d4
                                                          d5

y4   y5   y6             F4



                   y10   y11         y12
                 Model Comparison via
                     2 Difference

  2 =        df =       (Nested model)
  2 =        df =       (Default model)
___________________________________
  2dif =      dfdif =   p = ? (a single tail)

Find p value at the following website:
http://www.tutor-homework.com/statistics_tables/statistics_tables.html


Conclusion:
  If p > .05, there is no difference between the default model and nested
   model. Or the Hypothesis that the parameters of the two models are equal
   is not supported.
                   Practice

   Test if effect A=B
    Equality Constraints in Mplus

   Parameter Labels:
       Numbers
       Letters
       Combination of numbers of letters


   Constraint (B=A)
       F3 on F1 (A);
       F3 on F2 (A);
Run CFA with Real Data

                a3       e1

  Verbal
Aggression      a93      e2


                a94      e3




                a37      e4

 Physical
Aggression      a57      e5


                a90      e6
         Multigroup Analysis
VARIABLE:
   USEVAR = X1 X2 X3 X4;
   Grouping IS sex (0=F 1=M);
ANALYSIS: TYPE = MISSING H1;
MODEL:
  F1 BY X1 - X4;

MODEL M:
  F1 BY X2 - X4;

                                Note: sex is grouping variable
                                and is not used in the model.
      Why Measurement Invariance
              Matters?
   Xg1 = g1 + g1g1 + g1
   Xg2 = g2 + g2g2 + g2

   Xg1- Xg2= (g1 - g2) + (g1g1-g2g2) + (g1-g2)

   Xg1- Xg2 =          + (g1- g2)
      Test Measurement Invariance
                     Default Model
Model:
  F1 By a3
          a93(1)
          a94 (2);
  F2 By a37
          a57 (3)
          a90 (4);
Model M:
  F1 By
          a93 ()
          a94 ();
  F2 By
          a57 ()
          a90 ();
Output: stand;
                                     Note: Reference indicators in
                                     the second group are omitted.
      Test Measurement Invariance
                     Constrained Model
Model:
  F1 By a3
          a93(1)
          a94 (2);
  F2 By a37
          a57 (3)
          a90 (4);
Model M:
  F1 By
          a93 (1)
          a94 (2);
  F2 By
          a57 (3)
          a90 (4);
Output: stand;
                                    Note: Reference indicators in
                                    the second group are omitted.
        Estimate with Real Data
                                 a3    e1


                 Verbal
                                 a93   e2
               Aggression

Sex
                                 a94   e3
                            d1
Race1

                            d2
Race2                            a37   e4


                Physical
                                 a57   e5
               Aggression


                                 a90   e6
SEM with Categorical
    Indicators

      Session II
       Problems of Ordinal Scales

   Not truly interval measure of a latent
    dimension, having measurement errors
   Limited range, biased against extreme
    scores
   Items are equally weighted (implicitly by
    1) when summed up or averaged, losing
    item sensitivity
    Criticisms on Using Ordinal Scales
         as Measures of Latent Constructs
   Steven (1951): …means should be avoided because
    its meaning could be easily interpreted beyond ranks.
   Merbitz(1989): Ordinal scales and foundations of
    misinference
   Muthen (1983): Pearson product moment correlations
    of ordinal scales will produce distorted results in
    structural equation modeling.
   Write (1998): “…misuses nonlinear raw scores or
    Likert scales as though they were linear measures will
    produce systematically distorted results. …It’s not only
    unfair, it is immoral.”
        Assumption of Categorical
              Indicators
   A categorical indicator is a coarse
    categorization of a normally distributed
    underlying dimension
Latent (Polychoric) Correlation
Categorization of Latent Dimension
          & Threshold 




          No                 Yes
                                           Y

  Never    m-1    Sometimes m    Often


     1         2      3      4      5
                 Threshold

   The values of a latent dimension at which
    respondents have 50% probability of
    responding to two adjacent categories
   Number of thresholds = response
    categories – 1. e.g. a binary variable has
    one threshold.
   Mplus specification [x$1] [y$2];
Normal Cumulative Distributions
Measurement Models of Categorical
      Indicators (2P IRT)

Probit: P (=1|) = [(- + )-1/2 ]
        (Estimation = Weight Least Square with df adjusted for

        Means and Variances)



Logistic: P (=1|) = 1 / (1+ e-(- + ))
           (Maximum Likelihood Estimation)
           Converting CFA to IRT
               Parameters
   Probit Conversion
       a = -1/2
       b = /
   Logit Conversion
       a = /D      (D=1.7)

       b = /
             One Parameter
         Item Response Theory Model

   Analysis: Estimator = ML;
   Model:
           F by X1@1.7
                 X2@1.7
                 …
                 Xn@1.7;
         Sample Information 

   Latent Correlation Matrix  
    equivalent to covariance matrix of
    continuous indicators
   Threshold matrix Δ 
    equivalent to means of continuous
    indicators
          Stages of Estimation

   Sample information:
    Correlations/threshold/intercepts
    (Maximum Likelihood)
   Correlation structure (Weight Least
    Square)
          g
     F=    (s(g)-(g))’W(g)-1(s(g)-(g))
          g=1
                 W-1 matrix

   Elements:
    S1 intercepts or/and thresholds
    S2 slopes
    S3 residual variances and correlations
   W-1 : divided by sample size
                Estimation

   WLSMV:
    Weight Least Square estimation 2 with
    degrees of freedom adjusted for Means
    and Variances of latent and observed
    variables
                Baseline Model

   Estimated thresholds of all the categorical
    indicators
   df = p 2– 3p    (p = 3 of polychoric correlations)
          Data Preparation Tip

   Categorical indicators are required to have
    consistent response categories across
    groups

   Run Crosstab to identify zero cells

   Recode variables to collapse certain
    categories to eliminate zero cells
Inconsistent Categories

          1    2    3    4    5
 Male     60   80   43   4    0
 Female   57   86   32   16   2




          1    2    3    4
 Male     60   80   43   4
 Female   57   86   32   18
    Specify Dependent Variables
           as Categorical
   Variable:
       Categ = x1-x3;
       Categ = all;
                Reporting Results
   Guidelines:
       Conceptual Model
       Software + Version
       Data (continuous or categorical?)
       Treatment of Missing Values
       Estimation method
       Model fit indices (2(df), p, CFI, TLI, RMSEA)
       Measurement properties (factor loadings + reliability)
       Structural parameter estimates (estimate,
        significance, 95% confidence intervals)
        ( = .23*, CI = .18~.28)
Reliability of Categorical Indicators
                         (variance approach)



 = (i)2/ [(i)2 + 2],              where

     (i)2 = square (sum of standardized factor loadings)
      2 = sum of residual variances
          i = items or indicator
         2 i = 1 - 2

 McDonald, R. P. (1999). Test theory: A unified treatment (p.89) Mahwah,
   New Jersey: Lawrence Erlbaum Associates.
         Calculator of Reliability
                (Categorical Indicators)


   SPSS reliability data
   SPSS reliability syntax
      Trouble Shooting Strategy

   Start with one part of a big model
   Ensure every part works
   Estimate all parts simultaneously
           Important Resources

   Mplus Website:
    www.statmodel.com
   Papers:
    http://www.statmodel.com/papers.shtml
   Mplus discussions:
    http://www.statmodel.com/cgi-bin/discus/discus.cgi

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:2/18/2013
language:Unknown
pages:114