Embed
Email

Assessing Fit in SEM

Document Sample
Assessing Fit in SEM
MK 9200 Fall 2007 Class 7



Assessing "Fit" in SEM



In regression, we evaluate a model by comparing Y, ^

Y



In SEM, we compare S, , using a discrepancy function, most commonly either

^

ML:



$  tr  S $ 1   log S  p

F  log    

 



or GLS:

1 



1 $ 



2

F  tr I  S 

2  



When assumptions hold, all such discrepancy functions are asymptotically

equivalent to a weighted sum(s) of squared discrepancies:





F   s    'W 1  s   

ˆ ˆ



Under MVN, the elements of the matrix W should be particular functions of the

elements of S. LISREL computes both a “ML fit function” F and a “normal

theory weighted” F, and highlights the latter.





When assumptions fail, so does the assumption of asymptotic equality. In

particular, when the model is false, GLS and ML may compensate for one error

by imposing another. GLS may be more prone to this than ML.

Fit Indices: 2



The original fit statistic was 2

under assumptions, (n - 1) * min(F) ~ 2



H0: proposed model fits exactly in the population

HA: proposed model does not fit exactly



2 near the model's DF = a large p-value = fail to reject the proposed model





Note: 2 only tests restrictions in the model. It does not test:



1. free parameters;

2. conceptual identification of factors;

3. existence of alternate models that are as good as or better than yours.

Problems with the 2



1. supporting assumptions rarely hold



Bentler has long supported adjusting 2 for nonnormality. This is an option in

LISREL, when the needed informationis available.



2. interesting models do not fit exactly in the pop'n



Long a consideration in EFA, where TLI and AIC originated, and where the goal

is to assess the “??” of a given model.



If fit is "approximate," the 2 fit statistic follows a "noncentral 2" ( or 2' )

distribution, with DF as before but with "noncentrality parameter" λ, estimated as

 = 2 - DF

^



Here λ represents the systematic error part of 2 while DF represents the random

(sampling) error part.





The λ estimate is a direct function of N, unless λ = 0.



This complicates fit assessment.

Early alternatives to 2



2 / DF ratio—a hopelessly flawed index. Inspired by the notion that the

variance of a central 2 = 2 * DF.



Goodness of Fit Index (GFI



GFI  1 

 s    'W 1  s   

ˆ ˆ

s 'W 1 s

Adjusted Goodness of Fit Index (AGFI)

p  p  1

AGFI  1  1  GFI 

2 DF

R.O.T. for GFI, AGFI: for a long time, .90 or higher

Chosen to be “soft.”

Known to be biased:

upward with N

downward with DF large relative to n.

Root Mean Square Residual

 p i  s   2 

RMR   2   

ij ij



 i 1 j 1 p  p  1 

 

If fit is perfect, then residuals reflect only random sampling error and are small.

The value of RMR is not directly affacted by n.



The “standardized RMR” or SRMR is this value rescaled to what it would be if

all observed variables had a variance of 1.



R.O.T.: standardized RMR .05

Dealing with Indeterminacy in Model Evaluation:

Strategy1: shift emphasis from absolute fit to relative fit



1A. Relative to a competing model:



2 difference tests--



if models A and B are nested AND both fit well,



and DFA > DFB,



then 2A - 2B ~ 2 with DF = DFA - DFB



Significant 2 difference favors less restricted model







There is a good analogy to regression, where we start with one model and its R2,

then add predictors and examine the improvement in R2. Here, we start with a

model, examine 2, and then add paths and examine the improvement in 2.

For non-nested models--





Akaike Information Criterion (AIC)

(one formula)



AIC   2  2t

where t is the number of free parameters. Favor the model with the smallest AIC.

The statistical distribution of AIC is unknown.





AIC (and related indices) accurately assess the ability of a model to cross-validate

in a new sample of the same size



However . . . different models cross-validate better at different sample sizes



These indices favor more saturated models as N rises, all other things equal.



Haughton, Oud and Jansen (1998) suggested that BIC*:



 n 

BIC*   2   t

 2 



did a better job of picking the best model, under a range of conditions.



More recently, Levy and Hancock (2007) have made a “supermodel” approach,

due to Vuong and to Golden, more accessible to applied researchers.

Strategy 1B. Evaluate fit relative to a worst case:



The "independence model"--all measures are uncorrelated



An older tool: Tucker-Lewis Index



2i/dfi - 2k/dfk

TLI = 2 = --------------------

2i/dfi - 1





Bentler Comparative Fit Index (CFI)



ˆ

k

CFI  1 

ˆ

i



where i = max (i, k, 0)

^



k = max (k, 0)

^







R.O.T (now out of date).: .90 or higher**

Strategy2: evaluate approximate fit, rather than exact fit



Root Mean Square Error of Approximation (RMSEA)



ˆ

k

RMSEA 

 n  1  DF

Assesses misfit per DF



Can test hypotheses / form confidence intervals



.00 -- .05 implies good approximate fit**

.10 or above indicates poor approximate fit



H0: RMSEA  .05 -- "test of close fit"

Hu and Bentler--currently the most authoritative word





Computed rejection rates for indices under various conditions of:

--type of misspecification (under-specified only)

--nonnormality



Finding: under conventional ROT, all indices rejected too few misspecified

models (Type II error).

Some indices rejected too many correct models (Type I error).



They asserted that SRMR most sensitive to errors in structure, while CFI and

RMSEA among most sensitive to errors in loadings. More recent papers

effectively question this assertion.



Their “finding,” however, suggested a combinatorial rule



Hu and Bentler's recommendations (for ML):



SRMR  .08 AND (CFI  .95 OR RMSEA  .06)



but with specific numbers dependent on n and normality



Also argue that RMSEA and TLI may over-reject correct models at low n.

Power Analysis: MacCallum, Browne and Sugawara (1996)



ˆ

k

RMSEA 

 N  1  DF

RMSEA has a known distribution, tied to the 2' distribution.



So we can construct hypothesis tests around chosen values of RMSEA—though

those values remain arbitrary.



We can even swap H0 and Ha.



Exact fit (conventional 2 ):

H0: RMSEA ≤ 0.00

Ha: RMSEA > 0.00



Close fit:

H0: RMSEA ≤ 0.05

Ha: RMSEA > 0.05



Not-close fit:

H0: RMSEA ≥ 0.05

Ha: RMSEA < 0.05



Now H0 is not our model, and we regain the social science convention of trying

to defeat H0 through precision and power.

Power: the probability of correctly rejecting a false H0.



Power is a function of both H0 (now presumed false) and a presumed true

value—if H0 is RMSEA ≤ 0.05 and the true value is RMSEA = 0.051, your

power will never be very high. So you must choose a reasonable and meaningful

alternative value.



MBS recommend these alternative values:

Test of close fit: RMSEA = 0.08

Test of not-close fit: RMSEA = 0.01



Either way, power is computed by comparing the 2' distribution implied by

RMSEAalt with  c2' , the critical value under H0 for a given  significance level.

  Pr   alt   c2' 

2'







  Pr   alt   c2' 

2'





Each distribution is defined by its degrees of freedom (DF, which we know) and

its noncentrality parameter, equal to:

  (n  1)* DF * RMSEA2

So:



1. Plug in the H0 (your model) value for RMSEA and compute 0.



2. Find the critical value,  c2' , corresponding to DF, 0 and .

3. Plug in the alternate value for RMSEA and compute a.

4. Find, as appropriate:

  Pr   alt   c2' 

2'







  Pr   alt   c2' 

2'





Problem: may need to approximate this value, unless you have a tool like

Mathematica that provides values for noncentral 2.

Selecting Necessary Sample Size to Achieve Power



Even given the two values for RMSEA, DF, , and desired power , it is not

possible to directly compute nmin, the minimum sample size.



nmin must be found iteratively, by successive approximation.



Lessons:



1. power is a function of both DF and n.

2. when DF are low, n must be very high to obtain reasonable power.

3. low DF = few indicators per factor, so discarding indicators reduces power.

4. with high DF, very low n might produce sufficient power, but still need n

large enough for asymptotic properties to hold.



Keep in mind, power is also affected by the strength of relationships, with

stronger relations producing more power.


Related docs
Other docs by stevenTerrell
Instructional Pilot Project
Views: 5  |  Downloads: 0
062500 기도문
Views: 8  |  Downloads: 0
group_prefs
Views: 8  |  Downloads: 0
Simplify each expression without a calculator
Views: 177  |  Downloads: 0
Diminutives
Views: 29  |  Downloads: 1
Michelle Dalton RN, MS
Views: 8  |  Downloads: 0
Intellectual Property Today
Views: 4  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!