VIEWS: 5 PAGES: 13 POSTED ON: 5/14/2010
Biometrics 60, 624–636 September 2004 Maximum Likelihood Analysis of a General Latent Variable Model with Hierarchically Mixed Data Sik-Yum Lee1,∗ and Xin-Yuan Song1,2∗∗ 1 Department of Statistics, The Chinese University of Hong Kong, Hong Kong 2 Sun Yat-Sen University, Guangzhou, China ∗ email: sylee@sparc2.sta.cuhk.edu.hk ∗∗ email: xysong@u4000.sta.cuhk.edu.hk Summary. A general two-level latent variable model is developed to provide a comprehensive framework for model comparison of various submodels. Nonlinear relationships among the latent variables in the struc- tural equations at both levels, as well as the eﬀects of ﬁxed covariates in the measurement and structural equations at both levels, can be analyzed within the framework. Moreover, the methodology can be applied to hierarchically mixed continuous, dichotomous, and polytomous data. A Monte Carlo EM algorithm is implemented to produce the maximum likelihood estimate. The E-step is completed by approximating the conditional expectations through observations that are simulated by Markov chain Monte Carlo methods, while the M-step is completed by conditional maximization. A procedure is proposed for computing the complicated observed-data log likelihood and the BIC for model comparison. The methods are illustrated by using a real data set. Key words: Covariates; Dichotomous and polytomous data; MCEM algorithm; Model comparison; Nonlinear structural equations; Path sampling; Two-level latent variable model. 1. Introduction As the previously mentioned software packages are devel- Latent variable methods are widely appreciated tools in med- oped on the basis of standard models and data structures, ical and psychosocial research. The development and analysis they cannot be applied to many complex real situations. of latent variable models for investigating relationships be- Hence, generalizations in diﬀerent directions have recently tween observed variables and latent variables have received been developed. Sammel and Ryan (1996) proposed an ex- a great deal of attention. One approach (see Sammel, Ryan, ploratory factor analysis (EFA) model with ﬁxed covariates and Legler, 1997; Dunson, 2000; among others) focused on and continuous data. Early work on EFA with dichotomous analyzing the contribution of the latent variables to the mean data falls into two directions: those which estimate the corre- of the observed variables, as well as the eﬀects of ﬁxed co- lations from the dichotomous data (e.g., Christoﬀerson, 1975) variates on the observed and/or latent variables, based on an and those which ﬁt latent probit models (e.g., Bock and exponential family of distributions. Structural equation mod- Aitkin, 1981). Recently, Meng and Schilling (1996) developed els (SEMs; Bollen, 1989; among others) focused on exploring a Monte Carlo EM (MCEM; Wei and Tanner, 1990) algo- latent factors that cause the behavior of observed variables, rithm for this model. Shi and Lee (2000) used a similar al- and investigated the relationships of latent variables among gorithm for a conﬁrmatory factor analysis (CFA) model with themselves and with the observed variables. At present, there mixed continuous and polytomous data. Recently, much at- are more than a dozen standard packages for ﬁtting SEMs, tention has been devoted to nonlinear SEMs (see Schumacker such as LISREL VIII (J¨reskog and S¨rbom, 1996) and EQS o o and Marcoulides, 1998; Wall and Amemiya, 2000; Lee and (Bentler, 1995). SEMs have been widely used in medical re- Zhu, 2002; and references therein). For hierarchically struc- search though not to the extent that they have been used in tured data that violate the assumption of independence, Lee social and psychological research. In addition to those cited and Shi (2001) utilized MCEM to ﬁt factor analysis models by Bentler and Stein (1992), SEMs have recently been ap- with mixed continuous and polytomous data. Rabe-Hesketh, plied to investigate the behavior of twins in genetics (Dolan, Skrondal, and Pickles (2003) provided a multilevel SEM with Molenaar, and Boomsma, 1991) to analyze quality-of-life data software, GLLAM, that is more general in several ways than (Wang et al., 2002) and to analyze ecological and evolution- our proposed model; however, they did not consider nonlin- ary biology data (see Pugesek, Tomer, and von Eye, 2003, and ear variable terms. The above-mentioned contributions em- references therein among others). In this article, we concen- phasize estimation; overall few model comparison results have trate on the maximum likelihood (ML) analysis of a general been published. The main reason for this is probably the diﬃ- SEM. culties in evaluating the intractable observed-data likelihood. 624 Latent Variable Model with Hierarchically Mixed Data 625 In this article we investigate a general two-level SEM that are the observed continuous random vectors, while wgi = can accommodate the nonlinearities of the latent variables in (wgi1 , . . . , wgis ) and ygi = (ygi1 , . . . , ygit ) are unobservable the structural equations at both levels of the model, the ef- continuous random vectors such that wgi and ygi correspond fects of ﬁxed covariates on the measurement and structural to the dichotomous and polytomous observations, respec- equations at both levels, and hierarchical, mixed continuous, tively. The observable dichotomous vector d = (d1 , . . . , ds ) dichotomous, and polytomous data. In addition to the ML es- is related to its underlying vector w = (w1 , . . . , ws ) by timation, we propose an approach for computing the compli- cated observed-data log likelihood of the posited SEM. Conse- dj = 1 if wj > 0; and 0 otherwise, j = 1, . . . , s. (1) quently, statistics such as the likelihood ratio or the Bayesian information criterion (BIC) can be evaluated for model com- parison under a comprehensive framework. The observable polytomous vector z = (z 1 , . . . , zt ) is related The present work is motivated by a study (Morisky et al., to y = (y 1 , . . . , yt ) by 1998) of the eﬀects of establishments’ policies, knowledge, and attitudes on condom use among Filipino commercial sex zj = h if αjh < yj ≤ αjh+1 , for h = 0, . . . , kj − 1; workers (CSWs). The primary concern is ﬁnding an AIDS- j = 1, . . . , t, (2) preventative intervention for Filipino CSWs. The data set was collected from female CSWs in 95 establishments (bars, night clubs, Karaoke TV, and massage parlors) in Philippine where {−∞ = αj0 < αj1 < · · · < αjkj = ∞} is the set of thresh- cities. The questionnaire included a large number of items, olds that deﬁnes the categories. In the CSW example out- covering the areas of knowledge, attitudes, beliefs, behavior, lined in the Introduction, the dichotomous variables, dj , cor- self-eﬃcacy for condom use, etc. Latent psychological determi- respond to three (s = 3) yes/no questions regarding condom nants such as CSWs’ risk behavior, knowledge, and attitudes usage while the polytomous variables, zj , relate to six (t = 6) associated with AIDS and condom use are important issues Likert scale questions regarding worry about AIDS and atti- to be assessed, and will be used for modifying the planned tude toward condom use. At the ﬁrst level, we assume that, intervention. The basic goal of our analysis is to establish a conditional on the group mean vg , random observations in well-ﬁtting SEM to study possible causal eﬀects on the im- each group have the following structure: portant latent variable “usage of condom, η.” In particular, taking the latent variables “worry about AIDS, ξ 1 ” and “at- ugi = vg + A1g cugi + Λ1g ω 1gi + ε1gi , titude toward condom use, ξ 2 ” as exogenous variables in the g = 1, . . . , G, i = 1, . . . , Ng , (3) structural equation with η as the endogenous variable, our aim is to investigate whether the incorporation of ﬁxed covariates and/or nonlinear terms of ξ 1 and ξ 2 in the structural equa- where cugi is a vector of ﬁxed covariates, A1g is a matrix of tion will give a better model than that with a linear structural coeﬃcients, Λ1g is a matrix of factor loadings, ω 1gi is a q 1 × 1 equation without covariates. As emphasized by Morisky et al. vector of latent variables, and ε1gi is a p × 1 vector of error (1998), establishments’ policies on condom use exert a strong measurements that is independent of ω 1gi and is distributed inﬂuence on CSWs, and hence it is necessary to develop a as N [0, Ψ1g ], where Ψ1g is a diagonal matrix. At the second two-level nonlinear SEM with ﬁxed covariates to take into ac- level, we assume that vg has the structure count the inﬂuence of establishments. Moreover, as manifest variables are measured in terms of dichotomous and polyto- vg = A2 cvg + Λ2 ω 2g + ε2g , g = 1, . . . , G, (4) mous variables, the discrete nature of the data set cannot be ignored. where cvg is a vector of ﬁxed covariates, A2 is a matrix of The proposed SEM is described in Section 2. Section 3 ad- coeﬃcients, Λ2 is a matrix of factor loadings, ω 2g is a q 2 × 1 dresses the ML estimation with an MCEM algorithm. A pro- vector of latent variables, and ε2g is a p × 1 vector of error cedure for model comparison is developed in Section 4, based measurements that is independent of ω 2g and is distributed on the key idea of path sampling (Gelman and Meng, 1998). as N [0, Ψ2 ], where Ψ2 is a diagonal matrix. In the CSW The newly developed methodology is illustrated with real ex- example, one ﬁxed covariate cvg that relates to the availabil- amples in Section 5. The article concludes with a summary ity of condoms in the establishment is incorporated in (4) discussion. (see Section 5). Moreover, the ﬁrst-level latent vectors are as- 2. A General Structural Equation Model sumed to be independent of the second-level latent vectors. However, as ugi and ugj are correlated, the usual independent The aim of this section is to introduce a two-level SEM with assumption on observations within the groups is avoided. It conditions for identiﬁcation for analyzing hierarchical, mixed follows from (3) and (4) that continuous, dichotomous, and polytomous data. To provide a comprehensive framework for model comparison, the pro- posed model includes the nonlinearities of the latent variables ugi = A2 cvg + Λ2 ω 2g + ε2g + A1g cugi + Λ1g ω 1gi + ε1gi . (5) and ﬁxed covariates at both levels. Suppose that there is a collection of p-variate random vec- The latent vectors ω 2g and ω 1gi are partitioned as ω 1gi = tors ugi , i = 1, . . . , Ng , within groups g = 1, . . . , G. The sample (η 1gi , ξ1gi ) and ω 2g = (η 2g , ξ 2g ) , respectively, where sizes Ng may diﬀer from group to group so that the data set η 1gi (q 11 × 1), ξ 1gi (q 12 × 1), η 2g (q 21 × 1), and ξ 2g (q 22 × 1) could be unbalanced. We suppose without the loss of general- are latent subvectors. To account for the nonlinearity among ity that ugi = (xgi , wgi , ygi ) , where xgi = (xgi1 , . . . , xgir ) latent variables and the eﬀects of ﬁxed covariates on the latent 626 Biometrics, September 2004 variables, the following nonlinear structural equations in the factor analysis model with continuous and polytomous vari- between-groups and within-groups models are considered: ables (Lee and Shi, 2001), but also a generalization of the multivariate probit model (e.g., Chib and Greenberg, 1998). η 1gi = B1g c1gi + Π1g η 1gi + Γ1g F1 (ξ 1gi ) + δ 1gi , (6) In most substantive applications of two-level models, it is and assumed that for g = 1, . . . , G, A1g = A1 , Λ1g = Λ1 , Ψ1g = Ψ1 , B1g = B1 , Π1g = Π1 , Γ1g = Γ1 , Φ1g = Φ1 , and Ψ1δg = Ψ1δ . η 2g = B2 c2g + Π2 η 2g + Γ2 F2 (ξ 2g ) + δ 2g , (7) For brevity, this assumption will be taken in the development where c1gi and c2g are ﬁxed covariates, δ 1gi and δ 2g are ran- of the methods in subsequent sections. However, generaliza- dom vectors of error measurements, B1g , B2 , Π1g , Π2 , Γ1g , tions to situations without this assumption are straightfor- and Γ2 are matrices of coeﬃcients, F1 (ξ 1gi ) = (f 11 (ξ 1gi ), . . . , ward. A summary of the notation is given in the Appendix. f 1a (ξ 1gi )) and F2 (ξ 2g ) = (f 21 (ξ 2g ), . . . , f 2b (ξ 2g )) are vector- valued functions with nonzero diﬀerentiable functions f1k and 3. Maximum Likelihood Estimation f2k , and usually a ≥ q 12 and b ≥ q 22 . In the CSW example, two We will implement an MCEM type algorithm to obtain ﬁxed covariates (c1gi1 , c1gi2 ) in relation to the knowledge of the ML estimates. Let Xg = (xg1 , . . . , xgNg ), and X = AIDS and AIDS test are incorporated in (6) (see Section 5). (X1 , . . . , XG ) be the observed continuous data; let Dg = We assume that I1 − Π1g and I2 − Π2 are nonsingular, and (dg1 , . . . , dgNg ) and D = (D1 , . . . , DG ) be the observed di- that the determinants of these two matrices are independent chotomous data; and let Zg = (zg1 , . . . , zgNg ) and Z = of Π1g and Π2 , respectively. The latter assumption is true for (Z1 , . . . , ZG ) be the observed polytomous data. Let W all nonrecursive models such that Π1g and/or Π2 are upper = (W1 , . . . , WG ) and Y = (Y1 , . . . , YG ) be the matri- (or lower) triangular matrices. It can be relaxed at the ex- ces of latent continuous measurements associated with D pense of more computational eﬀort in completing the M-step and Z, respectively; and let Ω1g = (ω 1g1 , . . . , ω 1gNg ), Ω1 = of the MCEM algorithm (see Lee and Zhu, 2002). Following (Ω11 , . . . , Ω1G ), Ω2 = (ω 21 , . . . , ω 2G ), and V = (v1 , . . . , vg ) be the usual assumptions in structural equation models, ξ 1gi and matrices of latent vectors at the within-groups and between- δ 1gi are assumed to be independently distributed as N [0, Φ1g ] groups levels. Inspired by the key ideas of data augmen- and N [0, Ψ1δg ], respectively, where Ψ1δg is a diagonal matrix. tation and the EM algorithm, these latent quantities are Similarly, it is assumed that ξ 2g and δ 2g are independently dis- augmented to the observed data in the analysis. This aug- tributed as N [0, Φ2 ] and N [0, Ψ2δ ], respectively, where Ψ2δ mentation scheme gives a complete data set (W, Y, V, is a diagonal matrix. However, due to the nonlinear functions Ω1 , Ω2 , X, D, Z) that is easier to deal with. Let α be in F1 and F2 , the distribution of ugi is not assumed to be the parameter vector of unknown thresholds and θ be the normal. parameter vector that contains the unknown distinct ele- The proposed model is not identiﬁed without imposing fur- ments in A1 , Λ1 , Ψ1 , A2 , Λ2 , Ψ2 , B1 , Π1 , Γ1 , Φ1 , Ψ1δ , B2 , ther conditions. For example, in measurement equation (1), Π2 , Γ2 , Φ2 , and Ψ2 , and Lc (W, Y, V, Ω1 , Ω2 , X, D, Z | the variance and thresholds associated with each polytomous α, θ) be the complete-data log-likelihood function with ﬁxed variable are not identiﬁable (Shi and Lee, 2000); the vari- covariates. ML estimates of α and θ are obtained via the EM ance and the corresponding row of Λ1g that are associated algorithm, which consists of the following steps at the lth it- with each dichotomous variable are not identiﬁed (Meng and eration. E-step: Evaluate Q(α, θ | α(l) , θ (l) ) = E{Lc (W, Y, Schilling, 1996); and the between-groups and within-groups V, Ω1 , Ω2 , X, D, Z | α, θ) | X, D, Z, α(l) , θ (l) }, where the ex- covariance structures are not identiﬁed. In addition, we can- pectation is taken with respect to the conditional distribution not allow intercepts to simultaneously exist in the measure- of (W, Y, V, Ω1 , Ω2 ) given (X, D, Z) at (α(l) , θ (l) ). M-step: ment and structural equations of the ﬁrst and second levels Determine (α(l+1) , θ (l+1) ) by maximizing Q(α, θ | α(l) , θ (l) ). of the model (see [5]). In general, there are no necessary and As Lc decomposes into two separable functions each with suﬃcient conditions to guarantee model identiﬁcation. How- distinct sets of parameters, we can write the E-step as ever, the conditions that are suggested in the literature for Q(α, θ | α(l) , θ (l) ) identifying various components of the current model can be (l) adopted. For example, to identify the parameters of a dichoto- = E L1 (W, Y, V, Ω1 , X, D, Z | α, θ 1 ) | X, D, Z, α(l) , θ 1 mous variable, the corresponding error variance is ﬁxed at a (l) preassigned value (Meng and Schilling, 1996; Lee and Song, + E L2 (V, Ω2 , X, D, Z | θ 2 ) | X, D, Z, θ 2 ≡ Q1 + Q2 , 2003). To identify the variance and thresholds of each polyto- (8) mous variable, αj 1 and αjkj−1 are ﬁxed at appropriate preas- signed values (Shi and Lee, 2000). To identify the covariance where structures in the ﬁrst- and second-level models, the common G Ng 1 practice in structural equation modeling of ﬁxing some ap- L1 = − (p + q1 ) log(2π) + log |Ψ1 | propriate elements in Λ1g , Λ2 , Π1g , Γ1g , Π2 , and/or Γ2 at 2 g=1 i=1 preassigned values is adopted. These are necessary conditions for identiﬁcation. Of course, one may use other conditions + log |Φ1 | − 2 log |I − Π1 | + ξ 1gi Φ−1 ξ 1gi 1 for identiﬁcation, though we stress the issue should be ap- + (ugi − vg − A1 cugi − Λ1 ω 1gi ) Ψ−1 1 proached on a problem-by-problem basis. The current SEM deﬁned in (3)–(7), together with the × (ugi − vg − A1 cugi − Λ1 ω 1gi ) mixed data structure, provides a comprehensive framework + [η 1gi − B1 c1gi − Π1 η 1gi − Γ1 F1 (ξ 1gi )] Ψ−1 1δ for models with latent variables. It is not only a combination of the nonlinear SEM (Lee and Zhu, 2002) and the two-level × [η 1gi − B1 c1gi − Π1 η 1gi − Γ1 F1 (ξ 1gi )] Latent Variable Model with Hierarchically Mixed Data 627 G Ng s Then, any αjh that takes a value in the interval [y ∗ jh , j,m + log I(wgij ∈ Aj ) y ∗ jh + 1) satisﬁes (11). A natural choice is the midpoint: j,m ∗ ∗ (l ) g=1 i=1 j=1 ˜ 1 αjh = (yj,mjh +1 + yj,mjh )/2. In the M-step, we evaluate αjh ˜ (1) (T ) t for every Y(l1 ) , and αjh is updated by (αjh + · · · + αjh )/T ˜ ˜ + log I[ygij ∈ (αjzj , αjzj +1 )] using draws {Y(l1 ) , l1 = 1, . . . , T } obtained from the E-step. j=1 Next, we have to solve the following system of equations: ≡ L11 + L12 , (9) ∂Q(α, θ | α(l) , θ (l) )/∂θ = 0. As Lc can be decomposed into L1 and L2 with separable parameter vectors θ 1 and θ 2 , this system of equations simpliﬁes to the following two equations, G ∂Q1 /∂θ 1 = 0 and ∂Q2 /∂θ 2 = 0, which can be solved sep- 1 L2 = − (p + q2 ) log(2π) + log |Ψ2 | arately. Due to the nonlinear structural equations at both 2 g=1 levels, they cannot be solved in closed form as in Shi and Lee (2000) or Lee and Shi (2001). As there are many parame- + log |Φ2 | − 2 log |I − Π2 | + ξ 2g Φ−1 ξ 2g 2 ters involved in Q1 as well as in Q2 , the Newton–Raphson or + (vg − A2 cvg − Λ2 ω 2g ) Ψ−1 2 scoring algorithms are complicated to implement. However, a closed form solution can be achieved with little computational × (vg − A2 cvg − Λ2 ω 2g ) eﬀort by the method of conditional maximizations (Meng and + [η 2g − B2 c2g − Π2 η 2g − Γ2 F2 (ξ 2g )] Ψ−1 2δ Rubin, 1993). The key idea is to solve each of ∂Qi /∂θ i = 0, i = 1, 2 conditional on other parameters; and then the conditional × [η 2g − B2 c2g − Π2 η 2g − Γ2 F2 (ξ 2g )] , (10) expectations involved are approximated by the corresponding sample means of the generated observations that are obtained in which I(y ∈ A) is an indicator function that takes the value from the E-step. The related technical details are presented of one if y fall in A, and zero otherwise; Aj is an appropriate in part (III) of the website cited above. s-dimensional cell with its jth side of the form (−∞, 0) or The selection of the sample size, say T, in the Monte Carlo [0, ∞]; θ 1 and θ 2 are the parameter vectors that contain E-step is an issue in the MCEM algorithm. To decrease the the structural parameters at the within-groups and between- Monte Carlo error at the E-step, T should be large. As it is groups models, respectively. The conditional expectations ˆ (l) ineﬃcient to start with a large T when θ is far from the ML are approximated by a suﬃciently large number of observa- tions simulated from the corresponding conditional distribu- estimate, it has been suggested that T should be increased tions via a hybrid algorithm that combines the Gibbs sam- from one iteration to the next (Wei and Tanner, 1990; Booth pler (Geman and Geman, 1984) and the Metropolis–Hastings and Hobert, 1999). One method is to take T = T 1 + T 2 l, (MH) algorithm (Metropolis et al., 1953; Hastings, 1970). for some positive integers T1 and T2 . As T1 is not related to Here, observations are sampled interactively from the follow- l, its size does not have a signiﬁcant impact. The choice of ing conditional distributions: p(W | Y, V, Ω1 , Ω2 , X, D, Z, T2 depends on the complexity of the underlying problem, and α, θ), p(Y | W, V, Ω1 , Ω2 , X, D, Z, α, θ), p(V | W, Y, Ω1 , should be approached on a problem-by-problem basis. For the Ω2 , X, D, Z, α, θ), p(Ω1 | W, Y, V, Ω2 , X, D, Z, α, θ), and rather complicated SEM that is analyzed in Section 5, we ﬁnd p(Ω2 | W, Y, V, Ω1 , X, D, Z, α, θ). These conditional dis- that the choice of T 2 = 15 is acceptable. Convergence can ei- tributions and the implementation of the MH algorithm for ther be monitored by ratios of the observed-data likelihood simulating observations are given in part (II) of the website: values at consecutive iterations (Meng and Schilling, 1996), http://www.sta.cuhk.edu.hk/sylee/MLLVM-HMD. computed via bridge sampling (Meng and Wong, 1996), or The conditional distributions p(W|·) and p(Y|·) are pro- computed by absolute or relative error of the parameter es- portional to univariate truncated normal distributions, and timates. Alternatively, Shi and Copas (2002) proposed the it is much easier and faster to draw observations from them following scheme without iteratively increasing T. After the than from the multivariate truncated normal distributions. mth iteration, they computed The M-step updates unknown parameters by maximizing 1 the conditional expectations obtained in the E-step. From L1 , ¯ (l) θ = (θ (l−m+1) + · · · + θ (l) ), (12) m we see that the thresholds which correspond to the polyto- mous variables and structural parameters in θ 1 are separable. and monitored convergence via the stopping rule: For given Hence, thresholds can be updated by maximizing the condi- small values δ 1 and δ 2 (e.g., 0.001), the procedure is stopped tional expectation of L12 . Following the idea of Shi and Lee if (2000), or Lee and Shi (2001), for all g, i, and j, we ﬁnd thresh- olds that satisfy ¯ (l) ¯ (l−γ0 ) θ −θ (13) ¯ (l−γ0 ) + δ1 θ I[ygij ∈ (αjzj , αjzj +1 ]] = 1. (11) For the jth component of ygi , let {y ∗ : n = 1, . . . , N } be the j,n is smaller than some predetermined small value δ 2 . To avoid order statistics of {ygij : g = 1, . . . , G; i = 1, . . . , Ng }, where the danger of stopping early, a value of γ 0 larger than 2 is N = N 1 + . . . , NG . Moreover, let mjh be the total number of suggested. Convergence is claimed after the stopping rule is {zgij < h, g = 1, . . . , G; i = 1, . . . , Ng } for h = 0, . . . , kj − 1. ¯ (l) satisﬁed for several consecutive iterations; θ is then taken 628 Biometrics, September 2004 to be the ML estimate. Shi and Copas (2002) argued that above integral over t. Speciﬁcally, we select ﬁxed grids {t(k) ; for a suﬃciently large m, the average of the Monte Carlo er- k = 0, . . . , K} such that t(0) = 0 < t(1) < · · · < t(K+1) = 1, and rors in (12) is negligible. To reduce the bias, one should take compute λ by an appropriate m which can control the Monte Carlo errors K within a bearable limit. In the example given in Section 5, 1 ¯ ¯ λ= (t(k+1) − t(k) )(U(k+1) + U(k) ), (17) we take m = 50. One can use the method in Shi and Copas 2 k=1 (2002), or take T = T 1 + T 2 l, or use a combination of both J where U(k) = J −1 j=1 U (Hobs , Hmis , t(k) , β), in which {Hmis , ¯ (j) (j) as in our example. Finally, standard error estimates are pro- duced by approximating the Louis (1982) formulae at the ML j = 1, . . . , J} are simulated observations from p(Hmis | Hobs , estimates via a suﬃciently large number of observations that t(k) , β). It follows from (16) that log z(1) = λ + log z(0). are obtained at the E-step of the ﬁnal iteration, together with Hence, if we can link M1 with an M0 , chosen to have an newly generated observations, if necessary. easily calculated observed-data likelihood, we can obtain the observed-data likelihood under M1 . 4. Model Comparison Importance sampling and bridge sampling (see Meng and The main goal of this section is to establish a method for Wong, 1996) are alternatives for computing normalizing con- computing the observed-data likelihoods associated with spe- stants. Importance sampling approximates [z(1)/z(0)] directly ciﬁc models of interest, or their ratios, so that the BIC can by two points, namely 0 and 1. Bridge sampling utilizes a sen- be evaluated for model comparison (or hypothesis testing). sible choice of a bridge density at a point t ∗ in [0, 1]. It ﬁrst Our development utilizes the key idea of path sampling estimates [z(t ∗ )/z(0)] and [z(t ∗ )/z(1)], then takes the ratio to (Gelmen and Meng, 1998) for computation (ratios) of nor- cancel z(t ∗ ). Hence, its performance is expected to be better malizing constants of probability models. Using more generic than importance sampling (Meng and Wong, 1996). Gelman terms, let Hobs be the observed data and Hmis be the hypo- and Meng (1998) argued that path sampling may be better thetical missing data that contain the latent quantities. Let than bridge sampling because it is an extension of the bridge M1 and M0 be competitive models, such that M 0 ⊂ M 1 , and sampling from a bridge to a theoretically inﬁnite number of let β be the parameter vector of M1 . As p(Hmis | Hobs , β) = bridges. Another key advantage is that it is on logarithm scale, p(Hmis , Hobs | β)/p(Hobs | β), the observed-data likelihood can which is generally more stable. be treated as the normalizing constant of p(Hmis | Hobs , β) To apply this method to the current model, we let β = with the complete-data likelihood as the unnormalized den- (α, θ), Hobs = (X, D, Z), and Hmis = (W, Y, V, Ω1 , Ω2 ). sity. Consider the following class of densities with a continuous Hence, observations that are simulated in the E-step can be parameter t in [0,1]: ¯ used for computing Uk in (17). We can compare a sequence of hierarchically nested models or nonnested models. To give a more speciﬁc illustration, we apply the method to ﬁnd the z(t) = p(Hobs , Hmis | t, β) dHmis , (14) observed-data likelihood of the following model: M2 : ugi = µ + A2 cvg + Λ2 ω 2g + ε2g + A1 cugi + Λ1 ω 1gi + ε1gi where p(Hobs , Hmis | t, β) is a density function for each t, p(Hobs , Hmis | a, β) = p(Hobs , Hmis | Ma , β) with p(Hobs , η 1gi = B1 c1gi + Π1 η 1gi + Γ1 F1 (ξ 1gi ) + δ 1gi (18) Hmis | Ma , β) denotes the complete-data likelihood under model Ma , a = 0, 1. We have, η 2g = B2 c2g + Π2 η 2g + Γ2 F2 (ξ 2g ) + δ 2g . (19) To facilitate the computation, we make use of the following z(a) = p(Hobs , Hmis | Ma , β)dHmis intermediate model: M 1 : ugi = µ + ε2g + ε1gi . The linked model that corresponds to z(t) is deﬁned as M t21 : ugi = µ + = p(Hobs | Ma , β), a = 0, 1. tA2 cvg + tΛ2 ω 2g + ε2g + tA1 cugi + tΛ1 ω 1gi + ε1gi , together with the structural equations (18) and (19) only for t = 0. Taking the logarithm and then diﬀerentiating both sides of Hence, when t = 1, M t21 becomes M2 with the structural (14) with respect to t, it can be shown by similar reasoning equations for ω 1gi and ω 2g ; and when t = 0, M t21 becomes as in Gelman and Meng (1998) that M1 without the structural equations. An approximated value of λ21 that gives log {p(X, D, d log z(t) d log p(Hobs , Hmis | t, β) Z | M 2 , α, θ)/p(X, D, Z | M 1 , α, θ)} can be obtained with the = E∗ , (15) dt dt proposed method. However, the computation of the observed- data log likelihood that corresponds to M1 is not straightfor- where E∗ denotes the expectation with respect to the sampling ward because the covariance matrix of ugi under M1 is not di- distribution p(Hmis | Hobs , t, β). Integrating (15) from 0 to 1, agonal. Hence, we treat M1 as a speciﬁc latent variable model: we have ugi = µ + I ε2g + ε1gi by treating ε2g as a vector of latent vari- 1 ables with an identity loading matrix I. We then link it with z(1) the following simple model, the observed-data log likelihood λ = log = E∗ [U (Hobs , Hmis , t, β)] dt, (16) z(0) 0 of which can be easily computed: M 0 : ugi = µ + ε1gi . The linked model is deﬁned as M t10 : ugi = µ + tIε2g + ε1gi . An ap- where U (Hobs , Hmis , t, β) = d log p(Hobs , Hmis | t, β)/dt. An proximated value of λ10 that gives log {p(X, D, Z | M 1 , α, θ)/ estimate of λ can be obtained by numerical evaluation of the p(X, D, Z | M 0 , α, θ)} can be similarly obtained. As M1 and Latent Variable Model with Hierarchically Mixed Data 629 M0 are relatively simple models, the approximation of λ10 is 138, 257, and 229, while the cell frequencies of the polyto- expected to be good. Finally, mous variables ranged from 11 to 369. One ﬁxed covariate cvg that is associated with the question “Are condoms avail- log p(X, D, Z | M2 , α, θ) = λ21 + λ10 + log p(X, D, Z | M0 , α, θ). able at your establishment for the workers?” is incorporated (20) in the between-groups measurement equation, while two ﬁxed Observed-data likelihood that corresponds to a simpler covariates, c1gi 1 and c1gi 2 , which are associated with the ques- model can be similarly computed as a special case of the above tions “How much do you think you know the disease called analysis. For example, for the following single-level nonlinear AIDS?” and “Have you ever had an AIDS test?,” are incorpo- SEM in Lee and Zhu (2002), rated in the within-groups structural equation. The explicit link of the observed variables and covariates to their questions M3 : ugi = µ + Λ1 ω 1gi + ε1gi , are given in part (I) of the website mentioned in Section 3. According to the methods given in Shi and Lee (2000) and η 1gi = Π1 η 1gi + Γ1 F1 (ξ 1gi ) + δ 1gi , (21) Meng and Schilling (1996), in identifying the model, the mean we can link it with M0 deﬁned above via M t30 : ugi = µ + vector, the smallest and the largest thresholds that correspond tΛ1 ω 1gi + ε1gi , together with the structural equation only for to the polytomous variables, and the diagonal elements of t = 0 as before. The observed-data likelihood can be obtained Ψ1 that correspond to the dichotomous variables are ﬁxed. In similarly. Likewise, the observed-data likelihood that corre- this example, the mean vector that corresponds to the poly- sponds to the two-level factor analysis model (Lee and Shi, tomous variables is ﬁxed at 0, the thresholds αj 1 and αj4 , j = −1 2001) can be obtained. 1, . . . , 6, are ﬁxed at αjh = Φ∗ (ρjh ), where Φ∗ is the distribu- For any two competitive models M1 and M2 , we can use the tion function of N[0, 1], and ρjh are the observed cumulative above-mentioned method to compute their observed-data log marginal proportions of the categories with zj < h. The di- likelihoods, and then the following BIC12 : agonal elements ψ 11 , ψ 12 , and ψ 13 of Ψ1 are ﬁxed at 0.1. A two-level model with the following measurement equations is BIC12 = −2{log p(X, D, Z | M1 , α(1) , θ (1) ) considered: − log p(X, D, Z | M2 , α(2) , θ (2) )} M0 : ugi = vg + µ + Λ1 ω 1gi + ε1gi , + (d1 − d2 ) log(N1 + · · · + NG ), (22) vg = A2 cvg + Λ2 ω 2g + ε2g , (23) where (α(a) , θ (a) ) is the da × 1 parameter vector of Ma . The where µ = (µ1 , µ2 , µ3 , 0∗ , . . . , 0∗ ) is the mean vector and A2 = following criterion can be used for interpretation of BIC12 (see (a21 , . . . , a29 ). In this example, parameters with an asterisk Kass and Raftery, 1995): are ﬁxed for achieving an identiﬁed model. For the between- groups model, a factor analysis model with no structural equa- tion is considered. The speciﬁcations of this model are BIC12 <0 0 to 2 2 to 6 >6 λ2,11 λ2,21 λ2,31 0∗ 0∗ 0∗ 0∗ 0∗ 0∗ Support No conclusion Support Strongly M1 M2 support M2 Λ2 = 0∗ 0∗ 0∗ λ2,42 λ2,52 λ2,62 0∗ 0∗ 0∗ , 0∗ 0∗ 0∗ 0∗ 0∗ 0∗ λ2,73 λ2,83 λ2,93 1.0∗ sym Φ2 = φ2,21 1.0∗ , and 5. An Illustrative Example: The AIDS Study φ2,31 φ2,32 1.0∗ The proposed methodology is used to analyze the AIDS data set. Nine manifest variables, of which the ﬁrst three are di- Ψ2 = (ψ21 , ψ22 , ψ23 , ψ24 , ψ25 , ψ26 , ψ27 , ψ28 , ψ29 ), chotomous and the last six are polytomous with a ﬁve-point where Φ2 is the covariance matrix of ω 2g . Although other scale are selected as indicators for the latent variables “usage structures for Λ2 can be considered, we choose the above of condom, η,” “worry about AIDS, ξ 1 ,” and “attitude toward common nonoverlapping structure in factor analysis for clear condom use, ξ 2 .” The questions that correspond to the ﬁrst interpretations of the latent variables. For the within-groups three dichotomous questions are as follows. (1) Have you ever model, the factor loading matrix Λ1 = (λ1,kh ) takes the same used a condom? (2) Did you use a condom the last time you form as Λ2 , except that λ1,11 , λ1,42 , and λ1,73 are ﬁxed at 1.0 had sex? (3) Have you ever put a condom on a customer? The for achieving model identiﬁcation. Let (η 1gi , ξ 1gi1 , ξ 1gi2 ) be questions corresponding to the last six polytomous variables a partition of ω 1gi corresponding to the structure of Λ1 . To are as follows. (4) How much of a threat do you think AIDS investigate the eﬀects of the ﬁxed covariates and the interac- is to the health of people? (5) What are the chances that tion of ξ 1gi 1 and ξ 1gi 2 on η 1gi , the structural equation is taken you might get AIDS? (6) How worried are you about getting as AIDS? (7) Do you agree or disagree that condoms make sex less enjoyable? (8) Do you agree or disagree that condoms η1gi = b1 c1gi1 + b2 c1gi2 + γ1 ξ1gi1 + γ2 ξ1gi2 cause a man to lose his erection? (9) Do you agree or disagree + γ12 ξ1gi1 ξ1gi2 + δ1gi . (24) that condoms cause pain or discomfort? The sample size is 618, with individuals in the establishments varying from 1 to The unknown parameters in Φ1 and Ψ1 are {φ1,11 , φ1,12 , φ1,22 } 55. The frequencies of “0” of the dichotomous variables are and {ψ 14 , . . . , ψ 19 }, respectively. 630 Biometrics, September 2004 (a) 15 10 5 0 20 40 60 80 100 120 140 160 180 200 MCEM Iteration (b) 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 0 20 40 60 80 100 MCEM Iteration Figure 1. (a) Plots of the observed-data log-likelihood ratios computed via bridge sampling. (b) Average estimates of µ1 , a25 , λ1,21 , λ2,21 , b1 , b2 , γ 1 , γ 2 , γ 12 , φ1,12 , φ2,23 , ψ 17 , ψ 1δ , and α12 in M0 . (c) Chains of the ﬁrst 800 iterations of η 1g1 , ξ 1gi 1 , and ξ 1gi 2 at the 165th MCEM iteration. Latent Variable Model with Hierarchically Mixed Data 631 (c) 2 1 0 −1 −2 0 100 200 300 400 500 600 700 800 4 2 0 −2 0 100 200 300 400 500 600 700 800 2 1 0 −1 −2 0 100 200 300 400 500 600 700 800 Hybrid Algorithm Iteration Figure 1. (Continued ) Table 1 ML estimates and standard errors of parameters: AIDS data example First level Second level Par MLE SE Par MLE SE Par MLE SE Par MLE SE λ1,21 0.472 0.038 φ1,11 0.522 0.013 λ2,11 1.072 0.024 φ2,12 0.286 0.077 λ1,31 0.528 0.015 φ1,12 −0.053 0.018 λ2,21 0.495 0.009 φ2,13 0.386 0.064 λ1,52 0.379 0.086 φ1,22 0.095 0.012 λ2,31 0.609 0.041 φ2,23 0.518 0.049 λ1,62 0.277 0.064 µ1 0.478 0.042 λ2,42 0.226 0.024 a21 0.231 0.108 λ1,83 2.135 0.188 µ2 −0.154 0.014 λ2,52 0.189 0.119 a22 0.415 0.066 λ1,93 2.039 0.054 µ3 −0.103 0.015 λ2,62 0.549 0.006 a23 0.459 0.028 γ1 0.086 0.077 α12 −1.007 λ2,73 0.433 0.026 a24 −0.304 0.050 γ2 0.406 0.202 α13 −0.621 λ2,83 0.253 0.009 a25 0.789 0.247 γ 12 −0.430 0.223 α22 −0.190 λ2,93 0.296 0.065 a26 −0.033 0.116 ψ 14 0.392 0.026 α23 0.220 ψ 21 0.005 0.001 a27 0.059 0.071 ψ 15 0.567 0.057 α32 −0.969 ψ 22 0.002 0.001 a28 −0.063 0.048 ψ 16 0.729 0.056 α33 −0.567 ψ 23 0.005 0.003 a29 −0.189 0.132 ψ 17 0.740 0.057 α42 0.527 ψ 24 0.011 0.007 ψ 18 0.520 0.076 α43 0.749 ψ 25 0.384 0.117 ψ 19 0.499 0.048 α52 −0.007 ψ 26 0.008 0.005 ψ 1δ 0.839 0.034 α53 0.246 ψ 27 0.012 0.008 b1 −0.041 0.041 α62 0.109 ψ 28 0.010 0.003 b2 0.197 0.062 α63 0.254 ψ 29 0.042 0.019 632 Biometrics, September 2004 (a) 5 4 3 2 1 Residual 0 −1 −2 −3 −4 −5 0 100 200 300 400 500 600 Case Number (b) 1 0.8 0.6 0.4 0.2 Residual 0 −0.2 −0.4 −0.6 −0.8 −1 0 100 200 300 400 500 600 Case Number Figure 2. ˆ ˆ ˆ (a) Plots of residuals δ1gi . (b) Plots of residuals ε1gi (1). (c) Plots of residuals ε2g (1). Latent Variable Model with Hierarchically Mixed Data 633 (c) 1 0.8 0.6 0.4 0.2 Residual 0 −0.2 −0.4 −0.6 −0.8 −1 0 10 20 30 40 50 60 70 80 90 Group Number Figure 2. (Continued ) The model M0 deﬁned by (23) and (24) is compared with Figure 1a and 1b. We observe that the plots are stabilized at the following models: around 100 MCEM iterations. Based on the above stopping rule, the algorithm stops at the 165th MCEM iteration; and M1 : ugi = vg + µ + Λ1 ω 1gi + ε1gi , ¯ (165) is taken as the ML estimate. We detect that the chains θ vg = A2 cvg + Λ2 ω 2g + ε2g , which are generated by the hybrid algorithm at each E-step mix quickly at around 100 iterations. To reveal the behavior, η1gi = b1 c1gi1 + b2 c1gi2 + γ1 ξ1gi1 + γ2 ξ1gi2 + δ1gi , the ﬁrst 800 iterations in the chains of η 1g1 , ξ 1g11 , and ξ 1g12 M2 : ugi = vg + µ + Λ1 ω 1gi + ε1gi , at the 165th iteration are presented in Figure 1c. In the path sampling procedure, we take K = 20 and J = 2000. The es- vg = A2 cvg + Λ2 ω 2g + ε2g , timated observed-data log likelihoods that correspond to M0 , η1gi = γ1 ξ1gi1 + γ2 ξ1gi2 + γ12 ξ1gi1 ξ1gi2 + δ1gi , M1 , M2 , and M3 are equal to −6172.77, −6185.07, −6196.76, and −6211.29, respectively. It follows from (22) that BIC10 = M3 : ugi = vg + µ + Λ1 ω 1gi + ε1gi , 18.18, BIC20 = 35.13, and BIC30 = 19.20. According to the vg = Λ2 ω 2g + ε2g , interpretation of BIC, these results indicate that M0 is ob- viously better than M1 , M2 , and M3 . The ML estimates and η1gi = b1 c1gi1 + b2 c1gi2 + γ1 ξ1gi1 + γ2 ξ1gi2 their standard error estimates (SEs, which are computed via + γ12 ξ1gi1 ξ1gi2 + δ1gi . 5000 simulated observations) of the selected model M0 are presented in Table 1. Note that as the complete-data log like- In the MCEM algorithm for computing the ML estimates un- lihood is not diﬀerentiable with respect to the thresholds, for der the competing models, we generate T = 50 + 15l obser- simplicity only the SEs of other parameters are presented. vations to approximate the conditional expectations at the Inspired by Guo and Thompson (1994), we use the last 150 E-step of the lth iteration. Convergence can be monitored by MCEM iterations to provide estimates of the Monte Carlo ratios of consecutive observed-data likelihoods computed via (MC) standard errors in the MCEM process. Using these it- ¯ (l) bridge sampling. To crossvalidate the results, θ is computed erations, blocked into 15 batches of size 10, standard errors via (12) with m = 50, and convergence can also be monitored for the average of these 150 values of estimates in θ are ob- via the method of Shi and Copas (2002) by using (13) with tained. We observe that these MC standard errors are very γ 0 = 3, δ 1 = δ 2 = 0.001. Convergence is claimed if the stopping small. For example, the corresponding MC standard errors rule is satisﬁed for ﬁve consecutive iterations. To reveal con- of γ 1 , γ 2 , and γ 12 are 0.012, 0.0005, and 0.002, respectively. vergence in analyzing M0 , plots of likelihood ratios and some ¯ ¯ We also compute the MC standard errors of U(1) , U(10) , and ¯ arbitrary elements of θ (l) against iterations are displayed in U¯(20) in estimating λ10 (see equation [20]) based on 20 batches 634 Biometrics, September 2004 each with a subsample size of 100. The MC standard errors ates is proposed for the CSW level, which involves a nonlinear are equal to 0.704, 0.678, and 0.733, respectively. Compared to structural equation for accounting for the causal eﬀects of their means (171.91, 172.03, and 26.46), these errors are small. the latent variables ξ 1gi 1 and ξ 1gi 2 and the ﬁxed covariates To address the sensitivity of T and the number of MCEM it- c1gi 1 and c1gi 2 on η. Based on the nonoverlapping structures erations to the choices, ML estimates are obtained by using of Λ1 and Λ2 , and the meanings of the questions associated ¯ (215) and by using a ﬁxed T = 200 with estimates taken at θ with the indicators, the latent variables in ω 1gi and ω 2g can the 200th iteration. Almost all absolute diﬀerences of the vari- be roughly interpreted as “usage of condom,” “worry about ous estimates are less than 0.05. Moreover, the observed-data AIDS,” and “attitude toward condom use” of the CSWs in log likelihoods that correspond to M0 , M1 , M2 , and M3 are relation to their individual level and to the policies of the computed with K = 30. The corresponding values are equal ˆ ˆ ˆ establishments. From φ1,11 , φ1,12 , and φ1,22 in Table 1, the cor- to −6176.75, −6193.97, −6202.65, and −6217.35, respectively. relation estimate of “worry about AIDS” and “attitude to- Finally, the observed-data log likelihoods are also computed ward condom use” at the within-groups level is −0.238. It is with K = 20 and J = 3000 with 1000 burn-in iterations. interesting to note that this estimate is completely diﬀerent The corresponding values are −6171.72, −6186.09, −6197.28, from the corresponding correlation estimate at the between- and −6207.16, respectively. Hence, we obtain close estimates ˆ groups level (φ2,23 = 0.518). This may indicate a diﬀerence of observed-data log likelihoods under diﬀerent settings, and between these latent variables at the CSW and establishment the same conclusion as before. levels. From the estimates (and SEs) of the elements in A2 , A standard approach to check the ﬁt of the selected model we know that the availability of condoms in establishments is to evaluate GOF = (ˆk − gk )2 /ˆk , where the summa- e e has a signiﬁcant eﬀect on most indicators for CSWs’ “us- tion is taken over the number of distinct response patterns age of condom” and “worry about AIDS.” This may indicate n0 (≤min{618, 23 × 56 } in this example), gk is the observed that an establishment’s policy has some inﬂuence on promot- frequency, and ek is probability of the response pattern k. ˆ ing safer sexual practice and attitude toward AIDS. From Theoretically, path sampling can be applied to compute ek ˆ the magnitudes of ˆ1 (=−0.041), ˆ2 (=0.197), and γ1 (=0.086), b b ˆ by treating it as an individual likelihood. However, as n 0 is it seems that “usage of condom” is not greatly aﬀected by usually very large (483 in this example), it is impractical CSWs’ knowledge and “worry about AIDS.” Rather, as we to evaluate GOF. Moreover, for the current complex model ˆ can see from γ2 (=0.406), they tend to use condoms more if and data structure, the sampling distribution of GOF cannot they feel condoms have no bad eﬀects on their sexual activi- be routinely derived. Inspired by model checking techniques ˆ ties. From γ12 (=−0.430), “worry about AIDS” and “attitude in regression, we use the following estimated residuals to toward condom use” have a negative interaction on “usage of reveal the goodness-of-ﬁt of M0 to the empirical data with condom.” The basic interpretation is that the additive eﬀect the latent structure: δ1gi = η1gi − ˆ1 c1gi1 − ˆ2 c1gi2 − γ1 ξ1gi1 − ˆ ˆ b b ˆ ˆ of the ﬁxed covariates (c1gi 1 and c1gi 2 ) and the linear random ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ γ2 ξ1gi2 − γ12 ξ1gi1 ξ1gi2 , ε2g = vg − A2 cvg − Λ2 ω 2g , ε1gi = ugi − ˆ ˆ ˆ variables (“worry about AIDS,” ξ 1gi 1 ; “attitude toward con- ˆ ˆ ˆ vg − µ − Λ1 ω 1gi , where estimates of the latent vectors (vg , ˆ dom use,” ξ 1gi 2 ) in the structural equation are inadequate to ω 1gi , and ω 2g ) and latent quantities (wgi and ygi ) are account for their relationships with the latent variable “us- obtained from the observations that are simulated by the age of condom, η,” and a negative interaction term of ξ 1gi 1 hybrid algorithm in the E-step of the ﬁnal MCEM iterations. and ξ 1gi 2 has to be added (see equation [24]). Depending on ˆ Plots of δ1gi , ε1gi (1), and ε2g (1) are displayed in Figure 2. ˆ ˆ diﬀerent situations, this negative interaction term has various The residual plots that correspond to other components of impacts. For example, it indicates that less worry about AIDS ˆ ˆ ε1gi and ε2g are similar. These plots lie within two parallel and regarding condoms as having no bad eﬀects on sexual ac- horizontal lines that are centered at zero, and no linear or tivities would have a positive eﬀect on condom use; and more quadratic trends are detected. This roughly indicates that worry about AIDS and regarding condoms as having bad ef- the proposed measurement and structural equations at both fects on sexual activities would also have a positive eﬀect on levels are adequate. condom use. As “worry about AIDS” and “attitude toward Comparing M0 with M1 , we conclude that a two-level model condom use” are negatively correlated, CSWs generally would with an interaction term of the exogenous latent variables in have the above behaviors. To give one more interpretation of the ﬁrst-level structural equation is signiﬁcantly better than a the negative interaction, it also reveals that more worry about simple model with a linear structural equation. By comparing AIDS and regarding condoms having no bad sexual eﬀects M0 with M2 and M3 , we conclude that the model with ﬁxed co- would have a too strong additive eﬀect on condom use, and variates c1gi 1 and c1gi 2 in the ﬁrst-level structural equation and needs to be reduced by a negative interaction eﬀect (however, the ﬁxed covariate cvg in the second-level measurement equa- as ξ 1gi 1 and ξ 1gi 2 are negatively correlated, this situation is not tion is signiﬁcantly better. To draw these conclusions, it is common). necessary to develop a comprehensive model that can accom- In conclusion, in order to obtain more meaningful and modate the ﬁxed covariates and nonlinearities of the latent valid statistical inferences of the complicated data set, we ap- variables in the analysis. It is not our intention to claim that ply a two-level SEM which includes a between-group factor M0 is the best model for this example; competing models can analysis model with ﬁxed covariates, and a complex within- be considered by applying the proposed method, if necessary. group SEM that involves a nonlinear structural equation with In the selected model, a factor analysis model with three ﬁxed covariates, linear and interaction terms of latent vari- correlated latent variables and a ﬁxed covariate is proposed for ables. Our program is written in C and is available upon the establishment level. A nonlinear SEM with ﬁxed covari- request. Latent Variable Model with Hierarchically Mixed Data 635 6. Discussion e e fets de covariables ﬁx´es dans les ´quations de mesure et Despite its generality, the proposed two-level SEM is deﬁned e les ´quations structurelles aux deux niveaux. De plus la by measurement and structural equations that describe the e e e a e e m´thodologie peut ˆtre appliqu´e ` des donn´es hi´rarchiques e mixtes continues, binaires et polytomiques. On impl´mente relationships among the observed and latent variables at both un algorithme EM Monte Carlo pour obtenir l’estimateur levels by conceptually simple regression models. Compared to e du maximum de vraisemblance. L’´tape E est accomplie par the other approach that links latent variables with a general- e l’approximation des esp´rances conditionnelles au travers des ized linear model, the measurement and structural equations e e e observations simul´es par m´thode MCMC, tandis que l’´tape are directly connected with the Gaussian distribution, and M est accomplie par une maximisation conditionnelle. On hence lead to a more straightforward interpretation of the e propose une m´thode de calcul de la log-vraisemblance des parameters. e e e donn´es observ´es, et du crit`re BIC pour la comparaison de Consider the following four major components of the pro- e e e e mod`les. Les m´thodes sont illustr´es par un jeu de donn´es posed model and the data structure: (1) a two-level model e r´elles. for hierarchically structured data, (2) discrete natures of the data, (3) ﬁxed covariates, and (4) nonlinear structural equa- References tions. The ﬁrst two components are important for achieving correct statistical results. The last two components are es- Bentler, P. M. (1995). EQS: Structural Equation Program sential to analyzing more complicated situations. Nonlinear Manual. Los Angeles: BMDP Statistical Software. terms of latent variables have been found to be useful in busi- Bentler, P. M. and Stein, J. A. (1992). Structural equation ness, education, and the social sciences. As models in biolog- models in medical research. Statistical Methods in Medical ical and medical research are more subtle, we expect nonlin- Research 1, 159–181. ear terms should be useful as well. We admit that in practice Bock, R. D. and Aitkin, M. (1981). Marginal maximum like- situations might not often arise that require all of the above lihood estimation of item parameters: Application of an components to ﬁt the data. Still, our development is useful for EM algorithm. Psychometrika 46, 443–461. providing a general framework for analyzing the large num- Bollen, K. A. (1989). Structural Equation with Latent Vari- bers of its submodels. This is particularly true from a model ables. New York: Wiley. comparison perspective. For example, even if a linear model Booth, J. G. and Hobert, J. P. (1999). Maximum generalized is better than a nonlinear model in ﬁtting a data set, such a linear model likelihoods with an automated Monte Carlo conclusion cannot be reached without the model comparison EM algorithm. Journal of the Royal Statistical Society, statistic under the more general nonlinear model framework. Series B 61, 265–285. We use BIC because it is more common, and it tends to favor Chib, S. and Greenberg, E. (1998). Analysis of multivariate simpler models than those chosen by the Akaike information probit models. Biometrika 85, 347–361. criterion (AIC; see Kass and Raftery, 1995). Christoﬀerson, A. (1975). Factor analysis for dichotomized Similar to most complex statistical models, the proposed variables. Psychometrika 37, 29–51. model is developed under normality assumptions for latent Dolan, C. V., Molenaar, P. C. M., and Boomsma, D. I. (1991). variables. In light of the fact that these assumptions may not Simultaneous genetic analysis of longitudinal means and always be true in practice, the development of techniques that covariance structure in the simplex model using twin are less reliant on these assumptions, such as that given in data. Behavior Genetics 21, 49–65. Wall and Amemiya (2000), would be useful. Another limita- Dunson, D. B. (2000). Bayesian latent variable models for tion of our model is the diﬃculty in developing computation- clustered mixed outcomes. Journal of the Royal Statistical ally feasible statistics for rigorously assessing its goodness-of- Society, Series B 62, 355–366. ﬁt. Residual analyses are presented in this article, but more Gelman, A. and Meng, X. L. (1998). Simulating normalizing research in this direction is necessary. constant: From importance sampling to bridge sampling to path sampling. Statistical Science 13, 163–185. Geman, S. and Geman, D. (1984). Stochastic relaxation, Acknowledgements Gibbs distributions, and the Bayesian restoration of im- ages. IEEE Transactions on Pattern Analysis and Ma- This research was fully supported by a grant from the chine Intelligence 6, 721–741. Research Grants Council of the Hong Kong Special Admin- Guo, S. W. and Thompson, E. A. (1994). Monte Carlo es- istrative Region, China (project CUHK 4243/02H). We are timation of mixed models for large complex pedigrees. thankful to the editor, associate editor, and two reviewers for Biometrics 50, 417–432. valuable comments that improved the article substantially, Hastings, W. K. (1970). Monte Carlo sampling methods using and to D. E. Morisky and J. A. Stein for the use of their Markov chains and their application. Biometrika 57, 97– AIDS data set. 109. o o J¨reskog, K. G. and S¨rbom, D. (1996). LISREL 8: Structural ´ ´ Resume Equation Modeling with the SIMPLIS Command Lan- On d´veloppe un mod`le g´n´ral ` variables latentes ` deux e e e e a a guage. Hove, London: Scientiﬁc Software International. niveaux pour donner un cadre coh´rent ` la comparai- e a Kass, R. E. and Raftery, A. E. (1995). Bayes factors. Journal son de sous-mod`les vari´s. On peut ainsi, dans ce cadre, e e of the American Statistical Association 90, 773–795. analyser aussi bien des relations non lin´aires entre vari- e Lee, S. Y. and Shi, J. Q. (2001). Maximum likelihood esti- e ables latentes dans les ´quations structurelles, que les ef- mation of two-level latent variable models with mixed 636 Biometrics, September 2004 continuous and polytomous data. Biometrics 57, 787– Evolutionary Biology. New York: Cambridge University 794. Press. Lee, S. Y. and Song, X. Y. (2003). Bayesian analysis of Rabe-Hesketh, S., Skrondal, A., and Pickles, A. (2003). Gen- structural equation models with dichotomous variables. eralized multilevel structural equation modeling. Psy- Statistics in Medicine 22, 3073–3088. chometrika, in press. Lee, S. Y. and Zhu, H. T. (2000). Statistical analysis of non- Sammel, M. D. and Ryan, L. M. (1996). Latent variables with linear structural equation models with continuous and ﬁxed eﬀects. Biometrics 52, 220–243. polytomous data. British Journal of Mathematical and Sammel, M. D., Ryan, L. M., and Legler, J. M. (1997). La- Statistical Psychology 53, 209–232. tent variable models for mixed discrete and continuous Lee, S. Y. and Zhu, H. T. (2002). Maximum likelihood es- outcomes. Journal of the Royal Statistical Society, Series timation of nonlinear structural equation models. Psy- B 59, 667–678. chometrika 67, 189–210. Schumacker, R. E. and Marcoulides, G. A. (1998). Interac- Louis, T. A. (1982). Finding the observed information matrix tion and Nonlinear Eﬀects in Structural Equation Models. when using EM algorithm. Journal of the Royal Statistical Mahwah, NJ: Lawrence Erlbaum Associates. Society, Series B 44, 226–233. Shi, J. Q. and Copas, J. (2002). Publication bias and meta- Meng, X. L. and Rubin, D. B. (1993). Maximum likelihood analysis for 2 × 2 tables: An average Markov chain Monte estimation via the ECM algorithm: A general framework. Carlo EM algorithm. Journal of the Royal Statistical So- Biometrika 80, 267–278. ciety, Series B 64, 221–236. Meng, X. L. and Schilling, S. (1996). Fitting full-information Shi, J. Q. and Lee, S. Y. (2000). Latent variable mod- item factor models and an empirical investigation of els with mixed continuous and polytomous data. Jour- bridge sampling. Journal of American Statistical Asso- nal of the Royal Statistical Society, Series B 62, 77– ciation 91, 1254–1267. 87. Meng, X. L. and Wong, H. W. (1996). Simulating ratios of Wall, M. M. and Amemiya, Y. (2000). Estimation for polyno- normalizing constants via a simple identity: A theoretical mial structural equation models. Journal of the American exploration. Statistica Sinica 6, 831–360. Statistical Association 95, 929–940. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Wang, C., Douglas, J., and Anderson, S. (2002). Item response Teller, A. H., and Teller, E. (1953). Equations of state models for joint analysis of quality of life and survival. calculations by fast computing machines. Journal of Statistics in Medicine 21, 129–142. Chemical Physics 21, 1087–1092. Wei, G. C. G. and Tanner, M. A. (1990). A Monte Carlo Morisky, D. E., Tiglao, T. V., Sneed, C. D., Tempongko, implementation of the EM algorithm and the poor man’s S. B., Baltazar, J. C., Detels, R., and Stein, J. A. (1998). data augmentation algorithm. Journal of the American The eﬀects of establishment practices, knowledge and at- Statistical Association 85, 699–704. titudes on condom use among Filipino sex workers. AIDS Care 10, 213–220. Pugesek, B. H., Tomer, A., and von Eye, A. (2003). Struc- Received April 2003. Revised December 2003. tural Equation Modeling Applications in Ecological and Accepted January 2004. Appendix Notation Observed continuous obs. Xg = (xg1 , . . . , xgNg ) X = (X1 , . . . , XG ) Observed dichotomous obs. Dg = (dg1 , . . . , dgNg ) D = (D1 , . . . , DG ) Observed polytomous obs. Zg = (zg1 , . . . , zgNg ) Z = (Z1 , . . . , ZG ) Unobserved continuous Wg = (wg1 , . . . , wgNg ) W = (W1 , . . . , WG ) measurements Yg = (yg1 , . . . , ygNg ) Y = (Y1 , . . . , YG ) First level Second level Latent variables ω 1gi = (η 1gi , ξ 1gi ) ω 2g = (η 2g , ξ 2g ) Ω1g = (ω 1g1 , . . . , ω 1gNg ) Ω2 = (ω 21 , . . . , ω 2G ) Ω1 = (Ω11 , . . . , Ω1G ) Measurement errors ε1gi , δ 1gi ε2g , δ 2g Fixed covariates cugi , c1gi cvg , c2g Regression coeﬃcients A1 , Λ1 , B1 , Π1 , Γ1 A2 , Λ2 , B2 , Π2 , Γ2 Covariance matrices Φ1 , Ψ1 , Ψ1δ Φ2 , Ψ2 , Ψ2δ