Maximum Likelihood Analysis of a General Latent Variable Model

Document Sample
Maximum Likelihood Analysis of a General Latent Variable Model Powered By Docstoc
					Biometrics 60, 624–636
September 2004




        Maximum Likelihood Analysis of a General Latent Variable
               Model with Hierarchically Mixed Data

                                      Sik-Yum Lee1,∗ and Xin-Yuan Song1,2∗∗
                    1
                        Department of Statistics, The Chinese University of Hong Kong, Hong Kong
                                      2
                                        Sun Yat-Sen University, Guangzhou, China
                                           ∗
                                             email: sylee@sparc2.sta.cuhk.edu.hk
                                         ∗∗
                                            email: xysong@u4000.sta.cuhk.edu.hk


         Summary. A general two-level latent variable model is developed to provide a comprehensive framework
         for model comparison of various submodels. Nonlinear relationships among the latent variables in the struc-
         tural equations at both levels, as well as the effects of fixed covariates in the measurement and structural
         equations at both levels, can be analyzed within the framework. Moreover, the methodology can be applied
         to hierarchically mixed continuous, dichotomous, and polytomous data. A Monte Carlo EM algorithm is
         implemented to produce the maximum likelihood estimate. The E-step is completed by approximating the
         conditional expectations through observations that are simulated by Markov chain Monte Carlo methods,
         while the M-step is completed by conditional maximization. A procedure is proposed for computing the
         complicated observed-data log likelihood and the BIC for model comparison. The methods are illustrated
         by using a real data set.
         Key words: Covariates; Dichotomous and polytomous data; MCEM algorithm; Model comparison;
         Nonlinear structural equations; Path sampling; Two-level latent variable model.



1. Introduction                                                     As the previously mentioned software packages are devel-
Latent variable methods are widely appreciated tools in med-     oped on the basis of standard models and data structures,
ical and psychosocial research. The development and analysis     they cannot be applied to many complex real situations.
of latent variable models for investigating relationships be-    Hence, generalizations in different directions have recently
tween observed variables and latent variables have received      been developed. Sammel and Ryan (1996) proposed an ex-
a great deal of attention. One approach (see Sammel, Ryan,       ploratory factor analysis (EFA) model with fixed covariates
and Legler, 1997; Dunson, 2000; among others) focused on         and continuous data. Early work on EFA with dichotomous
analyzing the contribution of the latent variables to the mean   data falls into two directions: those which estimate the corre-
of the observed variables, as well as the effects of fixed co-     lations from the dichotomous data (e.g., Christofferson, 1975)
variates on the observed and/or latent variables, based on an    and those which fit latent probit models (e.g., Bock and
exponential family of distributions. Structural equation mod-    Aitkin, 1981). Recently, Meng and Schilling (1996) developed
els (SEMs; Bollen, 1989; among others) focused on exploring      a Monte Carlo EM (MCEM; Wei and Tanner, 1990) algo-
latent factors that cause the behavior of observed variables,    rithm for this model. Shi and Lee (2000) used a similar al-
and investigated the relationships of latent variables among     gorithm for a confirmatory factor analysis (CFA) model with
themselves and with the observed variables. At present, there    mixed continuous and polytomous data. Recently, much at-
are more than a dozen standard packages for fitting SEMs,         tention has been devoted to nonlinear SEMs (see Schumacker
such as LISREL VIII (J¨reskog and S¨rbom, 1996) and EQS
                         o             o                         and Marcoulides, 1998; Wall and Amemiya, 2000; Lee and
(Bentler, 1995). SEMs have been widely used in medical re-       Zhu, 2002; and references therein). For hierarchically struc-
search though not to the extent that they have been used in      tured data that violate the assumption of independence, Lee
social and psychological research. In addition to those cited    and Shi (2001) utilized MCEM to fit factor analysis models
by Bentler and Stein (1992), SEMs have recently been ap-         with mixed continuous and polytomous data. Rabe-Hesketh,
plied to investigate the behavior of twins in genetics (Dolan,   Skrondal, and Pickles (2003) provided a multilevel SEM with
Molenaar, and Boomsma, 1991) to analyze quality-of-life data     software, GLLAM, that is more general in several ways than
(Wang et al., 2002) and to analyze ecological and evolution-     our proposed model; however, they did not consider nonlin-
ary biology data (see Pugesek, Tomer, and von Eye, 2003, and     ear variable terms. The above-mentioned contributions em-
references therein among others). In this article, we concen-    phasize estimation; overall few model comparison results have
trate on the maximum likelihood (ML) analysis of a general       been published. The main reason for this is probably the diffi-
SEM.                                                             culties in evaluating the intractable observed-data likelihood.


                                                             624
                                       Latent Variable Model with Hierarchically Mixed Data                                                          625


   In this article we investigate a general two-level SEM that              are the observed continuous random vectors, while wgi =
can accommodate the nonlinearities of the latent variables in               (wgi1 , . . . , wgis ) and ygi = (ygi1 , . . . , ygit ) are unobservable
the structural equations at both levels of the model, the ef-               continuous random vectors such that wgi and ygi correspond
fects of fixed covariates on the measurement and structural                  to the dichotomous and polytomous observations, respec-
equations at both levels, and hierarchical, mixed continuous,               tively. The observable dichotomous vector d = (d1 , . . . , ds )
dichotomous, and polytomous data. In addition to the ML es-                 is related to its underlying vector w = (w1 , . . . , ws ) by
timation, we propose an approach for computing the compli-
cated observed-data log likelihood of the posited SEM. Conse-                    dj = 1     if wj > 0;     and 0 otherwise,          j = 1, . . . , s. (1)
quently, statistics such as the likelihood ratio or the Bayesian
information criterion (BIC) can be evaluated for model com-
parison under a comprehensive framework.                                    The observable polytomous vector z = (z 1 , . . . , zt ) is related
   The present work is motivated by a study (Morisky et al.,                to y = (y 1 , . . . , yt ) by
1998) of the effects of establishments’ policies, knowledge,
and attitudes on condom use among Filipino commercial sex                     zj = h      if αjh < yj ≤ αjh+1 ,       for h = 0, . . . , kj − 1;
workers (CSWs). The primary concern is finding an AIDS-
                                                                                                                           j = 1, . . . , t,          (2)
preventative intervention for Filipino CSWs. The data set
was collected from female CSWs in 95 establishments (bars,
night clubs, Karaoke TV, and massage parlors) in Philippine                 where {−∞ = αj0 < αj1 < · · · < αjkj = ∞} is the set of thresh-
cities. The questionnaire included a large number of items,                 olds that defines the categories. In the CSW example out-
covering the areas of knowledge, attitudes, beliefs, behavior,              lined in the Introduction, the dichotomous variables, dj , cor-
self-efficacy for condom use, etc. Latent psychological determi-              respond to three (s = 3) yes/no questions regarding condom
nants such as CSWs’ risk behavior, knowledge, and attitudes                 usage while the polytomous variables, zj , relate to six (t = 6)
associated with AIDS and condom use are important issues                    Likert scale questions regarding worry about AIDS and atti-
to be assessed, and will be used for modifying the planned                  tude toward condom use. At the first level, we assume that,
intervention. The basic goal of our analysis is to establish a              conditional on the group mean vg , random observations in
well-fitting SEM to study possible causal effects on the im-                  each group have the following structure:
portant latent variable “usage of condom, η.” In particular,
taking the latent variables “worry about AIDS, ξ 1 ” and “at-                              ugi = vg + A1g cugi + Λ1g ω 1gi + ε1gi ,
titude toward condom use, ξ 2 ” as exogenous variables in the
                                                                                                  g = 1, . . . , G, i = 1, . . . , Ng ,               (3)
structural equation with η as the endogenous variable, our aim
is to investigate whether the incorporation of fixed covariates
and/or nonlinear terms of ξ 1 and ξ 2 in the structural equa-               where cugi is a vector of fixed covariates, A1g is a matrix of
tion will give a better model than that with a linear structural            coefficients, Λ1g is a matrix of factor loadings, ω 1gi is a q 1 × 1
equation without covariates. As emphasized by Morisky et al.                vector of latent variables, and ε1gi is a p × 1 vector of error
(1998), establishments’ policies on condom use exert a strong               measurements that is independent of ω 1gi and is distributed
influence on CSWs, and hence it is necessary to develop a                    as N [0, Ψ1g ], where Ψ1g is a diagonal matrix. At the second
two-level nonlinear SEM with fixed covariates to take into ac-               level, we assume that vg has the structure
count the influence of establishments. Moreover, as manifest
variables are measured in terms of dichotomous and polyto-                             vg = A2 cvg + Λ2 ω 2g + ε2g ,         g = 1, . . . , G,        (4)
mous variables, the discrete nature of the data set cannot be
ignored.
                                                                            where cvg is a vector of fixed covariates, A2 is a matrix of
   The proposed SEM is described in Section 2. Section 3 ad-
                                                                            coefficients, Λ2 is a matrix of factor loadings, ω 2g is a q 2 × 1
dresses the ML estimation with an MCEM algorithm. A pro-
                                                                            vector of latent variables, and ε2g is a p × 1 vector of error
cedure for model comparison is developed in Section 4, based
                                                                            measurements that is independent of ω 2g and is distributed
on the key idea of path sampling (Gelman and Meng, 1998).
                                                                            as N [0, Ψ2 ], where Ψ2 is a diagonal matrix. In the CSW
The newly developed methodology is illustrated with real ex-
                                                                            example, one fixed covariate cvg that relates to the availabil-
amples in Section 5. The article concludes with a summary
                                                                            ity of condoms in the establishment is incorporated in (4)
discussion.
                                                                            (see Section 5). Moreover, the first-level latent vectors are as-
2. A General Structural Equation Model                                      sumed to be independent of the second-level latent vectors.
                                                                            However, as ugi and ugj are correlated, the usual independent
The aim of this section is to introduce a two-level SEM with
                                                                            assumption on observations within the groups is avoided. It
conditions for identification for analyzing hierarchical, mixed
                                                                            follows from (3) and (4) that
continuous, dichotomous, and polytomous data. To provide
a comprehensive framework for model comparison, the pro-
posed model includes the nonlinearities of the latent variables             ugi = A2 cvg + Λ2 ω 2g + ε2g + A1g cugi + Λ1g ω 1gi + ε1gi .              (5)
and fixed covariates at both levels.
   Suppose that there is a collection of p-variate random vec-              The latent vectors ω 2g and ω 1gi are partitioned as ω 1gi =
tors ugi , i = 1, . . . , Ng , within groups g = 1, . . . , G. The sample   (η 1gi , ξ1gi ) and ω 2g = (η 2g , ξ 2g ) , respectively, where
sizes Ng may differ from group to group so that the data set                 η 1gi (q 11 × 1), ξ 1gi (q 12 × 1), η 2g (q 21 × 1), and ξ 2g (q 22 × 1)
could be unbalanced. We suppose without the loss of general-                are latent subvectors. To account for the nonlinearity among
ity that ugi = (xgi , wgi , ygi ) , where xgi = (xgi1 , . . . , xgir )      latent variables and the effects of fixed covariates on the latent
626                                                           Biometrics, September 2004

variables, the following nonlinear structural equations in the                     factor analysis model with continuous and polytomous vari-
between-groups and within-groups models are considered:                            ables (Lee and Shi, 2001), but also a generalization of the
                                                                                   multivariate probit model (e.g., Chib and Greenberg, 1998).
         η 1gi = B1g c1gi + Π1g η 1gi + Γ1g F1 (ξ 1gi ) + δ 1gi ,           (6)
                                                                                      In most substantive applications of two-level models, it is
and                                                                                assumed that for g = 1, . . . , G, A1g = A1 , Λ1g = Λ1 , Ψ1g = Ψ1 ,
                                                                                   B1g = B1 , Π1g = Π1 , Γ1g = Γ1 , Φ1g = Φ1 , and Ψ1δg = Ψ1δ .
          η 2g = B2 c2g + Π2 η 2g + Γ2 F2 (ξ 2g ) + δ 2g ,                  (7)
                                                                                   For brevity, this assumption will be taken in the development
where c1gi and c2g are fixed covariates, δ 1gi and δ 2g are ran-                    of the methods in subsequent sections. However, generaliza-
dom vectors of error measurements, B1g , B2 , Π1g , Π2 , Γ1g ,                     tions to situations without this assumption are straightfor-
and Γ2 are matrices of coefficients, F1 (ξ 1gi ) = (f 11 (ξ 1gi ), . . . ,           ward. A summary of the notation is given in the Appendix.
f 1a (ξ 1gi )) and F2 (ξ 2g ) = (f 21 (ξ 2g ), . . . , f 2b (ξ 2g )) are vector-
valued functions with nonzero differentiable functions f1k and                      3. Maximum Likelihood Estimation
f2k , and usually a ≥ q 12 and b ≥ q 22 . In the CSW example, two                  We will implement an MCEM type algorithm to obtain
fixed covariates (c1gi1 , c1gi2 ) in relation to the knowledge of                   the ML estimates. Let Xg = (xg1 , . . . , xgNg ), and X =
AIDS and AIDS test are incorporated in (6) (see Section 5).                        (X1 , . . . , XG ) be the observed continuous data; let Dg =
We assume that I1 − Π1g and I2 − Π2 are nonsingular, and                           (dg1 , . . . , dgNg ) and D = (D1 , . . . , DG ) be the observed di-
that the determinants of these two matrices are independent                        chotomous data; and let Zg = (zg1 , . . . , zgNg ) and Z =
of Π1g and Π2 , respectively. The latter assumption is true for                    (Z1 , . . . , ZG ) be the observed polytomous data. Let W
all nonrecursive models such that Π1g and/or Π2 are upper                          = (W1 , . . . , WG ) and Y = (Y1 , . . . , YG ) be the matri-
(or lower) triangular matrices. It can be relaxed at the ex-                       ces of latent continuous measurements associated with D
pense of more computational effort in completing the M-step                         and Z, respectively; and let Ω1g = (ω 1g1 , . . . , ω 1gNg ), Ω1 =
of the MCEM algorithm (see Lee and Zhu, 2002). Following                           (Ω11 , . . . , Ω1G ), Ω2 = (ω 21 , . . . , ω 2G ), and V = (v1 , . . . , vg ) be
the usual assumptions in structural equation models, ξ 1gi and                     matrices of latent vectors at the within-groups and between-
δ 1gi are assumed to be independently distributed as N [0, Φ1g ]                   groups levels. Inspired by the key ideas of data augmen-
and N [0, Ψ1δg ], respectively, where Ψ1δg is a diagonal matrix.                   tation and the EM algorithm, these latent quantities are
Similarly, it is assumed that ξ 2g and δ 2g are independently dis-                 augmented to the observed data in the analysis. This aug-
tributed as N [0, Φ2 ] and N [0, Ψ2δ ], respectively, where Ψ2δ                    mentation scheme gives a complete data set (W, Y, V,
is a diagonal matrix. However, due to the nonlinear functions                      Ω1 , Ω2 , X, D, Z) that is easier to deal with. Let α be
in F1 and F2 , the distribution of ugi is not assumed to be                        the parameter vector of unknown thresholds and θ be the
normal.                                                                            parameter vector that contains the unknown distinct ele-
    The proposed model is not identified without imposing fur-                      ments in A1 , Λ1 , Ψ1 , A2 , Λ2 , Ψ2 , B1 , Π1 , Γ1 , Φ1 , Ψ1δ , B2 ,
ther conditions. For example, in measurement equation (1),                         Π2 , Γ2 , Φ2 , and Ψ2 , and Lc (W, Y, V, Ω1 , Ω2 , X, D, Z |
the variance and thresholds associated with each polytomous                        α, θ) be the complete-data log-likelihood function with fixed
variable are not identifiable (Shi and Lee, 2000); the vari-                        covariates. ML estimates of α and θ are obtained via the EM
ance and the corresponding row of Λ1g that are associated                          algorithm, which consists of the following steps at the lth it-
with each dichotomous variable are not identified (Meng and                         eration. E-step: Evaluate Q(α, θ | α(l) , θ (l) ) = E{Lc (W, Y,
Schilling, 1996); and the between-groups and within-groups                         V, Ω1 , Ω2 , X, D, Z | α, θ) | X, D, Z, α(l) , θ (l) }, where the ex-
covariance structures are not identified. In addition, we can-                      pectation is taken with respect to the conditional distribution
not allow intercepts to simultaneously exist in the measure-                       of (W, Y, V, Ω1 , Ω2 ) given (X, D, Z) at (α(l) , θ (l) ). M-step:
ment and structural equations of the first and second levels                        Determine (α(l+1) , θ (l+1) ) by maximizing Q(α, θ | α(l) , θ (l) ).
of the model (see [5]). In general, there are no necessary and                        As Lc decomposes into two separable functions each with
sufficient conditions to guarantee model identification. How-                         distinct sets of parameters, we can write the E-step as
ever, the conditions that are suggested in the literature for                      Q(α, θ | α(l) , θ (l) )
identifying various components of the current model can be
                                                                                                                                                            (l)
adopted. For example, to identify the parameters of a dichoto-                        = E L1 (W, Y, V, Ω1 , X, D, Z | α, θ 1 ) | X, D, Z, α(l) , θ 1
mous variable, the corresponding error variance is fixed at a                                                                               (l)
preassigned value (Meng and Schilling, 1996; Lee and Song,                               + E L2 (V, Ω2 , X, D, Z | θ 2 ) | X, D, Z, θ 2          ≡ Q1 + Q2 ,
2003). To identify the variance and thresholds of each polyto-                                                                                                    (8)
mous variable, αj 1 and αjkj−1 are fixed at appropriate preas-
signed values (Shi and Lee, 2000). To identify the covariance                      where
structures in the first- and second-level models, the common                                     G   Ng
                                                                                            1
practice in structural equation modeling of fixing some ap-                         L1 = −                    (p + q1 ) log(2π) + log |Ψ1 |
propriate elements in Λ1g , Λ2 , Π1g , Γ1g , Π2 , and/or Γ2 at                              2
                                                                                                g=1 i=1
preassigned values is adopted. These are necessary conditions
for identification. Of course, one may use other conditions                                                   + log |Φ1 | − 2 log |I − Π1 | + ξ 1gi Φ−1 ξ 1gi
                                                                                                                                                    1

for identification, though we stress the issue should be ap-                                                  + (ugi − vg − A1 cugi − Λ1 ω 1gi ) Ψ−1
                                                                                                                                                 1
proached on a problem-by-problem basis.
    The current SEM defined in (3)–(7), together with the                                                     × (ugi − vg − A1 cugi − Λ1 ω 1gi )
mixed data structure, provides a comprehensive framework                                                     + [η 1gi − B1 c1gi − Π1 η 1gi − Γ1 F1 (ξ 1gi )] Ψ−1
                                                                                                                                                              1δ
for models with latent variables. It is not only a combination
of the nonlinear SEM (Lee and Zhu, 2002) and the two-level                                                   × [η 1gi − B1 c1gi − Π1 η 1gi − Γ1 F1 (ξ 1gi )]
                                             Latent Variable Model with Hierarchically Mixed Data                                                 627


                G   Ng      s                                                  Then, any αjh that takes a value in the interval [y ∗ jh ,
                                                                                                                                        j,m
           +                     log I(wgij ∈ Aj )                             y ∗ jh + 1) satisfies (11). A natural choice is the midpoint:
                                                                                 j,m
                                                                                        ∗          ∗                                       (l )
               g=1 i=1     j=1                                                                                                            ˜ 1
                                                                               αjh = (yj,mjh +1 + yj,mjh )/2. In the M-step, we evaluate αjh
                                                                               ˜
                                                                                                                              (1)          (T )
                                 t                                             for every Y(l1 ) , and αjh is updated by (αjh + · · · + αjh )/T
                                                                                                                                  ˜        ˜
                          +           log I[ygij ∈ (αjzj , αjzj +1 )]          using draws {Y(l1 ) , l1 = 1, . . . , T } obtained from the E-step.
                                j=1                                               Next, we have to solve the following system of equations:
        ≡ L11 + L12 ,                                                    (9)   ∂Q(α, θ | α(l) , θ (l) )/∂θ = 0. As Lc can be decomposed into
                                                                               L1 and L2 with separable parameter vectors θ 1 and θ 2 , this
                                                                               system of equations simplifies to the following two equations,
               G                                                               ∂Q1 /∂θ 1 = 0 and ∂Q2 /∂θ 2 = 0, which can be solved sep-
          1
L2 = −              (p + q2 ) log(2π) + log |Ψ2 |                              arately. Due to the nonlinear structural equations at both
          2
              g=1                                                              levels, they cannot be solved in closed form as in Shi and Lee
                                                                               (2000) or Lee and Shi (2001). As there are many parame-
                    + log |Φ2 | − 2 log |I − Π2 | + ξ 2g Φ−1 ξ 2g
                                                          2
                                                                               ters involved in Q1 as well as in Q2 , the Newton–Raphson or
                    + (vg − A2 cvg − Λ2 ω 2g ) Ψ−1
                                                2                              scoring algorithms are complicated to implement. However, a
                                                                               closed form solution can be achieved with little computational
                    × (vg − A2 cvg − Λ2 ω 2g )                                 effort by the method of conditional maximizations (Meng and
                    + [η 2g − B2 c2g − Π2 η 2g − Γ2 F2 (ξ 2g )] Ψ−1
                                                                 2δ
                                                                               Rubin, 1993). The key idea is to solve each of ∂Qi /∂θ i = 0, i =
                                                                               1, 2 conditional on other parameters; and then the conditional
                    × [η 2g − B2 c2g − Π2 η 2g − Γ2 F2 (ξ 2g )] ,       (10)   expectations involved are approximated by the corresponding
                                                                               sample means of the generated observations that are obtained
in which I(y ∈ A) is an indicator function that takes the value                from the E-step. The related technical details are presented
of one if y fall in A, and zero otherwise; Aj is an appropriate                in part (III) of the website cited above.
s-dimensional cell with its jth side of the form (−∞, 0) or                       The selection of the sample size, say T, in the Monte Carlo
[0, ∞]; θ 1 and θ 2 are the parameter vectors that contain                     E-step is an issue in the MCEM algorithm. To decrease the
the structural parameters at the within-groups and between-                    Monte Carlo error at the E-step, T should be large. As it is
groups models, respectively. The conditional expectations
                                                                                                                             ˆ (l)
                                                                               inefficient to start with a large T when θ is far from the ML
are approximated by a sufficiently large number of observa-
tions simulated from the corresponding conditional distribu-                   estimate, it has been suggested that T should be increased
tions via a hybrid algorithm that combines the Gibbs sam-                      from one iteration to the next (Wei and Tanner, 1990; Booth
pler (Geman and Geman, 1984) and the Metropolis–Hastings                       and Hobert, 1999). One method is to take T = T 1 + T 2 l,
(MH) algorithm (Metropolis et al., 1953; Hastings, 1970).                      for some positive integers T1 and T2 . As T1 is not related to
Here, observations are sampled interactively from the follow-                  l, its size does not have a significant impact. The choice of
ing conditional distributions: p(W | Y, V, Ω1 , Ω2 , X, D, Z,                  T2 depends on the complexity of the underlying problem, and
α, θ), p(Y | W, V, Ω1 , Ω2 , X, D, Z, α, θ), p(V | W, Y, Ω1 ,                  should be approached on a problem-by-problem basis. For the
Ω2 , X, D, Z, α, θ), p(Ω1 | W, Y, V, Ω2 , X, D, Z, α, θ), and                  rather complicated SEM that is analyzed in Section 5, we find
p(Ω2 | W, Y, V, Ω1 , X, D, Z, α, θ). These conditional dis-                    that the choice of T 2 = 15 is acceptable. Convergence can ei-
tributions and the implementation of the MH algorithm for                      ther be monitored by ratios of the observed-data likelihood
simulating observations are given in part (II) of the website:                 values at consecutive iterations (Meng and Schilling, 1996),
http://www.sta.cuhk.edu.hk/sylee/MLLVM-HMD.                                    computed via bridge sampling (Meng and Wong, 1996), or
   The conditional distributions p(W|·) and p(Y|·) are pro-                    computed by absolute or relative error of the parameter es-
portional to univariate truncated normal distributions, and                    timates. Alternatively, Shi and Copas (2002) proposed the
it is much easier and faster to draw observations from them                    following scheme without iteratively increasing T. After the
than from the multivariate truncated normal distributions.                     mth iteration, they computed
   The M-step updates unknown parameters by maximizing                                               1
the conditional expectations obtained in the E-step. From L1 ,                                 ¯ (l)
                                                                                               θ = (θ (l−m+1) + · · · + θ (l) ),              (12)
                                                                                                     m
we see that the thresholds which correspond to the polyto-
mous variables and structural parameters in θ 1 are separable.                 and monitored convergence via the stopping rule: For given
Hence, thresholds can be updated by maximizing the condi-                      small values δ 1 and δ 2 (e.g., 0.001), the procedure is stopped
tional expectation of L12 . Following the idea of Shi and Lee                  if
(2000), or Lee and Shi (2001), for all g, i, and j, we find thresh-
olds that satisfy                                                                                        ¯ (l) ¯ (l−γ0 )
                                                                                                         θ −θ
                                                                                                                                              (13)
                                                                                                          ¯ (l−γ0 ) + δ1
                                                                                                          θ
                     I[ygij ∈ (αjzj , αjzj +1 ]] = 1.                   (11)

For the jth component of ygi , let {y ∗ : n = 1, . . . , N } be the
                                             j,n
                                                                               is smaller than some predetermined small value δ 2 . To avoid
order statistics of {ygij : g = 1, . . . , G; i = 1, . . . , Ng }, where       the danger of stopping early, a value of γ 0 larger than 2 is
N = N 1 + . . . , NG . Moreover, let mjh be the total number of                suggested. Convergence is claimed after the stopping rule is
{zgij < h, g = 1, . . . , G; i = 1, . . . , Ng } for h = 0, . . . , kj − 1.                                                 ¯ (l)
                                                                               satisfied for several consecutive iterations; θ is then taken
628                                                         Biometrics, September 2004

to be the ML estimate. Shi and Copas (2002) argued that                       above integral over t. Specifically, we select fixed grids {t(k) ;
for a sufficiently large m, the average of the Monte Carlo er-                  k = 0, . . . , K} such that t(0) = 0 < t(1) < · · · < t(K+1) = 1, and
rors in (12) is negligible. To reduce the bias, one should take               compute λ by
an appropriate m which can control the Monte Carlo errors                                           K
within a bearable limit. In the example given in Section 5,                                     1                          ¯        ¯
                                                                                           λ=             (t(k+1) − t(k) )(U(k+1) + U(k) ),   (17)
we take m = 50. One can use the method in Shi and Copas                                         2
                                                                                                    k=1
(2002), or take T = T 1 + T 2 l, or use a combination of both
                                                                                                     J
                                                                              where U(k) = J −1 j=1 U (Hobs , Hmis , t(k) , β), in which {Hmis ,
                                                                                       ¯                                (j)                    (j)
as in our example. Finally, standard error estimates are pro-
duced by approximating the Louis (1982) formulae at the ML                    j = 1, . . . , J} are simulated observations from p(Hmis | Hobs ,
estimates via a sufficiently large number of observations that                  t(k) , β). It follows from (16) that log z(1) = λ + log z(0).
are obtained at the E-step of the final iteration, together with               Hence, if we can link M1 with an M0 , chosen to have an
newly generated observations, if necessary.                                   easily calculated observed-data likelihood, we can obtain the
                                                                              observed-data likelihood under M1 .
4. Model Comparison                                                              Importance sampling and bridge sampling (see Meng and
The main goal of this section is to establish a method for                    Wong, 1996) are alternatives for computing normalizing con-
computing the observed-data likelihoods associated with spe-                  stants. Importance sampling approximates [z(1)/z(0)] directly
cific models of interest, or their ratios, so that the BIC can                 by two points, namely 0 and 1. Bridge sampling utilizes a sen-
be evaluated for model comparison (or hypothesis testing).                    sible choice of a bridge density at a point t ∗ in [0, 1]. It first
   Our development utilizes the key idea of path sampling                     estimates [z(t ∗ )/z(0)] and [z(t ∗ )/z(1)], then takes the ratio to
(Gelmen and Meng, 1998) for computation (ratios) of nor-                      cancel z(t ∗ ). Hence, its performance is expected to be better
malizing constants of probability models. Using more generic                  than importance sampling (Meng and Wong, 1996). Gelman
terms, let Hobs be the observed data and Hmis be the hypo-                    and Meng (1998) argued that path sampling may be better
thetical missing data that contain the latent quantities. Let                 than bridge sampling because it is an extension of the bridge
M1 and M0 be competitive models, such that M 0 ⊂ M 1 , and                    sampling from a bridge to a theoretically infinite number of
let β be the parameter vector of M1 . As p(Hmis | Hobs , β) =                 bridges. Another key advantage is that it is on logarithm scale,
p(Hmis , Hobs | β)/p(Hobs | β), the observed-data likelihood can              which is generally more stable.
be treated as the normalizing constant of p(Hmis | Hobs , β)                     To apply this method to the current model, we let β =
with the complete-data likelihood as the unnormalized den-                    (α, θ), Hobs = (X, D, Z), and Hmis = (W, Y, V, Ω1 , Ω2 ).
sity. Consider the following class of densities with a continuous             Hence, observations that are simulated in the E-step can be
parameter t in [0,1]:                                                                                 ¯
                                                                              used for computing Uk in (17). We can compare a sequence
                                                                              of hierarchically nested models or nonnested models. To give
                                                                              a more specific illustration, we apply the method to find the
              z(t) =     p(Hobs , Hmis | t, β) dHmis ,                 (14)   observed-data likelihood of the following model:
                                                                              M2 : ugi = µ + A2 cvg + Λ2 ω 2g + ε2g + A1 cugi + Λ1 ω 1gi + ε1gi
where p(Hobs , Hmis | t, β) is a density function for each
t, p(Hobs , Hmis | a, β) = p(Hobs , Hmis | Ma , β) with p(Hobs ,                   η 1gi = B1 c1gi + Π1 η 1gi + Γ1 F1 (ξ 1gi ) + δ 1gi        (18)
Hmis | Ma , β) denotes the complete-data likelihood under
model Ma , a = 0, 1. We have,                                                       η 2g = B2 c2g + Π2 η 2g + Γ2 F2 (ξ 2g ) + δ 2g .          (19)
                                                                              To facilitate the computation, we make use of the following
             z(a) =     p(Hobs , Hmis | Ma , β)dHmis                          intermediate model: M 1 : ugi = µ + ε2g + ε1gi . The linked
                                                                              model that corresponds to z(t) is defined as M t21 : ugi = µ +
                  = p(Hobs | Ma , β),           a = 0, 1.                     tA2 cvg + tΛ2 ω 2g + ε2g + tA1 cugi + tΛ1 ω 1gi + ε1gi , together
                                                                              with the structural equations (18) and (19) only for t = 0.
Taking the logarithm and then differentiating both sides of                    Hence, when t = 1, M t21 becomes M2 with the structural
(14) with respect to t, it can be shown by similar reasoning                  equations for ω 1gi and ω 2g ; and when t = 0, M t21 becomes
as in Gelman and Meng (1998) that                                             M1 without the structural equations.
                                                                                 An approximated value of λ21 that gives log {p(X, D,
           d log z(t)      d log p(Hobs , Hmis | t, β)                        Z | M 2 , α, θ)/p(X, D, Z | M 1 , α, θ)} can be obtained with the
                      = E∗                             ,               (15)
               dt                     dt                                      proposed method. However, the computation of the observed-
                                                                              data log likelihood that corresponds to M1 is not straightfor-
where E∗ denotes the expectation with respect to the sampling                 ward because the covariance matrix of ugi under M1 is not di-
distribution p(Hmis | Hobs , t, β). Integrating (15) from 0 to 1,             agonal. Hence, we treat M1 as a specific latent variable model:
we have                                                                       ugi = µ + I ε2g + ε1gi by treating ε2g as a vector of latent vari-
                                 1
                                                                              ables with an identity loading matrix I. We then link it with
                  z(1)                                                        the following simple model, the observed-data log likelihood
       λ = log         =             E∗ [U (Hobs , Hmis , t, β)] dt,   (16)
                  z(0)       0                                                of which can be easily computed: M 0 : ugi = µ + ε1gi . The
                                                                              linked model is defined as M t10 : ugi = µ + tIε2g + ε1gi . An ap-
where U (Hobs , Hmis , t, β) = d log p(Hobs , Hmis | t, β)/dt. An             proximated value of λ10 that gives log {p(X, D, Z | M 1 , α, θ)/
estimate of λ can be obtained by numerical evaluation of the                  p(X, D, Z | M 0 , α, θ)} can be similarly obtained. As M1 and
                                       Latent Variable Model with Hierarchically Mixed Data                                                            629


M0 are relatively simple models, the approximation of λ10 is                 138, 257, and 229, while the cell frequencies of the polyto-
expected to be good. Finally,                                                mous variables ranged from 11 to 369. One fixed covariate
                                                                             cvg that is associated with the question “Are condoms avail-
log p(X, D, Z | M2 , α, θ) = λ21 + λ10 + log p(X, D, Z | M0 , α, θ).
                                                                             able at your establishment for the workers?” is incorporated
                                                                      (20)   in the between-groups measurement equation, while two fixed
  Observed-data likelihood that corresponds to a simpler                     covariates, c1gi 1 and c1gi 2 , which are associated with the ques-
model can be similarly computed as a special case of the above               tions “How much do you think you know the disease called
analysis. For example, for the following single-level nonlinear              AIDS?” and “Have you ever had an AIDS test?,” are incorpo-
SEM in Lee and Zhu (2002),                                                   rated in the within-groups structural equation. The explicit
                                                                             link of the observed variables and covariates to their questions
            M3 :    ugi = µ + Λ1 ω 1gi + ε1gi ,                              are given in part (I) of the website mentioned in Section 3.
                                                                                According to the methods given in Shi and Lee (2000) and
                   η 1gi = Π1 η 1gi + Γ1 F1 (ξ 1gi ) + δ 1gi ,        (21)
                                                                             Meng and Schilling (1996), in identifying the model, the mean
we can link it with M0 defined above via M t30 : ugi = µ +                    vector, the smallest and the largest thresholds that correspond
tΛ1 ω 1gi + ε1gi , together with the structural equation only for            to the polytomous variables, and the diagonal elements of
t = 0 as before. The observed-data likelihood can be obtained                Ψ1 that correspond to the dichotomous variables are fixed. In
similarly. Likewise, the observed-data likelihood that corre-                this example, the mean vector that corresponds to the poly-
sponds to the two-level factor analysis model (Lee and Shi,                  tomous variables is fixed at 0, the thresholds αj 1 and αj4 , j =
                                                                                                                −1
2001) can be obtained.                                                       1, . . . , 6, are fixed at αjh = Φ∗ (ρjh ), where Φ∗ is the distribu-
   For any two competitive models M1 and M2 , we can use the                 tion function of N[0, 1], and ρjh are the observed cumulative
above-mentioned method to compute their observed-data log                    marginal proportions of the categories with zj < h. The di-
likelihoods, and then the following BIC12 :                                  agonal elements ψ 11 , ψ 12 , and ψ 13 of Ψ1 are fixed at 0.1. A
                                                                             two-level model with the following measurement equations is
         BIC12 = −2{log p(X, D, Z | M1 , α(1) , θ (1) )
                                                                             considered:
                        − log p(X, D, Z | M2 , α(2) , θ (2) )}                                M0 :    ugi = vg + µ + Λ1 ω 1gi + ε1gi ,
                   + (d1 − d2 ) log(N1 + · · · + NG ),                (22)                              vg = A2 cvg + Λ2 ω 2g + ε2g ,                  (23)
where (α(a) , θ (a) ) is the da × 1 parameter vector of Ma . The             where µ = (µ1 , µ2 , µ3 , 0∗ , . . . , 0∗ ) is the mean vector and A2 =
following criterion can be used for interpretation of BIC12 (see             (a21 , . . . , a29 ). In this example, parameters with an asterisk
Kass and Raftery, 1995):                                                     are fixed for achieving an identified model. For the between-
                                                                             groups model, a factor analysis model with no structural equa-
                                                                             tion is considered. The specifications of this model are
BIC12        <0             0 to 2            2 to 6             >6
                                                                                                                                                        
                                                                                       λ2,11 λ2,21 λ2,31         0∗    0∗      0∗      0∗   0∗    0∗
         Support       No conclusion        Support        Strongly
           M1                                 M2          support M2         Λ2 =  0∗          0∗      0∗      λ2,42 λ2,52 λ2,62      0∗   0∗    0∗  ,
                                                                                         0∗     0∗      0∗       0∗    0∗      0∗    λ2,73 λ2,83 λ2,93
                                                                                  1.0∗                   sym
                                                                             Φ2 = φ2,21         1.0∗             ,    and
5. An Illustrative Example: The AIDS Study
                                                                                  φ2,31         φ2,32    1.0∗
The proposed methodology is used to analyze the AIDS data
set. Nine manifest variables, of which the first three are di-                Ψ2 = (ψ21 , ψ22 , ψ23 , ψ24 , ψ25 , ψ26 , ψ27 , ψ28 , ψ29 ),
chotomous and the last six are polytomous with a five-point
                                                                             where Φ2 is the covariance matrix of ω 2g . Although other
scale are selected as indicators for the latent variables “usage
                                                                             structures for Λ2 can be considered, we choose the above
of condom, η,” “worry about AIDS, ξ 1 ,” and “attitude toward
                                                                             common nonoverlapping structure in factor analysis for clear
condom use, ξ 2 .” The questions that correspond to the first
                                                                             interpretations of the latent variables. For the within-groups
three dichotomous questions are as follows. (1) Have you ever
                                                                             model, the factor loading matrix Λ1 = (λ1,kh ) takes the same
used a condom? (2) Did you use a condom the last time you
                                                                             form as Λ2 , except that λ1,11 , λ1,42 , and λ1,73 are fixed at 1.0
had sex? (3) Have you ever put a condom on a customer? The
                                                                             for achieving model identification. Let (η 1gi , ξ 1gi1 , ξ 1gi2 ) be
questions corresponding to the last six polytomous variables
                                                                             a partition of ω 1gi corresponding to the structure of Λ1 . To
are as follows. (4) How much of a threat do you think AIDS
                                                                             investigate the effects of the fixed covariates and the interac-
is to the health of people? (5) What are the chances that
                                                                             tion of ξ 1gi 1 and ξ 1gi 2 on η 1gi , the structural equation is taken
you might get AIDS? (6) How worried are you about getting
                                                                             as
AIDS? (7) Do you agree or disagree that condoms make sex
less enjoyable? (8) Do you agree or disagree that condoms                                 η1gi = b1 c1gi1 + b2 c1gi2 + γ1 ξ1gi1 + γ2 ξ1gi2
cause a man to lose his erection? (9) Do you agree or disagree                                       + γ12 ξ1gi1 ξ1gi2 + δ1gi .                        (24)
that condoms cause pain or discomfort? The sample size is
618, with individuals in the establishments varying from 1 to                The unknown parameters in Φ1 and Ψ1 are {φ1,11 , φ1,12 , φ1,22 }
55. The frequencies of “0” of the dichotomous variables are                  and {ψ 14 , . . . , ψ 19 }, respectively.
630                                                           Biometrics, September 2004


                                                                               (a)
                        15




                        10




                          5




                          0


                                      20       40        60          80      100         120      140        160       180        200
                                                                          MCEM Iteration


                                                                                 (b)
                              2



                          1.5



                              1



                          0.5



                              0



                        −0.5



                          −1



                        −1.5



                          −2
                                  0           20                40              60                 80                100
                                                                           MCEM Iteration

Figure 1. (a) Plots of the observed-data log-likelihood ratios computed via bridge sampling. (b) Average estimates of µ1 ,
a25 , λ1,21 , λ2,21 , b1 , b2 , γ 1 , γ 2 , γ 12 , φ1,12 , φ2,23 , ψ 17 , ψ 1δ , and α12 in M0 . (c) Chains of the first 800 iterations of η 1g1 , ξ 1gi 1 , and
ξ 1gi 2 at the 165th MCEM iteration.
                                       Latent Variable Model with Hierarchically Mixed Data                                                  631


                                                                            (c)
                 2

                 1

                 0

              −1

              −2
                     0           100           200            300           400           500           600           700         800

                 4


                 2


                 0


              −2
                     0           100           200            300           400           500           600           700         800

                 2

                 1

                 0

              −1

              −2
                     0           100           200            300         400           500             600           700         800
                                                                Hybrid Algorithm Iteration

                                                           Figure 1.        (Continued )




                                                             Table 1
                                 ML estimates and standard errors of parameters: AIDS data example

                            First level                                                                         Second level

Par     MLE               SE           Par           MLE             SE           Par           MLE            SE           Par     MLE      SE
λ1,21    0.472           0.038         φ1,11          0.522         0.013         λ2,11         1.072         0.024     φ2,12       0.286   0.077
λ1,31    0.528           0.015         φ1,12         −0.053         0.018         λ2,21         0.495         0.009     φ2,13       0.386   0.064
λ1,52    0.379           0.086         φ1,22          0.095         0.012         λ2,31         0.609         0.041     φ2,23       0.518   0.049
λ1,62    0.277           0.064         µ1             0.478         0.042         λ2,42         0.226         0.024     a21         0.231   0.108
λ1,83    2.135           0.188         µ2            −0.154         0.014         λ2,52         0.189         0.119     a22         0.415   0.066
λ1,93    2.039           0.054         µ3            −0.103         0.015         λ2,62         0.549         0.006     a23         0.459   0.028
γ1       0.086           0.077         α12           −1.007                       λ2,73         0.433         0.026     a24        −0.304   0.050
γ2       0.406           0.202         α13           −0.621                       λ2,83         0.253         0.009     a25         0.789   0.247
γ 12    −0.430           0.223         α22           −0.190                       λ2,93         0.296         0.065     a26        −0.033   0.116
ψ 14     0.392           0.026         α23            0.220                       ψ 21          0.005         0.001     a27         0.059   0.071
ψ 15     0.567           0.057         α32           −0.969                       ψ 22          0.002         0.001     a28        −0.063   0.048
ψ 16     0.729           0.056         α33           −0.567                       ψ 23          0.005         0.003     a29        −0.189   0.132
ψ 17     0.740           0.057         α42            0.527                       ψ 24          0.011         0.007
ψ 18     0.520           0.076         α43            0.749                       ψ 25          0.384         0.117
ψ 19     0.499           0.048         α52           −0.007                       ψ 26          0.008         0.005
ψ 1δ     0.839           0.034         α53            0.246                       ψ 27          0.012         0.008
b1      −0.041           0.041         α62            0.109                       ψ 28          0.010         0.003
b2       0.197           0.062         α63            0.254                       ψ 29          0.042         0.019
632                                                      Biometrics, September 2004


                                                                           (a)
                         5


                         4


                         3


                         2


                         1
             Residual




                         0


                        −1


                        −2


                        −3


                        −4


                        −5
                             0              100            200           300            400            500            600
                                                                      Case Number


                                                                            (b)
                         1


                        0.8


                        0.6


                        0.4


                        0.2
          Residual




                         0


                     −0.2


                     −0.4


                     −0.6


                     −0.8


                        −1
                              0              100            200           300            400            500            600
                                                                       Case Number


      Figure 2.                                          ˆ                             ˆ                                ˆ
                                  (a) Plots of residuals δ1gi . (b) Plots of residuals ε1gi (1). (c) Plots of residuals ε2g (1).
                                             Latent Variable Model with Hierarchically Mixed Data                                              633


                                                                         (c)
                               1


                              0.8


                              0.6


                              0.4


                              0.2
                  Residual




                               0


                             −0.2


                             −0.4


                             −0.6


                             −0.8


                              −1
                                    0   10         20    30        40      50           60        70       80        90
                                                                    Group Number

                                                         Figure 2.     (Continued )

  The model M0 defined by (23) and (24) is compared with                        Figure 1a and 1b. We observe that the plots are stabilized at
the following models:                                                          around 100 MCEM iterations. Based on the above stopping
                                                                               rule, the algorithm stops at the 165th MCEM iteration; and
    M1 :   ugi = vg + µ + Λ1 ω 1gi + ε1gi ,                                    ¯ (165) is taken as the ML estimate. We detect that the chains
                                                                               θ
            vg = A2 cvg + Λ2 ω 2g + ε2g ,                                      which are generated by the hybrid algorithm at each E-step
                                                                               mix quickly at around 100 iterations. To reveal the behavior,
           η1gi = b1 c1gi1 + b2 c1gi2 + γ1 ξ1gi1 + γ2 ξ1gi2 + δ1gi ,
                                                                               the first 800 iterations in the chains of η 1g1 , ξ 1g11 , and ξ 1g12
    M2 :   ugi = vg + µ + Λ1 ω 1gi + ε1gi ,                                    at the 165th iteration are presented in Figure 1c. In the path
                                                                               sampling procedure, we take K = 20 and J = 2000. The es-
            vg = A2 cvg + Λ2 ω 2g + ε2g ,                                      timated observed-data log likelihoods that correspond to M0 ,
           η1gi = γ1 ξ1gi1 + γ2 ξ1gi2 + γ12 ξ1gi1 ξ1gi2 + δ1gi ,               M1 , M2 , and M3 are equal to −6172.77, −6185.07, −6196.76,
                                                                               and −6211.29, respectively. It follows from (22) that BIC10 =
    M3 :   ugi = vg + µ + Λ1 ω 1gi + ε1gi ,                                    18.18, BIC20 = 35.13, and BIC30 = 19.20. According to the
            vg = Λ2 ω 2g + ε2g ,                                               interpretation of BIC, these results indicate that M0 is ob-
                                                                               viously better than M1 , M2 , and M3 . The ML estimates and
           η1gi = b1 c1gi1 + b2 c1gi2 + γ1 ξ1gi1 + γ2 ξ1gi2                    their standard error estimates (SEs, which are computed via
                      + γ12 ξ1gi1 ξ1gi2 + δ1gi .                               5000 simulated observations) of the selected model M0 are
                                                                               presented in Table 1. Note that as the complete-data log like-
In the MCEM algorithm for computing the ML estimates un-                       lihood is not differentiable with respect to the thresholds, for
der the competing models, we generate T = 50 + 15l obser-                      simplicity only the SEs of other parameters are presented.
vations to approximate the conditional expectations at the                         Inspired by Guo and Thompson (1994), we use the last 150
E-step of the lth iteration. Convergence can be monitored by                   MCEM iterations to provide estimates of the Monte Carlo
ratios of consecutive observed-data likelihoods computed via                   (MC) standard errors in the MCEM process. Using these it-
                                                 ¯ (l)
bridge sampling. To crossvalidate the results, θ is computed                   erations, blocked into 15 batches of size 10, standard errors
via (12) with m = 50, and convergence can also be monitored                    for the average of these 150 values of estimates in θ are ob-
via the method of Shi and Copas (2002) by using (13) with                      tained. We observe that these MC standard errors are very
γ 0 = 3, δ 1 = δ 2 = 0.001. Convergence is claimed if the stopping             small. For example, the corresponding MC standard errors
rule is satisfied for five consecutive iterations. To reveal con-                of γ 1 , γ 2 , and γ 12 are 0.012, 0.0005, and 0.002, respectively.
vergence in analyzing M0 , plots of likelihood ratios and some                                                                     ¯     ¯
                                                                               We also compute the MC standard errors of U(1) , U(10) , and
                          ¯
arbitrary elements of θ (l) against iterations are displayed in                U¯(20) in estimating λ10 (see equation [20]) based on 20 batches
634                                                    Biometrics, September 2004

each with a subsample size of 100. The MC standard errors                 ates is proposed for the CSW level, which involves a nonlinear
are equal to 0.704, 0.678, and 0.733, respectively. Compared to           structural equation for accounting for the causal effects of
their means (171.91, 172.03, and 26.46), these errors are small.          the latent variables ξ 1gi 1 and ξ 1gi 2 and the fixed covariates
To address the sensitivity of T and the number of MCEM it-                c1gi 1 and c1gi 2 on η. Based on the nonoverlapping structures
erations to the choices, ML estimates are obtained by using               of Λ1 and Λ2 , and the meanings of the questions associated
¯ (215) and by using a fixed T = 200 with estimates taken at
θ                                                                         with the indicators, the latent variables in ω 1gi and ω 2g can
the 200th iteration. Almost all absolute differences of the vari-          be roughly interpreted as “usage of condom,” “worry about
ous estimates are less than 0.05. Moreover, the observed-data             AIDS,” and “attitude toward condom use” of the CSWs in
log likelihoods that correspond to M0 , M1 , M2 , and M3 are              relation to their individual level and to the policies of the
computed with K = 30. The corresponding values are equal                                               ˆ    ˆ          ˆ
                                                                          establishments. From φ1,11 , φ1,12 , and φ1,22 in Table 1, the cor-
to −6176.75, −6193.97, −6202.65, and −6217.35, respectively.              relation estimate of “worry about AIDS” and “attitude to-
Finally, the observed-data log likelihoods are also computed              ward condom use” at the within-groups level is −0.238. It is
with K = 20 and J = 3000 with 1000 burn-in iterations.                    interesting to note that this estimate is completely different
The corresponding values are −6171.72, −6186.09, −6197.28,                from the corresponding correlation estimate at the between-
and −6207.16, respectively. Hence, we obtain close estimates                                  ˆ
                                                                          groups level (φ2,23 = 0.518). This may indicate a difference
of observed-data log likelihoods under different settings, and             between these latent variables at the CSW and establishment
the same conclusion as before.                                            levels. From the estimates (and SEs) of the elements in A2 ,
    A standard approach to check the fit of the selected model             we know that the availability of condoms in establishments
is to evaluate GOF =              (ˆk − gk )2 /ˆk , where the summa-
                                   e           e                          has a significant effect on most indicators for CSWs’ “us-
tion is taken over the number of distinct response patterns               age of condom” and “worry about AIDS.” This may indicate
n0 (≤min{618, 23 × 56 } in this example), gk is the observed              that an establishment’s policy has some influence on promot-
frequency, and ek is probability of the response pattern k.
                     ˆ                                                    ing safer sexual practice and attitude toward AIDS. From
Theoretically, path sampling can be applied to compute ek             ˆ   the magnitudes of ˆ1 (=−0.041), ˆ2 (=0.197), and γ1 (=0.086),
                                                                                                    b             b                  ˆ
by treating it as an individual likelihood. However, as n 0 is            it seems that “usage of condom” is not greatly affected by
usually very large (483 in this example), it is impractical               CSWs’ knowledge and “worry about AIDS.” Rather, as we
to evaluate GOF. Moreover, for the current complex model                                     ˆ
                                                                          can see from γ2 (=0.406), they tend to use condoms more if
and data structure, the sampling distribution of GOF cannot               they feel condoms have no bad effects on their sexual activi-
be routinely derived. Inspired by model checking techniques                             ˆ
                                                                          ties. From γ12 (=−0.430), “worry about AIDS” and “attitude
in regression, we use the following estimated residuals to                toward condom use” have a negative interaction on “usage of
reveal the goodness-of-fit of M0 to the empirical data with                condom.” The basic interpretation is that the additive effect
the latent structure: δ1gi = η1gi − ˆ1 c1gi1 − ˆ2 c1gi2 − γ1 ξ1gi1 −
                              ˆ      ˆ     b          b         ˆ ˆ       of the fixed covariates (c1gi 1 and c1gi 2 ) and the linear random
ˆ ˆ        ˆ ˆ ˆ                        ˆ           ˆ ˆ ˆ
γ2 ξ1gi2 − γ12 ξ1gi1 ξ1gi2 , ε2g = vg − A2 cvg − Λ2 ω 2g , ε1gi = ugi −
                             ˆ     ˆ                               ˆ      variables (“worry about AIDS,” ξ 1gi 1 ; “attitude toward con-
        ˆ ˆ ˆ
vg − µ − Λ1 ω 1gi , where estimates of the latent vectors (vg ,
ˆ                                                                         dom use,” ξ 1gi 2 ) in the structural equation are inadequate to
ω 1gi , and ω 2g ) and latent quantities (wgi and ygi ) are               account for their relationships with the latent variable “us-
obtained from the observations that are simulated by the                  age of condom, η,” and a negative interaction term of ξ 1gi 1
hybrid algorithm in the E-step of the final MCEM iterations.               and ξ 1gi 2 has to be added (see equation [24]). Depending on
           ˆ
Plots of δ1gi , ε1gi (1), and ε2g (1) are displayed in Figure 2.
                 ˆ                 ˆ                                      different situations, this negative interaction term has various
The residual plots that correspond to other components of                 impacts. For example, it indicates that less worry about AIDS
ˆ          ˆ
ε1gi and ε2g are similar. These plots lie within two parallel             and regarding condoms as having no bad effects on sexual ac-
horizontal lines that are centered at zero, and no linear or              tivities would have a positive effect on condom use; and more
quadratic trends are detected. This roughly indicates that                worry about AIDS and regarding condoms as having bad ef-
the proposed measurement and structural equations at both                 fects on sexual activities would also have a positive effect on
levels are adequate.                                                      condom use. As “worry about AIDS” and “attitude toward
    Comparing M0 with M1 , we conclude that a two-level model             condom use” are negatively correlated, CSWs generally would
with an interaction term of the exogenous latent variables in             have the above behaviors. To give one more interpretation of
the first-level structural equation is significantly better than a          the negative interaction, it also reveals that more worry about
simple model with a linear structural equation. By comparing              AIDS and regarding condoms having no bad sexual effects
M0 with M2 and M3 , we conclude that the model with fixed co-              would have a too strong additive effect on condom use, and
variates c1gi 1 and c1gi 2 in the first-level structural equation and      needs to be reduced by a negative interaction effect (however,
the fixed covariate cvg in the second-level measurement equa-              as ξ 1gi 1 and ξ 1gi 2 are negatively correlated, this situation is not
tion is significantly better. To draw these conclusions, it is             common).
necessary to develop a comprehensive model that can accom-                   In conclusion, in order to obtain more meaningful and
modate the fixed covariates and nonlinearities of the latent               valid statistical inferences of the complicated data set, we ap-
variables in the analysis. It is not our intention to claim that          ply a two-level SEM which includes a between-group factor
M0 is the best model for this example; competing models can               analysis model with fixed covariates, and a complex within-
be considered by applying the proposed method, if necessary.              group SEM that involves a nonlinear structural equation with
    In the selected model, a factor analysis model with three             fixed covariates, linear and interaction terms of latent vari-
correlated latent variables and a fixed covariate is proposed for          ables. Our program is written in C and is available upon
the establishment level. A nonlinear SEM with fixed covari-                request.
                                   Latent Variable Model with Hierarchically Mixed Data                                         635


6. Discussion                                                                               e             e
                                                                     fets de covariables fix´es dans les ´quations de mesure et
Despite its generality, the proposed two-level SEM is defined              e
                                                                     les ´quations structurelles aux deux niveaux. De plus la
by measurement and structural equations that describe the               e               e           e a          e    e
                                                                     m´thodologie peut ˆtre appliqu´e ` des donn´es hi´rarchiques
                                                                                                                           e
                                                                     mixtes continues, binaires et polytomiques. On impl´mente
relationships among the observed and latent variables at both
                                                                     un algorithme EM Monte Carlo pour obtenir l’estimateur
levels by conceptually simple regression models. Compared to                                            e
                                                                     du maximum de vraisemblance. L’´tape E est accomplie par
the other approach that links latent variables with a general-                               e
                                                                     l’approximation des esp´rances conditionnelles au travers des
ized linear model, the measurement and structural equations                            e        e                            e
                                                                     observations simul´es par m´thode MCMC, tandis que l’´tape
are directly connected with the Gaussian distribution, and           M est accomplie par une maximisation conditionnelle. On
hence lead to a more straightforward interpretation of the                          e
                                                                     propose une m´thode de calcul de la log-vraisemblance des
parameters.                                                                e        e             e
                                                                     donn´es observ´es, et du crit`re BIC pour la comparaison de
   Consider the following four major components of the pro-               e          e                  e                      e
                                                                     mod`les. Les m´thodes sont illustr´es par un jeu de donn´es
posed model and the data structure: (1) a two-level model             e
                                                                     r´elles.
for hierarchically structured data, (2) discrete natures of the
data, (3) fixed covariates, and (4) nonlinear structural equa-                                References
tions. The first two components are important for achieving
correct statistical results. The last two components are es-         Bentler, P. M. (1995). EQS: Structural Equation Program
sential to analyzing more complicated situations. Nonlinear               Manual. Los Angeles: BMDP Statistical Software.
terms of latent variables have been found to be useful in busi-      Bentler, P. M. and Stein, J. A. (1992). Structural equation
ness, education, and the social sciences. As models in biolog-            models in medical research. Statistical Methods in Medical
ical and medical research are more subtle, we expect nonlin-              Research 1, 159–181.
ear terms should be useful as well. We admit that in practice        Bock, R. D. and Aitkin, M. (1981). Marginal maximum like-
situations might not often arise that require all of the above            lihood estimation of item parameters: Application of an
components to fit the data. Still, our development is useful for           EM algorithm. Psychometrika 46, 443–461.
providing a general framework for analyzing the large num-           Bollen, K. A. (1989). Structural Equation with Latent Vari-
bers of its submodels. This is particularly true from a model             ables. New York: Wiley.
comparison perspective. For example, even if a linear model          Booth, J. G. and Hobert, J. P. (1999). Maximum generalized
is better than a nonlinear model in fitting a data set, such a             linear model likelihoods with an automated Monte Carlo
conclusion cannot be reached without the model comparison                 EM algorithm. Journal of the Royal Statistical Society,
statistic under the more general nonlinear model framework.               Series B 61, 265–285.
We use BIC because it is more common, and it tends to favor          Chib, S. and Greenberg, E. (1998). Analysis of multivariate
simpler models than those chosen by the Akaike information                probit models. Biometrika 85, 347–361.
criterion (AIC; see Kass and Raftery, 1995).                         Christofferson, A. (1975). Factor analysis for dichotomized
   Similar to most complex statistical models, the proposed               variables. Psychometrika 37, 29–51.
model is developed under normality assumptions for latent            Dolan, C. V., Molenaar, P. C. M., and Boomsma, D. I. (1991).
variables. In light of the fact that these assumptions may not            Simultaneous genetic analysis of longitudinal means and
always be true in practice, the development of techniques that            covariance structure in the simplex model using twin
are less reliant on these assumptions, such as that given in              data. Behavior Genetics 21, 49–65.
Wall and Amemiya (2000), would be useful. Another limita-            Dunson, D. B. (2000). Bayesian latent variable models for
tion of our model is the difficulty in developing computation-              clustered mixed outcomes. Journal of the Royal Statistical
ally feasible statistics for rigorously assessing its goodness-of-        Society, Series B 62, 355–366.
fit. Residual analyses are presented in this article, but more        Gelman, A. and Meng, X. L. (1998). Simulating normalizing
research in this direction is necessary.                                  constant: From importance sampling to bridge sampling
                                                                          to path sampling. Statistical Science 13, 163–185.
                                                                     Geman, S. and Geman, D. (1984). Stochastic relaxation,
                   Acknowledgements                                       Gibbs distributions, and the Bayesian restoration of im-
                                                                          ages. IEEE Transactions on Pattern Analysis and Ma-
This research was fully supported by a grant from the
                                                                          chine Intelligence 6, 721–741.
Research Grants Council of the Hong Kong Special Admin-
                                                                     Guo, S. W. and Thompson, E. A. (1994). Monte Carlo es-
istrative Region, China (project CUHK 4243/02H). We are
                                                                          timation of mixed models for large complex pedigrees.
thankful to the editor, associate editor, and two reviewers for
                                                                          Biometrics 50, 417–432.
valuable comments that improved the article substantially,
                                                                     Hastings, W. K. (1970). Monte Carlo sampling methods using
and to D. E. Morisky and J. A. Stein for the use of their
                                                                          Markov chains and their application. Biometrika 57, 97–
AIDS data set.
                                                                          109.
                                                                      o                     o
                                                                     J¨reskog, K. G. and S¨rbom, D. (1996). LISREL 8: Structural
                            ´   ´
                           Resume                                         Equation Modeling with the SIMPLIS Command Lan-
On d´veloppe un mod`le g´n´ral ` variables latentes ` deux
     e                e    e e   a                   a                    guage. Hove, London: Scientific Software International.
niveaux pour donner un cadre coh´rent ` la comparai-
                                     e     a                         Kass, R. E. and Raftery, A. E. (1995). Bayes factors. Journal
son de sous-mod`les vari´s. On peut ainsi, dans ce cadre,
                 e       e                                                of the American Statistical Association 90, 773–795.
analyser aussi bien des relations non lin´aires entre vari-
                                         e                           Lee, S. Y. and Shi, J. Q. (2001). Maximum likelihood esti-
                        e
ables latentes dans les ´quations structurelles, que les ef-              mation of two-level latent variable models with mixed
636                                               Biometrics, September 2004

     continuous and polytomous data. Biometrics 57, 787–                  Evolutionary Biology. New York: Cambridge University
     794.                                                                 Press.
Lee, S. Y. and Song, X. Y. (2003). Bayesian analysis of              Rabe-Hesketh, S., Skrondal, A., and Pickles, A. (2003). Gen-
     structural equation models with dichotomous variables.               eralized multilevel structural equation modeling. Psy-
     Statistics in Medicine 22, 3073–3088.                                chometrika, in press.
Lee, S. Y. and Zhu, H. T. (2000). Statistical analysis of non-       Sammel, M. D. and Ryan, L. M. (1996). Latent variables with
     linear structural equation models with continuous and                fixed effects. Biometrics 52, 220–243.
     polytomous data. British Journal of Mathematical and            Sammel, M. D., Ryan, L. M., and Legler, J. M. (1997). La-
     Statistical Psychology 53, 209–232.                                  tent variable models for mixed discrete and continuous
Lee, S. Y. and Zhu, H. T. (2002). Maximum likelihood es-                  outcomes. Journal of the Royal Statistical Society, Series
     timation of nonlinear structural equation models. Psy-               B 59, 667–678.
     chometrika 67, 189–210.                                         Schumacker, R. E. and Marcoulides, G. A. (1998). Interac-
Louis, T. A. (1982). Finding the observed information matrix              tion and Nonlinear Effects in Structural Equation Models.
     when using EM algorithm. Journal of the Royal Statistical            Mahwah, NJ: Lawrence Erlbaum Associates.
     Society, Series B 44, 226–233.                                  Shi, J. Q. and Copas, J. (2002). Publication bias and meta-
Meng, X. L. and Rubin, D. B. (1993). Maximum likelihood                   analysis for 2 × 2 tables: An average Markov chain Monte
     estimation via the ECM algorithm: A general framework.               Carlo EM algorithm. Journal of the Royal Statistical So-
     Biometrika 80, 267–278.                                              ciety, Series B 64, 221–236.
Meng, X. L. and Schilling, S. (1996). Fitting full-information       Shi, J. Q. and Lee, S. Y. (2000). Latent variable mod-
     item factor models and an empirical investigation of                 els with mixed continuous and polytomous data. Jour-
     bridge sampling. Journal of American Statistical Asso-               nal of the Royal Statistical Society, Series B 62, 77–
     ciation 91, 1254–1267.                                               87.
Meng, X. L. and Wong, H. W. (1996). Simulating ratios of             Wall, M. M. and Amemiya, Y. (2000). Estimation for polyno-
     normalizing constants via a simple identity: A theoretical           mial structural equation models. Journal of the American
     exploration. Statistica Sinica 6, 831–360.                           Statistical Association 95, 929–940.
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N.,                Wang, C., Douglas, J., and Anderson, S. (2002). Item response
     Teller, A. H., and Teller, E. (1953). Equations of state             models for joint analysis of quality of life and survival.
     calculations by fast computing machines. Journal of                  Statistics in Medicine 21, 129–142.
     Chemical Physics 21, 1087–1092.                                 Wei, G. C. G. and Tanner, M. A. (1990). A Monte Carlo
Morisky, D. E., Tiglao, T. V., Sneed, C. D., Tempongko,                   implementation of the EM algorithm and the poor man’s
     S. B., Baltazar, J. C., Detels, R., and Stein, J. A. (1998).         data augmentation algorithm. Journal of the American
     The effects of establishment practices, knowledge and at-             Statistical Association 85, 699–704.
     titudes on condom use among Filipino sex workers. AIDS
     Care 10, 213–220.
Pugesek, B. H., Tomer, A., and von Eye, A. (2003). Struc-                        Received April 2003. Revised December 2003.
     tural Equation Modeling Applications in Ecological and                                Accepted January 2004.



                                                            Appendix
                                                              Notation

                 Observed continuous obs.                Xg = (xg1 , . . . , xgNg )             X = (X1 , . . . , XG )
                 Observed dichotomous obs.               Dg = (dg1 , . . . , dgNg )             D = (D1 , . . . , DG )
                 Observed polytomous obs.                Zg = (zg1 , . . . , zgNg )             Z = (Z1 , . . . , ZG )
                 Unobserved continuous                   Wg = (wg1 , . . . , wgNg )             W = (W1 , . . . , WG )
                  measurements                           Yg = (yg1 , . . . , ygNg )             Y = (Y1 , . . . , YG )
                                                         First level                            Second level
                 Latent variables                        ω 1gi = (η 1gi , ξ 1gi )               ω 2g = (η 2g , ξ 2g )
                                                         Ω1g = (ω 1g1 , . . . , ω 1gNg )        Ω2 = (ω 21 , . . . , ω 2G )
                                                         Ω1 = (Ω11 , . . . , Ω1G )
                 Measurement errors                      ε1gi , δ 1gi                           ε2g , δ 2g
                 Fixed covariates                        cugi , c1gi                            cvg , c2g
                 Regression coefficients                   A1 , Λ1 , B1 , Π1 , Γ1                 A2 , Λ2 , B2 , Π2 , Γ2
                 Covariance matrices                     Φ1 , Ψ1 , Ψ1δ                          Φ2 , Ψ2 , Ψ2δ

				
DOCUMENT INFO