# SUGI 28 An Introduction to the Analysis of Mixed Models

Document Sample

```					SUGI 28                                                                                                                             Statistics and Data Analysis

Paper 253-28

An Introduction to the Analysis of Mixed Models
Dallas E. Johnson, Kansas State University, Manhattan, KS

ABSTRACT                                       Furthermore, under the basic assumptions given above,

This paper introduces the General Linear Mixed Model                                         ~ N( ' , '(X'V-1X)- ).
\$R    R             R
(GLMM) and compares various alternatives for estimating
This estimator of ' is called the Generalized Least Squares
\$R
estimable functions of the model parameters and provides some
methods of estimation. The paper also considers estimating the
standard errors of the various estimates of estimable functions           Estimated Generalized Least Squares Estimators
and how the estimates and their estimated standard errors can be
used for statistical inference. Special cases of a GLMM include                    In practice, G and R are rarely known! However,
all types of split-plot experiments, repeated measures                    suppose G and R can be estimated in some way by
experiments, and various combinations of these. Examples will
and       , respectively. Then ' could be estimated
\$R
be given to illustrate how these kinds of experiments fit into a
GLMM framework.
by          where
INTRODUCTION
and                           .
The General Linear Mixed Model (GLMM) is defined by

The estimator             is called the Estimated Generalized Least

Squares Estimator (EGLS) of ' .    \$R
where y is an nx1 observable data vector, is a px1 vector of
\$
unknown parameters, u is a qx1 vector of unobservable random
variables, X and Z are design matrices corresponding to the fixed                     In most balanced experiments                     is an unbiased
and random effects, respectively, and is a vector of random
,
errors. The basic assumptions are that
estimator of ' , but in unbalanced experiments
\$R                                                   cannot be
, ~ N(0, Rnxn),
u ~ N(0,Gqxq),                                  shown to be an unbiased estimator of           ' .
\$R
and
COV( , u) = 0nxq.
,                                          Ordinary Least Squares Estimators

Note that                                                                             Another estimator of ' ., called the Ordinary Least
\$R

E(y) = X \$                                  Squares Estimator of ' . is
\$R               where                              .

This estimator is always an unbiased estimator of ' .        \$R    In many
and
balanced experiments
Cov(y) = ZGZ' + R = V (say).
=              .
Thus

y ~ N(X , V).
\$                                      When this is true, then              is an unbiased estimator of        ' . It
\$R
ESTIMATORS OF '     \$R                                                    can also be shown that,

Four different estimators of ' are introduced where
\$R                                                           ~ N( ' , '(X'X)-(X'VX)(X'X)- ).
\$R   R                        R
' , is an estimable function of the parameters in . That is,
\$R                                                     \$     R
belongs to the column space of X . N
Even though               is not unbiased for ' in unbalanced cases,
\$R
Generalized Least Squares Estimators
one might expect that              should be a reasonable estimator of
If G and R are both known, then V is also known, and
in this special case, the uniformly minimum variance unbiased
' .
\$R
(UMVU) estimate of any estimable function ' is    \$R       where
In most unbalanced cases one cannot determine the

.                      variance of              . However, note that

1
SUGI 28                                                                                                                                                         Statistics and Data Analysis

To estimate the variance of the GLM estimator, one would
Var(           ) = '(X'V-1X)- ,
R              R
need to estimate R and G, then Var(                                                           ) is estimated
and so it would seem reasonable to approximate the variance of

with                     so that so that approximate
by                                                                                                    .

estimated standard error of               is
In all cases, with any of these estimators, in order to make
inferences about ' , it is usually assumed that
\$R
.

GLM Estimators
for some appropriate value of               <   which usually has to be estimated in
Another possible estimator of ' is   \$R                                   some way.

In order to compute the EGLS estimate of ' , one must                         \$R
where                                                                               first be able to estimate R and G.       While general estimates
of R and G do not exist, there are many special cases in which one
can estimate R and G.

Split-Plot Type Cases

In many cases the GLMM,
and where         is chosen so                 belongs to the column

can be written as
space of                  . This estimator is called the GLM                                       y = X + Z 1u1 + Z 2u2 +...+ Z rur +
\$                                                  ,                (1)

estimator as it is the estimator that one gets from SAS-GLM when                     where it can be assumed that ui ~ independent N(0, i2 I) for                              F
the random effects are treated as fixed effects. The GLM                             i=1,2,...r, ~ N(0, 2 I), and that COV(ui , ) = 0 for all i. In this
,               ,F                                         ,
estimator can be shown to be an unbiased estimator of ' . One   \$R                   case, V = 12 Z 1Z 1' + 22 Z 2Z 2' +...+ r2 Z rZ r' +
F                F                           2
FI. Note that       F   ,
each ui is a qi×1 random vector with corresponding design matrix Z i
would like to choose       so that VAR(                           ) is               which is an n×qi matrix. Here estimating V is accomplished by
estimating 12 , 22 , ..., r2 , and
F    F           F        2
, called variance components.
F   ,

All split-plot type experiments fit into this framework.
minimized.      The variance of                          is equal to
Repeated measures experiments also fit into this general framework
when the repeated measures satisfy compound symmetry
assumptions, and all repeated measures experiments can be written
.           in the framework of model (1) simply by placing fewer restrictions on
R=COV( ). ,
Choosing an appropriate a is easy in balanced cases, and in                          Example 1: A Simple Split-Plot Model Suppose
these cases SAS-GLM does the work for you. Choosing an
appropriate a in unbalanced cases is much harder, and SAS-                                                      yijk =       +
( \$ J : i   +   j   +   ij   +   *
k(i)   +   ,   ijk
GLM rarely chooses an appropriate a for you. Usually, SAS -
GLM gives you a message indicating that the LSMEANS are not                          for i=1,2,...,t; j=1,2,...,b; and k=1,2,...,ni; where k(i) ~ i.i.d. N(0, 2 ),
*                           F    *
estimable or if you are using ESTIMATE and/or CONTRAST                               , ijk ~ i.i.d. N(0,
F ,
2
), and where all the k(i) 's and ijk's are
*                   ,
options, you often get a message that the contrast is not                            independent. Then y = X + Z 1u1 +   \$                where  ,
estimable. In all cases, these happen because the criteria that                      y' = [y111 y112...y11n y121 y122 ...y12n ... ytb1 ytb2 ... ytbn ],
SAS-GLM uses to select a does not provide an a such that                             \$ '=[         1 2 ... t
( ( \$ \$ \$ J J J :    1   2 ... b     11 12 ... tb],        (
u1' = [ 1(1) 2(1) ... n(1) ... 1(t) 2(t) ... n(t)],
* *    *   * *                                      *
belongs to the column space of                           . The                , ' = [ 111 112... 11n
,   , ,   , , ,        121   122 ... 12n   ... tb1 tb2 ... tbn ], and X
, ,                      ,
and Z 1 are design matrices determined by the pattern of observed
data. The ideal conditions for the errors in the split-plot model can
be restated as u1 ~ N(0, 2 IN) where N=n1+n2+...nt,,
F*                                ~ N(0,               ,
GLM estimator,                       of ' should also be a
\$R                                         F    2
INb), and that u1 and are independent. For the simple split plot
,
,
model, one must estimate 2 and 2 in order to estimate V.
F*           F   ,
reasonable estimator of ' . Like the OLS estimator of
\$R                                      \$R ' , it is
unbiased.

2
SUGI 28                                                                                                                                                                                     Statistics and Data Analysis

Example 2: A Simple Repeated Measures Model. Suppose                                                                                  (d)        Use the resulting solutions as the estimates of the
variance components.
yijk =        +
( \$ J :  i   +   j   +         ij       +    , ijk
for i=1,2,...,t; j=1,2,...,b; and k=1,2,...,ni. Let ik' = [ i1k i2k ... ibk          ,            , ,                         ,
]. Assume that the ik's are i.i.d. N(0, ) for i=1,2,...,t;
,                                      E                                                                    (a)        The estimators of the variance components are unbiased.
k=1,2,...,ni. Thus ik represents the bx1 vector of errors
,
corresponding to the b repeated measures on the kth subject in                                                                        (b)        One can often approximate the degrees of freedom
the ith treatment group.                                                                                                                         corresponding to the estimated standard errors of
estimators of estimable functions of the fixed effects by
For the simple repeated measures experiment Z 1=0                                                                                      using Satterthwaite's Method.
which implies that G=0 and V = R. Here
(c)        SAS - GLM can produce the necessary information to
perform these analyses.

R=COV( )=COV(
,                            )=                                                       , for                               (a)        There is no unique way in which to form an ANOVA table
when the data are not balanced.

(b)        The procedure can produce negative estimates of the
variance components which do not make sense.For the
y' = [y111 y121...y1b1 y112 y122 ...y1b2                 ... yt1n yt2n ... ytbn ].                                                               simple repeated measures model, one must estimate         G
in order to estimate V.
ESTIMATING G and R
(c)        If some of the expected mean squares of the random
Methods for estimating G and R are considered in this                                                                            effects in the ANOVA table depend on fixed effects, the
section.                                                                                                                                         method cannot be applied. This problem can be avoided
by placing all of the fixed effects in the model first followed
Estimates of G and R in Split-plot Type Experiments.                                                                                             by the random effects.

For the general case,one has                                                                                          2. Maximum Likelihood Method

y = X + Z 1u1 + Z 2u2 +...+ Z rur +
\$                                                                 ,
Find the values of , 12 , 22 , ... r2 , and
F \$      F           F      2
Fthat
,
where it can be assumed that                                                                                                          maximize the likelihood function over the parameter space.

ui ~ independent N(0,                   F   i
2

,   ~ N(0,          F   ,
2
I),                                                              (a)        Avoids negative estimates of the variance components.
and that
COV(ui , ) = 0 for all i.
,
(a)        Numerically intensive
In this case,
(b)        Resulting estimators are not unbiased.
2                   2                                      2                         2
V=    F 1    Z 1Z 1' +   F  2    Z 2Z 2' +...+            F     r       Z rZ r' +     F   ,       I
(c)        Solving the likelihood equations requires an iterative
and estimating V is accomplished by estimating                                               F F 1
2
,       2
2
, ...   F
r
2
,                  process which may or may not converge. Even when it
and   F 2
,.                                                                                                                                       converges, it may converge at a local maxima rather than
a global maximum.

Methods of Estimating the Variance Components                                                                                         (d)        Tends to underestimate the variance components.

1. ANOVA (Method of Moment) Estimates                                                                                                 (e)        Distributional properties are not known except
asymptotically.
(a)             Fit the GLMM by assuming that the random effects in
the model are fixed effects.                                                                                          3. Restricted Maximum Likelihood (REML)

(b)             Next obtain the corresponding ANOVA table and                                                                                    REML estimators of the variance components are found
compute the expected mean squares of the                                                                              by maximizing that part of the likelihood function that is invariant to
fixed effects in the model.
observed mean squares in the ANOVA table under the
true assumptions about the ui's and .                                        ,                                                   The GLMM is given by

(c)             Equate the observed mean squares to their expected                                                                                              y = X + Zu +
\$           ,   .
mean squares and solve the resulting system of
equations for each of the variance components.

3
SUGI 28                                                                                                                                                     Statistics and Data Analysis

Let L be a full row rank matrix that satisfies LX=0 and such that                                    2. Compound Symmetry. Here it is assumed that
rank(L) = n-rank(X) where n=the dimension of y.
Let y* = Ly. Then

y* ~ N(0,   F 1
2
LZ 1Z 1'L' +   F
2
2
LZ 2Z 2'L' +...+   F
r
2
LZ rZ r'L' +   F
,
2
I).
G   =                       .
The likelihood function formed from y* depends only on the
variance components; and the REML estimates of the variance
components are those values of 12 , 22 , ... r2 , and
F   F         2
F that           F ,
maximize the restricted likelihood function formed from y*.

3. Huyhn-Feldt Conditions. Here it is assumed that
(a)          Less numerically intensive than the Maximum                                              G  = I + j' + j ' for some and some . This implies
8   Z       Z           8           Z
Likelihood Method.

(b)          The REML estimates and the ANOVA estimates agree
when the data are balanced and all MM estimates of
the variance components are non-negative.                                               G   =                                   .

(c)          REML estimates tend to be less biased than the
Maximum Likelihood Estimates.

(a)          The distributional properties of these estimators are not                               4. AR(1) . Here it is assumed that
known, except asymptotically.

Repeated Measures (Continued)

In the simple repeated measures experiment,

y=X +    , \$     where                                                                  G   =                                        .

,   ~ N(0,                                   ).
In each of the above structures for as well as other
G
possible structures , estimating V requires one to estimate the
parameters in . For the unstructured case and the compound
G
symmetry case, can be estimated either by method of moments
G
(MoM), maximum likelihood (ML), or restricted maximum likelihood (
If there are p repeated measures on each subject, then                                     REML). If there are no missing data on any subject, the estimates
each is a pxp matrix, provided that there are no missing data for
G                                                                                            will agree. For all other cases, the parameters in must be
G
any subject.                                                                                         estimated by ML or REML methods.

Here, in order to estimate V, one must estimate .                       G                           Finally, in each of these cases, once estimates of the
The estimate of      depends on whether one assumes any
G                                                                             variance parameters are found, V can be estimated, and then ' is
\$R
structure on .   G                                                                                   estimated by             where                            . The
Reasonable Structures on .          G
1. Unstructured . Here it is assumed that
G                                                                              standard error of          is estimated by                 .

CONTACT INFORMATION

Dallas E. Johnson
Department of Statistics
G   =                                    .                                                           101 Dickens Hall
Kansas State University
Manhattan, KS 66506-0802
Work Phone: 785-532-0510
Fax: 785-532-7736
Email: dallas@stat.ksu.edu
Web: http://www.ksu.edu/stats/facultypages/johnson.htm

4

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 14 posted: 11/21/2008 language: English pages: 4
gregoria