Introduction to longitudinal analysis and repeated measures analysis

Document Sample
Introduction to longitudinal analysis and repeated measures analysis Powered By Docstoc
					    Introduction to longitudinal analysis
     and repeated measures analysis


Applied longitudinal regression
Fitzmaurice, Laird, Ware
       Fundamental objectives of
          longitudinal studies

• Assess within individual changes in
  response (individual growth curves)
• Explain differences in growth curves
  between individuals
             Longitudinal data


• A distinctive feature of longitudinal data is
  that the measurements are clustered
  (within people)
• Longitudinal data also have a temporal
  order
                 Clusters

• Measurements within a cluster (e.g., a
  person) are more similar than
  measurements in different clusters
• An individual’s propensity to respond – be
  it high, medium, or low – is shared by all
  repeated measures
• Measurements taken more closely in time
  are more strongly correlated
Normal probability density
Multivariate normal probability density
  Regression with repeated measures



• Model the mean
• Model the covariance
  (the tendency of measurements within a person
  to vary together)
           Covariance patterns

• A variety of options are available to
  describe the covariance
• Parameters describing the covariance
  must be estimated along with traditional
  explanatory variables
• Some covariance patterns require more
  information (i.e., require more parameters
  to be estimated than others)
Variance component (VC) or simple
• Assumes the repeated measurements are
  uncorrelated
• Usually unrealistic for longitudinal data
                                          (
                                               2

• Only have to estimate one parameter              )
           Unstructured covariance

• Assumes a distinct correlation between every
  pair of measurements
• The most complex covariance pattern
Unstructured covariance matrix
             Compound symmetry

• Assumes the same correlations between all pairs of
  data points
• Requires estimating two parameters ( ,  )
                                        2
Compound symmetry matrix
                   Toeplitz

• Assumes correlations between equally distant
  points is constant
                    Spatial Power
• Correlations decline with increasing spacing between
  points
• Allows unequal spacing between points
• Requires estimating two parameters ( ,  )
                                            2
                   Spatial Gaussian

• Correlations decline with increasing distances between points
• Requires 2 parameters
• Allows unequally spaced points
Autoregressive (AR(1))
   Selecting covariance structures

                                   Different time
                Equal    Unequal   points across
               spacing   spacing      subjects
Compound         Yes       Yes          Yes
symmetry
Unstructured     Yes       Yes          No

AR(1)            Yes       No           No

Toeplitz         Yes       No           No

Spatial          Yes       Yes          Yes
structures
Modeling covariance versus modeling the mean
   Generalized estimating equations
  (GEE) for repeated measurements


• Estimate regression coefficients as usual
  for the explanatory variables
• Estimate coefficients as well for the
  correlation matrix
How to choose the ‘best’ covariance pattern

 • Fit a complex model for the mean (i.e.,
   overfit)
 • Try nested covariance patterns and select
   the best based upon differences in
   likelihood ratios
 • For patterns not nested choose the
   covariance pattern with the lowest Akaike
   Information Criteria
 Misspecified covariance patterns
            with GEEs
• GEEs have an adjustment to correct the
  covariance pattern if misspecified
• The adjustment requires repeated
  measurements at the various
  combinations of the explanatory variables
• The adjustment works best for similar
  measurement times, and with limited
  missing data
               Sample size


• The number of subjects should be large
  relative to the number of measurements
• If you have 5-12 explanatory variables you
  need at least 100 clusters; to be
  reasonably confident, you probably need
  200 clusters
Oral treatment to reduce blood lead levels



                              A rebound occurs
                             from stored sources
                             of lead
      Study data

id   group   time   lead

 1     P       0    30.8
 1     P       1    26.9
 1     P       4    25.8
 1     P       6    23.8
 2     A       0    26.5
 2     A       1    14.8
 2     A       4    19.5
 2     A       6    21.0
 3     A       0    25.8
 3     A       1    23.0
            Regression model



Blood lead levels= Group Week Group*Week

Unstructured covariance matrix
Unstructured covariance matrix

                Estimated R Matrix

   Row   Col1        Col2       Col3      Col4

    1    25.2257    19.1074     19.6995   22.2016

    2    19.1074    44.3458     35.5351   29.6750

    3    19.6995    35.5351     47.3778   30.6205

    4    22.2016    29.6750     30.6205   58.6510
                  Regression results
                                 Standard                    Pr >
Effect       group time Estimate     Error DF     t Value      |t|

Intercept                26.2720    0.7103   98   36.99     <.0001

group         A          0.2680     1.0045   98   0.27      0.7902

group         P             0         .      .      .         .

time                6    -2.6260    0.8885   98   -2.96     0.0039

time                4    -2.2020    0.8149   98   -2.70     0.0081

time                1    -1.6120    0.7919   98   -2.04     0.0445

time                0       0         .      .      .         .

group*time    A     6    -3.1520    1.2566   98   -2.51     0.0138

group*time    A     4    -8.8240    1.1525   98   -7.66     <.0001

group*time    A     1    -11.4060   1.1199   98   -10.18    <.0001
Trapezoidal differences
               P-value for area differences

 • A p-value for the differences in areas
   under the curves
 • Estimate differences from baseline and
   compare areas

                                       Contrasts

Label                      Num DF Den DF Chi-Square F Value Pr > ChiSq Pr > F
3 DF Test of Interaction        3      99          111.96   37.32   <.0001   <.0001
               Linear splines




Connected linear slopes
           Data for fitting splines
     id     succimer   time       y     time_1

       1         0        0      30.8       0
       1         0        1      26.9       0
       1         0        4      25.8       3
       1         0        6      23.8       5
       2         1        0      26.5       0
       2         1        1      14.8       0
       2         1        4      19.5       3
       2         1        6      21.0       5
       3         1        0      25.8       0
       3         1        1      23.0       0



Subtract the value of the break point (the knot) from
time to create a second time variable
          Regression model


Blood lead levels =
time time_1 succimer*time succimer*time_1
                  Regression results


Effect            Estimate Standard Error DF t Value Pr > |t|

Intercept          26.3422         0.4991   99   52.78    <.0001

time                -1.6296        0.7818   99    -2.08   0.0397

time_1              1.4305         0.8777   99    1.63    0.1063

time*succimer      -11.2500        1.0924   99   -10.30   <.0001

time_1*succimer    12.5822         1.2278   99   10.25    <.0001
Estimated and mean blood lead levels
    Study of the development of obesity


• Data on 4865 boys and girls were collected
  biannually
• Five cohorts: 5-7, 7-9, 9-11, 11-13, 13-15
• Outcome was obesity (yes or no)
• Modeled mean age of measurement and
  longitudinal age
                        Study data

id   female   baselineage   occasion   obesity   cage   cage2

 1       0          6           1         1        -6      36
 1       0          6           2         1        -4      16
 1       0          6           3         1        -2       4
 2       0          6           1         1        -6      36
 2       0          6           2         1        -4      16
 2       0          6           3         1        -2       4
 3       0          6           1         1        -6      36
 3       0          6           2         1        -4      16
 3       0          6           3         1        -2       4
 4       0          6           1         1        -6      36
            Regression model


Obesity=
Female + cage + cage2 + female*cage +
 female*cage2

Fit using logistic regression for repeated
  measures
                Regression results
               Analysis Of GEE Parameter Estimates

                Empirical Standard Error Estimates

                                         95%
                        Standard      Confidence
Parameter      Estimate     Error       Limits               Z Pr > |Z|

Intercept        -1.2135    0.0506   -1.3126   -1.1144   -24.00   <.0001

gender            0.1159    0.0711   -0.0235   0.2553     1.63    0.1033

cage              0.0378    0.0133   0.0118    0.0638     2.85    0.0043

cage2            -0.0175    0.0034   -0.0241   -0.0109    -5.19   <.0001

gender*cage       0.0075    0.0182   -0.0282   0.0433     0.41    0.6795

gender*cage2      0.0039    0.0046   -0.0051   0.0130     0.85    0.3949
Regression curves
           Regression model


Obesity = Female + cage + cage2
                Regression results

            Analysis Of GEE Parameter Estimates

             Empirical Standard Error Estimates

                      Standard 95% Confidence
Parameter Estimate        Error    Limits                Z Pr > |Z|

Intercept   -1.2283     0.0477   -1.3218   -1.1348   -25.75   <.0001

gender       0.1449     0.0627   0.0221    0.2678     2.31    0.0208

cage         0.0418     0.0091   0.0240    0.0596     4.60    <.0001

cage2       -0.0155     0.0023   -0.0200   -0.0110    -6.73   <.0001
  Clinical trial of antibiotics for leprosy

• Participants randomized to placebo or to
  two antibiotics
• Measured number of leprosy bacilli at six
  sites of the body
• Analyzed using Poisson regression
  (although results showed overdispersion
  and may have been better modeled with
  Negative binomial regression)
     Study data
id    time   A    B

 1      0    1    0
 1      1    1    0
 2      0    0    1
 2      1    0    1
 3      0    0    0
 3      1    0    0
 4      0    1    0
 4      1    1    0
 5      0    0    1
 5      1    0    1
           Regression model


Y = time + A*time + B*time

Note: Treatments were not included alone
 because treatments were randomized and
 the difference should be zero
                 Regression results

             Analysis Of GEE Parameter Estimates

            Model-Based Standard Error Estimates

                      Standard    95% Confidence
Parameter Estimate        Error       Limits             Z Pr > |Z|

Intercept   2.3734     0.1035     2.1704    2.5763    22.92   <.0001

time        -0.0138    0.1111     -0.2315   0.2039    -0.12   0.9010

time*A      -0.5406    0.1818     -0.8969   -0.1843   -2.97   0.0029

time*B      -0.4791    0.1779     -0.8278   -0.1303   -2.69   0.0071
Test that the antibiotics are equally effective


                Contrast Results for GEE Analysis

   Contrast                  DF Chi-Square Pr > ChiSq Type

   Drug x Time Interaction    2        6.99         0.0803   Wald
              Regression model:
        Y = time + time*Antibiotic (y/n)

                  Analysis Of GEE Parameter Estimates

                  Model-Based Standard Error Estimates

                                             95%
                            Standard      Confidence
Parameter          Estimate     Error       Limits              Z Pr > |Z|

Intercept             2.3734    0.1028   2.1718    2.5749    23.08   <.0001

time                 -0.0108    0.1142   -0.2345   0.2130    -0.09   0.9249

time*Antibiotic      -0.5141    0.1536   -0.8152   -0.2131   -3.35   0.0008