Analyzing Repeated Measures by abo20752

VIEWS: 24 PAGES: 10

									1     Introduction
Health research often involves taking repeated measurements from the same
study subject. Such longitudinal or repeated-measures study designs are moti-
vated by the desire to study change. Some changes occur naturally (e.g. height
changes from birth to adulthood; the elderly loses hearing etc) while targeted
interventions can also cause change (cholesterol levels may decline with new
medication; test scores might rise after special educational programs; weight-
loss after implementing a new diet and exercise regimen). Research designs for
studying change can be observational or experimental. Data can be collected
prospectively or retrospectively. Time can be measured in various units - hrs,
days, months, seasons or years. Measurements can be taken on regular fixed
time points or at irregular time points.

1.1    The two types of questions
Primary goal of a longitudinal study is to characterize the change in response
over time and the factors that influence change. We can answer two types of
questions from a longitudinal data
    ˆ How does the outcome change over time - a descriptive question
    ˆ Can we predict differences in changes - examining association between
      predictors and patterns of change
    With repeated measures on individuals, one can capture within-individual
change. Cross-sectional studies are not suitable for assessing change.
    Here is an example of an often misleading analysis of change from Fitzmau-
rice. Suppose investigators are interested in determining the increase in body
fatness in girls after menarche. Using a cross-sectional study design, investiga-
tors might obtain measurements of body fat on two separate groups of girls: a
group of 10 year old (pre-menarche) and a group of 15 year old (post-menarche).
Using an unpaired t-test, the mean percent body fat in the two groups is com-
pared. This comparison does not provide an estimate of the change in body
fatness from pre- to post- menarche in girls. The effect of growth or aging
(an inherently within-individual effect) cannot be estimated from cross-sectional
study. The effects estimated in this design are potentially confounded by pos-
sible cohort effects. An ideal study would be to obtain measurement from a
cohort of girls at age 10 and then at 15 years, then use a paired t-test. This
way, each girl acts as their own control such that any changes in percent body
fat can be attributable to aging.


2     Analyzing Repeated Measures
The distinctive feature of longitudinal data is that they are clustered. Typically,
observations within a cluster exhibit positive correlation, and this correlation
must be accounted for in the analysis. This correlation invalidates the crucial
assumption of independence that is a cornerstone of many standard statistical
techniques. Before subjecting the data to statistical analyses, it is always ad-
visable to summarize the data and use exploratory graphs. We illustrate some
of these with the following example:

                                        1
2.1    Example: Treatment of Lead-Exposed Children (TLC)
       Trial (Fitzmaurice)
    ˆ Exposure to lead during infancy is associated with substantial deficits in
      tests of cognitive ability
    ˆ Chelation treatment of children with high lead levels usually requires in-
      jections and hospitalizations

    ˆ A new agent, Succimer, can be given orally

    ˆ Randomized placebo-controlled trial examining changes in blood lead level
      during course of treatment
    ˆ 100 children randomized to placebo or Succimer

    ˆ Measures of blood lead level at baseline, 1, 4 and 6 weeks

2.2    Sample Data
> lead = read.csv("lead.csv", header = TRUE)
> head(lead)

    ID TreatGroup Week0 Week1 Week4 Week6
1    1          P    31 26.9 25.8 23.8
2    2          A    26 14.8 19.5 21.0
3    3          A    26 23.0 19.1 23.2
4    4          P    25 24.5 22.0 22.5
5    5          A    20   2.8   3.2   9.4
6    6          A    20   5.4   4.5 11.9

> tail(lead)

     ID TreatGroup Week0 Week1 Week4 Week6
95   95          A    31 10.8     20 22.2
96   96          A    31   3.9     7 17.8
97   97          A    41 15.1     11 27.1
98   98          A    29 22.1     25   4.1
99   99          A    22   7.6    11 13.0
100 100          A    21   8.1    26 12.3

    The lead data above is in what we call the wide format. We may want
to change the data into what is known as the long format because that is the
format we typically use to analyze longitudinal data. There are many ways to
do this but the reshape command quicker and easier:
                   ¯
> lead.long = reshape(lead, idvar = "ID", varying = c("Week0", "Week1",
+      "Week4", "Week6"), v.names = "Lead", timevar = "Timeweek", times = c(0,
+      1, 4, 6), direction = "long")
> head(lead.long)




                                       2
      ID TreatGroup Timeweek Lead
1.0    1          P        0   31
2.0    2          A        0   26
3.0    3          A        0   26
4.0    4          P        0   25
5.0    5          A        0   20
6.0    6          A        0   20

    We can have the data sorted by id and week:
> lead.long2 = lead.long[order(lead.long$ID, lead.long$Timeweek),
+     ]
> head(lead.long2)

      ID TreatGroup Timeweek Lead
1.0    1          P        0   31
1.1    1          P        1   27
1.4    1          P        4   26
1.6    1          P        6   24
2.0    2          A        0   26
2.1    2          A        1   15

    Here is the trajectories of all the study subjects:

>   attach(lead.long2)
>   unique.id = unique(ID)
>   plot(Timeweek, Lead, type = "n")
>   for (i in unique.id) {
+       lines(Timeweek[ID == i], Lead[ID == i], type = "b", col = ifelse(TreatGroup[ID ==
+           i] == "P", 2, 3), pch = 20)
+   }




                                         3
                                                          q
         60
         50




                                                          q
              q
         40




                   q                         q
                   q
                   q                                      q
              q                              q
              q                              q
              q
Lead




              q                              q
              q                              q            q
                                                          q
              q
              q    q                         q
              q                              q
                                             q
              q    q                                      q
                                                          q
              q
              q    q                                      q
                                                          q
              q    q
         30




              q                                           q
                                                          q
              q
              q    q                         q            q
              q
              q    q                         q
              q
              q    q
                   q                         q
                                             q
              q    q
                   q                         q            q
                                                          q
              q    q                         q            q
                                                          q
              q
              q    q                         q
                                             q
              q    q                                      q
              q
              q    q                         q            q
              q
              q    q                         q
                                             q            q
                                                          q
              q
              q    q                         q            q
              q    q
                   q                         q
                                             q            q
                                                          q
              q    q
                   q                                      q
                                                          q
              q    q
                   q                         q            q
                                                          q
              q
              q    q                         q
                                             q            q
              q
              q    q
                   q                         q            q
                                                          q
         20




              q
              q    q
                   q                         q            q
              q
              q                              q            q
                                                          q
                   q                         q
                                             q            q
                   q                         q            q
                                                          q
                                             q            q
                   q
                   q                         q            q
                   q
                   q                         q
                                             q            q
                                                          q
                   q                         q
                                             q            q
                                                          q
                   q
                   q                         q            q
                                             q            q
                   q                                      q
                                                          q
                   q                         q
                                             q            q
                                                          q
                   q
                   q                         q            q
                                                          q
         10




                   q                         q
                   q                         q
                   q                         q            q
                                             q
                                             q
                   q
                   q
                   q                         q
                   q
                   q
                   q                         q
                   q                         q
                                             q            q
                   q
                   q
                   q                         q




              0    1      2       3          4   5       6

                              Timeweek


   This picture is not very informative - it is too crowded. Lets get an idea how
a sub sample of the individuals performed:
>      unique.id = unique(ID)
>      sample.id = sample(unique.id, 10, replace = FALSE)
>      plot(Timeweek, Lead, type = "n")
>      for (i in sample.id) {
+          lines(Timeweek[ID == i], Lead[ID == i], type = "b", col = ifelse(TreatGroup[ID ==
+              i] == "P", 2, 3), pch = 20)
+      }




                                         4
       60
       50
       40




            q
Lead




                                             q
                  q

                                                         q
       30




            q
            q     q                                      q
            q
            q
            q
            q                                            q
                  q                          q
                                             q           q
                  q                          q
            q     q                          q           q
       20




                  q                                      q
                  q                                      q
                                             q
                                                         q
                                                         q
       10




                                             q
                  q
                  q                          q




            0    1       2       3           4   5       6

                              Timeweek


   Generally, we can see individual patterns of trajectories as well as response
to Placebo or Succimer.
   Let us summarize the lead levels by treatment group over time. We start
with a graphic summary

> boxplot(Lead ~ TreatGroup + Timeweek, ylab = "Lead Level", xlab = "Treatment Group and We




                                         5
                                                                 q
              60
              50




                                                                       q
                      q                   q
              40




                                                 q
                                          q                      q
Lead Level




                                    q
              30
              20
              10




                     A.0    P.0    A.1   P.1    A.4       P.4   A.6   P.6

                           Treatment Group and Week of measurement


             Alternatively, we can use other display functions such as plot.design
             We can summarize these data with means using the tapply function.

> options(digits = 4)
> tapply(Lead, list(TreatGroup, factor(Timeweek)), mean)

      0     1     4     6
A 26.54 13.52 15.51 20.76
P 26.27 24.66 24.07 23.65

> tapply(Lead, list(TreatGroup, factor(Timeweek)), sd)

      0     1     4     6
A 5.021 7.672 7.852 9.246
P 5.024 5.461 5.753 5.640

> tapply(Lead, list(TreatGroup, factor(Timeweek)), median)

      0     1     4     6
A 26.20 12.25 15.35 18.85
P 25.25 24.10 22.45 22.35

             We can display the means graphically as well:
> lead.mean = tapply(Lead, list(TreatGroup, factor(Timeweek)), mean)
> plot(c(0, 1, 4, 6), lead.mean[1, ], type = "b", pch = 20, col = 3,
+     xlab = "Week", ylab = "Lead")
> lines(c(0, 1, 4, 6), lead.mean[2, ], type = "b", pch = 18, col = 2,
+     lty = 3)


                                                      6
              q
        26
        24
        22




                                                          q
Lead

        20
        18
        16




                                                  q
        14




                      q




             0        1       2       3           4   5   6

                                    Week


       or alternative use interaction.plot.

> interaction.plot(Timeweek, factor(TreatGroup), Lead, lwd = 3, col = c(3,
+     2), lty = c(1, 12), trace.label = "TreatGroup")




                                              7
                26


                                                                           TreatGroup

                                                                                 P
                24




                                                                                 A
                22
mean of Lead

                20
                18
                16
                14




                        0                1              4             6

                                                Timeweek


               In summary we observe that:
               ˆ Baseline means are similar between the two groups - randomization
               ˆ Discernable differences in the pattern of change in mean response over
                 time
               ˆ Huge drop is treated, then gradual rise - release of bone-lead into blood
               ˆ Trend in placebo treated is relatively flat
    The main objective of this study is to describe changes in mean response over
time, and how these changes are related to the covariates of interest (Succimer
or Placebo). The question is Does treatment with succimer reduce blood lead
levels over time relative to any changes observed in the placebo group. So the
null hypothesis maybe written as
                                     H0 : µj (S) = µj (P ), for allj = 1, . . . , 4     (1)

2.3                  Covariance and Correlation
We have to examine the covariance and correlation structure. Using the placebo
group and Succimer group separately
> cov(lead[lead$TreatGroup == "P", 3:6])
                     Week0   Week1    Week4    Week6
Week0                25.24   22.75    24.26    21.42
Week1                22.75   29.82    27.04    23.38
Week4                24.26   27.04    33.10    28.22
Week6                21.42   23.38    28.22    31.81


                                                            8
> cov(lead[lead$TreatGroup == "A", 3:6])

        Week0   Week1   Week4   Week6
Week0   25.21   15.47   15.14   22.99
Week1   15.47   58.87   44.03   35.97
Week4   15.14   44.03   61.66   33.02
Week6   22.99   35.97   33.02   85.49

   By examining the diagonals of these matrices, we observe that variance in-
creases over time - a common characteristic of longitudinal data.
   Lets examine the correlations as well:
> cor(lead[lead$TreatGroup == "P", 3:6])

         Week0    Week1    Week4    Week6
Week0   1.0000   0.8291   0.8394   0.7559
Week1   0.8291   1.0000   0.8607   0.7592
Week4   0.8394   0.8607   1.0000   0.8697
Week6   0.7559   0.7592   0.8697   1.0000

> cor(lead[lead$TreatGroup == "P", 3:6])

         Week0    Week1    Week4    Week6
Week0   1.0000   0.8291   0.8394   0.7559
Week1   0.8291   1.0000   0.8607   0.7592
Week4   0.8394   0.8607   1.0000   0.8697
Week6   0.7559   0.7592   0.8697   1.0000

We observe that there are strong positive correlation between observations from
the same individual. The strength of correlation decreases with increasing time
separation.

2.4     Repeated Measures Analysis by ANOVA
There are several approaches to analysis of repeated measures data analysis
that have developed over time. We look at the early approaches which fall into
the ANOVA family. In RM ANOVA, the correlation among repeated measures
is assumed to arise from some individual-specific random effect added to each
measurement on any given individual. Time is treated as a factor variable.
    Here is how we do it in R
> summary(aov(Lead ~ TreatGroup * factor(Timeweek) + Error(factor(ID)),
+     data = lead.long2))

Error: factor(ID)
           Df Sum Sq Mean Sq F value Pr(>F)
TreatGroup 1    3111    3111    25.4 2.1e-06 ***
Residuals 98 11987       122
---
Signif. codes: 0 ´***ˇ 0.001 ´**ˇ 0.01 ´*ˇ 0.05 ´.ˇ 0.1 ´ ˇ 1
                  S   S       S S       S S      S S    S S

Error: Within

                                        9
                             Df Sum Sq Mean Sq F value Pr(>F)
factor(Timeweek)              3   3273    1091    61.4 <2e-16 ***
TreatGroup:factor(Timeweek)   3   2030     677    38.1 <2e-16 ***
Residuals                   294   5222      18
---
Signif. codes: 0 ´***ˇ 0.001 ´**ˇ 0.01 ´*ˇ 0.05 ´.ˇ 0.1 ´ ˇ 1
                  S   S       S S       S S      S S     S S

   Here is what we observe from the above results:
   ˆ The between groups test indicates that the variable TreatGroup is signifi-
     cant, consequently in the graph we see that the lines for the two treatments
     are rather far apart.

   ˆ The within subject test indicate that there is a significant time effect, in
     other words, the groups change in blood lead level over time. In the graph
     we see that the groups have lines that are not flat, i.e. the slopes of the
     lines are non-zero.

   ˆ The lines are non-parallel, implies the interaction between time and Treat-
     meant group is significant
The test for main effects are meaningless if the interaction effects are significant.
ANOVA assumes a compound covariance structure, which is not always the true.
There are other ways of modeling the covariance as well.




                                       10

								
To top