Analyzing Repeated Measures
Document Sample


1 Introduction
Health research often involves taking repeated measurements from the same
study subject. Such longitudinal or repeated-measures study designs are moti-
vated by the desire to study change. Some changes occur naturally (e.g. height
changes from birth to adulthood; the elderly loses hearing etc) while targeted
interventions can also cause change (cholesterol levels may decline with new
medication; test scores might rise after special educational programs; weight-
loss after implementing a new diet and exercise regimen). Research designs for
studying change can be observational or experimental. Data can be collected
prospectively or retrospectively. Time can be measured in various units - hrs,
days, months, seasons or years. Measurements can be taken on regular fixed
time points or at irregular time points.
1.1 The two types of questions
Primary goal of a longitudinal study is to characterize the change in response
over time and the factors that influence change. We can answer two types of
questions from a longitudinal data
How does the outcome change over time - a descriptive question
Can we predict differences in changes - examining association between
predictors and patterns of change
With repeated measures on individuals, one can capture within-individual
change. Cross-sectional studies are not suitable for assessing change.
Here is an example of an often misleading analysis of change from Fitzmau-
rice. Suppose investigators are interested in determining the increase in body
fatness in girls after menarche. Using a cross-sectional study design, investiga-
tors might obtain measurements of body fat on two separate groups of girls: a
group of 10 year old (pre-menarche) and a group of 15 year old (post-menarche).
Using an unpaired t-test, the mean percent body fat in the two groups is com-
pared. This comparison does not provide an estimate of the change in body
fatness from pre- to post- menarche in girls. The effect of growth or aging
(an inherently within-individual effect) cannot be estimated from cross-sectional
study. The effects estimated in this design are potentially confounded by pos-
sible cohort effects. An ideal study would be to obtain measurement from a
cohort of girls at age 10 and then at 15 years, then use a paired t-test. This
way, each girl acts as their own control such that any changes in percent body
fat can be attributable to aging.
2 Analyzing Repeated Measures
The distinctive feature of longitudinal data is that they are clustered. Typically,
observations within a cluster exhibit positive correlation, and this correlation
must be accounted for in the analysis. This correlation invalidates the crucial
assumption of independence that is a cornerstone of many standard statistical
techniques. Before subjecting the data to statistical analyses, it is always ad-
visable to summarize the data and use exploratory graphs. We illustrate some
of these with the following example:
1
2.1 Example: Treatment of Lead-Exposed Children (TLC)
Trial (Fitzmaurice)
Exposure to lead during infancy is associated with substantial deficits in
tests of cognitive ability
Chelation treatment of children with high lead levels usually requires in-
jections and hospitalizations
A new agent, Succimer, can be given orally
Randomized placebo-controlled trial examining changes in blood lead level
during course of treatment
100 children randomized to placebo or Succimer
Measures of blood lead level at baseline, 1, 4 and 6 weeks
2.2 Sample Data
> lead = read.csv("lead.csv", header = TRUE)
> head(lead)
ID TreatGroup Week0 Week1 Week4 Week6
1 1 P 31 26.9 25.8 23.8
2 2 A 26 14.8 19.5 21.0
3 3 A 26 23.0 19.1 23.2
4 4 P 25 24.5 22.0 22.5
5 5 A 20 2.8 3.2 9.4
6 6 A 20 5.4 4.5 11.9
> tail(lead)
ID TreatGroup Week0 Week1 Week4 Week6
95 95 A 31 10.8 20 22.2
96 96 A 31 3.9 7 17.8
97 97 A 41 15.1 11 27.1
98 98 A 29 22.1 25 4.1
99 99 A 22 7.6 11 13.0
100 100 A 21 8.1 26 12.3
The lead data above is in what we call the wide format. We may want
to change the data into what is known as the long format because that is the
format we typically use to analyze longitudinal data. There are many ways to
do this but the reshape command quicker and easier:
¯
> lead.long = reshape(lead, idvar = "ID", varying = c("Week0", "Week1",
+ "Week4", "Week6"), v.names = "Lead", timevar = "Timeweek", times = c(0,
+ 1, 4, 6), direction = "long")
> head(lead.long)
2
ID TreatGroup Timeweek Lead
1.0 1 P 0 31
2.0 2 A 0 26
3.0 3 A 0 26
4.0 4 P 0 25
5.0 5 A 0 20
6.0 6 A 0 20
We can have the data sorted by id and week:
> lead.long2 = lead.long[order(lead.long$ID, lead.long$Timeweek),
+ ]
> head(lead.long2)
ID TreatGroup Timeweek Lead
1.0 1 P 0 31
1.1 1 P 1 27
1.4 1 P 4 26
1.6 1 P 6 24
2.0 2 A 0 26
2.1 2 A 1 15
Here is the trajectories of all the study subjects:
> attach(lead.long2)
> unique.id = unique(ID)
> plot(Timeweek, Lead, type = "n")
> for (i in unique.id) {
+ lines(Timeweek[ID == i], Lead[ID == i], type = "b", col = ifelse(TreatGroup[ID ==
+ i] == "P", 2, 3), pch = 20)
+ }
3
q
60
50
q
q
40
q q
q
q q
q q
q q
q
Lead
q q
q q q
q
q
q q q
q q
q
q q q
q
q
q q q
q
q q
30
q q
q
q
q q q q
q
q q q
q
q q
q q
q
q q
q q q
q
q q q q
q
q
q q q
q
q q q
q
q q q q
q
q q q
q q
q
q
q q q q
q q
q q
q q
q
q q
q q
q
q q
q q q
q
q
q q q
q q
q
q q
q q q
q
20
q
q q
q q q
q
q q q
q
q q
q q
q q q
q
q q
q
q q q
q
q q
q q
q
q q
q q
q
q
q q q
q q
q q
q
q q
q q
q
q
q q q
q
10
q q
q q
q q q
q
q
q
q
q q
q
q
q q
q q
q q
q
q
q q
0 1 2 3 4 5 6
Timeweek
This picture is not very informative - it is too crowded. Lets get an idea how
a sub sample of the individuals performed:
> unique.id = unique(ID)
> sample.id = sample(unique.id, 10, replace = FALSE)
> plot(Timeweek, Lead, type = "n")
> for (i in sample.id) {
+ lines(Timeweek[ID == i], Lead[ID == i], type = "b", col = ifelse(TreatGroup[ID ==
+ i] == "P", 2, 3), pch = 20)
+ }
4
60
50
40
q
Lead
q
q
q
30
q
q q q
q
q
q
q q
q q
q q
q q
q q q q
20
q q
q q
q
q
q
10
q
q
q q
0 1 2 3 4 5 6
Timeweek
Generally, we can see individual patterns of trajectories as well as response
to Placebo or Succimer.
Let us summarize the lead levels by treatment group over time. We start
with a graphic summary
> boxplot(Lead ~ TreatGroup + Timeweek, ylab = "Lead Level", xlab = "Treatment Group and We
5
q
60
50
q
q q
40
q
q q
Lead Level
q
30
20
10
A.0 P.0 A.1 P.1 A.4 P.4 A.6 P.6
Treatment Group and Week of measurement
Alternatively, we can use other display functions such as plot.design
We can summarize these data with means using the tapply function.
> options(digits = 4)
> tapply(Lead, list(TreatGroup, factor(Timeweek)), mean)
0 1 4 6
A 26.54 13.52 15.51 20.76
P 26.27 24.66 24.07 23.65
> tapply(Lead, list(TreatGroup, factor(Timeweek)), sd)
0 1 4 6
A 5.021 7.672 7.852 9.246
P 5.024 5.461 5.753 5.640
> tapply(Lead, list(TreatGroup, factor(Timeweek)), median)
0 1 4 6
A 26.20 12.25 15.35 18.85
P 25.25 24.10 22.45 22.35
We can display the means graphically as well:
> lead.mean = tapply(Lead, list(TreatGroup, factor(Timeweek)), mean)
> plot(c(0, 1, 4, 6), lead.mean[1, ], type = "b", pch = 20, col = 3,
+ xlab = "Week", ylab = "Lead")
> lines(c(0, 1, 4, 6), lead.mean[2, ], type = "b", pch = 18, col = 2,
+ lty = 3)
6
q
26
24
22
q
Lead
20
18
16
q
14
q
0 1 2 3 4 5 6
Week
or alternative use interaction.plot.
> interaction.plot(Timeweek, factor(TreatGroup), Lead, lwd = 3, col = c(3,
+ 2), lty = c(1, 12), trace.label = "TreatGroup")
7
26
TreatGroup
P
24
A
22
mean of Lead
20
18
16
14
0 1 4 6
Timeweek
In summary we observe that:
Baseline means are similar between the two groups - randomization
Discernable differences in the pattern of change in mean response over
time
Huge drop is treated, then gradual rise - release of bone-lead into blood
Trend in placebo treated is relatively flat
The main objective of this study is to describe changes in mean response over
time, and how these changes are related to the covariates of interest (Succimer
or Placebo). The question is Does treatment with succimer reduce blood lead
levels over time relative to any changes observed in the placebo group. So the
null hypothesis maybe written as
H0 : µj (S) = µj (P ), for allj = 1, . . . , 4 (1)
2.3 Covariance and Correlation
We have to examine the covariance and correlation structure. Using the placebo
group and Succimer group separately
> cov(lead[lead$TreatGroup == "P", 3:6])
Week0 Week1 Week4 Week6
Week0 25.24 22.75 24.26 21.42
Week1 22.75 29.82 27.04 23.38
Week4 24.26 27.04 33.10 28.22
Week6 21.42 23.38 28.22 31.81
8
> cov(lead[lead$TreatGroup == "A", 3:6])
Week0 Week1 Week4 Week6
Week0 25.21 15.47 15.14 22.99
Week1 15.47 58.87 44.03 35.97
Week4 15.14 44.03 61.66 33.02
Week6 22.99 35.97 33.02 85.49
By examining the diagonals of these matrices, we observe that variance in-
creases over time - a common characteristic of longitudinal data.
Lets examine the correlations as well:
> cor(lead[lead$TreatGroup == "P", 3:6])
Week0 Week1 Week4 Week6
Week0 1.0000 0.8291 0.8394 0.7559
Week1 0.8291 1.0000 0.8607 0.7592
Week4 0.8394 0.8607 1.0000 0.8697
Week6 0.7559 0.7592 0.8697 1.0000
> cor(lead[lead$TreatGroup == "P", 3:6])
Week0 Week1 Week4 Week6
Week0 1.0000 0.8291 0.8394 0.7559
Week1 0.8291 1.0000 0.8607 0.7592
Week4 0.8394 0.8607 1.0000 0.8697
Week6 0.7559 0.7592 0.8697 1.0000
We observe that there are strong positive correlation between observations from
the same individual. The strength of correlation decreases with increasing time
separation.
2.4 Repeated Measures Analysis by ANOVA
There are several approaches to analysis of repeated measures data analysis
that have developed over time. We look at the early approaches which fall into
the ANOVA family. In RM ANOVA, the correlation among repeated measures
is assumed to arise from some individual-specific random effect added to each
measurement on any given individual. Time is treated as a factor variable.
Here is how we do it in R
> summary(aov(Lead ~ TreatGroup * factor(Timeweek) + Error(factor(ID)),
+ data = lead.long2))
Error: factor(ID)
Df Sum Sq Mean Sq F value Pr(>F)
TreatGroup 1 3111 3111 25.4 2.1e-06 ***
Residuals 98 11987 122
---
Signif. codes: 0 ´***ˇ 0.001 ´**ˇ 0.01 ´*ˇ 0.05 ´.ˇ 0.1 ´ ˇ 1
S S S S S S S S S S
Error: Within
9
Df Sum Sq Mean Sq F value Pr(>F)
factor(Timeweek) 3 3273 1091 61.4 <2e-16 ***
TreatGroup:factor(Timeweek) 3 2030 677 38.1 <2e-16 ***
Residuals 294 5222 18
---
Signif. codes: 0 ´***ˇ 0.001 ´**ˇ 0.01 ´*ˇ 0.05 ´.ˇ 0.1 ´ ˇ 1
S S S S S S S S S S
Here is what we observe from the above results:
The between groups test indicates that the variable TreatGroup is signifi-
cant, consequently in the graph we see that the lines for the two treatments
are rather far apart.
The within subject test indicate that there is a significant time effect, in
other words, the groups change in blood lead level over time. In the graph
we see that the groups have lines that are not flat, i.e. the slopes of the
lines are non-zero.
The lines are non-parallel, implies the interaction between time and Treat-
meant group is significant
The test for main effects are meaningless if the interaction effects are significant.
ANOVA assumes a compound covariance structure, which is not always the true.
There are other ways of modeling the covariance as well.
10
Related docs
Get documents about "