Subgroup Analysis An irresistible temptation Deciding on analysis after looking at the data is “dangerous useful and often done ” IJ Good 1983 Most trials report subgroup anal
Document Sample


Subgroup Analysis
An irresistible temptation!
Deciding on analysis after looking
at the data is “dangerous, useful,
and often done.” (IJ Good, 1983)
Most trials report subgroup analyses
(median=4 subgroups)
Assmann SF, Lancet 2000; 355:1064-1069
Aims of Subgroup Analysis
• To show consistency of trial findings for major
endpoints for important patient subsets
• To assess whether there are large differences
in the treatment effect among different types of
patients and, if so, identify hypotheses for
future research. (Assess the possibility of
treatment X subgroup or covariate
interactions)
Subgroup Analysis by Astrological Birth
Sign
ISIS-2: Streptokinase and Aspirin for Acute MI
Percentage
Reduction in 5 Week
Vascular Mortality
Gemini or Libra 9% (NS)
Other signs -28% (p < 0.00001)
Overall -23% (p < 0.00001)
“Lack of evidence of benefit just in one particular
subgroup is not good evidence of lack of benefit.”
Subgrouping Considerations
• Most trials are not designed to look at subgroups (power
is lower for subgroups than overall comparison).
• For subgroup analysis, it is often not clear how to control
for type 1 error (the more subgroups examined, the
greater the risk of a type 1 error).
• Not all subgroups of interest can be pre-specified (we are
not that smart).
• The subgroup may not be what it appears to be (it may be
a marker or label for some other characteristic).
Subgroup Definitions
• Proper subgroup – grouping of patients
according to baseline characteristics
• Improper subgroup – grouping of patients
according to characteristics following
randomization (i.e., factors potentially affected
by treatment)
• Interaction – evidence that treatment effects
differ by subgroup (quantitative versus
qualitative)
Yusuf S, et al., JAMA, 266:93-98, 1991.
A Priori and A Posteriori Subgroups
• A priori: written in the protocol in advance of the
study
• A posteriori (post hoc):
- specified … later
- before unblinding
- after unblinding
Pre- and Post-Stratification
and Subgroup Analysis
• Pre-stratification variables are often, but not
always, subgroups of interest.
• Aim of post-stratified analysis is to obtain a
“better” estimate of overall treatment effect.
• Aim of subgroup analysis is to determine
whether treatment differences are consistent.
• Like post-stratification, plans for subgroup
analysis should be pre-specified –– sometimes
there are surprises.
Subgrouping vs. Stratification
Grouping Purpose
Pre-stratification “insurance” for balance
in randomization
Post-stratification increase the accuracy
of estimates of
treatment effect
Subgroups check the consistency
of the treatment effect
Stratified Design for Comparing Treatments
Treatment
Stratum A B
1 m1A m1B m1
2 m2A m2B m2
3 m3A m3B m3
4 m4A m4B m4
na nb
• Typical situation: m1 ≠ m2 ≠ m3 ≠ m4
• Study is designed/powered based on na and nb
• Goal: miA = miB for all i.
Subgrouping Factors Determined
Experimentally
2 x 2 Factorial
A No A
B
Determined
by Randomization { No B
versus
A No A
B
Baseline
Characteristic { No B
NIH Policy on Subgroups
“When an NIH-defined Phase III clinical trial is
proposed, evidence must be reviewed to show
whether or not clinically important sex/gender
and race/ethnicity differences in the intervention
effect are to be expected.”
“Inclusion of the results of sex/gender,
race/ethnicity and relevant subpopulations
analyses is strongly encouraged in all
publication submissions.”
http://grants.nih.gov/grants/funding/women_min/guidelines_amended_10_2001,htm
ICH Guidelines on Subgroups
• If the size of the study permits, important
demographic or baseline value-defined
subgroups should be examined.
• These analyses are not intended to “salvage”
an otherwise unsupportive study.
• Subgroup analyses may suggest hypotheses
to be examined in other studies
• If there is a prior hypothesis about a subgroup,
this should be part of the statistical analysis
plan.
Controversial Issues
• Appropriate significance level? Bonferroni method
may be too conservative – loss of power in a
situation where power is already low.
• Should subgroup analysis be performed if the overall
result is negative? Much harder sell.
• Should only a priori subgroups be described? Not
always that smart.
• How should subgroup analyses be presented?
Interaction tests important.
• Should analyses be based on post-randomization
measures? No
THEME: Restrain wishful thinking.
A Consumer’s (and Producer’s?)
Guide to Subgroup Analysis
• Document heterogeneity between subgroups
• Argue consistency with biologic phenomena
• Argue consistency with other data from the
trial
• Argue consistency with other studies
THEME: You’d better have a story.
Data from Neonatal Hypocalcemia Trial:
All Calcium Levels in mmol/l
Breast-fed Bottle-fed
Supplement Placebo Supplement Placebo
Treatment mean 2.445 2.408 2.300 2.195
No. babies 64 102 169 285
SE 0.0365 0.0311 0.0211 0.0189
Treatment effect 0.037 0.105
SE 0.0480 0.0283
P-value 0.44 0.0002
Reference: Cockburn et al, BMJ, 281:11-14; 1980.
See also Pocock. Clinical Trials a Practical Approach..
Data from Neonatal Hypocalcemia Trial
(cont.)
0.037 0.105 0.068
Z 1.22
(0.0365 0.0311 0.0211 0.0189 )
2 2 2 2 12
0.0557
P-value = 0.22
HDFP Study
Deaths Percent Difference
Race, Sex, Age SC RC in Mortality
Black men 112 140 -18.5
Black women 70 98 -27.8
White men 109 126 -14.7
White women 58 55 +2.1
30-49 81 82 -5.7
50-59 115 159 -25.3
60-69 153 178 -16.4
Overall 349 419 -16.9
HDFP Subgroups
Black Men (1) Black Women (2)
Dead Alive Dead Alive
SC 112 952 SC 70 1274
RC 140 944 RC 98 1256
^ ^
O 1 = 0.79 O 2 = 0.70
W 1 = 55.0 W 2 = 38.3
White Men (3) White Women (4)
Dead Alive Dead Alive
SC 109 1783 SC 58 1026
RC 126 1735 RC 55 1101
^ ^
O 3 = 0.84 O 4 = 1.13
W 3 = 54.8 W 4 = 26.8
4
w i 174.9
c 1
ˆ
logO p (55.0)log(0.79) (38.3)log(0.70)
(54.8)log(0.84) (26.8)log(1.13)
/174.9
ˆ
logO p 0.188
ˆ
Op 0.83
4
ˆ ˆ
X (3) w i (logOi logO p )2
2
c 1
0.134 1.111 0.008 2.551
3.804; p 0.28
Subgroup Analyses According to
Follow-up Time
• Heart and estrogen/progestin
Replacement Study (HERS)
• Adenomatous Polyp Prevention on
Vioxx (APPROVe) Trial
HERS
Estrogen-
Progestin Placebo Hazard Ratio
(n=1380) (n=1383) (95% CI)
Primary CHD 0.99
172 176
events (0.80 – 1.22)
Year 1 57 38 1.52
Year 2 47 48 1.00
Year 3 35 41 0.87
Year 4 33 49 0.67
P=.009 for interaction
APPROVE
Rofecoxib Placebo Hazard ratio
(n=1287) (n=1299) (95% CI)
Confirmed 1.92
thrombotic 46 26
events (1.19 – 3.11)
Months 0-18 22 20 1.18
Months 19-36 24 6 4.45
P=.01 for failed test of proportional hazards (interaction)
Barrett-Connor on HERS*
A Fable: Looking for the Pony
A man has 2 sons, one a hopeless pessimist and
the other an unrealistic optimist. Determined to
change their thinking to a less extreme position,
the man buys a room full of toys for the
pessimist and a room full of horse manure for
the optimist.
When he returns, the pessimist is crying because
he has broken all of his toys. In contrast, the
optimist is shoveling through his gift and
proclaim: “with all that manure there must be a
pony in there somewhere.”
Circulation 2002;105:902-903.
“New Study Reassures Most Users of Hormones.
For Newly Menopausal, There’s No Heart Risk; A
Reversal of Findings.”
“At Issue is something called the P value…”
Wall Street Journal
April 4, 2007
Cardiovascular and Global Index Events by Years
Since Menopause at Baseline (WHI Study)
Years Since Menopause
<10 10-19 ≥20
No. of Cases No. of Cases No. of Cases
P
Hormone Hormone Hormone value
Therapy Placebo HR Therapy Placebo HR Therapy Placebo HR for
(n=3608)(n=3529)(95%CI) (n=4483) (n=3529) (95%CI) (n=3608) (n=3529) (95%CI) Trend†
CHD‡ 39 51 0.76 113 103 1.10 194 158 1.28 .02
(0.50-1.16) (0.84-1.45) (1.03-1.58)
Stroke 41 23 1.77 100 79 1.23 142 113 1.26 .36
(1.05-2.98) (0.92-1.66) (0.98-1.62)
Total Mortality 53 67 0.76 142 149 0.98 267 240 1.14 .51
(0.53-1.09) (0.78-1.24) (0.96-1.36)
Global Index§ 222 203 1.05 482 440 1.12 675 632 1.09 .62
(0.86-1.27) (0.98-1.27) (0.98-1.22)
† Test for trend (interaction) using years since menopause as continuous (linear) form of categorical
coded values. Cox regression models stratified according to active vs. placebo and trial, including
terms for years since menopause and the interaction between trials and years since menopause
JAMA 2007;297:1465-1477
CHD Events by Years Since Menopause at Baseline
Years Since Menopause
P-
<10 10-19 ≥20 value
HR HR HR for
(95%CI) (95%CI) (95%CI) Trend†
CHD‡ 0.76 1.10 1.28 .02
(0.50-1.16) (0.84-1.45) (1.03-1.58)
“These analyses, although not definitive, suggest that the
health consequences of hormone therapy may vary
by distance from menopause…”
AIDS Vaccine Trial
(Science 28 February 2003)
Not
Infected Infected
Vaccine 191 3,139 3,330
98 1,581
Placebo 1,679
289 4,720 5,009
5.7% vs. 5.8%
ˆ
OR 0.98
95% CI (0.78 to 1.24)
AIDS Vaccine Trial
Subgroup Analysis
White and Hispanic Black, Asian, Other
Not Not
Infected Infected Infected Infected
Vaccine 179 2,824 Vaccine 12 315
81 1,427 17 154
Placebo Placebo
6.0 vs. 5.4% 3.7 vs. 9.9%
ˆ
OR 1.12 (95% CI : 0.85 to 1.46) ˆ
OR 0.35 (95% CI : 0.16 to 0.74)
1 2
ˆ
O p 1.02 ; 1 8.6 for homogeneity o f odds ratio ; p = 0.003
2
Example: ACTG 155
Randomization (allocation ratio)
Arms: AZT 2
ddC 2
AZT + ddC 3
Primary outcome: disease progression (AIDS/death)
Secondary outcome: CD4+ cell count change, toxicities
Sample Size: 991
Number
Subgrouping: CD4<50 269
50≤CD4<150 336
CD4≥150 386
“We found no overall benefits of
zalcitabine used alone or with zidovudine.
However, a trend analysis suggested a
better outcome for combination therapy
compared with zidovudine as the
pretreatment CD4 cell count increased”.
“Our study suggests that combination
therapy may be beneficial in patients with
higher CD4 cell counts”.
Pooled Analysis of AZT + ddX vs. AZT
Treatment Naïve Patients
Baseline No. AIDS/Death
CD4+ Events Hazard Ratio*
< 100 382 0.66 (0.53 - 0.82)
100 - 199 319 0.63 (0.50 - 0.81)
200 - 299 186 0.62 (0.45 - 0.84)
300 - 499 90 0.63 (0.40 - 0.98)
*AZT + ddx vs. AZT
Some Lessons From ACTG 155 Presentation
1. What does “a priori” mean?
If it is important, amend the protocol.
2. Confusion about stratification and
subgrouping.
Lessons Continued
3. It is easy to develop explanations for possible
subgroup effects.
4. By chance some subgroups will be more
extreme than others.
Lessons Continued
5. For an ordered/continuous variable, test for
trend is important.
CD4+
> 50
50 - 149
150+
4 df test for interaction (3 treatment groups and 3 CD4
categories) or
2 df test (3 treatment groups and continuous CD4)
6. “Subgroup label” may be a marker for
something else.
Sometimes, but rarely, a subgroup finding
leads to a new study where the result is
confirmed.
V-HeFT I: Survival in All Patients
100
Placebo
Prazosin
90
I/H 36%
Percent survival
80
70
60
50 Placebo vs I/H: P=0.06
40
0 6 12 18 24 30 36 42
Time (months)
Placebo: N (cumulative death) 273 201 (53) 132 (94) 82 (128)
I/H: N (cumulative death) 186 147 (23) 108 (48) 70 (67)
Cohn JN, et al. N Engl J Med. 1986;314:1547-1552. 40
V-HeFT I: All-cause Mortality in
Black and White Patients
Black patients White patients
100 100
95 95
90 90
HR=0.53 HR=0.88
85 85
80
P=0.04 80
P=0.47
Percent survival
Percent survival
75 75
70 70
65 65
60 60
55 55
50 50
Treatment Group Treatment Group
45 45
H-I (H) H-I (H)
40 40
Placebo (P) Placebo (P)
35 35
30 30
0 365 730 1095 1460 1825 0 365 730 1095 1460 1825
P (N=79) P (N=61) P (N=44) P (N=29) P (N=14) P (N=14) P (N=192) P (N=140) P (N=91) P (N=55) P (N=27) P (N=8)
H (N=49) H (N=43) H (N=36) H (N=28) H (N=16) H (N=16) H (N=132) H (N=102) H (N=71) H (N=42) H (N=22) H (N=9)
Days since randomization date Days since randomization date
Carson P, et al. J Cardiac Fail. 1999;5:178-187.
41
V-HeFT II: All-cause Mortality in
All Patients
100 Survival times for all patients
95
90
85 Hazard ratio Log-rank
80 (95% CI) P-value
Percent survival
75 1.23 (0.97, 1.55) 0.083
70
65
60
55
50 Treatment Group
45 H-I (H)
40 Enalapril (E)
35
30
0 365 730 1095 1460 1825
E (N=403) E (N=346) E (N=265) E (N=169) E (N=89) E (N=1)
H (N=401) H (N=332) H (N=242) H (N=157) H (N=86) H (N=3)
Days since randomization date
Carson P, et al. J Cardiac Fail. 1999;5:178-187. 42
V-HeFT II: All-cause Mortality in
Black and White Patients
Black patients White patients
100 100
95 95
90 90
HR=1.01 HR=1.32
85 85
80
P=NS 80
P=0.02
Percent survival
Percent survival
75 75
70 70
65 65
60 60
55 55
50 50
Treatment Group Treatment Group
45 45
H-I (H) H-I (H)
40 40
Enalapril (E) Enalapril (E)
35 35
30 30
0 365 730 1095 1460 1825 0 365 730 1095 1460 1825
P (N=106) P (N=93) P (N=69) P (N=47) P (N=24) P (N=1) P (N=292) P (N=251) P (N=194) P (N=123) P (N=66) P (N=1)
H (N=109) H (N=92) H (N=67) H (N=49) H (N=29) H (N=2) H (N=282) H (N=231) H (N=171) H (N=105) H (N=55) H (N=1)
Days since randomization date Days since randomization date
Carson P, et al. J Cardiac Fail. 1999;5:178-187.
43
A-HeFT: 43% Decrease in
Mortality
100
Isosorbide dinitrate/hydralazine
95
Survival (%)
90
Placebo
HR=0.57
P=0.01
85
0 100 200 300 400 500 600
Days since baseline visit
ISDN/HYD 518 463 407 359 313 251 13
Placebo 532 466 401 340 285 232 24
Taylor AL, et al. N Engl J Med. 2004;351:2049-2057. 44
A-HeFT: Components of
Composite Score
Death First heart failure Change in QOL
hospitalization
30 -3.1 -7.1
15 0
24.4%
10.2% -2
10
20 -4
6.2%
16.4%
5 n=54
n=138 -6
n=32
n=85
0 10 -8
P=0.02 P<0.001 P=0.01
Placebo ISDN/HYD
Taylor AL, et al. N Engl J Med. 2004;351:2049-2057.
45
Guidelines to Follow for Interpreting
Subgroup Analysis
• Assess magnitude of interaction before focusing on
separate subgroups and their tests of significance
• Assess consistency with biologic phenomenon
realizing that “human imagination is capable of
developing a rationale for most findings” (Ware,
NEJM, 2003).
• Assess consistency with other data from trial
• Assess consistency with other studies
Guidelines For Reporting Subgroup
Analyses (NEJM 2007;2189-2194)
• Abstract: Only if based on primary outcome and pre-
specified
• Methods: Number pre-specified; any of special
interest; endpoint; methods used to assess
heterogeneity; number preformed; potential effect on
type 1 error
• Results: present tests of heterogeneity; forest plot
• Discussion: Cautious in interpretation; state
limitations; cite supporting or contradictory data
Summary
• P-values for individual subgroups are misleading –
report CIs.
• Calculate subgroup by treatment interactions, but be
cognizant of low power
• Keep in mind most trials are designed assuming no
interaction.
• Define key subgroups to be investigated in the
protocol.
• Report subgroup findings very cautiously – ultimately
want validation in another study or meta-analysis.
“Only one thing is worse than doing subgroup
analyses --- believing the results.” Richard Peto
Get documents about "