Trial Objectives
Superiority, Non-inferiority,
and Equivalence
Questions of Interest
• Is the new treatment better than the control
treatment that I am using now? (superiority
trial)
• If it is not better, is the new treatment as good
(not unacceptably non-inferior) as the control
treatment that I am using now? (non-inferiority
trial)
• Can I use the new treatment and the control
treatment interchangeably? (equivalence trial)
Non-inferiority and equivalence trials are usually
considered when there is an active control.
Definitions (ICH Guidelines – E9)
• Superiority trial – a trial with the primary objective of
showing that the response to the investigational
product is superior to a comparative agent (active or
placebo control).
• Equivalence trial – a trial with primary objective of
showing that the response to two or more treatments
differs by an amount which is clinically unimportant
(active control).
• Non-inferiority trial – a trial with the primary objective
of showing that the response to the investigational
product is not clinically inferior to a comparative agent
(active or placebo control but usually active) – very
common in the regulatory setting.
Examples – Non-Inferiority - 1
• Is a new left ventricular assist device that provides a
“bridge” to heart transplant as effective in keeping
patients alive until a heart becomes available as one of
the FDA-approved devices?
• Is a new vaccine for pertussis (whooping cough) that
has an improved safety profile as effective in
preventing whooping cough as the currently licensed
vaccine?
• Is a single dose of a drug (low dose) equivalent to a
twice a day dose (high dose)?
Examples – Non-Inferiority - 2
• Is a short course of treatment for
latent TB infection (3 months of INH
plus rifapentine) as effective as 9
months of INH in preventing active
TB?
Example - HIV Trial:
Abacavir-Lamivudine-Zidovdine vs
Indinavir-Lamivudine-Zidovudine
JAMA 2001;285:1155-1163.
“The study was powered to assess
treatment equivalence for the primary
endpoint (i.e., a plasma HIV RNA level
0 is not correct because a small, underpowered
study could incorrectly lead to a claim of equivalence –
absence of evidence is not evidence of absence, and if
power is too high, Ho may be rejected when the difference is
not important.
• Since Ho cannot be accepted, either reverse the roles of
type 1 and 2 errors (i.e., rejection of Ho implies equivalence)
or focus on confidence intervals
• Treatment difference must be chosen not only to rule out
smallest clinically meaningful difference, but also to be sure
new treatment is better than no treatment
• Consensus on what equivalence means, especially in a
broad sense, is hard to achieve
1-Sided Hypothesis Testing (Non-inferiority)
A = new treatment; B = standard;
PA and PB = event rates (failure rate)
PA PB ; 0 Implies standard is better
H o : o (B better by at least o )
H A : o (A not worseby as much as o ;
A is close to B)
If Ho is rejected, treatments are “equivalent”
Roles of null and alternative hypotheses are reversed. In
practice, this is confusing to people.
Parallel Group Studies
with Continuous Outcomes: Sample Size
Formula is the Same Except for δ0
A B
2
2 z1 z1
2
n n A nB
O 2
0.025; z1 1.96
1 .90; z1 1.28
Note: If Δ=0, then this
2 2 10.5 is equivalent to
n A nB
O 2 superiority trial to detect
δo with 90% power.
Example
Non-Inferiority Trial for New BP Lowering Drug
δO = 4 mmHg
Δ = 0, -2 (A better) and +2 (B better)
σ2 = 100; α = 0.025 (1-sided); 1-β = 0.90
1:1 allocation
No. per
δO Δ group
4 0 132
4 +2 525
4 -2 58
Confidence Interval Approach
Example of Type I Error
A (new B (standard
treatment treatment
better) ˆ
0 better)
(1 2 ) CI
ly
Type I error = Prob (incorrect rejecting null
hypothesis)
y
In this case - incorrectl claiming " equivalence"
when the treatments are not (reverseof usual situation)
Upper limit of (1- 2 ) CI o , but o
We want toreject H o when o , not acceptit.
Sample Size for Equivalence
Design Based on CI Limits
A = New Treatment; B = Standard
Prob (upper limit of CI exceeds 0 when -δ, i.e. non-
inferiority demonstrated.
In this case both non-
inferiority and
superiority have been
demonstrated
-δ 0 No difference
Non-inferiority and Inferiority
The 95% CI for the difference
between the control and the
intervention are all >-δ, i.e. non-
inferiority demonstrated.
In this case both non- In this case both non-
inferiority and inferiority inferiority and
have been demonstrated superiority have been
demonstrated
-δ 0 No difference
CONVINCE Design
• Based on the findings from 17 trials with over
50,000 participants, the CVD risk reduction
associated with BP lowering by diuretics and
beta-blockers was estimated as 24%.
• Equivalence margin was set to ensure that
there would be no more than a 50% loss of
efficacy based on this point estimate.
• Upper bound = 1.16 = 0.88 (12% reduction)/
0.76 (24% reduction).
• Lower bound = 1/1.16 = 0.86.
Another Example
Treatment of Acute MI
See Editorial NEJM 337:Oct. 16, 1997
Background
Gusto I Lower 95% CI limit for 30 day
Study mortality difference for accelerated
(N = 41,021)
infusion of alteplase vs.
streptokinase
= 0.4% (30 day mortality: 6.3 vs 7.3%)
(need N = 50,000 to rule out
difference this big)
Two New Studies
Cobalt Study Double bolus alteplase vs. accelerated
infusion of alteplase (N = 7,169)
30 day 7.98% vs. 7.53%
mortality 1-sided 95% CI (-∞ to 1.49)
Conclusion Not equivalent
Gusto III Trial Double bolus reteplase vs. accelerated
infusion of alteplase (N = 15,059)
30 day 7.47% vs. 7.24%
mortality 95% CI (-0.66 to 1.10)
Conclusion Similar efficacy
Summary - Determining Equivalence
• First step in establishing equivalence -
define ‘limits of equivalence’ (± δ)
• Having conducted the trial, calculate the
95% confidence intervals for the
difference between the control and the
new treatment
• If the confidence interval is entirely
within ± δ then equivalence is
established
Summary - Determining Non-inferiority
• Equivalence requires that the difference
control - new intervention is both > -δ and < δ,
the new treatment must be neither worse
nor better than the control by a fixed
amount.
• In contrast to equivalence with non-inferiority
we are only interested in determining
whether new treatment is no worse by an
amount δ.
Analysis of Non-inferiority/Equivalence
Trials
• Superiority trials are analysed by intention-to-
treat (ITT) because it is the most conservative
and least likely to be biased.
• ITT analysis of non-inferiority trials is not
conservative - there is a bias towards no
difference.
• Per Protocol analysis is biased since not all
randomised patients included.
• Recommendation: Analyze by both ITT and
per protocol (need to ensure power for both).
Equivalence/Non-Inferiority Trials
Summary
• Equivalence is “in the eyes of the beholder”
• The absence of a significant difference in a superiority
trial does not imply equivalence
• Need to be sure about the efficacy of the control
treatment based on earlier trials.
• Sloppy trials yield “equivalent” results
• Because of difficulty of interpretation, equivalence and
non-inferiority trials should be used cautiously for
licensure.
• More head to head comparisons of approved treatments
are needed.
Quality of Reporting of Non-inferiority and
Equivalence Trials
(JAMA 2006;295:1147-1151)
• Margin defined in most trials, but rationale for
margin missing in majority of studies
• About 25% of reports did not give sample size
justification in sufficient detail to reproduce
• Less than 50% described both intention to
treat and per protocol analysis
• About 15% of reports did not state confidence
intervals.
Guidelines for Reporting Non-inferiority and
Equivalence Trials+
(JAMA 2006;295:1152-1160)
• Specification of whether the trial is a non-
inferiority study
• Sample size details (specification and rationale
for non-inferiority margin)
• Use of 1- or 2-sided confidence interval
• Nature of analysis: intention to treat, per
protocol or both
• Presentation of results: confidence intervals
+ Builds on CONSORT guidelines for superiority trials.
Checklist for Information
Concerning Sample Size in 81 Trials
Percent
Statement on: Mentioned
Planned sample size 30
Type I error rate 21
Power or Type II error rate 26
1-sided or 2-sided test 7
Hypothesized treatment difference 26
Planned duration of follow-up 75
Sample Size Recommendations
1. Specify in advance in protocol
2. Inflate sample size to account for dropouts and dropins
because analysis is “intent-to-treat”
3. Sample size may also have to be inflated to account for:
– Lag
– Competing events
– Pattern of events in control group
– Medical exclusions, “healthy worker” effect
4. Plot power curve (power vs. ∆) for fixed N to assess
impact of mis-specification of k (Pe)
5. Monitor parameters which influence sample size during
study; modify sample size if necessary
6. Report parameters used for sample size in trial publication