# Equivalence testing

Document Sample

Equivalence Testing

Dig it!
Tests of Equivalence
   As has been mentioned, the typical method of
NHST applied to looking for differences between
groups does not technically allow us to conclude
equivalence just because we do not reject null
   p is a measure of evidence against the null, not for it
   Having a small sample would allow us to the retain the null
   Often this conclusion is reached anyway
   Stated differently, absence of evidence does not
imply evidence of absence
   Altman & Bland,1995
   Examples of usage:
   generic drug vs established drug
   efficacy of counselling therapies
Tests of Equivalence

   To conclude there is a substantial
difference you must observe a
difference large enough to conclude it
is not due to sampling error
   To conclude there is not a substantial
difference you must observe a
difference small enough to reject that
closeness is not due to sampling error
Tests of Equivalence
Two one-sided tests (TOST)
   One method is to test the joint null
hypothesis that our mean difference score is
not as large as the upper value of the
specified range and not below the lower
bound of the specified range of equivalence
   H01: μ1 - μ2 > δ OR
   H02: μ1 - μ2 < -δ*
   By rejecting both of these hypotheses, we
can conclude that | μ1 - μ2| < δ, or that our
difference falls within the range specified
TOST
Tests of Equivalence

   Specify a range? Isn’t that subjective?

   Base it on:
   Previous research
   Practical considerations
   Your knowledge of the scale of
measurement
TOST

   See if the difference between means is
significantly different from the specified
allowable difference
   Must reject two null hypotheses
   H01: 1  2  
   H02: 1  2  
Example
   Scores from the midterms of two sections of
a stat class
   First specify range of equivalence 
   Say, any score within 3 points of another

   Section 1: M = 75, s = 3.2, N = 20
   Section 2: M = 76, s = 2.4, N = 20
Example

   H01: 1  2  3
   H02: 1  2  3

   By rejecting H01 we conclude the
difference is less than 3
   By rejecting H02 we conclude the
difference is greater than -3
Fuzzy yet?
   Recall that the size
difference we are
looking for is one that
is 3 units.
   This would hold
whether the first mean
was 3 above the
second mean or vice
versa
   Hence we are looking
for a difference that
lies in the μ1 – μ2       Top is traditional null search for sig diff
interval (-3,3)           Bottom the two null approach for equiv
Worked out
(76  75)  3      2
t                         2.25
2
3.2 2.4    2     .89

20      20
(76  75)  (3)       4
t                           4.47
2
3.2 2.4     2     .89

20      20

   H01 is rejected if -t ≤ -tcv, and H02 is rejected if t ≥ tcv
   Df = 20+20-2 = 38
   Here we reject in both cases (.05 level)* and
conclude statistical equivalence
Another way to look at it
   H0: -3 ≤ μ1-μ2 ≤ 3

   In this formulation we reject if either the
lower bound of a CI on the mean difference
exceeds the upper value in the null
hypothesis, or our upper bound of the CI for
the mean difference is lower than the lower
value of the null hypothesis
   In other words, we reject the notion of
equivalence if our CI for the difference
between means falls outside the H0 range.
The CI Approach
   So another (and perhaps easier) method is to
specify a range of values that would constitute
equivalency among groups
   -δ to δ
   Determine the appropriate confidence interval
for the mean difference between the groups
   See if the CI for the difference score falls
entirely within the range of equivalency
   If either lower or upper end falls beyond do not
claim equivalent
   This is equivalent to the TOST outcome
Using Inferential Confidence
Intervals
   Decide on a ranged estimate that reflects your
estimation of equivalence ()
   In other words, if my ranged estimate is smaller than this, I
will conclude equivalence
   Establish inferential CIs for each variable’s mean
   Create a new range that includes the lower bound
from the smaller mean, and the upper bound from
the larger mean
   Represents the maximum probable difference
   See if this CI range (Rg) is smaller than the
specified maximum amount of difference allowed to
still claim equivalence ()
Equivalence Testing
Previous example
   Scores from the midterms of two sections of a stat
class
   First specify range of equivalence 
   Say, any score within 3 points of another

   Section 1: M = 75, s = 3.2, N = 20
   Section 2: M = 76, s = 2.4, N = 20

   ICI95 Section 1 = 73.95 to 76.06
   ICI95 Section 2 = 75.21 to 76.79
   Rg = 76.79 - 73.95 = 2.84
Example

   The range observed by our ICIs is not
larger than the equivalence range ()
   Conclude the two classes scored
similarly.
Another Example
   Anxiety measures are taken from two groups of
clients who’d been exposed to different types of
therapies (A & B)
   We’ll say the scale goes from 0 to 100
   First establish your range of equivalence
sY1  sY2
2     2

X A  40 s  9.29 n  12                         E
sY1  sY2
X B  47 s  11.03 n  12
t x  t95 E

sY1  sY2
2     2

t x  t95
sY1  sY2
Results
s A  2.68
sB  3.18
2.682  3.182   2.682  3.182 4.16
E                                         .710
2.68  3.18     2.68  3.18   5.86
t95 (11)  2.20 for both groups

ICI A  40  2.20(.71)(2.68)  40  4.19  35.81to 44.119
ICI B  47  2.20(.71)(3.18)  47  4.97  42.03 to 51.97

Range  35.81 to 51.97  16.16

   Equivalent?
Which method?
   Tryon’s proposal using ICIs is perhaps
preferable in that:
   NHST is implicit rather than explicit
   Retains respective group information
   Covers both tests of difference and
equivalence
   Provides for a third outcome
   Statistical indeterminancy
   Say what??
Indeterminancy

   Neither statistically different or equivalent
   Or perhaps both
   Judgment must be suspended as there is
no evidence for or against any hypothesis
   May help in warding off interpretation of
‘marginally significant’ findings as trends
Figure from Jones et al (BMJ 1996) showing relationship
between equivalence and confidence intervals
Note on sample size
   It was mentioned how we couldn’t conclude
equivalence from a difference test because
small samples could easily be used to show
nonsignificance
   Power is not necessarily the same for tests
of equivalence and difference
   However the idea is the same, in that with
larger samples we will be more likely to
conclude equivalence
Summary

   Confidence intervals are an important
component statistical analysis and should
always be reported
   Non-significance on a test of difference does
not allow us to assume equivalence
   Methods exist to test the group equivalency,
and should be implemented whenever that
is the true goal of the research question

DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 5 posted: 9/14/2012 language: English pages: 25