# Stanford by mikeholy

VIEWS: 15 PAGES: 88

• pg 1
```									Introduction to choosing the
correct statistical test

+
Tests for Continuous Outcomes I
1.   What is the outcome (dependent) variable?
2.   Is the outcome variable continuous, binary/categorical, or time-
to-event?
3.   What is the unit of observation?
   person* (most common)
   lesion
   half a face
   physician
   clinical center
4.   Are the observations independent or correlated?
   Independent: observations are unrelated (usually different, unrelated
people)
   Correlated: some observations are related to one another, for example: the
same person over time (repeated measures), lesions within a person, half a
face, hands within a person, controls who have each been selected to a
particular case, sibling pairs, husband-wife pairs, mother-infant pairs
Correlated data example
   Split-face trial:
   Researchers assigned 56 subjects to apply
SPF 85 sunscreen to one side of their faces
and SPF 50 to the other prior to engaging
in 5 hours of outdoor sports during mid-
day.
   Sides of the face were randomly assigned;
subjects were blinded to SPF strength.
   Outcome: sunburn

Russak JE et al. JAAD 2010; 62: 348-349.
Results:
Table I -- Dermatologist grading of sunburn after an average of 5 hours of
skiing/snowboarding (P = .03; Fisher’s exact test)

Sun protection factor         Sunburned           Not sunburned
85                                            1                     55
50                                            8                     48

Fisher’s exact test compares the following proportions: 1/56 versus
8/56. Note that individuals are being counted twice!
Correct analysis of data…
Table 1. Correct presentation of the data from: Russak JE et
al. JAAD 2010; 62: 348-349. (P = .016; McNemar’s test).

SPF-50 side

SPF-85 side            Sunburned           Not sunburned
Sunburned                  1                     0

Not sunburned                7                     48

McNemar’s test evaluates the probability of the following: In all 7 out of
7 cases where the sides of the face were discordant (i.e., one side burnt
and the other side did not), the SPF 50 side sustained the burn.
Overview of common
statistical tests
Are the observations correlated?

independent               correlated
Outcome Variable                                                                    Assumptions
Continuous              Ttest                     Paired ttest                      Outcome is normally
distributed (important
ANOVA                     Repeated-measures ANOVA
(e.g. blood pressure,                                                               for small samples).
Linear correlation        Mixed models/GEE modeling         Outcome and predictor
age, pain score)
Linear regression                                           have a linear
relationship.

Binary or               Chi-square test           McNemar’s test                    Chi-square test
assumes sufficient
Relative risks            Conditional logistic regression
categorical                                                                         numbers in each cell
Logistic regression       GEE modeling                      (>=5)
(e.g. breast cancer
yes/no)
Time-to-event           Kaplan-Meier statistics   n/a                               Cox regression
assumes proportional
Cox regression
(e.g. time-to-death,                                                                hazards between
groups
time-to-fracture)
Overview of common
statistical tests
Are the observations correlated?

independent               correlated
Outcome Variable                                                                    Assumptions
Continuous              Ttest                     Paired ttest                      Outcome is normally
distributed (important
ANOVA                     Repeated-measures ANOVA
(e.g. blood pressure,                                                               for small samples).
Linear correlation        Mixed models/GEE modeling         Outcome and predictor
age, pain score)
Linear regression                                           have a linear
relationship.

Binary or               Chi-square test           McNemar’s test                    Sufficient numbers in
each cell (>=5)
Relative risks            Conditional logistic regression
categorical
Logistic regression       GEE modeling
(e.g. breast cancer
yes/no)
Time-to-event           Kaplan-Meier statistics   n/a                               Cox regression
assumes proportional
Cox regression
(e.g. time-to-death,                                                                hazards between
groups
time-to-fracture)
Continuous outcome (means)
Are the observations correlated?                                        Alternatives if the
Outcome                                                                               normality assumption is
Variable      independent                         correlated                          violated (and small n):
Continuous    Ttest: compares means               Paired ttest: compares means        Non-parametric statistics
(e.g. blood   between two independent             between two related groups (e.g.,   Wilcoxon sign-rank test:
groups                              the same subjects before and
pressure,                                         after)
non-parametric alternative to
age, pain                                                                             paired ttest
score)        ANOVA: compares means                                                   Wilcoxon sum-rank test
between more than two               Repeated-measures                   (=Mann-Whitney U test): non-
independent groups                  ANOVA: compares changes             parametric alternative to the ttest
over time in the means of two or    Kruskal-Wallis test: non-
Pearson’s correlation               more groups (repeated
parametric alternative to ANOVA
measurements)
coefficient (linear                                                     Spearman rank correlation
correlation): shows linear                                              coefficient: non-parametric
correlation between two             Mixed models/GEE                    alternative to Pearson’s correlation
continuous variables                modeling: multivariate              coefficient
regression techniques to compare
changes over time between two
Linear regression:                  or more groups
multivariate regression technique
when the outcome is continuous;
Continuous outcome (means)
Are the observations correlated?                                        Alternatives if the
Outcome                                                                               normality assumption is
Variable      independent                         correlated                          violated (and small n):
Continuous    Ttest: compares means               Paired ttest: compares means        Non-parametric statistics
(e.g. blood   between two independent             between two related groups (e.g.,   Wilcoxon sign-rank test:
groups                              the same subjects before and
pressure,                                         after)
non-parametric alternative to
age, pain                                                                             paired ttest
score)        ANOVA: compares means                                                   Wilcoxon sum-rank test
between more than two               Repeated-measures                   (=Mann-Whitney U test): non-
independent groups                  ANOVA: compares changes             parametric alternative to the ttest
over time in the means of two or    Kruskal-Wallis test: non-
Pearson’s correlation               more groups (repeated
parametric alternative to ANOVA
measurements)
coefficient (linear                                                     Spearman rank correlation
correlation): shows linear                                              coefficient: non-parametric
correlation between two             Mixed models/GEE                    alternative to Pearson’s correlation
continuous variables                modeling: multivariate              coefficient
regression techniques to compare
changes over time between two
Linear regression:                  or more groups
multivariate regression technique
when the outcome is continuous;
Example: two-sample t-test
   In 1980, some researchers reported that
―men have more mathematical ability than
women‖ as evidenced by the 1979 SAT’s,
where a sample of 30 random male
deviation of 436±77 and 30 random female
were similar in educational backgrounds,
socio-economic status, and age). Do you
agree with the authors’ conclusions?
Two sample ttest
Statistical question: Is there a difference in SAT
math scores between men and women?
 What is the outcome variable? Math SAT
scores
 What type of variable is it? Continuous

 Is it normally distributed? Yes

 Are the observations correlated? No
 Are groups being compared, and if so, how
many? Yes, two
 two-sample ttest
Two-sample ttest mechanics…
Data Summary
n    Sample    Sample
Mean    Standard
Deviation

Group 1:   30    416        81
women
Group 2:   30    436        77
men
Two-sample t-test
alternative)
H0: ♂-♀ math SAT = 0
Ha: ♂-♀ math SAT ≠ 0 [two-sided]
Two-sample t-test
F and M have approximately equal standard deviations/variances, so make a “pooled” estimate of
standard deviation/variance:

81  77
sp           79                  s 2  792
p
2
The standard error of a difference of two means is:
2        2
sp        792 792
sp
            20.4
n   m   30 30

Differences in means follow a T-distribution…
T distribution
   A t-distribution is like a Z distribution,
except has slightly fatter tails to reflect
the uncertainty added by estimating the
standard deviation.
   The bigger the sample size (i.e., the
bigger the sample size used to estimate
), then the closer t becomes to Z.
   If n>100, t approaches Z.
Student’s t Distribution
Note: t                     Z as n increases

Standard
Normal
(t with df = )

t (df = 13)
t-distributions are bell-
shaped and symmetric, but
have „fatter‟ tails than the                                                               t (df = 5)
normal

0                                          t
from “Statistics for Managers” Using Microsoft® Excel 4th Edition, Prentice-Hall 2004
Student’s t Table
Upper Tail Area
Let: n = 3
df         .25           .10           .05                                    df = n - 1 = 2
 = .10
1 1.000 3.078 6.314                                                             /2 =.05

2 0.817 1.886 2.920
3 0.765 1.638 2.353                                                                           /2 = .05

The body of the table
contains t values, not                                                 0   2.920 t
probabilities
from “Statistics for Managers” Using Microsoft® Excel 4th Edition, Prentice-Hall 2004
t distribution values
With comparison to the Z value
Confidence    t                                       t                    t           Z
Level     (10 d.f.)                              (20 d.f.)             (30 d.f.)     ____

.80                     1.372                   1.325                     1.310   1.28
.90                     1.812                   1.725                     1.697   1.64
.95                     2.228                   2.086                     2.042   1.96
.99                     3.169                   2.845                     2.750   2.58

Note: t                 Z as n increases

from “Statistics for Managers” Using Microsoft® Excel 4th Edition, Prentice-Hall 2004
Two-sample t-test
F and M have approximately equal standard deviations/variances, so make a “pooled” estimate of
standard deviation/variance:

81  77
sp           79                  s 2  792
p
2
The standard error of a difference of two means is:
2        2
sp        792 792
sp
            20.4
n   m   30 30

Differences in means follow a T-distribution; here we have a T-distribution with
58 degrees of freedom (60 observations – 2 means)…
Two-sample t-test
3. Observed difference in our experiment = 20
points
Two-sample t-test
4. Calculate the p-value of what you observed
Critical value for           20  0
two-tailed p-value     T58          .98
of .05 for T58=2.000          20.4
0.98<2.000, so
p>.05
p  .33

5. Do not reject null! No evidence that men are better
in math ;)
Corresponding confidence
interval…

20  2.00 * 20.4  20.8  60.8

Note that the 95% confidence
interval crosses 0 (the null value).
Review Question 1
A t-distribution:

a.   Is approximately a normal distribution if
n>100.
b.   Can be used interchangeably with a normal
distribution as long as the sample size is large
enough.
c.   Reflects the uncertainty introduced when using
the sample, rather than population, standard
deviation.
d.   All of the above.
Review Question 1
A t-distribution:

a.   Is approximately a normal distribution if
n>100.
b.   Can be used interchangeably with a normal
distribution as long as the sample size is large
enough.
c.   Reflects the uncertainty introduced when using
the sample, rather than population, standard
deviation.
d.   All of the above.
Review Question 2
In a medical student class, the 6 people born on odd days had heights
of 64.64 inches; the 10 people born on even days had heights of
71.15 inches. Height is roughly normally distributed. Which of the
following best represents the correct statistical test for these data?

a. Z  71.1  64.6  6.5  1.44; p  ns
4.5       4.5
71.1  64.6 6.5
Z                    4.6; p  .0001
b.            4.5      1.4
16

71.1  64.6        6.5
c. T14                             2.7; p  .05
4.7 2 4.7 2       2.4

10     6

d.           71.1  64.6 6.5
T14                     1.44; p  ns
4.5      4.5
Review Question 2
In a medical student class, the 6 people born on odd days had heights
of 64.64 inches; the 10 people born on even days had heights of
71.15 inches. Height is roughly normally distributed. Which of the
following best represents the correct statistical test for these data?

a. Z  71.1  64.6  6.5  1.44; p  ns
4.5       4.5
71.1  64.6 6.5
Z                    4.6; p  .0001
b.            4.5      1.4
16

71.1  64.6        6.5
c. T14                             2.7; p  .05
4.7 2 4.7 2       2.4

10     6

d.           71.1  64.6 6.5
T14                     1.44; p  ns
4.5      4.5
Continuous outcome (means)
Are the observations correlated?                                        Alternatives if the
Outcome                                                                               normality assumption is
Variable      independent                         correlated                          violated (and small n):
Continuous    Ttest: compares means               Paired ttest: compares means        Non-parametric statistics
(e.g. blood   between two independent             between two related groups (e.g.,   Wilcoxon sign-rank test:
groups                              the same subjects before and
pressure,                                         after)
non-parametric alternative to
age, pain                                                                             paired ttest
score)        ANOVA: compares means                                                   Wilcoxon sum-rank test
between more than two               Repeated-measures                   (=Mann-Whitney U test): non-
independent groups                  ANOVA: compares changes             parametric alternative to the ttest
over time in the means of two or    Kruskal-Wallis test: non-
Pearson’s correlation               more groups (repeated
parametric alternative to ANOVA
measurements)
coefficient (linear                                                     Spearman rank correlation
correlation): shows linear                                              coefficient: non-parametric
correlation between two             Mixed models/GEE                    alternative to Pearson’s correlation
continuous variables                modeling: multivariate              coefficient
regression techniques to compare
changes over time between two
Linear regression:                  or more groups
multivariate regression technique
when the outcome is continuous;
Example: paired ttest
TABLE 1. Difference between Means of "Before" and "After" Botulinum Toxin A Treatment

Before BTxnA   After BTxnA   Difference   Significance

Social skills                   5.90           5.84          NS           .293
Academic performance            5.86           5.78          .08          .068**

Date success                    5.17           5.30          .13          .014*
Occupational success            6.08           5.97          .11          .013*
Attractiveness                  4.94           5.07          .13          .030*
Financial success               5.67           5.61          NS           .230
Relationship success            5.68           5.68          NS           .967
Athletic success                5.15           5.38          .23          .000**

*   Significant at 5% level.
**   Significant at 1% level.
Paired ttest
Statistical question: Is there a difference in date
success after BoTox?
 What is the outcome variable? Date success

 What type of variable is it? Continuous

 Is it normally distributed? Yes

 Are the observations correlated? Yes, it’s the
same patients before and after
 How many time points are being compared?
Two
 paired ttest
Paired ttest mechanics
1.   Calculate the change in date success score for each
person.
2.   Calculate the average change in date success for
the sample. (=.13)
3.   Calculate the standard error of the change in date
success. (=.05)
4.   Calculate a T-statistic by dividing the mean change
by the standard error (T=.13/.05=2.6).
5.   Look up the corresponding p-values. (T=2.6
corresponds to p=.014).
6.   Significant p-values indicate that the average
change is significantly different than 0.
Paired ttest example 2…
Patient   BP Before (diastolic)   BP After

1               100               92

2                89               84

3                83               80

4                98               93

5               108               98

6                95               90
Example problem: paired ttest
Patient   Diastolic BP Before   D. BP After   Change

1              100                92          -8

2               89                84          -5

3               83                80          -3

4               98                93          -5

5              108                98         -10

6               95                90          -5

Null Hypothesis: Average Change = 0
Example problem: paired ttest
 8  5  3  5  10  5  36                               Change
X                                  6
6              6
-8
( 8  6) 2  ( 5  6) 2  ( 3  6) 2 ...
sx                                                                -5
5
4  1  9  1  16  1   32                                        -3
     2.5
5              5
-5
2.5
sx            1.0      Null Hypothesis: Average Change = 0
-10
6
With 5 df, T>2.571
60                        corresponds to p<.05             -5
T5        6                   (two-sided test)
1.0
Example problem: paired ttest
Change

95% CI : - 6  2.571 * (1.0)     -8

 (-3.43 , - 8.571)              -5

-3
Note: does not include 0.
-5

-10

-5
Continuous outcome (means)
Are the observations correlated?                                        Alternatives if the
Outcome                                                                               normality assumption is
Variable      independent                         correlated                          violated (and small n):
Continuous    Ttest: compares means               Paired ttest: compares means        Non-parametric statistics
(e.g. blood   between two independent             between two related groups (e.g.,   Wilcoxon sign-rank test:
groups                              the same subjects before and
pressure,                                         after)
non-parametric alternative to
age, pain                                                                             paired ttest
score)        ANOVA: compares means                                                   Wilcoxon sum-rank test
between more than two               Repeated-measures                   (=Mann-Whitney U test): non-
independent groups                  ANOVA: compares changes             parametric alternative to the ttest
over time in the means of two or    Kruskal-Wallis test: non-
Pearson’s correlation               more groups (repeated
parametric alternative to ANOVA
measurements)
coefficient (linear                                                     Spearman rank correlation
correlation): shows linear                                              coefficient: non-parametric
correlation between two             Mixed models/GEE                    alternative to Pearson’s correlation
continuous variables                modeling: multivariate              coefficient
regression techniques to compare
changes over time between two
Linear regression:                  or more groups
multivariate regression technique
when the outcome is continuous;
Using our class data…
   Hypothesis: Students who consider
themselves street smart drink more
alcohol than students who consider
themselves book smart.
   Null hypothesis: no difference in alcohol
drinking between street smart and book
smart students.
―Non-normal‖ class
data…alcohol…
Wilcoxon sum-rank test
Statistical question: Is there a difference in alcohol
drinking between street smart and book smart
students?
 What is the outcome variable? Weekly alcohol intake
(drinks/week)
 What type of variable is it? Continuous

 Is it normally distributed? No (and small n)

 Are the observations correlated? No

 Are groups being compared, and if so, how many?
two
 Wilcoxon sum-rank test
Results:
Book smart:                     Street smart:

Mean=1.6 drinks/week; median    Mean=2.7 drinks/week; median
= 1.5                           = 3.0
Wilcoxon rank-sum test
mechanics…
   Book smart values (n=13): 0 0 0 0 1 1 2 2 2 3 3 4 5
   Street Smart values (n=7): 0 0 2 3 3 5 6
   Combined groups (n=20): 0 0 0 0 0 0 1 1 2 2 2 2 3 3
334556
   Corresponding ranks: 3.5* 3.5 3.5 3.5 3.5 3.5 7.5 7.5
10.5 10.5 10.5 10.5 14.5 14.5 14.5 14.5 17 18.5 18.5
20

*ties are assigned average ranks; e.g., there are 6 zero’s, so zero’s get the average of the ranks
1 through 6.
Wilcoxon rank-sum test…
   Ranks, book smart: 3.5 3.5 3.5 3.5 7.5 7.5 10.5 10.5 10.5 14.5
14.5 17 18.5
   Ranks, street smart: 3.5 3.5 10.5 14.5 14.5 18.5 20
   Sum of ranks book smart:
3.5+3.5+3.5+3.5+7.5+7.5+10.5+10.5+10.5+
14.5+14.5+17+18.5= 125
   Sum of ranks street smart: 3.5+3.5+10.5+14.5
+14.5+18.5+20= 85
   Wilcoxon sum-rank test compares these numbers accounting for
the differences in sample size in the two groups.
   Resulting p-value (from computer) = 0.24
   Not significantly different!
Example 2, Wilcoxon sum-rank
test…
10 dieters following Atkin’s diet vs. 10 dieters following
Jenny Craig

Hypothetical RESULTS:
Atkin’s group loses an average of 34.5 lbs.

J. Craig group loses an average of 18.5 lbs.

Conclusion: Atkin’s is better?
Example: non-parametric tests
BUT, take a closer look at the individual data…

Atkin’s, change in weight (lbs):
+4, +3, 0, -3, -4, -5, -11, -14, -15, -300

J. Craig, change in weight (lbs)
-8, -10, -12, -16, -18, -20, -21, -24, -26, -30
Jenny Craig
30

25

20

P
e
r
c   15
e
n
t

10

5

0
-30   -25   -20   -15   -10   -5   0   5   10   15   20

Weight Change
Atkin’s
30

25

20

P
e
r
c   15
e
n
t

10

5

0
-300 -280 -260 -240 -220 -200 -180 -160 -140 -120 -100   -80   -60   -40   -20   0   20

Weight Change
Wilcoxon Rank-Sum test
   RANK the values, 1 being the least weight
loss and 20 being the most weight loss.
   Atkin’s
   +4, +3, 0, -3, -4, -5, -11, -14, -15, -300
    1, 2, 3, 4, 5, 6, 9, 11, 12, 20
   J. Craig
   -8, -10, -12, -16, -18, -20, -21, -24, -26, -30
   7, 8, 10, 13, 14, 15, 16, 17, 18, 19
Wilcoxon Rank-Sum test
 Sum of Atkin’s ranks:
 1+ 2 + 3 + 4 + 5 + 6 + 9 + 11+ 12 +
20=73
 Sum of Jenny Craig’s ranks:

7 + 8 +10+ 13+ 14+ 15+16+ 17+ 18+19=137

   Jenny Craig clearly ranked higher!
   P-value *(from computer) = .018
Review Question 3
When you want to compare mean blood
pressure between two groups, you should:

a.   Use a ttest
b.   Use a nonparametric test
c.   Use a ttest if blood pressure is normally
distributed.
d.   Use a two-sample proportions test.
e.   Use a two-sample proportions test only if
blood pressure is normally distributed.
Review Question 3
When you want to compare mean blood
pressure between two groups, you should:

a.   Use a ttest
b.   Use a nonparametric test
c.   Use a ttest if blood pressure is
normally distributed.
d.   Use a two-sample proportions test.
e.   Use a two-sample proportions test only if
blood pressure is normally distributed.
Continuous outcome (means)
Are the observations correlated?                                        Alternatives if the
Outcome                                                                               normality assumption is
Variable      independent                         correlated                          violated (and small n):
Continuous    Ttest: compares means               Paired ttest: compares means        Non-parametric statistics
(e.g. blood   between two independent             between two related groups (e.g.,   Wilcoxon sign-rank test:
groups                              the same subjects before and
pressure,                                         after)
non-parametric alternative to
age, pain                                                                             paired ttest
score)        ANOVA: compares means                                                   Wilcoxon sum-rank test
between more than two               Repeated-measures                   (=Mann-Whitney U test): non-
independent groups                  ANOVA: compares changes             parametric alternative to the ttest
over time in the means of two or    Kruskal-Wallis test: non-
Pearson’s correlation               more groups (repeated
parametric alternative to ANOVA
measurements)
coefficient (linear                                                     Spearman rank correlation
correlation): shows linear                                              coefficient: non-parametric
correlation between two             Mixed models/GEE                    alternative to Pearson’s correlation
continuous variables                modeling: multivariate              coefficient
regression techniques to compare
changes over time between two
Linear regression:                  or more groups
multivariate regression technique
when the outcome is continuous;
DHA and eczema…
P-values from
Wilcoxon sign-
rank tests

Figure 3 from: Koch C, Dölle S, Metzger M, Rasche C, Jungclas H, Rühl R, Renz H, Worm M. Docosahexaenoic
acid (DHA) supplementation in atopic eczema: a randomized, double-blind, controlled trial. Br J Dermatol. 2008
Apr;158(4):786-92. Epub 2008 Jan 30.
Wilcoxon sign-rank test
Statistical question: Did patients improve in SCORAD
score from baseline to 8 weeks?
 What is the outcome variable? SCORAD

 What type of variable is it? Continuous

 Is it normally distributed? No (and small numbers)

 Are the observations correlated? Yes, it’s the same
people before and after
 How many time points are being compared? two

  Wilcoxon sign-rank test
Wilcoxon sign-rank test
mechanics…
   1. Calculate the change in SCORAD score for
each participant.
   2. Rank the absolute values of the changes in
SCORAD score from smallest to largest.
   3. Add up the ranks from the people who
improved and, separately, the ranks from the
people who got worse.
   4. The Wilcoxon sign-rank compares these
values to determine whether improvements
significantly exceed declines (or vice versa).
Continuous outcome (means)
Are the observations correlated?                                        Alternatives if the
Outcome                                                                               normality assumption is
Variable      independent                         correlated                          violated (and small n):
Continuous    Ttest: compares means               Paired ttest: compares means        Non-parametric statistics
(e.g. blood   between two independent             between two related groups (e.g.,   Wilcoxon sign-rank test:
groups                              the same subjects before and
pressure,                                         after)
non-parametric alternative to
age, pain                                                                             paired ttest
score)        ANOVA: compares means                                                   Wilcoxon sum-rank test
between more than two               Repeated-measures                   (=Mann-Whitney U test): non-
independent groups                  ANOVA: compares changes             parametric alternative to the ttest
over time in the means of two or    Kruskal-Wallis test: non-
Pearson’s correlation               more groups (repeated
parametric alternative to ANOVA
measurements)
coefficient (linear                                                     Spearman rank correlation
correlation): shows linear                                              coefficient: non-parametric
correlation between two             Mixed models/GEE                    alternative to Pearson’s correlation
continuous variables                modeling: multivariate              coefficient
regression techniques to compare
changes over time between two
Linear regression:                  or more groups
multivariate regression technique
when the outcome is continuous;
ANOVA example
Mean micronutrient intake from the school lunch by school
S1a, n=28      S2b, n=25      S3c, n=21        P-valued
Calcium (mg)     Mean       117.8          158.7          206.5            0.000
SDe        62.4           70.5           86.2
Iron (mg)        Mean       2.0            2.0            2.0              0.854
SD         0.6            0.6            0.6
Folate (μg)      Mean       26.6           38.7           42.6             0.000
SD         13.1           14.5           15.1
Zinc (mg)        Mean       1.9            1.5            1.3              0.055
SD         1.0            1.2            0.4
a School 1 (most deprived; 40% subsidized lunches).                       FROM: Gould R, Russell J,
Barker ME. School lunch
b School 2 (medium deprived; <10% subsidized).                            menus and 11 to 12 year old
c School 3 (least deprived; no subsidization, private school).            children's food choice in three
secondary schools in England-
d ANOVA; significant differences are highlighted in bold (P<0.05).        are the nutritional standards
being met? Appetite. 2006
Jan;46(1):86-92.
ANOVA
Statistical question: Does calcium content of
school lunches differ by school type
(privileged, average, deprived)
 What is the outcome variable? Calcium
 What type of variable is it? Continuous

 Is it normally distributed? Yes

 Are the observations correlated? No
 Are groups being compared and, if so, how
many? Yes, three
  ANOVA
ANOVA
(ANalysis Of VAriance)
   Idea: For two or more groups, test
difference between means, for normally
distributed variables.
   Just an extension of the t-test (an
ANOVA with only two groups is
mathematically equivalent to a t-test).
One-Way Analysis of Variance

   Assumptions, same as ttest
 Normally distributed outcome

 Equal variances between the groups

 Groups are independent
Hypotheses of One-Way
ANOVA

H 0 : μ1  μ 2  μ 3  

H 1 : Not all of the population means are the same
ANOVA
   It’s like this: If I have three groups to
compare:
   I could do three pair-wise ttests, but this
would increase my type I error
   So, instead I want to look at the pairwise
differences ―all at once.‖
   To do this, I can recognize that variance is
a statistic that let’s me look at more than
one difference at a time…
The ―F-test‖
Is the difference in the means of the groups more
than background noise (=variability within groups)?
Summarizes the mean differences
between all groups at once.

Variability between groups
F
Variability within groups

Analogous to pooled variance from a ttest.
The F-distribution
    A ratio of variances follows an F-distribution:
  2
between
~ Fn ,m
   2
within

The  F-test tests the hypothesis that two variances
are equal.
F   will be close to 1 if sample variances are equal.
H 0 :  between   within
2           2

H a :  between   within
2           2
ANOVA example 2

   Randomize 33 subjects to three groups:
800 mg calcium supplement vs. 1500
mg calcium supplement vs. placebo.
   Compare the spine bone density of all 3
groups after 1 year.
Spine bone density vs.
treatment
1.2

1.1

Within group
Between                                              variability
1.0
S
group
P         variation
I
N                                           Within group
E
Within group       variability
0.9
variability

0.8

0.7
PLACEBO           800mg CALCIUM      1500 mg CALCIUM
Group means and standard
deviations
   Placebo group (n=11):
   Mean spine BMD = .92 g/cm2
   standard deviation = .10 g/cm2
   800 mg calcium supplement group (n=11)
   Mean spine BMD = .94 g/cm2
   standard deviation = .08 g/cm2
   1500 mg calcium supplement group (n=11)
   Mean spine BMD =1.06 g/cm2
   standard deviation = .11 g/cm2
The size of the
Between-group        groups.                                          The difference of
variation.                                                            each group’s

The F-Test
mean from the
overall mean.

(. 92  .97 ) 2  (. 94  .97 ) 2  (1.06  .97 ) 2
sbetween  nsx  11 * (
2           2
)  .063
3 1

swithin  avg s 2  1 (.102  .082  .112 )  .0095
2
3
2
s              .063
F2,30         between
2
        6.6
s within     .0095
Large F value indicates
The average                                            Each group’s variance. the between group
that
amount of                                                                  variation exceeds the
variation within                                                           within group variation
groups.                                                                    (=the background
noise).
Review Question 4
Which of the following is an assumption of
ANOVA?

a. The outcome variable is normally
distributed.
b. The variance of the outcome variable is the
same in all groups.
c. The groups are independent.
d. All of the above.
e. None of the above.
Review Question 4
Which of the following is an assumption of
ANOVA?

a. The outcome variable is normally
distributed.
b. The variance of the outcome variable is the
same in all groups.
c. The groups are independent.
d. All of the above.
e. None of the above.
ANOVA summary
   A statistically significant ANOVA (F-test)
only tells you that at least two of the
groups differ, but not which ones differ.

   Determining which groups differ (when
it’s unclear) requires more sophisticated
analyses to correct for the problem of
multiple comparisons…
Question: Why not just do 3
pairwise ttests?

   Answer: because, at an error rate of 5% each test,
this means you have an overall chance of up to 1-
(.95)3= 14% of making a type-I error (if all 3
comparisons were independent)
    If you wanted to compare 6 groups, you’d have to
do 15 pairwise ttests; which would give you a high
chance of finding something significant just by
chance.
Multiple comparisons
Correction for
multiple comparisons
How to correct for multiple comparisons
post-hoc…
• Bonferroni correction (adjusts p by most
conservative amount; assuming all tests
independent, divide p by the number of
tests)

1. Bonferroni
For example, to make a Bonferroni correction, divide your desired alpha cut-off
level (usually .05) by the number of comparisons you are making. Assumes
complete independence between comparisons, which is way too conservative.

Obtained P-value   Original Alpha   # tests   New Alpha    Significant?

.001               .05           5         .010           Yes

.011               .05           4         .013           Yes

.019               .05           3         .017           No

.032               .05           2         .025           No

.048               .05           1         .050           Yes
2/3. Tukey and Sheffé
   Both methods increase your p-values to
account for the fact that you’ve done
multiple comparisons, but are less
conservative than Bonferroni (let
computer calculate for you!).
Review Question 5
I am doing an RCT of 4 treatment regimens for blood
pressure. At the end of the day, I compare blood
pressures in the 4 groups using ANOVA. My p-value is
.03. I conclude:

a. All of the treatment regimens differ.
b. I need to use a Bonferroni correction.
c. One treatment is better than all the rest.
d. At least one treatment is different from the
others.
e. In pairwise comparisons, no treatment will be
Review Question 5
I am doing an RCT of 4 treatment regimens for blood
pressure. At the end of the day, I compare blood
pressures in the 4 groups using ANOVA. My p-value is
.03. I conclude:

a. All of the treatment regimens differ.
b. I need to use a Bonferroni correction.
c. One treatment is better than all the rest.
d. At least one treatment is different from
the others.
e. In pairwise comparisons, no treatment will be
Continuous outcome (means)
Are the observations correlated?                                        Alternatives if the
Outcome                                                                               normality assumption is
Variable      independent                         correlated                          violated (and small n):
Continuous    Ttest: compares means               Paired ttest: compares means        Non-parametric statistics
(e.g. blood   between two independent             between two related groups (e.g.,   Wilcoxon sign-rank test:
groups                              the same subjects before and
pressure,                                         after)
non-parametric alternative to
age, pain                                                                             paired ttest
score)        ANOVA: compares means                                                   Wilcoxon sum-rank test
between more than two               Repeated-measures                   (=Mann-Whitney U test): non-
independent groups                  ANOVA: compares changes             parametric alternative to the ttest
over time in the means of two or    Kruskal-Wallis test: non-
Pearson’s correlation               more groups (repeated
parametric alternative to ANOVA
measurements)
coefficient (linear                                                     Spearman rank correlation
correlation): shows linear                                              coefficient: non-parametric
correlation between two             Mixed models/GEE                    alternative to Pearson’s correlation
continuous variables                modeling: multivariate              coefficient
regression techniques to compare
changes over time between two
Linear regression:                  or more groups
multivariate regression technique
when the outcome is continuous;
Non-parametric ANOVA
(Kruskal-Wallis test)
Statistical question: Do nevi counts differ by training
velocity (slow, medium, fast) group in marathon
runners?
 What is the outcome variable? Nevi count

 What type of variable is it? Continuous

 Is it normally distributed? No (and small sample size)

 Are the observations correlated? No

 Are groups being compared and, if so, how many?
Yes, three
  non-parametric ANOVA
Example: Nevi counts and
marathon runners

Richtig et al. Melanoma Markers in Marathon Runners: Increase with Sun Exposure and Physical Strain. Dermatology 2008;217:38-44.
Non-parametric ANOVA
Kruskal-Wallis one-way ANOVA
(just an extension of the Wilcoxon Sum-Rank test for
2 groups; based on ranks)
Example: Nevi counts and
marathon runners

By non-parametric ANOVA, the groups
differ significantly in nevi count
(p<.05) overall.
for multiple comparisons), the lowest
velocity group differs significantly
from the highest velocity group
(p<.05)

Richtig et al. Melanoma Markers in Marathon Runners: Increase with Sun Exposure and Physical Strain. Dermatology 2008;217:38-44.
Review Question 6
I want to compare depression scores between three
groups, but I’m not sure if depression is normally
distributed. What should I do?

a.   Don’t worry about it—run an ANOVA anyway.
b.   Test depression for normality.
c.   Use a Kruskal-Wallis (non-parametric) ANOVA.
d.   Nothing, I can’t do anything with these data.
e.   Run 3 nonparametric ttests.
Review Question 6
I want to compare depression scores between three
groups, but I’m not sure if depression is normally
distributed. What should I do?

a.   Don’t worry about it—run an ANOVA anyway.
b.   Test depression for normality.
c.   Use a Kruskal-Wallis (non-parametric) ANOVA.
d.   Nothing, I can’t do anything with these data.
e.   Run 3 nonparametric ttests.
Review Question 7
If depression score turns out to be very non-normal,
then what should I do?

a.   Don’t worry about it—run an ANOVA anyway.
b.   Test depression for normality.
c.   Use a Kruskal-Wallis (non-parametric) ANOVA.
d.   Nothing, I can’t do anything with these data.
e.   Run 3 nonparametric ttests.
Review Question 7
If depression score turns out to be very non-normal,
then what should I do?

a. Don’t worry about it—run an ANOVA anyway.
b. Test depression for normality.
c. Use a Kruskal-Wallis (non-parametric)
ANOVA.
d. Nothing, I can’t do anything with these data.
e. Run 3 nonparametric ttests.
Review Question 8
I measure blood pressure in a cohort of elderly men
yearly for 3 years. To test whether or not their blood
pressure changed over time, I compare the mean blood
pressures in each time period using a one-way ANOVA.
This strategy is:

a.   Correct. I have three means, so I have to use ANOVA.
b.   Wrong. Blood pressure is unlikely to be normally distributed.
c.   Wrong. The variance in BP is likely to greatly differ at the three
time points.
d.   Correct. It would also be OK to use three ttests.
e.   Wrong. The samples are not independent.
Review Question 8
I measure blood pressure in a cohort of elderly men
yearly for 3 years. To test whether or not their blood
pressure changed over time, I compare the mean blood
pressures in each time period using a one-way ANOVA.
This strategy is:

a.   Correct. I have three means, so I have to use ANOVA.
b.   Wrong. Blood pressure is unlikely to be normally distributed.
c.   Wrong. The variance in BP is likely to greatly differ at the three
time points.
d.   Correct. It would also be OK to use three ttests.
e.   Wrong. The samples are not independent.

```
To top