Docstoc

Stanford

Document Sample
Stanford Powered By Docstoc
					Introduction to choosing the
correct statistical test

                  +
  Tests for Continuous Outcomes I
Questions to ask yourself:
1.   What is the outcome (dependent) variable?
2.   Is the outcome variable continuous, binary/categorical, or time-
     to-event?
3.   What is the unit of observation?
         person* (most common)
         lesion
         half a face
         physician
         clinical center
4.   Are the observations independent or correlated?
         Independent: observations are unrelated (usually different, unrelated
          people)
         Correlated: some observations are related to one another, for example: the
          same person over time (repeated measures), lesions within a person, half a
          face, hands within a person, controls who have each been selected to a
          particular case, sibling pairs, husband-wife pairs, mother-infant pairs
              Correlated data example
                 Split-face trial:
                       Researchers assigned 56 subjects to apply
                        SPF 85 sunscreen to one side of their faces
                        and SPF 50 to the other prior to engaging
                        in 5 hours of outdoor sports during mid-
                        day.
                       Sides of the face were randomly assigned;
                        subjects were blinded to SPF strength.
                       Outcome: sunburn

Russak JE et al. JAAD 2010; 62: 348-349.
       Results:
   Table I -- Dermatologist grading of sunburn after an average of 5 hours of
   skiing/snowboarding (P = .03; Fisher’s exact test)


  Sun protection factor         Sunburned           Not sunburned
  85                                            1                     55
  50                                            8                     48



Fisher’s exact test compares the following proportions: 1/56 versus
8/56. Note that individuals are being counted twice!
       Correct analysis of data…
    Table 1. Correct presentation of the data from: Russak JE et
    al. JAAD 2010; 62: 348-349. (P = .016; McNemar’s test).


                                        SPF-50 side

       SPF-85 side            Sunburned           Not sunburned
        Sunburned                  1                     0


      Not sunburned                7                     48


McNemar’s test evaluates the probability of the following: In all 7 out of
7 cases where the sides of the face were discordant (i.e., one side burnt
and the other side did not), the SPF 50 side sustained the burn.
         Overview of common
         statistical tests
                        Are the observations correlated?


                        independent               correlated
Outcome Variable                                                                    Assumptions
Continuous              Ttest                     Paired ttest                      Outcome is normally
                                                                                    distributed (important
                        ANOVA                     Repeated-measures ANOVA
(e.g. blood pressure,                                                               for small samples).
                        Linear correlation        Mixed models/GEE modeling         Outcome and predictor
age, pain score)
                        Linear regression                                           have a linear
                                                                                    relationship.

Binary or               Chi-square test           McNemar’s test                    Chi-square test
                                                                                    assumes sufficient
                        Relative risks            Conditional logistic regression
categorical                                                                         numbers in each cell
                        Logistic regression       GEE modeling                      (>=5)
(e.g. breast cancer
yes/no)
Time-to-event           Kaplan-Meier statistics   n/a                               Cox regression
                                                                                    assumes proportional
                        Cox regression
(e.g. time-to-death,                                                                hazards between
                                                                                    groups
time-to-fracture)
         Overview of common
         statistical tests
                        Are the observations correlated?


                        independent               correlated
Outcome Variable                                                                    Assumptions
Continuous              Ttest                     Paired ttest                      Outcome is normally
                                                                                    distributed (important
                        ANOVA                     Repeated-measures ANOVA
(e.g. blood pressure,                                                               for small samples).
                        Linear correlation        Mixed models/GEE modeling         Outcome and predictor
age, pain score)
                        Linear regression                                           have a linear
                                                                                    relationship.

Binary or               Chi-square test           McNemar’s test                    Sufficient numbers in
                                                                                    each cell (>=5)
                        Relative risks            Conditional logistic regression
categorical
                        Logistic regression       GEE modeling
(e.g. breast cancer
yes/no)
Time-to-event           Kaplan-Meier statistics   n/a                               Cox regression
                                                                                    assumes proportional
                        Cox regression
(e.g. time-to-death,                                                                hazards between
                                                                                    groups
time-to-fracture)
          Continuous outcome (means)
              Are the observations correlated?                                        Alternatives if the
Outcome                                                                               normality assumption is
Variable      independent                         correlated                          violated (and small n):
Continuous    Ttest: compares means               Paired ttest: compares means        Non-parametric statistics
(e.g. blood   between two independent             between two related groups (e.g.,   Wilcoxon sign-rank test:
              groups                              the same subjects before and
pressure,                                         after)
                                                                                      non-parametric alternative to
age, pain                                                                             paired ttest
score)        ANOVA: compares means                                                   Wilcoxon sum-rank test
              between more than two               Repeated-measures                   (=Mann-Whitney U test): non-
              independent groups                  ANOVA: compares changes             parametric alternative to the ttest
                                                  over time in the means of two or    Kruskal-Wallis test: non-
              Pearson’s correlation               more groups (repeated
                                                                                      parametric alternative to ANOVA
                                                  measurements)
              coefficient (linear                                                     Spearman rank correlation
              correlation): shows linear                                              coefficient: non-parametric
              correlation between two             Mixed models/GEE                    alternative to Pearson’s correlation
              continuous variables                modeling: multivariate              coefficient
                                                  regression techniques to compare
                                                  changes over time between two
              Linear regression:                  or more groups
              multivariate regression technique
              when the outcome is continuous;
              gives slopes or adjusted means
          Continuous outcome (means)
              Are the observations correlated?                                        Alternatives if the
Outcome                                                                               normality assumption is
Variable      independent                         correlated                          violated (and small n):
Continuous    Ttest: compares means               Paired ttest: compares means        Non-parametric statistics
(e.g. blood   between two independent             between two related groups (e.g.,   Wilcoxon sign-rank test:
              groups                              the same subjects before and
pressure,                                         after)
                                                                                      non-parametric alternative to
age, pain                                                                             paired ttest
score)        ANOVA: compares means                                                   Wilcoxon sum-rank test
              between more than two               Repeated-measures                   (=Mann-Whitney U test): non-
              independent groups                  ANOVA: compares changes             parametric alternative to the ttest
                                                  over time in the means of two or    Kruskal-Wallis test: non-
              Pearson’s correlation               more groups (repeated
                                                                                      parametric alternative to ANOVA
                                                  measurements)
              coefficient (linear                                                     Spearman rank correlation
              correlation): shows linear                                              coefficient: non-parametric
              correlation between two             Mixed models/GEE                    alternative to Pearson’s correlation
              continuous variables                modeling: multivariate              coefficient
                                                  regression techniques to compare
                                                  changes over time between two
              Linear regression:                  or more groups
              multivariate regression technique
              when the outcome is continuous;
              gives slopes or adjusted means
Example: two-sample t-test
   In 1980, some researchers reported that
    ―men have more mathematical ability than
    women‖ as evidenced by the 1979 SAT’s,
    where a sample of 30 random male
    adolescents had a mean score ± 1 standard
    deviation of 436±77 and 30 random female
    adolescents scored lower: 416±81 (genders
    were similar in educational backgrounds,
    socio-economic status, and age). Do you
    agree with the authors’ conclusions?
Two sample ttest
Statistical question: Is there a difference in SAT
  math scores between men and women?
 What is the outcome variable? Math SAT
  scores
 What type of variable is it? Continuous

 Is it normally distributed? Yes

 Are the observations correlated? No
 Are groups being compared, and if so, how
  many? Yes, two
 two-sample ttest
Two-sample ttest mechanics…
Data Summary
           n    Sample    Sample
                 Mean    Standard
                         Deviation


Group 1:   30    416        81
women
Group 2:   30    436        77
men
Two-sample t-test
1. Define your hypotheses (null,
  alternative)
  H0: ♂-♀ math SAT = 0
  Ha: ♂-♀ math SAT ≠ 0 [two-sided]
  Two-sample t-test
  2. Specify your null distribution:
  F and M have approximately equal standard deviations/variances, so make a “pooled” estimate of
      standard deviation/variance:

               81  77
          sp           79                  s 2  792
                                               p
                  2
The standard error of a difference of two means is:
                   2        2
              sp        792 792
                       sp
                              20.4
                n   m   30 30

Differences in means follow a T-distribution…
T distribution
   A t-distribution is like a Z distribution,
    except has slightly fatter tails to reflect
    the uncertainty added by estimating the
    standard deviation.
   The bigger the sample size (i.e., the
    bigger the sample size used to estimate
    ), then the closer t becomes to Z.
   If n>100, t approaches Z.
                   Student’s t Distribution
                                    Note: t                     Z as n increases

                                      Standard
                                       Normal
                                   (t with df = )

                                                                                        t (df = 13)
     t-distributions are bell-
     shaped and symmetric, but
     have „fatter‟ tails than the                                                               t (df = 5)
     normal




                                                                  0                                          t
from “Statistics for Managers” Using Microsoft® Excel 4th Edition, Prentice-Hall 2004
              Student’s t Table
           Upper Tail Area
                                                                                 Let: n = 3
   df         .25           .10           .05                                    df = n - 1 = 2
                                                                                       = .10
    1 1.000 3.078 6.314                                                             /2 =.05

    2 0.817 1.886 2.920
    3 0.765 1.638 2.353                                                                           /2 = .05

                 The body of the table
                 contains t values, not                                                 0   2.920 t
                 probabilities
from “Statistics for Managers” Using Microsoft® Excel 4th Edition, Prentice-Hall 2004
                   t distribution values
                      With comparison to the Z value
          Confidence    t                                       t                    t           Z
           Level     (10 d.f.)                              (20 d.f.)             (30 d.f.)     ____

              .80                     1.372                   1.325                     1.310   1.28
              .90                     1.812                   1.725                     1.697   1.64
              .95                     2.228                   2.086                     2.042   1.96
              .99                     3.169                   2.845                     2.750   2.58

                                Note: t                 Z as n increases

from “Statistics for Managers” Using Microsoft® Excel 4th Edition, Prentice-Hall 2004
  Two-sample t-test
   2. Specify your null distribution:
   F and M have approximately equal standard deviations/variances, so make a “pooled” estimate of
       standard deviation/variance:

                81  77
           sp           79                  s 2  792
                                                p
                   2
The standard error of a difference of two means is:
                    2        2
               sp        792 792
                        sp
                               20.4
                 n   m   30 30

Differences in means follow a T-distribution; here we have a T-distribution with
58 degrees of freedom (60 observations – 2 means)…
Two-sample t-test
3. Observed difference in our experiment = 20
  points
    Two-sample t-test
4. Calculate the p-value of what you observed
    Critical value for           20  0
    two-tailed p-value     T58          .98
    of .05 for T58=2.000          20.4
    0.98<2.000, so
    p>.05
                           p  .33



5. Do not reject null! No evidence that men are better
in math ;)
Corresponding confidence
interval…


 20  2.00 * 20.4  20.8  60.8

 Note that the 95% confidence
 interval crosses 0 (the null value).
     Review Question 1
     A t-distribution:

a.   Is approximately a normal distribution if
     n>100.
b.   Can be used interchangeably with a normal
     distribution as long as the sample size is large
     enough.
c.   Reflects the uncertainty introduced when using
     the sample, rather than population, standard
     deviation.
d.   All of the above.
     Review Question 1
     A t-distribution:

a.   Is approximately a normal distribution if
     n>100.
b.   Can be used interchangeably with a normal
     distribution as long as the sample size is large
     enough.
c.   Reflects the uncertainty introduced when using
     the sample, rather than population, standard
     deviation.
d.   All of the above.
Review Question 2
     In a medical student class, the 6 people born on odd days had heights
     of 64.64 inches; the 10 people born on even days had heights of
     71.15 inches. Height is roughly normally distributed. Which of the
     following best represents the correct statistical test for these data?

a. Z  71.1  64.6  6.5  1.44; p  ns
           4.5       4.5
          71.1  64.6 6.5
     Z                    4.6; p  .0001
b.            4.5      1.4
               16

             71.1  64.6        6.5
c. T14                             2.7; p  .05
              4.7 2 4.7 2       2.4
                   
               10     6

d.           71.1  64.6 6.5
     T14                     1.44; p  ns
                 4.5      4.5
Review Question 2
     In a medical student class, the 6 people born on odd days had heights
     of 64.64 inches; the 10 people born on even days had heights of
     71.15 inches. Height is roughly normally distributed. Which of the
     following best represents the correct statistical test for these data?

a. Z  71.1  64.6  6.5  1.44; p  ns
           4.5       4.5
          71.1  64.6 6.5
     Z                    4.6; p  .0001
b.            4.5      1.4
               16

             71.1  64.6        6.5
c. T14                             2.7; p  .05
              4.7 2 4.7 2       2.4
                   
               10     6

d.           71.1  64.6 6.5
     T14                     1.44; p  ns
                 4.5      4.5
          Continuous outcome (means)
              Are the observations correlated?                                        Alternatives if the
Outcome                                                                               normality assumption is
Variable      independent                         correlated                          violated (and small n):
Continuous    Ttest: compares means               Paired ttest: compares means        Non-parametric statistics
(e.g. blood   between two independent             between two related groups (e.g.,   Wilcoxon sign-rank test:
              groups                              the same subjects before and
pressure,                                         after)
                                                                                      non-parametric alternative to
age, pain                                                                             paired ttest
score)        ANOVA: compares means                                                   Wilcoxon sum-rank test
              between more than two               Repeated-measures                   (=Mann-Whitney U test): non-
              independent groups                  ANOVA: compares changes             parametric alternative to the ttest
                                                  over time in the means of two or    Kruskal-Wallis test: non-
              Pearson’s correlation               more groups (repeated
                                                                                      parametric alternative to ANOVA
                                                  measurements)
              coefficient (linear                                                     Spearman rank correlation
              correlation): shows linear                                              coefficient: non-parametric
              correlation between two             Mixed models/GEE                    alternative to Pearson’s correlation
              continuous variables                modeling: multivariate              coefficient
                                                  regression techniques to compare
                                                  changes over time between two
              Linear regression:                  or more groups
              multivariate regression technique
              when the outcome is continuous;
              gives slopes or adjusted means
Example: paired ttest
     TABLE 1. Difference between Means of "Before" and "After" Botulinum Toxin A Treatment


                                Before BTxnA   After BTxnA   Difference   Significance



Social skills                   5.90           5.84          NS           .293
Academic performance            5.86           5.78          .08          .068**

Date success                    5.17           5.30          .13          .014*
Occupational success            6.08           5.97          .11          .013*
Attractiveness                  4.94           5.07          .13          .030*
Financial success               5.67           5.61          NS           .230
Relationship success            5.68           5.68          NS           .967
Athletic success                5.15           5.38          .23          .000**



*   Significant at 5% level.
**   Significant at 1% level.
Paired ttest
Statistical question: Is there a difference in date
  success after BoTox?
 What is the outcome variable? Date success

 What type of variable is it? Continuous

 Is it normally distributed? Yes

 Are the observations correlated? Yes, it’s the
  same patients before and after
 How many time points are being compared?
  Two
 paired ttest
Paired ttest mechanics
1.   Calculate the change in date success score for each
     person.
2.   Calculate the average change in date success for
     the sample. (=.13)
3.   Calculate the standard error of the change in date
     success. (=.05)
4.   Calculate a T-statistic by dividing the mean change
     by the standard error (T=.13/.05=2.6).
5.   Look up the corresponding p-values. (T=2.6
     corresponds to p=.014).
6.   Significant p-values indicate that the average
     change is significantly different than 0.
Paired ttest example 2…
Patient   BP Before (diastolic)   BP After

  1               100               92

  2                89               84

  3                83               80

  4                98               93

  5               108               98

  6                95               90
   Example problem: paired ttest
    Patient   Diastolic BP Before   D. BP After   Change

      1              100                92          -8

      2               89                84          -5

      3               83                80          -3

      4               98                93          -5

      5              108                98         -10

      6               95                90          -5

Null Hypothesis: Average Change = 0
           Example problem: paired ttest
       8  5  3  5  10  5  36                               Change
X                                  6
                 6              6
                                                                    -8
     ( 8  6) 2  ( 5  6) 2  ( 3  6) 2 ...
sx                                                                -5
                          5
 4  1  9  1  16  1   32                                        -3
                             2.5
           5              5
                                                                    -5
           2.5
   sx            1.0      Null Hypothesis: Average Change = 0
                                                                   -10
             6
                                   With 5 df, T>2.571
       60                        corresponds to p<.05             -5
  T5        6                   (two-sided test)
        1.0
  Example problem: paired ttest
                               Change

95% CI : - 6  2.571 * (1.0)     -8

 (-3.43 , - 8.571)              -5

                                 -3
   Note: does not include 0.
                                 -5

                                -10

                                 -5
          Continuous outcome (means)
              Are the observations correlated?                                        Alternatives if the
Outcome                                                                               normality assumption is
Variable      independent                         correlated                          violated (and small n):
Continuous    Ttest: compares means               Paired ttest: compares means        Non-parametric statistics
(e.g. blood   between two independent             between two related groups (e.g.,   Wilcoxon sign-rank test:
              groups                              the same subjects before and
pressure,                                         after)
                                                                                      non-parametric alternative to
age, pain                                                                             paired ttest
score)        ANOVA: compares means                                                   Wilcoxon sum-rank test
              between more than two               Repeated-measures                   (=Mann-Whitney U test): non-
              independent groups                  ANOVA: compares changes             parametric alternative to the ttest
                                                  over time in the means of two or    Kruskal-Wallis test: non-
              Pearson’s correlation               more groups (repeated
                                                                                      parametric alternative to ANOVA
                                                  measurements)
              coefficient (linear                                                     Spearman rank correlation
              correlation): shows linear                                              coefficient: non-parametric
              correlation between two             Mixed models/GEE                    alternative to Pearson’s correlation
              continuous variables                modeling: multivariate              coefficient
                                                  regression techniques to compare
                                                  changes over time between two
              Linear regression:                  or more groups
              multivariate regression technique
              when the outcome is continuous;
              gives slopes or adjusted means
Using our class data…
   Hypothesis: Students who consider
    themselves street smart drink more
    alcohol than students who consider
    themselves book smart.
   Null hypothesis: no difference in alcohol
    drinking between street smart and book
    smart students.
―Non-normal‖ class
data…alcohol…
Wilcoxon sum-rank test
Statistical question: Is there a difference in alcohol
  drinking between street smart and book smart
  students?
 What is the outcome variable? Weekly alcohol intake
  (drinks/week)
 What type of variable is it? Continuous

 Is it normally distributed? No (and small n)

 Are the observations correlated? No

 Are groups being compared, and if so, how many?
  two
 Wilcoxon sum-rank test
    Results:
Book smart:                     Street smart:




 Mean=1.6 drinks/week; median    Mean=2.7 drinks/week; median
 = 1.5                           = 3.0
Wilcoxon rank-sum test
mechanics…
   Book smart values (n=13): 0 0 0 0 1 1 2 2 2 3 3 4 5
   Street Smart values (n=7): 0 0 2 3 3 5 6
   Combined groups (n=20): 0 0 0 0 0 0 1 1 2 2 2 2 3 3
    334556
   Corresponding ranks: 3.5* 3.5 3.5 3.5 3.5 3.5 7.5 7.5
    10.5 10.5 10.5 10.5 14.5 14.5 14.5 14.5 17 18.5 18.5
    20

*ties are assigned average ranks; e.g., there are 6 zero’s, so zero’s get the average of the ranks
     1 through 6.
Wilcoxon rank-sum test…
   Ranks, book smart: 3.5 3.5 3.5 3.5 7.5 7.5 10.5 10.5 10.5 14.5
    14.5 17 18.5
   Ranks, street smart: 3.5 3.5 10.5 14.5 14.5 18.5 20
   Sum of ranks book smart:
    3.5+3.5+3.5+3.5+7.5+7.5+10.5+10.5+10.5+
    14.5+14.5+17+18.5= 125
   Sum of ranks street smart: 3.5+3.5+10.5+14.5
    +14.5+18.5+20= 85
   Wilcoxon sum-rank test compares these numbers accounting for
    the differences in sample size in the two groups.
   Resulting p-value (from computer) = 0.24
   Not significantly different!
        Example 2, Wilcoxon sum-rank
        test…
10 dieters following Atkin’s diet vs. 10 dieters following
Jenny Craig

Hypothetical RESULTS:
Atkin’s group loses an average of 34.5 lbs.

J. Craig group loses an average of 18.5 lbs.

Conclusion: Atkin’s is better?
        Example: non-parametric tests
BUT, take a closer look at the individual data…

Atkin’s, change in weight (lbs):
      +4, +3, 0, -3, -4, -5, -11, -14, -15, -300

J. Craig, change in weight (lbs)
       -8, -10, -12, -16, -18, -20, -21, -24, -26, -30
Jenny Craig
    30




    25




    20

P
e
r
c   15
e
n
t

    10




     5




     0
         -30   -25   -20   -15   -10   -5   0   5   10   15   20

                                   Weight Change
Atkin’s
    30




    25




    20

P
e
r
c   15
e
n
t

    10




     5




     0
         -300 -280 -260 -240 -220 -200 -180 -160 -140 -120 -100   -80   -60   -40   -20   0   20

                                           Weight Change
Wilcoxon Rank-Sum test
   RANK the values, 1 being the least weight
    loss and 20 being the most weight loss.
   Atkin’s
   +4, +3, 0, -3, -4, -5, -11, -14, -15, -300
    1, 2, 3, 4, 5, 6, 9, 11, 12, 20
   J. Craig
   -8, -10, -12, -16, -18, -20, -21, -24, -26, -30
   7, 8, 10, 13, 14, 15, 16, 17, 18, 19
Wilcoxon Rank-Sum test
 Sum of Atkin’s ranks:
 1+ 2 + 3 + 4 + 5 + 6 + 9 + 11+ 12 +
  20=73
 Sum of Jenny Craig’s ranks:

7 + 8 +10+ 13+ 14+ 15+16+ 17+ 18+19=137

   Jenny Craig clearly ranked higher!
   P-value *(from computer) = .018
Review Question 3
When you want to compare mean blood
  pressure between two groups, you should:

a.   Use a ttest
b.   Use a nonparametric test
c.   Use a ttest if blood pressure is normally
     distributed.
d.   Use a two-sample proportions test.
e.   Use a two-sample proportions test only if
     blood pressure is normally distributed.
Review Question 3
When you want to compare mean blood
  pressure between two groups, you should:

a.   Use a ttest
b.   Use a nonparametric test
c.   Use a ttest if blood pressure is
     normally distributed.
d.   Use a two-sample proportions test.
e.   Use a two-sample proportions test only if
     blood pressure is normally distributed.
          Continuous outcome (means)
              Are the observations correlated?                                        Alternatives if the
Outcome                                                                               normality assumption is
Variable      independent                         correlated                          violated (and small n):
Continuous    Ttest: compares means               Paired ttest: compares means        Non-parametric statistics
(e.g. blood   between two independent             between two related groups (e.g.,   Wilcoxon sign-rank test:
              groups                              the same subjects before and
pressure,                                         after)
                                                                                      non-parametric alternative to
age, pain                                                                             paired ttest
score)        ANOVA: compares means                                                   Wilcoxon sum-rank test
              between more than two               Repeated-measures                   (=Mann-Whitney U test): non-
              independent groups                  ANOVA: compares changes             parametric alternative to the ttest
                                                  over time in the means of two or    Kruskal-Wallis test: non-
              Pearson’s correlation               more groups (repeated
                                                                                      parametric alternative to ANOVA
                                                  measurements)
              coefficient (linear                                                     Spearman rank correlation
              correlation): shows linear                                              coefficient: non-parametric
              correlation between two             Mixed models/GEE                    alternative to Pearson’s correlation
              continuous variables                modeling: multivariate              coefficient
                                                  regression techniques to compare
                                                  changes over time between two
              Linear regression:                  or more groups
              multivariate regression technique
              when the outcome is continuous;
              gives slopes or adjusted means
          DHA and eczema…
                                          P-values from
                                          Wilcoxon sign-
                                          rank tests




Figure 3 from: Koch C, Dölle S, Metzger M, Rasche C, Jungclas H, Rühl R, Renz H, Worm M. Docosahexaenoic
acid (DHA) supplementation in atopic eczema: a randomized, double-blind, controlled trial. Br J Dermatol. 2008
Apr;158(4):786-92. Epub 2008 Jan 30.
Wilcoxon sign-rank test
Statistical question: Did patients improve in SCORAD
  score from baseline to 8 weeks?
 What is the outcome variable? SCORAD

 What type of variable is it? Continuous

 Is it normally distributed? No (and small numbers)

 Are the observations correlated? Yes, it’s the same
  people before and after
 How many time points are being compared? two

  Wilcoxon sign-rank test
Wilcoxon sign-rank test
mechanics…
   1. Calculate the change in SCORAD score for
    each participant.
   2. Rank the absolute values of the changes in
    SCORAD score from smallest to largest.
   3. Add up the ranks from the people who
    improved and, separately, the ranks from the
    people who got worse.
   4. The Wilcoxon sign-rank compares these
    values to determine whether improvements
    significantly exceed declines (or vice versa).
          Continuous outcome (means)
              Are the observations correlated?                                        Alternatives if the
Outcome                                                                               normality assumption is
Variable      independent                         correlated                          violated (and small n):
Continuous    Ttest: compares means               Paired ttest: compares means        Non-parametric statistics
(e.g. blood   between two independent             between two related groups (e.g.,   Wilcoxon sign-rank test:
              groups                              the same subjects before and
pressure,                                         after)
                                                                                      non-parametric alternative to
age, pain                                                                             paired ttest
score)        ANOVA: compares means                                                   Wilcoxon sum-rank test
              between more than two               Repeated-measures                   (=Mann-Whitney U test): non-
              independent groups                  ANOVA: compares changes             parametric alternative to the ttest
                                                  over time in the means of two or    Kruskal-Wallis test: non-
              Pearson’s correlation               more groups (repeated
                                                                                      parametric alternative to ANOVA
                                                  measurements)
              coefficient (linear                                                     Spearman rank correlation
              correlation): shows linear                                              coefficient: non-parametric
              correlation between two             Mixed models/GEE                    alternative to Pearson’s correlation
              continuous variables                modeling: multivariate              coefficient
                                                  regression techniques to compare
                                                  changes over time between two
              Linear regression:                  or more groups
              multivariate regression technique
              when the outcome is continuous;
              gives slopes or adjusted means
            ANOVA example
    Mean micronutrient intake from the school lunch by school
                                S1a, n=28      S2b, n=25      S3c, n=21        P-valued
    Calcium (mg)     Mean       117.8          158.7          206.5            0.000
                     SDe        62.4           70.5           86.2
    Iron (mg)        Mean       2.0            2.0            2.0              0.854
                     SD         0.6            0.6            0.6
    Folate (μg)      Mean       26.6           38.7           42.6             0.000
                     SD         13.1           14.5           15.1
    Zinc (mg)        Mean       1.9            1.5            1.3              0.055
                     SD         1.0            1.2            0.4
a School 1 (most deprived; 40% subsidized lunches).                       FROM: Gould R, Russell J,
                                                                          Barker ME. School lunch
b School 2 (medium deprived; <10% subsidized).                            menus and 11 to 12 year old
c School 3 (least deprived; no subsidization, private school).            children's food choice in three
                                                                          secondary schools in England-
d ANOVA; significant differences are highlighted in bold (P<0.05).        are the nutritional standards
                                                                          being met? Appetite. 2006
                                                                          Jan;46(1):86-92.
ANOVA
Statistical question: Does calcium content of
  school lunches differ by school type
  (privileged, average, deprived)
 What is the outcome variable? Calcium
 What type of variable is it? Continuous

 Is it normally distributed? Yes

 Are the observations correlated? No
 Are groups being compared and, if so, how
  many? Yes, three
  ANOVA
ANOVA
(ANalysis Of VAriance)
   Idea: For two or more groups, test
    difference between means, for normally
    distributed variables.
   Just an extension of the t-test (an
    ANOVA with only two groups is
    mathematically equivalent to a t-test).
    One-Way Analysis of Variance

   Assumptions, same as ttest
     Normally distributed outcome

     Equal variances between the groups

     Groups are independent
   Hypotheses of One-Way
   ANOVA

H 0 : μ1  μ 2  μ 3  

H 1 : Not all of the population means are the same
ANOVA
   It’s like this: If I have three groups to
    compare:
       I could do three pair-wise ttests, but this
        would increase my type I error
       So, instead I want to look at the pairwise
        differences ―all at once.‖
       To do this, I can recognize that variance is
        a statistic that let’s me look at more than
        one difference at a time…
   The ―F-test‖
Is the difference in the means of the groups more
than background noise (=variability within groups)?
                    Summarizes the mean differences
                    between all groups at once.




        Variability between groups
     F
         Variability within groups


                      Analogous to pooled variance from a ttest.
     The F-distribution
    A ratio of variances follows an F-distribution:
                   2
                    between
                              ~ Fn ,m
                    2
                     within

The  F-test tests the hypothesis that two variances
are equal.
F   will be close to 1 if sample variances are equal.
               H 0 :  between   within
                       2           2


               H a :  between   within
                       2           2
ANOVA example 2

   Randomize 33 subjects to three groups:
    800 mg calcium supplement vs. 1500
    mg calcium supplement vs. placebo.
   Compare the spine bone density of all 3
    groups after 1 year.
           Spine bone density vs.
                 treatment
    1.2




    1.1


                                                               Within group
          Between                                              variability
    1.0
S
          group
P         variation
I
N                                           Within group
E
                         Within group       variability
    0.9
                         variability


    0.8




    0.7
                      PLACEBO           800mg CALCIUM      1500 mg CALCIUM
Group means and standard
deviations
   Placebo group (n=11):
       Mean spine BMD = .92 g/cm2
       standard deviation = .10 g/cm2
   800 mg calcium supplement group (n=11)
       Mean spine BMD = .94 g/cm2
       standard deviation = .08 g/cm2
   1500 mg calcium supplement group (n=11)
       Mean spine BMD =1.06 g/cm2
       standard deviation = .11 g/cm2
                     The size of the
Between-group        groups.                                          The difference of
variation.                                                            each group’s

             The F-Test
                                                                      mean from the
                                                                      overall mean.




                                (. 92  .97 ) 2  (. 94  .97 ) 2  (1.06  .97 ) 2
        sbetween  nsx  11 * (
         2           2
                                                                                    )  .063
                                                       3 1

        swithin  avg s 2  1 (.102  .082  .112 )  .0095
         2
                             3
                                       2
                                     s              .063
                        F2,30         between
                                        2
                                                         6.6
                                      s within     .0095
                                                                             Large F value indicates
  The average                                            Each group’s variance. the between group
                                                                             that
  amount of                                                                  variation exceeds the
  variation within                                                           within group variation
  groups.                                                                    (=the background
                                                                             noise).
Review Question 4
Which of the following is an assumption of
   ANOVA?

a. The outcome variable is normally
   distributed.
b. The variance of the outcome variable is the
   same in all groups.
c. The groups are independent.
d. All of the above.
e. None of the above.
Review Question 4
Which of the following is an assumption of
   ANOVA?

a. The outcome variable is normally
   distributed.
b. The variance of the outcome variable is the
   same in all groups.
c. The groups are independent.
d. All of the above.
e. None of the above.
ANOVA summary
   A statistically significant ANOVA (F-test)
    only tells you that at least two of the
    groups differ, but not which ones differ.

   Determining which groups differ (when
    it’s unclear) requires more sophisticated
    analyses to correct for the problem of
    multiple comparisons…
Question: Why not just do 3
pairwise ttests?

   Answer: because, at an error rate of 5% each test,
    this means you have an overall chance of up to 1-
    (.95)3= 14% of making a type-I error (if all 3
    comparisons were independent)
    If you wanted to compare 6 groups, you’d have to
    do 15 pairwise ttests; which would give you a high
    chance of finding something significant just by
    chance.
Multiple comparisons
Correction for
multiple comparisons
How to correct for multiple comparisons
  post-hoc…
• Bonferroni correction (adjusts p by most
  conservative amount; assuming all tests
  independent, divide p by the number of
  tests)
• Tukey (adjusts p)

• Scheffe (adjusts p)
           1. Bonferroni
For example, to make a Bonferroni correction, divide your desired alpha cut-off
level (usually .05) by the number of comparisons you are making. Assumes
complete independence between comparisons, which is way too conservative.

     Obtained P-value   Original Alpha   # tests   New Alpha    Significant?


          .001               .05           5         .010           Yes


          .011               .05           4         .013           Yes


          .019               .05           3         .017           No


          .032               .05           2         .025           No


          .048               .05           1         .050           Yes
2/3. Tukey and Sheffé
   Both methods increase your p-values to
    account for the fact that you’ve done
    multiple comparisons, but are less
    conservative than Bonferroni (let
    computer calculate for you!).
     Review Question 5
     I am doing an RCT of 4 treatment regimens for blood
     pressure. At the end of the day, I compare blood
     pressures in the 4 groups using ANOVA. My p-value is
     .03. I conclude:


a. All of the treatment regimens differ.
b. I need to use a Bonferroni correction.
c. One treatment is better than all the rest.
d. At least one treatment is different from the
   others.
e. In pairwise comparisons, no treatment will be
     Review Question 5
     I am doing an RCT of 4 treatment regimens for blood
     pressure. At the end of the day, I compare blood
     pressures in the 4 groups using ANOVA. My p-value is
     .03. I conclude:


a. All of the treatment regimens differ.
b. I need to use a Bonferroni correction.
c. One treatment is better than all the rest.
d. At least one treatment is different from
   the others.
e. In pairwise comparisons, no treatment will be
          Continuous outcome (means)
              Are the observations correlated?                                        Alternatives if the
Outcome                                                                               normality assumption is
Variable      independent                         correlated                          violated (and small n):
Continuous    Ttest: compares means               Paired ttest: compares means        Non-parametric statistics
(e.g. blood   between two independent             between two related groups (e.g.,   Wilcoxon sign-rank test:
              groups                              the same subjects before and
pressure,                                         after)
                                                                                      non-parametric alternative to
age, pain                                                                             paired ttest
score)        ANOVA: compares means                                                   Wilcoxon sum-rank test
              between more than two               Repeated-measures                   (=Mann-Whitney U test): non-
              independent groups                  ANOVA: compares changes             parametric alternative to the ttest
                                                  over time in the means of two or    Kruskal-Wallis test: non-
              Pearson’s correlation               more groups (repeated
                                                                                      parametric alternative to ANOVA
                                                  measurements)
              coefficient (linear                                                     Spearman rank correlation
              correlation): shows linear                                              coefficient: non-parametric
              correlation between two             Mixed models/GEE                    alternative to Pearson’s correlation
              continuous variables                modeling: multivariate              coefficient
                                                  regression techniques to compare
                                                  changes over time between two
              Linear regression:                  or more groups
              multivariate regression technique
              when the outcome is continuous;
              gives slopes or adjusted means
Non-parametric ANOVA
(Kruskal-Wallis test)
Statistical question: Do nevi counts differ by training
  velocity (slow, medium, fast) group in marathon
  runners?
 What is the outcome variable? Nevi count

 What type of variable is it? Continuous

 Is it normally distributed? No (and small sample size)

 Are the observations correlated? No

 Are groups being compared and, if so, how many?
  Yes, three
  non-parametric ANOVA
                Example: Nevi counts and
                marathon runners




Richtig et al. Melanoma Markers in Marathon Runners: Increase with Sun Exposure and Physical Strain. Dermatology 2008;217:38-44.
Non-parametric ANOVA
Kruskal-Wallis one-way ANOVA
  (just an extension of the Wilcoxon Sum-Rank test for
  2 groups; based on ranks)
                Example: Nevi counts and
                marathon runners

                                                                                      By non-parametric ANOVA, the groups
                                                                                      differ significantly in nevi count
                                                                                      (p<.05) overall.
                                                                                      By Wilcoxon sum-rank test (adjusted
                                                                                      for multiple comparisons), the lowest
                                                                                      velocity group differs significantly
                                                                                      from the highest velocity group
                                                                                      (p<.05)




Richtig et al. Melanoma Markers in Marathon Runners: Increase with Sun Exposure and Physical Strain. Dermatology 2008;217:38-44.
     Review Question 6
     I want to compare depression scores between three
     groups, but I’m not sure if depression is normally
     distributed. What should I do?


a.   Don’t worry about it—run an ANOVA anyway.
b.   Test depression for normality.
c.   Use a Kruskal-Wallis (non-parametric) ANOVA.
d.   Nothing, I can’t do anything with these data.
e.   Run 3 nonparametric ttests.
     Review Question 6
     I want to compare depression scores between three
     groups, but I’m not sure if depression is normally
     distributed. What should I do?


a.   Don’t worry about it—run an ANOVA anyway.
b.   Test depression for normality.
c.   Use a Kruskal-Wallis (non-parametric) ANOVA.
d.   Nothing, I can’t do anything with these data.
e.   Run 3 nonparametric ttests.
     Review Question 7
     If depression score turns out to be very non-normal,
     then what should I do?


a.   Don’t worry about it—run an ANOVA anyway.
b.   Test depression for normality.
c.   Use a Kruskal-Wallis (non-parametric) ANOVA.
d.   Nothing, I can’t do anything with these data.
e.   Run 3 nonparametric ttests.
  Review Question 7
   If depression score turns out to be very non-normal,
   then what should I do?


a. Don’t worry about it—run an ANOVA anyway.
b. Test depression for normality.
c. Use a Kruskal-Wallis (non-parametric)
   ANOVA.
d. Nothing, I can’t do anything with these data.
e. Run 3 nonparametric ttests.
     Review Question 8
     I measure blood pressure in a cohort of elderly men
     yearly for 3 years. To test whether or not their blood
     pressure changed over time, I compare the mean blood
     pressures in each time period using a one-way ANOVA.
     This strategy is:

a.   Correct. I have three means, so I have to use ANOVA.
b.   Wrong. Blood pressure is unlikely to be normally distributed.
c.   Wrong. The variance in BP is likely to greatly differ at the three
     time points.
d.   Correct. It would also be OK to use three ttests.
e.   Wrong. The samples are not independent.
     Review Question 8
     I measure blood pressure in a cohort of elderly men
     yearly for 3 years. To test whether or not their blood
     pressure changed over time, I compare the mean blood
     pressures in each time period using a one-way ANOVA.
     This strategy is:

a.   Correct. I have three means, so I have to use ANOVA.
b.   Wrong. Blood pressure is unlikely to be normally distributed.
c.   Wrong. The variance in BP is likely to greatly differ at the three
     time points.
d.   Correct. It would also be OK to use three ttests.
e.   Wrong. The samples are not independent.

				
DOCUMENT INFO