# Anova

Shared by:
Categories
Tags
-
Stats
views:
5
posted:
2/8/2012
language:
pages:
38
Document Sample

```							ANOVA: Analysis of Variance

Xuhua Xia
xxia@uottawa.ca
http://dambe.bio.uottawa.ca
Ronald A. Fisher (1890-1962)
Head of the statistics Division at the Rothamsted Experimental
Station in Hertfordshire. One of the three founders of theoretical
population genetics. Developer of statistical methods, especially the
likelihood methods. Published The Genetical Theory of Natural
Selection in 1930, in which he proposed the fundamental theory of
natural selection:

“To call in a statistician after the experiment is done may be no more
than asking him to perform a postmortem examination; he may be able
to say what the experiment died of.”

Xuhua Xia
Analysis of Variance (ANOVA)
• ANOVA was mainly developed by Ronald A. Fisher
• The F statistic was named after him.
• The essence of ANOVA is to partition the total variation
into its components.
• Assumptions
– Normality
– Equal variance among treatment groups
• Alternative methods

Xuhua Xia
One-way ANOVA Model

xij =  + i + ij vs. xij =  + ij

Is this effect zero?
This is the same model for t-test, except that the subscript i is
1 and 2 in t-test, but 1, 2, ..., n in one-way ANOVA

Xuhua Xia
t-test and ANOVA
Male    Female
193       175
188       173
185       168
183       165
180       163    Groups    Count       Sum    Average Variance
178             Male               7     1277 182.4286 54.95238
Female             5      844    168.8     26.2
170

n                      7        5   ANOVA
Mean            182.4286    168.8    Source       SS        df        MS        F     P-value
SS              329.7143    104.8              541.7357
Between Groups                1 541.7357 12.46762 0.005438
Pooled Var      43.45143                       434.5143
Within Groups                10 43.45143
PooledSE        3.859745
t               3.530951            Total       976.25           11
df                    10
P                 0.0054
Equal Var.?        P=      0.4939

Xuhua Xia
Variance and Sum of Squares

N

x      i
x       i 1
N

N

 ( xi  x ) 2   Sum of Squared Deviations
s2       i 1
N 1     Degree of Freedom

Xuhua Xia
Partition of Variance

Between-group       Grand
deviation           Mean
X

X1           X2              X3

Within-group
deviation
Xuhua Xia
Numerical Illustration of One-Way ANOVA

Treatment
Low-fat food          Medium-fat food   High-fat food
Weight Gain     0 1                   4 5               8 9
2 1                   6 5               10 9
Mean            1                     5                 9
SSB             2(1-5)2=32            2(5-5)2=0         2(9-5)2=32
SSW             (0-1)2+(2-1)2 = 2     2                 2

Grand Mean = (0 + 2 + 4 +…+ 10) / 6 = 5                 Now repeat the
SST = (0-5)2 + (2-5)2 +…+ (10-5)2 = 70, with df = 5     ANOVA computation
SSB = 32 + 0 + 32 = 64, with df = 2
SSW = 2 + 2 + 2 = 6, with df = 3
the numbers in red.
MSB = 64/2 = 32                                         Email me SSB, SSW,
MSW = 6/3 = 2
DFnum, and DFdenom.
F = MSB/MSW = 16, DFnum = 2, DFdenom = 3, p = 0.0251

Xuhua Xia
ANOVA Table

Dependent variable: Weight Gain

Source     DF    SS      MS    F   p

Model      2     64.0 32.0 16.0    0.0251

Error      3       6.0   2.0

Total      5     70.0

Xuhua Xia
Empirical F distribution
Mean1 s12                                Mean2 s22   s12/s22

2.4 3                                2.6   3      1

3.0 3                                2.9   2       1.5

F-distribution                                    1.4
0.8
0.7
0.6                                                                1.6
0.5
0.4
f

0.3                                                                0.6
0.2
0.1
0
...
0   0.5    1    1.5       2   2.5   3   3.5
Xuhua Xia                          F
One-way experimental design

Low-fat food            Medium-fat food High-fat food

Weight 0                           4                        8
gain   2                           6                        10
The null hypothesis H0: X1 = X2 = X3 is rejected. The three kinds of food
differ significantly in their effect on weight gain of rabbits. In particular,
Medium-fat and High-fat foods are significantly better than Low-fat food.
However, Medium-fat and High-fat foods do not differ in their effect on
rabbit weight gain.
Xuhua Xia
Assumptions
75         82
76         80
80         85
77         85
80         78
77         87
73         82
77         82
200
n                   8            9
8
Mean           76.875       82.625
95.667
Var             5.554        8.554
1538.250
GrandMean      86.824
79.750       79.750
86.824
SST           231.000
13840.471                  subtotal
SSB            66.125
791.786    66.125
703.810        132.250
1495.596
SSW                      59.875
38.875 12306.000         98.750
12344.875
dfT                                         16
15
dfB                                          1
dfW                                         15
14
MSB                                    132.250
1495.596
MSW                                      7.054
822.992
F=          1.817
18.749            P=        0.1976
0.0007
Xuhua Xia
Paired-sample t-test: 3
Using blocks to reduce confounding environmental factors (Everything else
being equal except for the treatment effect) in evaluating the protein content
of two wheat variaties.

1
1
2
1                   Block 1
2
1
1
1                   Block 2
1
2
2
2                   Block 3
2
2
1
2                   Block 4

How should we allocate the two crop varieties to the plots? What comparison would be fair?
Xuhua Xia
Randomized Complete Blocks: Plots
Using blocks to reduce confounding environmental factors (Everything else
being equal except for the treatment effect).

1
1                 1
4                                       1
2
1
3                                   Block 1
2
2                2                                         2
1                2
4                        3           Block 2
3
3                3                                         3
2                3
1                        4           Block 3
4
3                 4                                        41
2                4
4                                   Block 4

The three crop varieties are randomly allocated to the plots within each block.

Xuhua Xia
Randomized complete blocks
Which of the six strains of clover has the highest protein content? The experimenter
divided his field into 5 relatively homogenous blocks each with 6 plots, and randomly
assigned his 6 strains to the 6 plots within each block. After harvesting, he determined
the nitrogen content for each strain in each plot.
If only two strains:

Block 1       3dok13 3dok4 compo 3dok5 3dok1 3dok5                   3dok13     3dok4

Block 2        compo 3dok1 3dok4 3dok13 3dok7 3dok13                 3dok13     3dok4

Block 3        compo compo 3dok1 3dok5             3dok1 3dok5       3dok13     3dok4

Block 4        3dok4 3dok13 3dok4        3dok7 3dok4 3dok7           3dok13     3dok4

Block 5        3dok1 3dok5 3dok13 3dok13 3dok7 compo                 3dok13     3dok4

Xuhua Xia
Bartlett’s Test
Feed 1      Feed 2    Feed 3    Feed 4                 The null hypothesis for
60.8        68.7     102.6      87.9              the F-test (or variance
57        67.7     102.1      84.2              ratio test):
65          74     100.2      83.1
58.6        66.3      96.5      85.7              H0 : v1 = v 2.
61.7        69.8                90.3
k                       4   <==Number of groups           SUM          The null hypothesis for
n                       5           5         4         5       19
Bartlett’s or Levene test:
SS                 37.568       34.26     22.97    33.552   128.35
v                       4           4         3         4       15     H0: v1 = v2 = ... = vn.
Inversev             0.25        0.25 0.333333       0.25 1.083333
Var                 9.392       8.565 7.656667      8.388
lnVar            2.239858    2.147684 2.035577 2.126802
v*lnVar          8.959433    8.590737 6.10673 8.507208 32.16411
PooledVar        8.556667
lnPooledVar      2.146711                                              The formulae in this
B                0.036552   <==More accurate than that in Zar (1996)
C                1.112963
sheet use defined
Bc               0.032842                                              variables in EXCEL:
P                0.998433
Insert|name|define
Xuhua Xia
Do Six Strains of Clover Differ?

Class    Levels      Values

STRAIN        6      3dok1 3dok13 3dok4 3dok5 3dok7 compos

Number of observations in data set = 30

Analysis of Variance Procedure

Dependent Variable: NITROGEN
Sum of          Mean
Source                   DF         Squares        Square    F Value   Pr > F

Model                     5       847.046667   169.409333      14.37   0.0001
Error                    24       282.928000    11.788667
Corrected Total          29      1129.974667

R-Square             C.V.      Root MSE       NITROGEN Mean
0.749616         17.26515       3.43346             19.8867

Xuhua Xia
Multiple Comparison
Duncan's Multiple Range Test for variable: NITROGEN

NOTE: This test controls the type I comparisonwise error rate, not
the experimentwise error rate

Alpha= 0.05   df= 24     MSE= 11.78867

Difference spanning Number of Means   2     3     4     5     6
Critical Range 4.482 4.707 4.852 4.954 5.031

Means with the same letter are not significantly different.

Duncan Grouping                 Mean          N   STRAIN
A               28.820          5   3dok1
B               23.980          5   3dok5
C       B               19.920          5   3dok7
C       D               18.700          5   compos
E       D               14.640          5   3dok4
E                       13.260          5   3dok13

Means are arranged in descending order.
Xuhua Xia
Comparisonwise & Experimentwise Errors

•      Type I comparisonwise error rate is the probability of a Type I error for an individual test of
hypothesis, symbolized by c.
•      Type I experimentwise error rate is the probability of making at least one Type I error for a set
of hypothesis tests, symbolized by e.
•      If c = 0.05, and N hypotheses are tested, then e  1 – (1 - c)N.
•      For 5 treatments in our case, there are a total of 10 pairwise comparisons between means. Thus,
c = 0.05 would imply e  0.40. That is, if all means are in fact equal, there is roughly a
probability of 0.4 that at least one hypothesis will be incorrectly rejected.
•      If we are to control the experimentwise error rate below 0.05, we can set e = 0.05:

•               e  1 – (1 - c)N = 1 – (1 - c)10 = 0.05

•      and solve the equation, which yield c = 0.005. This of course would increase the difficulty to
reject a null hypothesis, even if the null hypothesis is false.

Xuhua Xia
SAS output: I
Dependent Variable: nitrogen
Sum of
Source                   DF         Squares    Mean Square     F Value   Pr > F
Model                     9   1045.201333         116.133481     27.40   <.0001
Error                    20        84.773333       4.238667
Corrected Total          29   1129.974667

R-Square    Coeff Var       Root MSE      nitrogen Mean
0.924978    10.35268        2.058802           19.88667

Source                   DF        Anova SS    Mean Square     F Value   Pr > F
strain                    5   847.0466667      169.4093333       39.97   <.0001
Block                     4   198.1546667         49.5386667     11.69   <.0001

Xuhua Xia
Multiple Comparison
Duncan's Multiple Range Test for nitrogen

NOTE: This test controls the Type I comparisonwise error rate, not the
experimentwise error rate.

Alpha = 0.05, DFE = 20 MSE = 4.238667

Number of Means          2          3          4          5          6
Critical Range       2.716      2.851      2.937      2.997      3.041
Means with the same letter are not significantly different.
Duncan Grouping          Mean      N    strain
A        28.820      5    3dok1
B        23.980      5    3dok5
C        19.920      5    3dok7
C        18.700      5    compos
D        14.640      5    3dok4
D        13.260      5    3dok13

Xuhua Xia
Ex. ANOVA with repeated measures

Subjects         Drug 1            Drug 2          Drug 3
1            164               152             178
2            202               181             222
3            143               136             132
4            210               194             216
5            228               219             245
6            173               159             182
7            161               157             165
What is the treatment effect? What is the block?
Analyze the data with SAS. Write a concise 1-page report. Submit at the
beginning of the next class in hardcopy.
Xuhua Xia
Two-way experimental design
Testing the effect of food and sex on rabbit food consumption

Fresh food         Rancid food

Male
695.67             535.33

Female
642.67              517.33

Food               709, 679, 699      592, 538, 476
Consumed
657, 594, 677      508, 505, 539
Xuhua Xia
What is the interaction effect?
Dependent Variable: CONSUMED
Sum of            Mean
Source              DF         Squares          Square F Value     Pr > F

Model                3    65903.5833     21967.8611       15.06    0.0012

Error                8    11666.6667         1458.3333

Corrected Total     11    77570.2500

R-Square           C.V.        Root MSE       CONSUMED Mean

0.849599     6.388646           38.1881             597.750

Source              DF      Anova SS     Mean Square     F Value   Pr > F
FOOD                 1    61204.0833      61204.0833       41.97   0.0002
SEX                  1     3780.7500       3780.7500        2.59   0.1460
FOOD*SEX             1      918.7500        918.7500        0.63   0.4503

Xuhua Xia
What is Interaction?

When the effect of FOOD is independent of SEX, e.g., when fresh food is
preferred by both males and females to the same extent, then there is no
interaction term. When the effect of FOOD depends on SEX, e.g., when males
eat more fresh food than rancid food but females eat less rancid food than fresh
food, then there is an interaction effect.

1600                                        700
1400

Consumption
Consumption

1200                                        650
1000
800                                        600
600
400                                        550
200
0                                         500
Male          Female                       Male         Female
Sex                                       Sex

Xuhua Xia
Interaction Effect: Example

Fresh food      Rancid food

Male
568.67          695.67

Female
642.67          517.33

Food             592, 538, 576   709, 679, 699
Consumed
657, 594, 677   508, 505, 539

Xuhua Xia
Significant Interaction
Dependent Variable: CONSUMED
Sum of            Mean
Source        DF        Squares          Square   F Value    Pr > F
Model          3     55920.2500      18640.0833     23.06    0.0003
Error          8      6466.6667        808.3333
Total         11     62386.9167

R-Square          C.V.         Root MSE      CONSUMED Mean
0.896346      4.690973          28.4312            606.083

Source       DF          Anova SS   Mean Square   F Value    Pr > F

FOOD          1     47754.0833      47754.0833      59.08    0.0001
SEX           1         2.0833          2.0833       0.00    0.9608
FOOD*SEX      1      8164.0833       8164.0833      10.10    0.0130

Can we conclude that SEX has no effect on food consumption?
Xuhua Xia
SAS Program for two-way ANOVA
proc format;
value sexLevel 1='male' 2='female';             Ex.
value foodLevel 1='fresh' 2='rancid';
data assign63;                                    1. Rewrite the “data” block
do food=1 to 2;
do sex=1 to 2;                                  of the SAS program by
do n=1 to 3;                                   using:
input Consumed @@;
output;
end;
data assign63;
end;                                            input food sex consumed;
end;
format sex sexLevel. food foodLevel.;
cards;
cards;
709 679 699 657 594 677 592 538 476 508 505 539   ......
;
proc anova;                                       ;
class food sex;
model Consumed=food|sex;                       2. Run the resulting program
means food / duncan;
run;                                              to check if the rewriting is
Xuhua Xia                                     correct.
Three-way ANOVA
Race          Sex      Fresh      Rancid

Short-ear     Male
647.5      515.5

Female
611        500.5

Long-ear      Male
706        594.5

Female
652.5      548
Short-ear    Male     650, 645   511, 520
Female   610, 612   500, 501
Long-ear     Male     700, 712   601, 588
Female   650, 655   550, 546
Xuhua Xia
SAS Program
data assign71;
proc format;                              input race sex food Consumed;
value sex 1='male' 2='female';           cards;
value food 1='fresh' 2='rancid';         1 1 1 650
value race 1='short-ear' 2='long-ear';   1 1 1 645
1 1 2 511
1 1 2 520
1 2 1 610
format sex sex. food food. race race.;    1 2 1 612
1 2 2 500
1 2 2 501
2 1 1 700
2 1 1 712
Optional, but will                      2 1 2 601
increase clarity in the                 2 1 2 588
2 2 1 650
output                                  2 2 1 655
Need to be in a new line,
2 2 2 550
i.e., not
2 2 2 546
;                       2 2 2 546;
proc anova;
class food sex race;
Xuhua Xia                                 model Consumed=food|sex|race;
ANOVA Table
Dependent Variable: CONSUMED
Sum of           Mean
Source               DF        Squares         Square   F Value   Pr > F

Model                 7      72138.4375   10305.4911     354.60   0.0001
Error                 8        232.5000      29.0625
Corrected Total      15      72370.9375

R-Square           C.V.      Root MSE       CONSUMED Mean
0.996787       0.903104       5.39096             596.938

Source               DF        Anova SS   Mean Square   F Value   Pr > F

FOOD                  1      52555.5625   52555.5625    1808.36   0.0001
SEX                   1       5738.0625    5738.0625     197.44   0.0001
FOOD*SEX              1        203.0625     203.0625       6.99   0.0296
RACE                  1      12825.5625   12825.5625     441.31   0.0001
FOOD*RACE             1        175.5625     175.5625       6.04   0.0395
SEX*RACE              1        588.0625     588.0625      20.23   0.0020
FOOD*SEX*RACE         1         52.5625      52.5625       1.81   0.2156

Xuhua Xia
SAS program listing
data assign71;
data assign71;                         input race sex food Consumed;
do race=1 to 2;                        cards;
do sex=1 to 2;                       1 1 1 650
do food=1 to 2;                     1 1 1 645
do n=1 to 2;                       1 1 2 511
input Consumed @@;                 1 1 2 520
output;                            1 2 1 610
end;                               1 2 1 612
end;                                1 2 2 500
end;                                 1 2 2 501
end;                                   2 1 1 700
cards;                                 2 1 1 712
650 645 511 520 610 612 500 501        2 1 2 601
700 712 601 588 650 655 550 546        2 1 2 588
;                                      2 2 1 650
proc anova;                            2 2 1 655
class food sex race;                2 2 2 550
model Consumed=food|sex|race;       2 2 2 546
run;                                   ;

Xuhua Xia
The Efficacy of Prayer
Class                                              N                    Mean
Members of Royal family                            97                   64.04
Clergy                                             945                  69.49
Lawyers                                            294                  68.14
Medical Profession                                 244                  67.31
English aristocracy                                1179                 67.31
Gentry                                             1632                 70.22
Officers in the Royal Navy                         366                  68.40
English literature and science                     395                  67.55
Officers of the Army                               569                  67.07
Fine arts                                          239                  65.96
Galton’s data could be analyzed by an one-way ANOVA. One
criterion for a good ANOVA design is that everything else being
equal except for the treatment effect. Does the data set above satisfy
this criterion?

Other data collected by Galton:
1. Rate of successful delivery between church-going parents and others
2. Life span of believers and non-believers from insurance companies
Xuhua Xia
(1822-1911)
Model I and Model II ANOVA
1. Model I ANOVA tests the differential effects of the fixed treatment.
xij =  + i + ij
where i stands for fixed treatment effects (e.g., between male and femle).

2.   Model II ANOVA tests the differential effects of a random variable and estimates its
contribution to total variance relative to that from measurement errors (for facilitating
experimental design).
xij =  + Ai + ij
where Ai stands for random treatment effects (e.g., between randomly sampled rabbits).
Metabolic rate in rabbit liver cells, taken for two samples of liver tissue

Replicate

1                         2                          1                         3
2                         6                          7                         5
How can we optimize the experiment? More rabbits or more replicates?

Xuhua Xia
Determining Calcium Content in Leaves

3.28 3.52 2.88   2.46 1.87 2.19   2.77 3.74 2.55   3.78 4.07 3.31

3.09 3.48 2.80   2.44 1.92 2.19   2.66 3.44 2.55   3.87 4.12 3.32

Xuhua Xia
SAS Program
data turnip;
Input plant leaf calcium @@;
cards;
1 1 3.28 1 1 3.09 1 2 3.52 1 2 3.48
1 3 2.88 1 3 2.80 2 1 2.46 2 1 2.44
2 2 1.87 2 2 1.92 2 3 2.19 2 3 2.19
3 1 2.77 3 1 2.66 3 2 3.74 3 2 3.44
3 3 2.55 3 3 2.55 4 1 3.78 4 1 3.87
4 2 4.07 4 2 4.12 4 3 3.31 4 3 3.31
;
proc nested;
class plant leaf;
var calcium;
run;
proc glm;
class plant leaf;
model calcium=plant leaf(plant);
run;
Xuhua Xia
SAS Output: NESTED
Nested Random Effects Analysis of Variance for Variable CALCIUM

Variance            DF      Sum of                       Error
Source                     Squares    F Value   Pr > F   Term
TOTAL               23   10.270396
PLANT                3    7.560346      7.665   0.0097   LEAF
LEAF                 8    2.630200     49.409   0.0000   ERROR
ERROR               12    0.079850

Variance                        Variance           Percent
Source        Mean Square      Component          of Total

TOTAL         0.446539          0.532938          100.0000
PLANT         2.520115          0.365223           68.5302
LEAF          0.328775          0.161060           30.2212
ERROR         0.006654          0.006654            1.2486

Mean                            3.01208333
Standard error of mean          0.32404445
Xuhua Xia
SAS Output: GLM
Dependent Variable: calcium
Sum of
Source               DF       Squares    Mean Square   F Value   Pr > F
Model                11   10.19054583     0.92641326    139.22   <.0001
Error                12    0.07985000     0.00665417
Corrected Total      23   10.27039583

R-Square    Coeff Var      Root MSE      calcium Mean
0.992225     2.708195      0.081573          3.012083

Source               DF     Type I SS   Mean Square    F Value   Pr > F
plant                 3    7.56034583    2.52011528     378.73   <.0001
leaf(plant)           8    2.63020000    0.32877500      49.41   <.0001

Source               DF   Type III SS   Mean Square    F Value   Pr > F
plant                 3    7.56034583    2.52011528     378.73   <.0001
leaf(plant)           8    2.63020000    0.32877500      49.41   <.0001

Xuhua Xia

```
Related docs
Other docs by huanghengdong
ME6105_Homework_4