Analysis of Variance: One-Way Design
Hypotheses involving three or more population means
The Hudson Company is a diversified manufacturer of
electronics products, both for the consumer and
manufacturing segments of the economy. The company
has always tried to develop numerous small plants as
opposed to a few, huge operations. Consequently, it has
plants in North America, Asia, and Europe. The company
has always maintained a decentralized management style,
but does set some corporate goals. A recent goal is to
increase the "local content" of goods and services each
plant purchases. The purpose of this goal is to add to the
stability of the local economy. The North American
division of the Hudson Company is divided into four areas.
At the yearly strategic planning meeting, the four regional
supervisors decided to determine whether any of the
regions were doing better than the others in their efforts to
increase the local content. Since "local content" is a
difficult value to measure, the supervisors decided to form
a working group and sample eight manufacturing sites in
each area. Since the working group would expect variation
to exist among the sites, it formulated the following null
and alternative hypotheses:
H0: 1 = 2 = 3 = 4
HA: Not all means are equal
Could this be done with six (all possible pairs of sales
regions) separate t tests?
1
The problem with using a series of t tests is that
although each test has an -level of 0.05, the true -
level for all tests combined is greater than 0.05.
As more tests are performed, the risk of rejecting at
least one true hypothesis is increased.
Plant Region 1 Region 2 Region 3 Region 4
1 4 7 5 12
2 6 9 11 8
3 10 13 6 7
4 12 8 8 9
5 5 9 4 11
6 8 14 9 10
7 4 7 10 6
8 7 5 11 9
Total: 56 72 64 72
Average: 7 9 8 9
Grand Total: 264
Grand Mean: 8.25
WITHIN SAMPLE VARIATION SSW
+ BETWEEN SAMPLE VARIATION + SSB
___________________________________ ______
TOTAL SAMPLE VARIATION TSS
k
SSB = nj (xbarj - Grand Mean)2
j=1
k nj
SSW = (xij - xbarj)2
j=1 i=1
2
SSW:
9 4 9 9
1 0 9 1
9 16 4 4
25 1 0 0
4 0 16 4
1 25 1 1
9 4 4 9
0 16 9 0
58 66 52 28 SUM = 204
SSB:
12.5 4.5 0.5 4.5 SUM = 22
TSS: 204 + 22 = 226
k ni
or TSS = (xij - Grand Mean)2
i=1 j=1
18.0625 1.5625 10.5625 14.0625
5.0625 0.5625 7.5625 0.0625
3.0625 22.5625 5.0625 1.5625
14.0625 0.0625 0.0625 0.5625
10.5625 0.5625 18.0625 7.5625
0.0625 33.0625 0.5625 3.0625
18.0625 1.5625 3.0625 5.0625
1.5625 10.5625 7.5625 0.5625 SUM = 226
MSB = SSB / k-1 22 / (4-1) = 7.3333
MSW = SSW / n - k 204 / (32-4) = 7.2857
Test Statistic = F = MSB / MSW = 7.3333 / 7.2857 = 1.0065
3
Degrees of freedom: D1 = k - 1 = 4 - 1 = 3
D2 = n - k = 32 - 4 = 28
= 0.05
Fcritical = 2.95
Since the calculated F-value is smaller than the
critical F-value, the working group should not reject
the null hypothesis.
To solve the problem using EXCEL
click ANALYSIS TOOLS
then click ANOVA: SINGLE-FACTOR
for the INPUT RANGE paint over the four columns
of data (without variable names and observation
numbers)
for the OUTPUT RANGE draw a box in an empty
(suitable) spot of your spreadsheet
Specified how the data is GROUPED: by column or
row
Choose an ALPHA LEVEL
Click OK
4
Here is how my output looks:
Anova:
Single-Factor
Summary
Groups Count Sum Average Variance
Column 1 8 56 7 8.285714
Column 2 8 72 9 9.428571
Column 3 8 64 8 7.428571
Column 4 8 72 9 4
ANOVA
Source of
Variation
SS df MS F P-value F crit
Between Groups 22 3 7.3333 1.0065 0.40 2.95
Within Groups 204 28 7.2857
Total 226 31
5
ONE-WAY ANOVA
Consider a 10-year study in which a sample of 15 people has been
observed while using toothpaste 1, 2, or 3, respectively. Let us
assume that five of the participants have been randomly assigned
to each of the treatments and that the study has provided the
following data:
Treatment, j (type of toothpaste used)
1 2 3
Observation, i
1 19 20 18
2 15 25 12
3 22 22 16
4 17 19 17
5 19 23 15
Total 92 109 78
Sample means Xbar1 = Xbar2 = Xbar3 =
18.4 21.8 15.6
Grand Mean = Xdouble-bar = (18.4 + 21.8 + 15.6) / 3 = 18.6
H0: The mean number of cavities for all users of toothpaste 1 is
the same as that for all users of toothpaste 2 or 3;
that is, 1 = 2 = 3
HA: At least one of the population means is different from the
others.
6
We study the variation in the sample data listed in the r = 5
rows and c = 3 columns. This variation has two components:
Variation among columns: Explained by treatments
Variation within columns: Unexplained variation
Total sample variation =
Between-sample variation + Within-sample variation
c ni
TSS = (Xij - Xdouble-bar)2
i=1 j=1
c
SSB = ni ((Xibar - Xdouble-bar)2
i=1
c ni
SSW = (Xij - Xbar)2
i=1 j=1
7
Xij Grand mean Diff. Diff.^2
19 18.6 0.4 0.16
15 18.6 -3.6 12.96
22 18.6 3.4 11.56
17 18.6 -1.6 2.56
19 18.6 0.4 0.16
20 18.6 1.4 1.96
25 18.6 6.4 40.96
22 18.6 3.4 11.56
19 18.6 0.4 0.16
23 18.6 4.4 19.36
18 18.6 -0.6 0.36
12 18.6 -6.6 43.56
16 18.6 -2.6 6.76
17 18.6 -1.6 2.56
15 18.6 -3.6 12.96
TSS = 167.6
n mean Grand mean Diff. Diff.^2 *n
5 18.4 18.6 -0.2 0.04 0.2
5 21.8 18.6 3.2 10.24 51.2
5 15.6 18.6 -3 9 45
SSB = 96.4
SSW = TSS - SSB = 71.2
Mean square within = MSW = SSW / N - c = 71.2 / (15-3) = 5.93
Mean square between = MSB = SSB / c - 1 = 96.4 / 2 = 48.2
F = MSB / MSW = 48.2 / 5.93 = 8.13
8
D1 = c - 1 = 2
D2 = N - c = 12
Critical value 3.89 ( = 0.05)
F = 8.13 exceeds the critical value and H0 is rejected.
How about for F0.01? Critical value 6.93
F = 8.13 again exceeds the critical value and H0 is rejected.
EXCEL Solution:
Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
Column 1 5 92 18.4 6.8
Column 2 5 109 21.8 5.7
Column 3 5 78 15.6 5.3
ANOVA
Source of SS df MS F P-value F crit
Variation
Between Groups 96.4 2 48.2 8.12 0.00588 3.88529
Within Groups 71.2 12 5.93
Total 167.6 14
9
TUKEY'S METHOD OF MULTIPLE COMPARISONS
Once the ANOVA leads to rejecting H0 (that the means are
equal), decision makers need a method to determine which
means are not equal.
When the samples from the populations are the same size,
we can establish a T range (according to Tukey's method):
T range = T MSW where T = (1 / n)q
q = Value from the Studentized range table (Table 10),
given the level and D1=K & D2 = N - K degrees of
freedom (d.f)
n = Common sample size
Example: A security analyst for Oso, Toro, and Oso
evaluated the effect of a sample of five
different stock-trading rules on five series of simulated
trades. The following rates of return were obtained (%):
Series Buy & Sell on Buy on Sell on Buy on
Hold Good News Bad News Bad News Good News
1 32 17 -5 15 2
2 -11 23 8 -5 -10
3 14 15 2 -10 -5
4 9 7 12 8 2
5 16 13 10 -2 4
6 9 16 12 2 6
7 14 12 -3 8 11
8 -1 -3 5 -1 -14
9 17 29 16 11 5
10 6 1 -6 5 -8
10
Anova:
Single-Factor
Summary
Groups Count Sum Average Variance
Column 1 10 105 10.5 130.94
Column 2 10 130 13 91.333
Column 3 10 51 5.1 60.767
Column 4 10 31 3.1 59.656
Column 5 10 -7 -0.7 65.122
ANOVA
Source of Variation
SS df MS F P-value F crit
Between 1231.6 4 307.9 3.7749 0.0099 2.579
Groups
Within 3670.4 45 81.564
Groups =.05
Total 4902 49
11
Column 1 10.50 |X1 - X2| 2.50 T range
D1=K=5 |X3 - X4| 2.00 < T range
D2=N-K=45 |X3 - X5| 5.80 < T range
q is approx. 4.025 |X4 - X5| 3.80 < T range
T Range = 11.495
q is approx. 4.025 (between 4.04 and 3.98 in Table 10:
k = 5, v's = 40 and 60)
n = Common sample size = 10
T = (1 / n)q = (1 / 3.1622776) (4.025) = 1.2728
T range = T MSW = (1.2728)81.564 = 11.495
When the samples from the populations are not the
same size, we can use Scheffé's method of multiple
comparisons. We will not cover that method here.
12