14 – Comparing Several Population Means
14.1 - One-way Analysis of Variance (ANOVA)
Suppose we wish to compare k population means ( k 2 ). This situation can arise in two
ways. If the study is observational, we are obtaining independently drawn samples from
k distinct populations and we wish to compare the population means for some numerical
response of interest. If the study is experimental, then we are using a completely
randomized design to obtain our data from k distinct treatment groups. In a completely
randomized design the experimental units are randomly assigned to one of k treatments
and the response value from each unit is obtained. The mean of the numerical response
of interest is then compared across the different treatment groups.
There two main questions of interest:
1) Are there at least two population means that differ?
2) If so, which population means differ and how much do they differ by?
H o : 1 2 ... k i.e. all the population means are equal and have a commonmean ( )
H a : i j for some i j, i.e. at least twopopulation means differ.
If we reject the null then we use comparative methods to answer question 2 above.
The test procedure compares the variation in observations between samples to the
variation within samples. If the variation between samples is large relative to the
variation within samples we are likely to conclude that the population means are not all
equal. The diagrams below illustrate this idea...
Between Group Variation >> Within Group Variation Between Group Variation Within Group Variation
(Conclude population means differ) (Fail to conclude the population means differ)
The name analysis of variance gets its name because we are using variation to decide
whether the population means differ.
Computational Details for Equal Sample Size Case ( n1 n2 nk n )
Standard statistical notation:
X ij jth observation from population i
X i sample mean from the population i
k n k
i 1 j 1
ij X i
X grand mean of all observations i 1
TWO ESTIMATES OF THE COMMON VARIANCE ( 2 )
Estimate 1: Using Between Group Variation
If the null hypothesis is true each of the X i ' s represents a randomly selected observation
from a normal distribution with mean (common mean) and std. deviation = , i.e.
the standard error of the mean.
Thus if we find the sample standard deviation of the X i ' s we get an estimate of ,
or in terms of the sample variance we have,
(X i X ) 2
Sample variance of the X i ' s i 1
is an estimate of .
k 1 n
n ( X i X ) 2
is an estimate of , if and only if the null hypothesis is true. However if the
alternative hypothesis is true, this estimate of 2 will be too BIG. This formula will
be large when there is substantial between group variation. This measure of between
group variation is called the Mean Square for Treatments and is denoted MS Treat .
Estimate 2: Using Within Group Variation
Another estimate of the common variance ( 2 ) can be found by looking at the variation
of the observations within each of the k treatment groups. By extending the pooled-
variance from the two population case we have the following:
(n 1)s12 (n2 1)s2 ... (nk 1)sk
Pooled estimate of 2 1
n1 n2 ... nk k
which simplifies to the mean of the k sample variances when the sample sizes are all
Another way to write this is as follows:
i 1 j 1
ij X i ) 2
Pooled estimate of 2 MS Error
This will be an estimate of the common population variance ( 2 ) regardless of whether
the null hypothesis is true or not. This is called the Mean Square for Error ( MS Error ) .
Thus if the null hypothesis is true we have two estimates of the common variance ( 2 ) ,
namely the mean square for treaments ( MS Treat ) and the Mean Square Error ( MS Error ) .
If MS Treat MS Error we reject H o , i.e. the between group variation is large relative to the
within group variation.
If MS Treat MS Error we fail to reject H o , i.e. the between group variation is NOT large
relative to the within group variation.
The test statistic compares the mean squares for treatment and error in the form of a ratio,
F ~ F - distribution with numerator df k 1 and denominator df N - k
Large values for F give small p-values and lead to rejection of the null hypothesis.
EXAMPLE 1 - Weight Gain in Anorexia Patients
Data File: Anorexia.JMP in the ANOVA/t-test JMP folder
Keywords: One-way ANOVA
These data give the pre- and post-weights of patients being treated for anorexia nervosa.
There are actually three different treatment plans being used in this study, and we wish to
compare their performance.
The variables in the data file are:
Treatment – Family, Standard, Behavioral
prewt - weight at the beginning of treatment
postwt - weight at the end of the study period
Weight Gain - weight gained (or lost) during treatment (postwt-prewt)
We begin our analysis by examining comparative displays for the weight gained across
the three treatment methods. To do this select Fit Y by X from the Analyze menu and
place the grouping variable, group, in the X box and place the response, Weight Gain, in
the Y box and click OK. Here boxplots, mean diamonds, normal quantile plots, and
comparison circles have been added.
Things to consider from this graphical display:
Do there appear to be differences in the mean weight gain?
Are the weight changes normally distributed?
Is the variation in weight gain equal across therapies?
Checking the Equality of Variance Assumption
To test whether it is reasonable to assume the population variances are equal for these
three therapies select UnEqual Variances from the Oneway Analysis pull down-menu.
Equality of Variance Test
H o : 12 2 k2
H a : the population variances are not
We have no evidence to conclude that the variances/standard deviations of the weight
gains for the different treatment programs differ (p >> .05).
ONE-WAY ANOVA TEST FOR COMPARING THE THERAPY MEANS
To test the null hypothesis that the mean weight gain is the same for each of the therapy
methods we will perform the standard one-way ANOVA test. To do this in JMP select
Means, Anova/t-Test from the Oneway Analysis pull-down menu. The results of the
test are shown in the Analysis of Variance box.
The mean square for treatments is 5.42 times larger than the
mean square for error! This provides strong evidence against
the null hypothesis.
The p-value contained in the ANOVA table is .0065, thus we reject the null hypothesis at
the .05 level and conclude statistically significant differences in the mean weight gain
experienced by patients in the different therapy groups exist.
14.2 - MULTIPLE COMPARISONS
Because we have concluded that the mean weight gains across treatment method are not
all equal it is natural to ask the secondary question:
Which means are significantly different from one another?
How large are the differences?
We could consider performing a series of two-sample t-Tests and constructing confidence
intervals for independent samples to compare all possible pairs of means, however if the
number of treatment groups is large we will almost certainly find two treatment means as
being significantly different. Why? Consider a situation where we have k = 7 different
treatments that we wish to compare. To compare all possible pairs of means (1 vs. 2, 1
k (k 1)
vs. 3, …, 6 vs. 7) would require performing a total of 21 two-sample t-Tests. If
we used P(Type I Error) .05 for each test we expect to make 21(.05) 1 Type I
Error, i.e. we expect to find one pair of means as being significantly different when in
fact they are not. This problem only becomes worse as the number of groups, k, gets
Experiment-wise Error Rate
Another way to think about this is to consider the probability of making no Type I Errors
when making our pair-wise comparisons. When k = 7 for example, the probability of
making no Type I Errors is (. 95 ) 21 .3406 , i.e. the probability that we make at least one
Type I Error is therefore .6596 or a 65.96% chance. Certainly this unacceptable! Why
would we conduct a statistical analysis when we know that we have a 66% of making an
error in our conclusions? This probability is called the experiment-wise error rate.
There are several different ways to control the experiment-wise error rate. One of the
easiest ways to control experiment-wise error rate is use the Bonferroni Correction. If
we plan on making m comparisons or conducting m significance tests the Bonferroni
Correction is to simply use as our significance level rather than . This simple
correction guarantees that our experiment-wise error rate will be no larger than . This
correction implies that our p-values will have to be less than , rather than , to be
considered statistically significant.
Multiple Comparison Procedures for Pair-wise Comparisons of k Population Means
When performing pair-wise comparison of population means in ANOVA there are
several different methods that can be employed. These methods depend on the types of
pair-wise comparisons we wish to perform. The different types available in JMP are
summarized briefly below:
Compare each pair using the usual two-sample t-Test for independent
samples. This choice does not provide any experiment-wise error rate
protection! (DON’T USE)
Compare all pairs using Tukey’s Honest Significant Difference (HSD)
approach. This is best choice if you are interested comparing each
possible pair of treatments.
Compare with the means to the “Best” using Hsu’s method. The best
mean can either be the minimum (if smaller is better for the response) or
maximum (if bigger is better for the response).
Compare each mean to a control group using Dunnett’s method.
Compares each treatment mean to a control group only. You must
identify the control group in JMP by clicking on an observation in your
comparative plot corresponding to the control group before selecting this
Multiple Comparison Options in JMP
EXAMPLE 1 - Weight Gain in Anorexia Patients (cont’d)
For these data we are probably interested in comparing each of the treatments to one
another. For this we will use Tukey’s multiple comparison procedure for comparing all
pairs of population means. Select Compare Means from the Oneway Analysis menu
and highlight the All Pairs, Tukey HSD option. Beside the graph you will now notice
there are circles plotted. There is one circle for each group and each circle is centered at
the mean for the corresponding group. The size of the circles are inversely proportional
to the sample size, thus larger circles will drawn for groups with smaller sample sizes.
These circles are called comparison circles and can be used to see which pairs of means
are significantly different from each other. To do this, click on the circle for one of the
treatments. Notice that the treatment group selected will be appear in the plot window
and the circle will become red & bold. The means that are significantly different from
the treatment group selected will have circles that are gray. These color differences will
also be conveyed in the group labels on the horizontal axis. In the plot below we have
selected Standard treatment group.
The results of the pair-wise comparisons are also contained the output window (shown
below). The matrix labeled Comparison for all pairs using Tukey-Kramer HSD identifies
pairs of means that are significantly different using positive entries in this matrix. Here
we see only behavioral cognitive therapy and standard therapy significantly differ in
terms of mean weight change.
The next table conveys the same information by using different letters to represent
populations that have significantly different means. Notice behavioral and standard
therapies are not connected by the same letter so they are significantly different.
Finally the CI’s in the Ordered Differences section give estimates for the differences in
the population means. Here we see that mean weight gain for patients in receiving
behavioral therapy is estimated to be between 2.09 lbs. and 13.34 lbs. larger than the
mean weight gain for patients receiving standard treatment (see highlighted section
EXAMPLE 2 – Butter Fat Content in Cow Milk
Data File: Butterfat-cows.JMP
This data set comes from the butter fat content for five breeds of dairy cows.
The variables in the data file are:
Breed – Ayshire, Canadian, Guernsey, Holstein-Fresian, Jersey
Age Group – two different age class of cows (1 = younger, 2 = older)
Butterfat - % butter fat found in the milk sample
We begin our analysis by examining comparative displays for the butter fat content
across the five breeds. To do this select Fit Y by X from the Analyze menu and place the
grouping variable, Breed, in the X box and place the response, Butterfat, in the Y box
and click OK. The resulting plot simply shows a scatter plot of butter fat content versus
breed. In many cases there will numerous data points on top of each other in such a
display making the plot harder to read. To help alleviate this problem we can stagger
or jitter the points a bit from vertical by selecting Jitter Points from the Display Options
menu. To help visualize breed differences we could add quantile boxplots, mean
diamonds, or mean with standard error bars to the graph. To do any or all of these select
the appropriate options from the Display Options menu. The plot at the top of the next
page has both quantile boxes and mean with error bars added.
We can clearly see the butter fat content is largest for Jersey and Guernsey cows and
lowest for Holstein-Fresian. There also appears to be some difference in the variation of
butter fat content as well. The summary statistics on the following page confirm our
findings based on the graph above. To obtain these summary statistics select the
Quantiles and Means, Std Dev, Std Err options from the Oneway Analysis menu at the
top of the window.
Summary Statistics for Butter Fat Content by Breed
To test the null hypothesis that the butter fat content is the same for each of the breeds we
can perform a one-way ANOVA test.
H o : Ayshire Canadian ... Jersey
H a : at least twobreeds have different mean butter fat content in their milk
Again the assumptions for one-way ANOVA are as follows:
1.) The samples are drawn independently or come from using a completely
randomized design. Here we can assume the cows sampled from each breed
were independently sampled.
2.) The variable of interest is normally distributed for each population.
3.) The population variances are equal across groups. Here this means:
2 Ayshire 2 Canadian ... 2 Jersey
If this assumption is violated we can use Welch’s ANOVA, which allows
for inequality of the population variances or we can transform the response
To check these assumptions in JMP select UnEqual Variances and Normal Quantile
Plot > Actual by Quantile (the 1st option). To conduct the one-way ANOVA test select
Means, Anova/t-Test from the Oneway Analysis pull-down menu.
The graphical results and the results of the test are shown on the following page.
Butter Fat Content by Breed
Normality appears to be satisfied
The equality of variance test results are shown below.
We have strong evidence against
the equality of the population
(Use Welch’s ANOVA)
We conclude at least two means differ.
Because we have strong evidence against equality of the population variances we could
use Welch’s ANOVA to test the equality of the means. This test allows for the population
variances/standard deviations to differ when comparing the population means. For
Welch’s ANOVA we see that the p-value < .0001, therefore we have strong evidence
against the equality of the means and we conclude these dairy breeds differ in the mean
butter fat content of their milk.
Variance Stabilizing Transformations
Another approach that is often times used when we encounter situations where there is
evidence against the equality of variance assumption is to transform the response. Often
times we find that the variation appears to increase as the mean increases. For these data
we see that this is the case, the estimated standard deviations increase as the estimated
means increase. To correct this we generally consider taking the log of the response.
Sometimes the square root and reciprocal are used but these transformations do not allow
for any meaningful interpretation in the original scale, whereas the log transform does
(see Handout 14).
As we can see, as the samples means for the
groups increase so do the sample standard
deviations. To stabilize the variances/standard
deviations we can try a log transformation.
The results of the analysis using the log response are shown below.
Analysis of the log(Percent Butter Fat) Content for the Five Dairy Cow Breeds
Normality is still satisfied, there is no evidence against
the equality of the variances, and we have strong
evidence that at least two means differ.
Because we have concluded that the mean/median log butter fat content differs across
breed (p < .0001), it is natural to ask the secondary question: Which breeds have typical
values that are significantly different from one another? To do this we again use multiple
comparison procedures. In JMP select Compare Means from the Oneway Analysis
menu and highlight the All Pairs, Tukey HSD option. The results of Tukey’s HSD are
Below is the plot showing the results when clicking on the circle for Guernsey’s.
We can see that groups Guernsey’s and Jersey’s do not significantly differ from each
other, but they both differ significantly from the other three breeds.
The results of all pair-wise comparisons are contained in the matrix above. All pairs of
means that are significantly different will have positive entries in this matrix. Here we see
only Guernsey and Jersey do not significantly differ. The table using letters conveys the
same information, using different letters to represent populations that have significantly
different means. The CI’s in the Ordered Differences section give estimates for the
differences in the population means/medians in the log scale. As was the case with using
the log transformation with two populations, we interpret the results in the original scale
in terms of ratios of medians. For example, back transforming the interval comparing
Jersey to Holstein-Fresian, we estimate that the median butter fat content found in Jersey
milk is between 1.33 and 1.55 times larger than the median butter fat content in Holstein-
Fresian milk. We could also state this in terms of percentages as follows: we estimate
that the typical butter fat content found in Jersey milk is between 33% and 55% higher
than the butter fat content found in Holstein-Fresian milk.
15.3 - Randomized Complete Block (RCB) Designs
EXAMPLE 1 – Comparing Methods of Determining Blood Serum Level
Data File: Serum-Meth.JMP
The goal of this study was determine if four different methods for determining blood
serum levels significantly differ in terms of the readings they give. Suppose we plan to
have 6 readings from each method which we will then use to make our comparisons.
One approach we could take would be to find 24 volunteers and randomly allocate six
subjects to each method and compare the readings obtained using the four methods.
(Note: this is called a completely randomized design). There is one major problem with
this approach, what is it?
Instead of taking this approach it would clearly be better to use each method on the same
subject. This removes subject to subject variation from the results and will allow us to
get a clearer picture of the actual differences in the methods. Also if we truly only wish
to have 6 readings for each method, this approach will only require the use of 6 subjects
versus the 24 subjects the completely randomized approach discussed above requires,
thus reducing the “cost” of the experiment.
The experimental design where each patient’s serum level is determined using each
method is called a randomized complete block (RCB) design. Here the patients serve as
the blocks; the term randomized refers to the fact that the methods will be applied to the
patients blood sample in a random order, and complete refers to the fact that each method
is used on each patients blood sample. In some experiments where blocking is used it is
not possible to apply each treatment to each block resulting in what is called an
incomplete block design. These are less common and we will not discuss them in this
The table below contains the raw data from the RCB experiment to compare the serum
Subject 1 2 3 4
1 360 435 391 502
2 1035 1152 1002 1230
3 632 750 591 804
4 581 703 583 790
5 463 520 471 502
6 1131 1340 1144 1300
Visualizing the need for Blocking
Select Fit Y by X from the Analyze menu and place Serum Level in the Y, Response box
and Method in the X, Factor box. The resulting comparative plot is shown below. Does
there appear to be any differences in the serum levels obtained from the four methods?
This plot completely ignores the fact that the same six blood samples were used for each
method. We can incorporate this fact visually by selecting Oneway Analysis >
Matching Column... > then highlight Patient in the list. This will have the following
effect on the plot.
Now we can clearly see that ignoring the fact the same six blood samples were used for
each method is a big mistake!
On the next page we will show how to correctly analyze these data.
Correct Analysis of RCB Design Data in JMP
First select Fit Y by X from the Analyze menu and place Serum Level in the Y,
Response box, Method in the X, Factor box, and Patient in the Block box. The results
from JMP are shown below.
Notice the Y axis is “Serum Level – Block Centered”. This means that what we are
seeing in the display is the differences in the serum level readings adjusting for the fact
that the readings for each method came from the same 6 patients. Examining the data in
this way we can clearly see that the methods differ in the serum level reported when
measuring blood samples from the same patient.
The results of the ANOVA clearly show we have strong evidence that the four methods
do not give the same readings when measuring the same blood sample (p < .0001).
The tables below give the block corrected mean for each method and the block means
used to make the adjustment.
As was the case with one-way ANOVA (completely randomized) we may still wish to
determine which methods have significantly different mean serum level readings when
measuring the same blood sample. We can again use multiple comparison procedures.
Select Compare Means... > All Pairs, Tukey’s HSD.
We can see that methods 4 & 2 differ significantly from methods 1 & 3 but not each
other. The same can be said for methods 1 & 3 when compared to methods 4 & 2. The
confidence intervals quantify the size of the difference we can expect on average when
measuring the same blood samples. For example, we see that method 4 will give
between 90.28 and 255.05 higher serum levels than method 3 on average when
measuring the same blood sample. Other comparisons can be interpreted in similar
EXAMPLE 2: Exercise Performance and Ipratropium Bromide
A study by Ikeda et al., (Thorax, 1996), was designed to determine the does of
ipratropium bromide aerosol that improves exercise performance using progressive cycle
ergometry in patients with stable chronic obstructive pulmonary disease. The mean age
of the 20 male subjects was 69.2 years with a standard deviation of 4.6 years. Among the
data collected were the maximum ventilation (VEmax, L/min.) values at maximum
achieved exercise for difference ipratropium bromide dosage levels ( g).
Data File: VE max IBA
A portion of the data table in JMP
Each dose (0 or placebo, 40, 80, 160, and
240) was given to the subjects in a random
order. Subjects are the blocking factor,
Treatment (i.e. dose) is the factor of interest,
and VE max is the response.
a) Determine if the mean maximum ventilation differs significantly as a function of dose.
b) Which dosage levels, if any, significantly differ from placebo?
c) Assuming that the larger VE values are best, what dosage would you recommend?
To analyze these data begin by select Analyze > Fit Y by X and place VE max in the Y
box, Treatment (dose) in the X box, and Subject (i.e. blocks) in the Block box as shown
The resulting output is shown on the following page.
Analysis of Block Design
There is a significant treatment (dose) effect on the mean VE max (p = .0022). The block
effect also appears to be quite large.
To compare the different dosages to placebo we can use Dunnett’s procedure. First click
on a point in the placebo group to identify that group as control and then select Compare
Means > With Control, Dunnett’s which will produce the output shown below.
Comparisons of different doses vs. placebo
Here the results from Dunnett’s Methods
show two dosesg and 240 g, are
significantly different from placebo.
By comparing the means in a pair-wise fashion using Tukey’s HSD we see that two
highest dose levels do not significantly differ we could argue that the “optimal” dose
would be 160 g because it would be “cheaper” in some sense than the higher dose.
Tukey’s HSD Results
15.4 - Nonparametric Approach: Kruskal-Wallis Test
If the normality assumption is suspect or the sample sizes from each of the k populations
are too small to assess normality we can use the Kruskal-Wallis Test to compare the size
of the values drawn from the different populations. There are two basic assumptions for
the Kruskal-Wallis test:
1) The samples from the k populations are independently drawn.
2) The null hypothesis is that all k populations are identical in shape, with the
only potential difference being in the location of the typical values (medians).
H o : All k populations have the same median or location.
H a : At least one of the populations has a median different from other others
At least one population is shifted away from the others.
To perform to the test we rank all of the data from smallest to largest and compute the
rank sum for each of the k samples. The test statistic looks at the difference between the
R N 1
average rank for each group i and average rank for all observations
n . If
there are differences in the populations we expect some groups will have an average rank
much larger than the average rank for all observations and some to have smaller average
R N 1
H ni n i 2 ~ k21 (Chi-square distribution with df = k-1)
N ( N 1) i 1 i
The larger H is the stronger the evidence we have against the null hypothesis that the
populations have the same location/median. Large values of H lead to small p-values!
Example: Movement of Gastropods (Austrocochlea obtusa)
Preliminary observations on North Stradbroke Island indicated that the gastropod Austrocochlea obtuse
preferred the zone just below the mean tide line. In an experiment to test this, A. obtusa were collected,
marked, and placed either 7.5 m above this zone (Upper Shore), 7.5 m below this zone (Lower Shore), or
back in the original area (Control). After two tidal cycles, the snails were recaptured. The distance each
had moved (in cm) from where it had been placed was recorded. Is there a significant difference among the
median distances moved by the three groups?
Enter these data into two
columns, one denoting the group
the other containing the recorded
movement for each snail.
R1 84 , R2 79 , R3 162 and H 7.25 (p-value = .0287).
We have evidence to suggest that the movement distances significantly differ between the groups.
To determine which groups significantly differ we can use the procedure outlined below. To determine if
group i significantly differs from group j we compute
Ri R j
ni n j and then compute p-value = P ( Z z ij ) .
N ( N 1) 1 1
12 ni n j
If the p-value is less then where m # of pair-wise comparisons to be made which would typically be
k k (k 1)
2 if all pair-wise comparisons are of interest. For this example, we can make a total of m =
3 3(3 1) .05
2 3 pair-wise comparisons so we compare our p-values to .00833 .
Comparing Upper Shore vs. Control
18.0 12.0 = 1.618 P(Z > 1.62) = .0526 > .00833 so we fail to conclude these locations
25(26) 1 1
12 7 9
significantly differ in terms of gastropod movement.
Comparing Upper Shore vs. Lower Shore
| 18.0 8.78 |
z 23 2.657 P(Z>2.66) = .0039 < .00833 so we conclude that these
25(26) 1 1
12 9 9
locations significantly differ in terms of gastropod movement. A similar comparison could be made to
compare lower shore to control, however it clearly will not be significant given previous results.
ONE-WAY ANOVA MODEL (FYI only)
In statistics we often use models to represent the mechanism that generates the observed
response values. Before we introduce the one-way ANOVA model we first must
introduce some notation.
ni sample size from group i , (i 1,...,k )
X ij the j th response value for i th group , ( j 1,..., ni )
mean common toall groups assuming the null hypothesisis true
i shift in the mean due to the fact the observation came from group i
ij random error for the j th observation from the i th group.
Note: The test procedure we use requires that the random errors are normally distributed
with mean 0 and variance 2 . Equivalently the test procedure requires that the response
is normally distributed with a common standard deviation for all k groups.
One-way ANOVA Model:
X ij i ij i 1,...,k and j 1,...,n i
The null hypothesis equivalent to saying that the i are all 0, and the alternative says that
at least one i 0 , i.e. H o : i 0 for all i vs. H a : i 0 for some i . We must decide
on the basis of the data whether we evidence against the null. We display this graphically
Estimates of the Model Parameters:
X ~ the grand mean
X ij i ij
ˆ ˆ ˆ
ˆi X i X ~ the estimate ith treatment effect
ˆ X ij X ( X i X ) ( X ij X i )
X X ~ these are called the fitted values.
ˆ ˆ i i
ij X ij X ij X ij X i ~ these are called the residuals.
An Alternative Derivation of the Test Statistic Based on the Model
X ij i ij
ˆ ˆ ˆ
X ij X ( X i X ) ( X ij X i )
( X ij X ) ( X i X ) ( X ij X i )
After squaring both sides we have,
( X ij X ) 2 ( X i X ) 2 ( X ij X i ) 2 2( X i X )( X ij X i )
Finally after summing overall of the observations and simplying we obtain.
k ni k k ni
( X ij X ) 2 ni ( X i X ) 2 ( X ij X i ) 2
i 1 j 1 i 1 i 1 j 1
These measures of variation are called “Sum of Squares” and are denoted SS. The above
expression can be written
SSTotal SSTreat SS Error
The degrees of freedom associated with each of these SS’s have the same relationship
and are given below.
df total = df treatment + df error
(n – 1) = (k – 1) + (n – k)
The mean squares (i.e. variance estimates) discussed above are found by taking the sum
of squares and dividing by their associated degrees of freedom, i.e.
SSTreat i i
n ˆ 2
i 1 which is an estimate of the common pop. variance ( ) only if Ho is true
k 1 k 1
MSError = 2 which is an estimate of the common population 2 regardless of Ho.
Thus when testing
H o : i 0 for all i.
H a : i 0 for some i.
we reject Ho when MSTreat >> MSError , i.e. when the estimated treatment effects are