Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

ECO391 Lecture Notes for Chapter 14 Section 1 by 9vAzUmed


									ECO391 009           Lecture Handout over Chapter 14.                 SPRING 2003

                        The One Factor ANOVA Model.
I.     Introduction and Theory.
II.    The Actual Test (with an example).
III.   Another Example ( exercise 14.1.18., page 564 )

ANOVA stands for “Analysis of Variance”.

General idea:
ANOVA can be used to test whether the means of K different populations are the same or not.

For example, a manager of Burger King may wonder how many employees to keep in his store
during noon hours on weekdays -- is the average number of customers (who come at noon)
different across weekdays?


Suppose we want to see if the average amount of studying per week is the same, despite your
year in school, i.e. does the amount of studying differ depending on if you are a freshman, a
sophomore, and junior, or a senior?

There is a population of students for each year in school so there are four populations of
interest. (K = number of populations)

If we are trying to see if class year influences the amount of studying we would refer to

       Study hours as the dependent variable and
       Class as the independent variable.

If the means are different across samples, it indicates a possible relationship between the
independent and dependent variables.

If the population parameter of interest is number of study hours per week, then we are testing:



ECO 391 009 SPRING 03 Chapter 14 ANOVA                                                          1
To test this hypothesis, we will take four random samples. (One sample will come from each
academic class.)

For each sample, we will calculate the average of that sample to use as a proxy for the
corresponding population means.
To test this hypothesis we will need to compare two types of variation:

       1) The variation across the four sample means. (variation between samples)
       (comparing the means across samples)
       2) The variation within each of the four samples. (variation within samples)
       (This component tells us how good an estimate each sample mean is of its corresponding
       population mean.)

The variation between shows how far (close) the samples are from each other. The bigger this
variation, the easier it is for us to reject Ho .

The variation within shows how well xbars approximate the true population means. If the
variation within the samples is huge, then xbars are not very good measures of population means
and , therefore, it diminishes our ability to see the difference among s using xbars. So, the
bigger the variation within, the more difficult it is to reject Ho.

In our example: We are trying to see if the amount you study depends upon your year in school.

By conducting a test that allows us to compare these two types of variation, we can see if class
standing is perhaps a determinant of how much we study. (i.e. does the average amount of
studying vary significantly across years in school?)

Suppose that we take a sample of size 5 from each of the four academic classes and the data
are as follows:

                                  Freshmen         Sophomores     Juniors        Seniors
Number of hours spent             10               20             7              15
studying per week                 5                12             22             12
                                  2                18             14             8
                                  7                9              10             11
                                  8                8              16             9
Sample Mean                       6.4= x(bar)1     13.4=x(bar)2   13.8=x(bar)3   11=x(bar)4
Sample Standard Deviation         3.05=s1          5.37=s2        5.76=s3        2.74=s4
Sample Variance                   9.3=s12          28.8=s22       33.2=s32       7.5=s42
nj is the sample size so in our example n1 = n2 = n3 = n4 = 5
The total sample size n = n1 + n2 + n3 + n4 = 5+5+5+5 = 20
        K   nj xj
X 
        j 1  n
ECO 391 009 SPRING 03 Chapter 14 ANOVA                                                             2
Total Sum of Squares of SST reflects total variation in the data.

               K    nj

SST   ( xij  X ) 2
               j 1 i 1

(Sum of Squared Distances between each point in all 4 samples and overall mean ( X ).

The total variation in the data is broken down into two parts:

1) The part that reflects error -- within sample variation
      Sum of Squares Within (SSW)
      We want this to be small so the sample mean is a good estimate of the population mean.

        SSW=SS1 +SS2 + SS3 +...+SSK where
SS i   ( xij  x j ) 2
        i 1

        In our example:
        SS1 =

        SS2 =

        SS3 =

        SS4 =
        An alternative formula:
SS j  (n j  1)S 2

         where Sj2 is the variance of the sample

        SS1 =

        SS2 =

        SS3 =

        SS4 =

SSW is also referred to as the error sum of squares as it refers to unexplained variation within
each of the samples.

In our example: SSW =

ECO 391 009 SPRING 03 Chapter 14 ANOVA                                                         3
2) The part that reflects variation between samples –- Sum of Squares Between, or SSB

SSB   n j ( x j  X ) 2
        j 1

A checkpoint: SSB and SSW should add up to the total variation SST:       SST = SSB+SSW

in our example SST =

What is our test statistic?

If we adjust SSB and SSW for their respective degrees of freedom and take their ratio,
then,   if the population means are really the same,             the result will be a random
variable distributed according to the F-distribution. This gives us the right to make our
conclusions about how likely to is to have the obtained evidence if Ho is correct.

(F-distribution was named in honor of Sir Ronald Fisher (1890-1962) who first studied this
distribution in the early 20’s.)

F-distribution has the following properties:

Correcting SSW and SSB for the degrees of freedom:

Mean Square Within also called Mean Squared Error = MSW

MSW = SSW/(n-K)                    (n-K) denominator degrees of freedom

        in our example MSW =

MSB = SSB/(K-1)                           (K-1) numerator degrees of freedom

in our example MSB =

ECO 391 009 SPRING 03 Chapter 14 ANOVA                                                         4
The test statistic is distributed according to the F distribution with n-K and K-1 d.f.

F = MSB/MSW (ratio of “the variation between samples” to “the variation within samples”)

The larger is the observed F-statistic, the more variation in population means that can be
attributed to the independent variable rather than error.

        In our example, F =

The most typical form of ANOVA table is as follows:

Source of          Sum of Squares        Degrees of     Mean Square        F
Variation                                Freedom
Between Group

Within Group


                              Actual One - Way (or one-factor) ANOVA Test:

1) Set up a null and alternative hypothesis:

Ho: 1=2=3=4   (The average amount of the study is the same across the four

H1: At least two of the population means differ.

(Let  = .05 )
Take a sample of size nj and calculate MSW and MSB

ECO 391 009 SPRING 03 Chapter 14 ANOVA                                                       5
2) The test statistic F:


3) Decision Rule: (Again, it is always a positive (right-tail) test.)

      Reject Ho is F> F,K-1,n-K

Critical F value is F,k-1,n-k         degrees of freedom: K-1 (numerator);   n-K (denominator)

from our example: F,k-1,n-k=

  Rejection Region:                  f(F)

                                 0                                F

4)The decision and conclusion

ECO 391 009 SPRING 03 Chapter 14 ANOVA                                                            6
                       Another practice exercise: Do 14. 1. 18 on page 564 of the text.

a)                     Sample means      Sample variances
Sample 1:              x1(bar) = 5.18     variance=0.262
Sample 2:               x2(bar) =4.25     variance= 0.177
Sample 3:               x3(bar) = 5.7     variance=0.368

b) for SSW

SS j  (n j  1)S 2
where Sj2 is the variance of the sample

SS1 = (5-1) (0.262)=
SS2 = (4-1)(0.177) =
SS3 = (6-1)(0.368) =

SSW=SS1 +SS2 + SS3 =

Before finding SSB need overall sample mean:

      k      nj xj
X                    =
     j 1          n

SSB   n j ( x j  X ) 2 =
            j 1

c) Correcting for the degrees of freedom:

MSB = SSB/(k-1)=

MSW=SSW/(n-k) =

d) F =

( degrees of freedom for MSB = K-1 =
  degrees of freedom for MSW = n-K=                 )

ECO 391 009 SPRING 03 Chapter 14 ANOVA                                                    7
THE ANOVA table for the exercise is the following:

Source of          Sum of Squares        Degrees of      Mean Square   F
Variation                                Freedom
Between Group

Within Group


So, the test is performed as follows:

1) Hypotheses:

2) F-statistic:

3) The decision rule:

4) Decision and conclusion:

                                    !!!! END OF CHAPTER 14 !!!!!

ECO 391 009 SPRING 03 Chapter 14 ANOVA                                     8

To top