VIEWS: 6 PAGES: 8 POSTED ON: 8/9/2012
ECO391 009 Lecture Handout over Chapter 14. SPRING 2003 The One Factor ANOVA Model. I. Introduction and Theory. II. The Actual Test (with an example). III. Another Example ( exercise 14.1.18., page 564 ) ANOVA stands for “Analysis of Variance”. General idea: ANOVA can be used to test whether the means of K different populations are the same or not. For example, a manager of Burger King may wonder how many employees to keep in his store during noon hours on weekdays -- is the average number of customers (who come at noon) different across weekdays? A PRACTICAL EXAMPLE: Suppose we want to see if the average amount of studying per week is the same, despite your year in school, i.e. does the amount of studying differ depending on if you are a freshman, a sophomore, and junior, or a senior? There is a population of students for each year in school so there are four populations of interest. (K = number of populations) If we are trying to see if class year influences the amount of studying we would refer to Study hours as the dependent variable and Class as the independent variable. If the means are different across samples, it indicates a possible relationship between the independent and dependent variables. If the population parameter of interest is number of study hours per week, then we are testing: Ho: H1: ECO 391 009 SPRING 03 Chapter 14 ANOVA 1 To test this hypothesis, we will take four random samples. (One sample will come from each academic class.) For each sample, we will calculate the average of that sample to use as a proxy for the corresponding population means. To test this hypothesis we will need to compare two types of variation: 1) The variation across the four sample means. (variation between samples) (comparing the means across samples) and 2) The variation within each of the four samples. (variation within samples) (This component tells us how good an estimate each sample mean is of its corresponding population mean.) The variation between shows how far (close) the samples are from each other. The bigger this variation, the easier it is for us to reject Ho . The variation within shows how well xbars approximate the true population means. If the variation within the samples is huge, then xbars are not very good measures of population means and , therefore, it diminishes our ability to see the difference among s using xbars. So, the bigger the variation within, the more difficult it is to reject Ho. In our example: We are trying to see if the amount you study depends upon your year in school. By conducting a test that allows us to compare these two types of variation, we can see if class standing is perhaps a determinant of how much we study. (i.e. does the average amount of studying vary significantly across years in school?) Suppose that we take a sample of size 5 from each of the four academic classes and the data are as follows: Freshmen Sophomores Juniors Seniors Number of hours spent 10 20 7 15 studying per week 5 12 22 12 2 18 14 8 7 9 10 11 8 8 16 9 Sample Mean 6.4= x(bar)1 13.4=x(bar)2 13.8=x(bar)3 11=x(bar)4 Sample Standard Deviation 3.05=s1 5.37=s2 5.76=s3 2.74=s4 Sample Variance 9.3=s12 28.8=s22 33.2=s32 7.5=s42 nj is the sample size so in our example n1 = n2 = n3 = n4 = 5 The total sample size n = n1 + n2 + n3 + n4 = 5+5+5+5 = 20 K nj xj X j 1 n ECO 391 009 SPRING 03 Chapter 14 ANOVA 2 Total Sum of Squares of SST reflects total variation in the data. K nj SST ( xij X ) 2 j 1 i 1 (Sum of Squared Distances between each point in all 4 samples and overall mean ( X ). The total variation in the data is broken down into two parts: 1) The part that reflects error -- within sample variation Sum of Squares Within (SSW) We want this to be small so the sample mean is a good estimate of the population mean. SSW=SS1 +SS2 + SS3 +...+SSK where ni SS i ( xij x j ) 2 i 1 In our example: SS1 = SS2 = SS3 = SS4 = An alternative formula: SS j (n j 1)S 2 j where Sj2 is the variance of the sample SS1 = SS2 = SS3 = SS4 = SSW is also referred to as the error sum of squares as it refers to unexplained variation within each of the samples. In our example: SSW = ECO 391 009 SPRING 03 Chapter 14 ANOVA 3 2) The part that reflects variation between samples –- Sum of Squares Between, or SSB K SSB n j ( x j X ) 2 j 1 SSB A checkpoint: SSB and SSW should add up to the total variation SST: SST = SSB+SSW in our example SST = What is our test statistic? If we adjust SSB and SSW for their respective degrees of freedom and take their ratio, then, if the population means are really the same, the result will be a random variable distributed according to the F-distribution. This gives us the right to make our conclusions about how likely to is to have the obtained evidence if Ho is correct. (F-distribution was named in honor of Sir Ronald Fisher (1890-1962) who first studied this distribution in the early 20’s.) F-distribution has the following properties: Correcting SSW and SSB for the degrees of freedom: Mean Square Within also called Mean Squared Error = MSW MSW = SSW/(n-K) (n-K) denominator degrees of freedom in our example MSW = MSB = SSB/(K-1) (K-1) numerator degrees of freedom in our example MSB = ECO 391 009 SPRING 03 Chapter 14 ANOVA 4 The test statistic is distributed according to the F distribution with n-K and K-1 d.f. F = MSB/MSW (ratio of “the variation between samples” to “the variation within samples”) The larger is the observed F-statistic, the more variation in population means that can be attributed to the independent variable rather than error. In our example, F = The most typical form of ANOVA table is as follows: Source of Sum of Squares Degrees of Mean Square F Variation Freedom Between Group Within Group Total Actual One - Way (or one-factor) ANOVA Test: 1) Set up a null and alternative hypothesis: Ho: 1=2=3=4 (The average amount of the study is the same across the four populations.) H1: At least two of the population means differ. (Let = .05 ) Take a sample of size nj and calculate MSW and MSB ECO 391 009 SPRING 03 Chapter 14 ANOVA 5 2) The test statistic F: F = MSB/MSW = 3) Decision Rule: (Again, it is always a positive (right-tail) test.) Reject Ho is F> F,K-1,n-K Critical F value is F,k-1,n-k degrees of freedom: K-1 (numerator); n-K (denominator) from our example: F,k-1,n-k= Rejection Region: f(F) 0 F 4)The decision and conclusion ECO 391 009 SPRING 03 Chapter 14 ANOVA 6 Another practice exercise: Do 14. 1. 18 on page 564 of the text. a) Sample means Sample variances Sample 1: x1(bar) = 5.18 variance=0.262 Sample 2: x2(bar) =4.25 variance= 0.177 Sample 3: x3(bar) = 5.7 variance=0.368 b) for SSW SS j (n j 1)S 2 j where Sj2 is the variance of the sample SS1 = (5-1) (0.262)= SS2 = (4-1)(0.177) = SS3 = (6-1)(0.368) = SSW=SS1 +SS2 + SS3 = Before finding SSB need overall sample mean: k nj xj X = j 1 n k SSB n j ( x j X ) 2 = j 1 c) Correcting for the degrees of freedom: MSB = SSB/(k-1)= MSW=SSW/(n-k) = d) F = ( degrees of freedom for MSB = K-1 = degrees of freedom for MSW = n-K= ) ECO 391 009 SPRING 03 Chapter 14 ANOVA 7 THE ANOVA table for the exercise is the following: Source of Sum of Squares Degrees of Mean Square F Variation Freedom Between Group Within Group Total So, the test is performed as follows: 1) Hypotheses: 2) F-statistic: 3) The decision rule: 4) Decision and conclusion: !!!! END OF CHAPTER 14 !!!!! ECO 391 009 SPRING 03 Chapter 14 ANOVA 8