Document Sample

8. Inference for Population Means 8.1.2 Student’s t Distribution 8.1 One Sample Inference Estimation ¯ is ¯ X √ an estimator of µ, with standard error SE(X) = 8.1.1 Introduction S/ n. Sampling Distribution of X¯ • Let X1, X2, . . . , Xn be a random sample from a population with mean µ and variance σ 2. In this ¯ 2 Since X ∼ N (µ, σ /n), we know that section we shall use the data to make inferences about µ. ¯ X −µ Z= √ ∼ N (0, 1) σ/ n • We will assume that the sampling distribution of ¯ X is N (µ, σ 2/n). This assumption is valid if either (or both) of the following are true: If we replace the (population standard deviation) σ by the (sample standard deviation) S then we get A1: X1, X2, . . . , Xn are normally distributed; A2: n is suﬃciently large for the Central Limit ¯ X −µ ¯ X −µ Theorem to apply. T = √ = ¯ S/ n SE(X) which has Student’s t distribution with ν = (n − 1) degrees of freedom. Statistics 160 – Semester 2, 2003 205 Statistics 160 – Semester 2, 2003 206 Notes on the t Distribution Critical Points Pictorially (i) Distribution of T depends upon n, but not upon µ or σ. (ii) When ν = (n − 1) is large, the t distribution is virtually identical to the N (0, 1) distribution. (iii) ‘t-tables’ give critical points of the t distribution. That is, values tα/2(ν) such that Area=α/2 Area=α/2 P (T > tα/2(ν)) = α/2 where T has a t distribution with ν degrees of freedom. Note that − t α/2(ν) t α/2(ν) P (T < −tα/2(ν)) = α/2 Figure 35: Critical point ±tα/2(ν) for a t distribution by symmetry. with ν degrees of freedom. Some t-tables are available at the S160 web page. Statistics 160 – Semester 2, 2003 207 Statistics 160 – Semester 2, 2003 208 8.1.3 Conﬁdence Intervals for a Example 8.1 Population Mean A car is tested for fuel economy. In 7 road trials the ¯ ¯ Since T = (X − µ)/SE(X) has a t distribution with number of km per litre was recorded, giving ν = (n − 1) degrees of freedom, ¯ x = 15.2 s = 2.2 ¯ ¯ 1 − α = P −tα/2(ν) < (X − µ)/SE(X) < tα/2(ν) ¯ ¯ ¯ Let µ be the population mean km per litre. Estimate = P −tα/2(ν)SE(X) < X − µ < tα/2(ν)SE(X) µ and ﬁnd a 90% conﬁdence interval for it. ¯ ¯ ¯ ¯ = P X − tα/2(ν)SE(X) < µ < X + tα/2(ν)SE(X) Conﬁdence Interval for µ From observed data x1, x2, . . . , xn, a 100(1 − α)% conﬁdence interval for µ is given by ¯ x ¯ x x − tα/2(ν)SE(¯), x + tα/2(ν)SE(¯) √ where SE(¯) = s/ n. x Statistics 160 – Semester 2, 2003 209 Statistics 160 – Semester 2, 2003 210 Example 8.2: Conﬁdence Interval for Computation by Hand Mean Pulmonary Compliance ¯ The mean of the data is x = 209.75 and the sample standard deviation is s = 24.16. the standard error √ for x is SE(¯) = 24.16/ 16 = 6.04. To ﬁnd a 95% ¯ x conﬁdence interval for µ, we need to look up critical Chronic exposure to asbestos ﬁbre is a well known value tα/2 in t-tables with ν = n − 1=15, where health hazard. The table below gives measurements α = 0.05 (since 95% = 100(1 − 0.05)%). From of pulmonary compliance (a measure of the how tables we ﬁnd that t0.025 = 2.131. Hence, a 95% eﬀectively the lung can inhale and exhale) for conﬁdence interval for µ is given by 16 construction workers, 8 months after they had left a site on which they had suﬀered prolonged ¯ x ¯ x x − tα/2SE(¯), x + tα/2SE(¯) exposure to asbestos. (Data source: ‘The Acute = (209.75 − 2.131 × 6.04, 209.75 + 2.131 × 6.04) Eﬀects of Chrysotile Asbestos Exposure on Lung = (196.9, 222.6) (1 dp) Function’, Environmental Research, 1978, pp. 360- 372.) In this example we provide an estimate, and a 95% conﬁdence interval, for the population mean Using MINITAB pulmonary compliance for people with prolonged First enter the data into a column of a MINITAB asbestos ﬁbre exposure. worksheet; this column is named PulCom in this example. Then go through the menu sequence Stat 167.9 180.8 184.8 189.8 194.8 200.2 → Basic Statistics → 1-Sample t... which 201.9 206.9 207.2 208.4 226.3 227.7 produces the 1-Sample t dialogue box. Choose the 228.5 232.4 239.8 258.6 conﬁdence level (95% in this case) and click on OK. Table 16: Pulmonary compliance for 16 workers MTB > TInterval 95.0 ’PulCom’. subjected to prolonged exposure to asbestos ﬁbre Confidence Intervals Deﬁne µ to be the population mean pulmonary compliance for people with prolonged asbestos ﬁbre Variabl N Mean StDev SE Mean 95.0 % C.I. exposure. PulCom 16 209.75 24.16 6.04 (196.87, 222.63) Statistics 160 – Semester 2, 2003 211 Statistics 160 – Semester 2, 2003 212 8.1.4 One Sample t-Test P-Values for T-test Aim: to investigate the credibility of some claim P-Value: Suppose that the observed value of the regarding µ. t-statistic is t. The way of calculating the P-value depends on H1: Hypotheses: H1 is a statement of the claim about µ; H0 is a statement negating the claim. H1 P-value µ > µ0 P (T ≥ t) Test Statistic: Suppose that we are testing µ < µ0 P (T ≤ t) µ = µ0 2P (T ≥ |t|) H0 : µ = µ 0 where T has a t distribution with ν = n − 1 degrees against any one of the alternatives of freedom. P-Value Interpretation and Signiﬁcance Levels: H1 : µ > µ 0 or H1 : µ < µ0 or H1 : µ = µ0 The interpretation of P-values, and the choice and application of signiﬁcance levels, is exactly the The appropriate test statistic is the t-statistic same as described in section 7.2.1 (for testing a proportion). ¯ ¯ X − µ0 X − µ0 T = √ = ¯ S/ n SE(X) • If H0 is true then T has a t sampling distribution with ν = n − 1 degrees of freedom. • T is likely to take extreme values when H1 is correct. Statistics 160 – Semester 2, 2003 213 Statistics 160 – Semester 2, 2003 214 Example 8.3 Example 8.4: Shoshoni Rectangles The Greeks called a rectangle ‘golden’ if the ratio √ of its width to its length was 1 ( 5 − 1) = 0.618. 2 12 king-size chocolate bars weighed (in grams) giving Such rectangles are aesthetically pleasing to most people, being not ‘too square’ or ‘too oblong’. The ¯ x = 149.10 s = 1.73 Shoshoni Indians used beaded rectangles to decorate their goods. The data tabulated below are the The producer wants to know whether the width-to-length ratios for eighteen such rectangles. (population) mean weight diﬀers from the advertised value of 150.0 grams. Write down appropriate 0.693 0.662 0.690 0.606 0.570 0.749 hypotheses and calculate the t-statistic to test this. 0.672 0.628 0.609 0.844 0.654 0.615 0.688 0.601 0.576 0.670 0.606 0.611 Table 17: Width-to-length ratios See also M&M Examples 7.4 and 7.5, Exercise 7.11, It is of interest to anthropologists to know whether 7.23, 7.29 for t-tests and conﬁdence intervals. (Use the Shoshoni Indians aimed to produce rectangles MINITAB for tests.) with the golden ratio. To investigate this, we test H0 : µ = 0.618 versus H1 : µ = 0.618, where µ is the population mean width-to-length ratio of Shoshoni rectangles. To do so, we enter data into the data into a column of a MINITAB worksheet. In this example, the column has been named Ratio. To perform a t-test of H0 against H1, go through the menu sequence Stat → Basic Statistics → 1-Sample t.... This will produce a dialogue box Statistics 160 – Semester 2, 2003 215 Statistics 160 – Semester 2, 2003 216 in which you must enter the variable to be tested 8.1.5 Paired t-Test (Ratio in this case). You also need to click on the circle by Test mean, and to ﬁll in the H0 value for µ (0.618 in this case). Finally, you must choose the type of alternative hypothesis (in our case not Consider an experiment in which there is one equal). When all that’s done, click on OK. This treatment and one control group. Diﬀerences should produce the following output in the MINITAB between the proﬁle of subjects in each group can command window: cause problems. T-Test of the Mean Test of mu = 0.6180 vs mu not = 0.6180 Var N Mean StDev SE Mean T P-Value Ratio 18 0.6513 0.0665 0.0157 2.13 0.049 This tells us (amongst other things) that • the value of the test statistic is 2.13 (which is computed from (0.6513 − 0.6180)/0.0157); • the P-value for this test is 0.049. Because the P-value is less that 0.05 (if only just!) we reject H0. However, the evidence in favour of H1 is not very strong. Conclusion: the data provides some evidence that the population mean width-to-length ratio for the Shoshoni rectangles is not the golden ratio. Statistics 160 – Semester 2, 2003 217 Statistics 160 – Semester 2, 2003 218 Example 8.5: A Small Clinical Trial outcomes than the control group. It will be impossible to say whether this is due to the treatment, or due to the fact that the disease tends to be worse in women (or in the young). Four patients with a rare disease are recruited to a clinical trial. The patient’s characteristics are as follows. Patient Age Sex A 71 M B 68 M C 28 F D 34 F Two patients are to be assigned to the treatment group, and two to the control group. If we did this completely at random, we could get patients A and B in the treatment group, and C and D in the control group. This could cause problems. • Suppose that the disease tends to be worse in men than women. Then the outcomes for the treatment and control group may be very similar even though the treatment is beneﬁcial, since the beneﬁts are counterbalanced by the increased acuteness of the disease in men. (The same argument could also be made in terms of age rather than sex.) • Suppose that the treatment group do have better Statistics 160 – Semester 2, 2003 219 Statistics 160 – Semester 2, 2003 220 Matched Pairs More Matched Pairs • A powerful way of designing an experiment to It is often possible to ‘match’ an individual to avoid such problems is matching. him/herself, and look at results under two diﬀerent experimental conditions (e.g. before and after some • The idea is to select experimental units in treatment). pairs that are ‘matched’ on potentially important characteristics, such as subject age and sex in a clinical trial. Example 8.7 Twenty students studying European languages went • We then randomly assign one member of each pair on a two week intensive course in spoken French. To to the treatment group, and the other member of assess the eﬀectiveness of the course, each student the pair to the control group. took a spoken French test before the course, and an equivalent test after the course. The before and after • The result is a treatment group and a control test scores on each student are paired observations. group that are very similar in terms of important characteristics of the experimental units (i.e. the variability between subjects is controlled). Example 8.6 In Example 8.5, patients A and B form a matched pair, as do patients C and D. We should randomly assign A and B to diﬀerent groups, and C and D to diﬀerent groups. Statistics 160 – Semester 2, 2003 221 Statistics 160 – Semester 2, 2003 222 Analysis of Matched Pairs Data Example 8.8: Analysis of Shoe Wear This example is concerned with data from an experiment on shoe wear. A manufacturing company • Matched pairs data should be analysed by looking wanted to compare two synthetic materials (A and at the diﬀerences between paired observations. B) for use in the soles of boy’s shoes. Data on the reduction in sole thickness were taken from a number • These diﬀerences can be analysed using the of shoes after boys had worn them for a period of methods described earlier (e.g. one sample t-test). three months. The results are displayed in a dot plot in Figure 36 below (data source: Box, Hunter & Hunter (1978) Statistics for Experimenters). See M&M Example 7.7, 7.8 and Exercises 7.17, 7.19 (use MINITAB). Material B Material A 6 9 12 15 Wear Figure 36: Dot plot of shoe wear data. Looking at the dot plot there appears to be little evidence of diﬀerence in wear between the two materials. However, that Figure does not tell the whole story. The data were collected via a matched pairs design in which 10 subjects were given one shoe soled with material A, and one shoe soled with material B. Figure 37 redisplays the data on a dot plot, but with paired observations joined by lines. Statistics 160 – Semester 2, 2003 223 Statistics 160 – Semester 2, 2003 224 in order to assess whether the materials have diﬀerent Material B Material A levels of durability. The test proceeds as a standard one sample t-test using the diﬀerences as data. We 6 9 12 15 conduct this test in MINITAB – see output below. Wear MTB > Name c4 = ’x’ Figure 37: Dot plot with paired observations joined MTB > Let ’x’ = ’A’ - ’B’ by lines. MTB > TTest 0.0 ’x’; SUBC> Alternative 0. The data are displayed in Table 18. T-Test of the Mean Subject 1 2 3 4 5 A 13.2 8.2 10.9 14.3 10.7 Test of mu = 0.000 vs mu not = 0.000 B 14.0 8.8 11.2 14.2 11.8 x=A−B -0.8 -0.6 -0.3 0.1 -1.1 Var N Mean StDev SE Mean T P-Value Subject 6 7 8 9 10 x 10 -0.410 0.387 0.122 -3.35 0.0086 A 6.6 9.5 10.8 8.8 13.3 B 6.4 9.8 11.3 9.3 13.6 • The worksheet shoewear.mtw contains the shoe x=A−B 0.2 -0.3 -0.5 -0.5 -0.3 wear data. The ﬁrst command gives the name x to the fourth column of this worksheet, while Table 18: Shoe wear data: A indicates wear on shoe the second deﬁnes x to be the required diﬀerence soled with material A, B indicates wear on shoe sold (between the A and B wear data which are stored with material B, and x is the paired diﬀerence A-B. in columns C2 and c3 respectively). Let µ be the (population) mean diﬀerence, and let • The t-test was conducted via the menu sequence σ be the (population) standard deviation of the Stat → Basic Statistics → 1-Sample t.... diﬀerences. We wish to test • The observed test statistic is t = −3.35 which H0 : µ = 0 against H1 : µ = 0 gives a P-value of 0.0086. The data therefore Statistics 160 – Semester 2, 2003 225 Statistics 160 – Semester 2, 2003 226 provide strong evidence against H0, and hence in 8.2 Two Independent Samples favour of a diﬀerence in durability between the two materials. Inspection of the dot plot (or simply comparing the mean wear for A and B) indicates that material A is the more durable (i.e. is less 8.2.1 Introduction subject to wear). Suppose we want to compare the means of two populations: It is generally a wise move to accompany a test result with a conﬁdence interval for the parameter population 1 population 2 under study. The reason for this is that statistical mean µ1 mean µ2 2 2 signiﬁcance, as a concept, is only concerned with the variance σ1 variance σ2 detection of a departure from H0. It does not tell We take random samples from the populations: you whether the magnitude of the departure from H0 is of practical importance. We therefore ﬁnish this example by calculating a conﬁdence interval for sample 1 the (population) mean diﬀerence between the wear data X11, X12, . . . , X1n1 , size n1 in materials A and B. sample mean ¯ X1 √¯ From the data above, x = −0.410 and sx = 0.387. √ n1 sample std. dev. S1 = 1 ¯ − X 1 )2 Hence SE(¯) = s/ n = 0.387/ 10 = 0.122. A x n1 −1 i=1(X1i 95% conﬁdence interval for µ is given by sample 2 x (¯ − t0.025(9)SE(¯), x + t0.025(9)SE(¯)) x ¯ x data X21, X22, . . . , X2n2 , size n2 sample mean ¯ X2 where the critical point t0.025(9) = 2.262. Hence the 1 n2 ¯ sample std. dev. S2 = n2 −1 i=1(X2i − X2)2 95% conﬁdence interval is given by (−0.410 − 2.262 × 0.122, −0.410 + 2.262 × 0.122) = (−0.69, −0.13) Statistics 160 – Semester 2, 2003 227 Statistics 160 – Semester 2, 2003 228 Estimation 8.2.2 Diﬀerent Population Variances ¯ ¯ We can estimate µ1 −µ2 using the estimator X1 −X2. Assumptions (at least one assumed true) ¯ ¯ Properties of X1 − X2 are: (A3): Both populations are normal; ¯ ¯ ¯ ¯ (i) E[X1 − X2] = E[X1] − E[X2] = µ1 − µ2. Hence it is unbiased. (A4): n1 and n2 are suﬃciently large for the Central 2 Limit theorem to apply. ¯ ¯ ¯ ¯ σ1 σ2 (ii) Var[X1 − X2] = Var[X1] + Var[X2] = n1 + n2 . 2 Hence the standard deviation is 2 σ1 σ22 + Result for Normal Random Variables n1 n2 2 2 If W ∼ N (µW , σW ), Y ∼ N (µY , σY ) and W and Y are independent, then ¯ ¯ The standard error of X1 − X2 is therefore 2 2 W ± Y ∼ N (µW ± µY , σW + σY ). 2 S1 S22 ¯ ¯ SE X1 − X2 = + . n1 n2 Statistics 160 – Semester 2, 2003 229 Statistics 160 – Semester 2, 2003 230 Two Sample T-Statistic Conﬁdence Intervals Using normal r.vs result, An (approximate) 100(1 − α)% conﬁdence interval for µ1 − µ2 is 2 2 ¯ ¯ σ1 σ2 X1 − X2 ∼ N µ1 − µ2 , + ¯ ¯ ¯ ¯ n1 n2 X1 − X2 − tα/2(ν)SE(X1 − X2), ¯ ¯ ¯ ¯ X1 − X2 + tα/2(ν)SE(X1 − X2) Standardizing and replacing (unknown) population variances by sample variances gives us the statistic where ¯ ¯ X1 − X2 − (µ1 − µ2) ¯ ¯ 2 2 T = SE(X1 − X2) = S1 /n1 + S2 /n2. 2 S1 2 S2 n1 + n2 • This statistic has an approximate t distribution, whose degrees of freedom, ν, depend on n1, n2 and the data in a complex way. • Using exactly the same type of arguments as were presented in section 8.1, we can derive conﬁdence intervals and tests for µ1 − µ2. Statistics 160 – Semester 2, 2003 231 Statistics 160 – Semester 2, 2003 232 Two-Sample t-Test Example 8.9 Hypotheses: H1 is a statement of some claim about µ1 and µ2; H0 is the negation of this statement. Null hypothesis of the form To monitor educational standards, a particular type of test was given to a random sample of year 10 H0 : µ 1 − µ 2 = δ0 students in 1995, and to a random sample of year 10 students in 2000. Results: for some ﬁxed number δ0 (often zero). The 1995 2000 alternative will be one of Sample size 50 60 Sample mean 63.1 60.9 H1 : µ1−µ2 > δ0, H1 : µ1−µ2 < δ0, H1 : µ1−µ2 = δ0 Sample std. dev. 17.8 8.9 Write down hypotheses and the observed value of Test Statistic: Use the t-statistic the two-sample t-test statistic for assessing whether ¯ ¯ X1 − X2 − δ0 there is a diﬀerence in standards between 1995 and T = 2 2000. S1 S2 n1 + n2 2 P-Value: Suppose observed value of the t-statistic is t. H1 P-value µ 1 − µ 2 > δ0 P (T ≥ t) µ 1 − µ 2 < δ0 P (T ≤ t) µ 1 − µ 2 = δ0 2P (T ≥ |t|) where T has a t distribution with ν degrees of freedom. Statistics 160 – Semester 2, 2003 233 Statistics 160 – Semester 2, 2003 234 Example 8.10: Cavendish’s Experiment (population) mean experimental measurement after on the Density of the Earth these changes. We wish to test H0 : µ 1 − µ 2 = 0 against H 1 : µ 1 − µ2 = 0 In Example 6.15 we looked at data collected by The analysis can be conducted in MINITAB. The Cavendish from an experiment to determine the data are stored in the worksheet cavendish2.mtw, density of the earth. We reproduce the data in in which the ﬁrst column of data (named x) Table 19. As it happened, Cavendish made some contains the experimental results, and the second changes to his experimental apparatus after the ﬁrst column (named app) indexes these results with a six measurements. The observations made before 1 for observations before the change and a 2 for this change are marked with a dagger in the Table. observations after the change. To perform a two- In this example we will assess the evidence that sample t-test on the data, we select the 2-sample the changes to the apparatus made a systematic t dialogue box via the menu sequence Stat → diﬀerence to Cavendish’s experimental results. Basic Statistics → 2-sample t.... Note that 5.50† 5.61† 4.88† 5.07† 5.26† 5.55† we use the Samples in One Column section of the 5.36 5.29 5.58 5.56 5.57 5.53 resulting dialogue box to enter the variable names, 5.62 6.29 5.44 5.34 5.79 5.10 since this is the way in which our worksheet has been 5.27 5.39 5.42 5.47 5.63 5.34 set up. Having entered x as the Samples and app as 5.46 5.30 5.75 5.68 5.85 the Subscripts we select the appropriate alternative hypothesis (and conﬁdence level if required), then Table 19: Cavendish’s results for the (mean) click on OK. The resulting output is given below. density of the Earth (in grams per cubic centimetre). Observations taken before the change Two Sample T-Test and Confidence Interval in experimental apparatus are marked with a dagger. Twosample T for x Let µ1 be the (population) mean experimental app N Mean StDev SE Mean measurement of the density of the earth before 1 6 5.312 0.293 0.12 the changes to the apparatus, and let µ2 be the 2 23 5.483 0.190 0.040 Statistics 160 – Semester 2, 2003 235 Statistics 160 – Semester 2, 2003 236 8.2.3 Common Population Variance 95% C.I. for mu 1 - mu 2: ( -0.48, 0.136) T-Test mu 1 = mu 2 (vs not =): T= -1.36 P=0.22 DF= 6 Comments: Assumptions Assume that A3 and/or A4 hold. Sometimes it will • The observed t-statistic is t = −1.36. The be reasonable to make a further assumption: degrees of freedom for the t-test have been evaluated as ν = 6 by the software. The P- (A5) : Both populations have the same variance; i.e. value for the test is P = 0.22. We therefore 2 2 σ1 = σ 2 = σ 2 . conclude that there is no statistically signiﬁcance evidence for a systematic change in the mean level We can judge the appropriateness of this assumption of experimental results after the alterations to the by comparing the the sample standard deviations, S1 experimental apparatus. and S2. • A 95% conﬁdence interval for µ1 − µ2 is given by (−0.48, 0.136). Statistics 160 – Semester 2, 2003 237 Statistics 160 – Semester 2, 2003 238 Pooled Estimation of Variance Two Sample T-Statistic (Pooled Variance) If A5 assumed, then can estimate the common population variance, σ 2, by the pooled estimator Adapting the working in section 8.2.1 to take into of variance, account the common population variance: 2 2 2 (n1 − 1)S1 + (n2 − 1)S2 ¯ ¯ X1 − X2 − (µ1 − µ2) Sp = . Z = n1 + n 2 − 2 σ 2/n1 + σ 2/n2 ¯ ¯ X1 − X2 − (µ1 − µ2) Example 8.11 = ∼ N (0, 1) In a medical study, 12 men with high blood pressure σ 1/n1 + 1/n2 were given calcium supplements, and 16 men with high blood pressure were given placebo. The 2 Replacing the unknown σ 2 by the estimator Sp gives reduction in blood pressure (in mm Hg) over a ten week period was measured for both groups. The ¯ ¯ X1 − X2 − (µ1 − µ2) summary statistics for the ﬁrst (i.e. treatment) group T = Sp 1/n1 + 1/n2 were ¯ x1 = 4.10 s1 = 6.42 which has a t-distribution with ν = n1 + n2 − 2 and for the second (i.e. control group) were degrees of freedom. ¯ x2 = 0.03 s2 = 5.25 Assuming that treatment and control population variances are equal, calculate a pooled estimate of the common population standard deviation. Statistics 160 – Semester 2, 2003 239 Statistics 160 – Semester 2, 2003 240 Conﬁdence Intervals (Pooled Variance) Two-Sample t-Test (Pooled Variance) A 100(1 − α)% conﬁdence interval for µ1 − µ2 is Hypotheses: As for unpooled two-sample t-test. Test Statistic: Use the t-statistic ¯ ¯ ¯ ¯ X1 − X2 − tα/2(ν)SEp(X1 − X2), ¯ ¯ ¯ ¯ X1 − X2 + tα/2(ν)SEp(X1 − X2) ¯ ¯ X1 − X2 − δ0 T = 1 1 Sp n1 + n2 ¯ where SEp(X1 − ¯ X2) = Sp 1/n1 + 1/n2 is standard error for ¯ ¯ X1 − X2 with pooling, and ν = n1 + n2 − 2. P-Value: As for unpooled two-sample t-test (but with ν = n1 + n2 − 2 degrees of freedom). Example 8.12 Find a 95% conﬁdence interval for the diﬀerence in the population mean reduction in blood pressure See M&M Examples 7.14, 7.15, 7.19, 7.20, 7.21; between the treatment and control groups in and Exercises 7.51, 7.55, 7.57, 7.59, 7.75, 7.77 (use Example 8.11. MINITAB where necessary). Statistics 160 – Semester 2, 2003 241 Statistics 160 – Semester 2, 2003 242 Example 8.13: SIRDS Infants the population of non-surviving SIRDS infants, with 2 mean µ2 and variance σ2 . We will address the following questions: Severe idiopathic respiratory distress syndrome (SIRDS) is a serious condition that can aﬀect (i) Is there signiﬁcant evidence that the population newborn infants, often resulting in death. The table mean birth weights diﬀer for survivors and non- below gives the birth weights (in kilograms) for two survivors? samples of SIRDS infants. The ﬁrst sample contains the weights of 12 SIRDS infants that survived, while (ii) What is a 95% conﬁdence interval for the the second sample contains the weights of 14 infants diﬀerence between the population means? that died of the condition. (Data from: van Vliet & Gupta (1973), ‘Sodium bicarbonate in idiopathic In order to address (i) we shall test respiratory distress syndrome’, Arch. Diseases in Childhood, 48, 249-255.) H0 : µ 1 − µ 2 = 0 against H 1 : µ 1 − µ2 = 0 Survived 1.130 1.680 1.930 2.090 2.700 3.160 3.640 We now need to decide whether or not to assume 1.410 1.720 2.200 2.550 3.005 equal population variances. The sample standard deviations are 0.758 for group 1 (survived) and 0.554 for group 2 (died). The ratio of larger to smaller is Died 0.758/0.554 = 1.37, suggesting the an assumption 1.050 1.230 1.500 1.720 1.770 2.500 1.100 of common variance is not too unreasonable. 1.225 1.295 1.550 1.890 2.200 2.440 2.730 The analysis of the data (to answer (i) and (ii) above) Table 20: Birth weights (in kg) of infants with SIRDS can be conducted in MINITAB as follows. The data It is of interest to medical researchers to know are available in the MINITAB worksheet SIRDS.mtw, whether birth weight inﬂuences a child’s chance of in which the ﬁrst column (named weight) contains surviving with SIRDS. Deﬁne population 1 to be the birth weights and the second column (named the population of SIRDS infants that survive, with survived) indexes these with a 1 if the infant 2 mean µ1 and variance σ1 . Let population 2 be survived, and a 0 if it died. As with the unpooled Statistics 160 – Semester 2, 2003 243 Statistics 160 – Semester 2, 2003 244 two-sample t-test in Example 8.10, we obtain the 8.3 More Than Two Samples 2-sample t dialogue box via the menu sequence Stat → Basic Statistics → 2-sample t.... We enter the necessary information in this dialogue in the same manner as in Example 8.10, but ﬁnish 8.3.1 Introduction by clicking on the box Assume equal variances 2 2 • Sometimes have samples from more than two (since we’re assuming σ1 = σ2 ). populations that we wish to compare. Two Sample T-Test and Confidence Interval • It is unwise to start by comparing every Twosample T for weight possible pair of populations using two-sample tests survived N Mean StDev SE Mean because: 1 12 2.268 0.758 0.22 0 14 1.729 0.554 0.15 – it is a lot of work. E.g. there are more than 100 pairwise comparisons between 15 samples) 95% C.I. for mu 1 - mu 0: ( 0.01, 1.07) – problems with signiﬁcance levels in multiple T-Test mu 1 = mu 0 (vs not =): T= 2.09 P=0.047 DF= 24 testing. Both use Pooled StDev = 0.655 • The output conﬁrms that a pooled estimate of ANOVA standard deviation was used; sp = 0.655. Analysis of variance (‘ANOVA’) is a method for comparing the means of several populations which • The observed value of the t-statistic is t = 2.09, avoids these problems. which gives a P-value of P = 0.047. This gives evidence in favour of H1. (We reject H0 at a 0.05 signiﬁcance level.) • A 95% conﬁdence interval for µ1 − µ2 is (0.01, 1.07). Statistics 160 – Semester 2, 2003 245 Statistics 160 – Semester 2, 2003 246 8.3.2 One-Way Analysis of Variance More ANOVA • The diﬀerent populations in ANOVA can be A ‘m-way analysis of variance’ can be used when deﬁned by one or more factors (i.e. qualitative there are m factors. In this unit we will restrict explanatory variables). attention to situations in which there is a single factor. • We can then think of ANOVA as testing for the eﬀect of these factors on the response variable. Data Structure for One-Way ANOVA Example 8.14 Popn. (mean, var) Sample (responses) Mean In a medical study, people are classiﬁed according to 1 2 (µ1, σ1 ) X11, X12, . . . , X1n1 ¯ X1 sex and whether or not they smoke. Hence there are 2 2 (µ2, σ2 ) X21, X22, . . . , X2n2 ¯ X2 . . . . . . four populations to consider: k 2 (µk , σk ) Xk1, Xk2, . . . , Xknk ¯ Xk • female non-smokers; Grand Mean • female smokers; The mean of all the data (irrespective of which sample they come from) is called the grand mean, • male non-smokers; ¯ and denoted X. The total number of observations is denoted n. • male smokers. Suppose that lung function is measured for each individual. We could consider the eﬀects of both factors sex and smoking on the response variable ‘lung function’. Statistics 160 – Semester 2, 2003 247 Statistics 160 – Semester 2, 2003 248 More ANOVA ANOVA Table Assumptions: Xij = X¯ + ¯ ¯ (Xi − X) ¯ + (Xij − Xi) grand mean deviation of sample i residual (A6) All the observations are normally distributed; ¯ ¯ ¯ ¯ If H0 is correct then (Xi − X), and hence (Xi − X)2, (A7) The population variances are all the same; i.e. should be quite close to zero for each sample. It can 2 2 2 be shown that σ1 = σ 2 = · · · = σ k . k ni k Hypotheses: ¯ (Xij − X)2 = ¯ ¯ ni(Xi − X)2 In one-way ANOVA we are testing the hypotheses: i=1 j=1 i=1 H0 : µ 1 = µ 2 = · · · = µ k Total SS Factor (Treatment) SS k ni + ¯ (Xij − Xi)2 against i=1 j=1 H1 : The µi’s are not all equal Residual (Error) SS where SS is an abbreviation for ‘sum of squares’. Evidence for H1 will be provided by a substantial degree of variability between the sample means. Statistics 160 – Semester 2, 2003 249 Statistics 160 – Semester 2, 2003 250 ANOVA Table Testing in ANOVA The decomposition of sums of squares can be displayed in an ANOVA table. • If the populations truly have diﬀerent means (i.e. if the factor has a real eﬀect on the responses) Source DF SS MS then the Factor MS should be relatively large Factor k−1 SSF ac SSF ac/(k − 1) compared to the Error MS. Error n−k SSErr SSErr /(n − k) Total (n − 1) SST ot • Hence, an appropriate test statistic is k ni ¯ 2 In this where SST ot = i=1 j=1(Xij − X) . M SF ac table, DF is ‘degrees of freedom’, and MS is ‘mean F = M SErr square’ (given by MS = SS/DF). with extreme positive values of F providing evidence against H0. • Under the assumption that H0 is correct, F has an F-distribution with (k − 1, n − k) degrees of freedom. The P-value for an observed value f of the F-statistic is P = P (F ≥ f ) where F has the F-distribution described above. Statistics 160 – Semester 2, 2003 251 Statistics 160 – Semester 2, 2003 252 Example 8.15: Silver Content in Coins These data are stored in the MINITAB worksheet silver.mtw. The sliver contents (‘responses’) are Table 21 below shows the silver content (% Ag) given in the ﬁrst column (named silver), while of coins taken from four mintings during the reign the second column (named minting) speciﬁes the of King Manuel I (1143-1180), a Byzantine ruler minting (with 1 for minting ‘A’, 2 for minting ‘B’ who lived in what is now Cyprus. There were nine and so on). To perform the analysis of variance coins from minting ‘A’, seven from ‘B’, four from in MINITAB, we need to go through the menu ‘C’ and seven from ‘D’. It is of historical interest to sequence Stat → ANOVA → Oneway. The details investigate whether or not the mean silver content of the analysis are then entered into the resulting varies for minting to minting. Hence we will test dialogue box;. Clicking on OK should then provide the following output in the MINITAB session window. H0 : µ A = µ B = µ C = µ D against H1 : µA, µB , µC , µD not all equal One-Way Analysis of Variance where µA is the population mean silver content for Analysis of Variance on silver minting A (and so on). Source DF SS MS F p ‘A’ ‘B’ ‘C’ ‘D’ minting 3 37.748 12.583 26.27 0.000 5.9 6.9 4.9 5.3 Error 23 11.015 0.479 6.8 9.0 5.5 5.6 Total 26 48.763 6.4 6.6 4.6 5.5 Individual 95% CIs For 7.0 8.1 4.5 5.1 Based on Pooled StDev 6.6 9.3 6.2 Lvl N Mean StDev ---+---------+---------+- 7.7 9.2 5.8 1 9 6.7444 0.5434 (--*--) 7.2 8.6 5.8 2 7 8.2429 1.0998 ( 6.9 3 4 4.8750 0.4500 (----*---) 6.2 4 7 5.6143 0.3625 (--*---) ---+---------+---------+- Table 21: Silver content by minting. Pooled StDev = 0.6920 4.5 6.0 7 Statistics 160 – Semester 2, 2003 253 Statistics 160 – Semester 2, 2003 254 Comments: 8.3.3 Conﬁdence Intervals • The F test statistic is F = 26.27. (Notice that this is equal to to MS(factor)/MS(Error); i.e. In one-way ANOVA, conﬁdence intervals for F = 12.583/0.479 = 26.27). population means are based upon a pooled estimate 2 of variance, Sp (which is actually equal to • The P-value is 0.000 (i.e. the F statistic is far MSE(Error)). larger than we would expect to get if H0 were true). Since the P-value is so tiny, the data A 100(1 − α)% conﬁdence interval for µi is given by provide very strong evidence against H0, and hence in favour of a variation in mean silver ¯ ¯ ¯ ¯ Xi − tα/2(ν)SEp(Xi), Xi + tα/2(ν)SEp(Xi) content from minting to minting. ¯ √ where SEp(Xi) = Sp/ ni and tα/2(ν) is a critical Endnote: point for a t distribution with ν = n − k degrees of The 95% conﬁdence intervals for each population freedom. mean allow us to get some idea as to the way in which the mean silver content varies between mintings. These conﬁdence intervals suggest that Example 8.16 the earlier mintings, ‘A’ and ‘B’, had a rather higher ¯ For Example 8.15, calculate the standard error for xA mean sliver content than the later ones, ‘C’ and using pooling and hence compute a 99% conﬁdence ‘D’. This seems to be an early example of currency interval for µA. devaluation See also M&M Exercises 12.1, 12.5 (use MINITAB). Statistics 160 – Semester 2, 2003 255 Statistics 160 – Semester 2, 2003 256 9. Comparing Two Proportions 9.2 Estimation 9.1 Introduction ˆ ˆ The statistic D = p1 − p2 is natural estimator of p1 − p 2 . Aim: to compare the proportions of individuals with a particular characteristic in two populations. Data: take simple random samples from both populations, and count the number of ‘successes’ p ˆ E[D] = E[ˆ1 − p2] (i.e. individuals with the characteristic) in both p p = E[ˆ1] − E[ˆ2] samples. = p1 − p2 Population Sample Count of Sample Hence D is an unbiased estimator of p1 − p2. Popn proportion size success proportion 1 p1 n1 X1 p1 = X1/n1 ˆ p ˆ Var(D) = Var(ˆ1 − p2) 2 p2 n2 X2 ˆ p2 = X2/n2 p p = Var(ˆ1) + Var(ˆ2) p1(1 − p1) p2(1 − p2) We will mostly concentrate on inference for p1 − p2. = + n1 n2 so the standard error of D is p1(1 − p1) p2(1 − p2) ˆ ˆ ˆ ˆ p p SE(D) ≡ SE(ˆ1−ˆ2) = + n1 n2 Statistics 160 – Semester 2, 2003 257 Statistics 160 – Semester 2, 2003 258 Example 9.1: Who Will Go Grey First? think the diﬀerence is statistically signiﬁcant or not? We will look at this question when return to this example later. Do men or women tend to get grey hair ﬁrst? In work done by two Australian researchers in the late 1980’s (and reported in Family Circle magazine in 1990), simple random samples of 1000 men and 1000 women, all of age 25, were taken. Among the men, 290 had some grey hair, while the corresponding ﬁgure for the women was 340. Let p1 be the population proportion of 25 year old men with some grey hair, and let p2 be the corresponding population proportion for women. An unbiased estimator of p1 − p2 is provided by the ˆ ˆ estimate p1 − p2 = 290/1000 − 340/1000 = −0.050. The standard error of this estimate is ˆ ˆ p1(1 − p1) p2(1 − p2) ˆ ˆ p ˆ SE(ˆ1 − p2) = + n1 n2 0.29 × (1 − 0.29) 0.34 × (1 − 0.34) = + 1000 1000 = 0.0207 The article writer for Family Circle commented that, “this diﬀerence [between sample proportions] is so small that it’s considered insigniﬁcant.” Do you Statistics 160 – Semester 2, 2003 259 Statistics 160 – Semester 2, 2003 260 Conﬁdence Intervals Example 9.2: Conﬁdence Interval from Grey Hair Data Assume: n1p1 ≥ 10, n1(1 − p1) ≥ 10, n2p2 ≥ 10, and n2(1 − p2) ≥ 10. Continuing from Example 9.1, ﬁnd a 95% conﬁdence Then interval for p1 − p2. · ˆ p1 ∼ N (p1, p1(1 − p1)/n1) and · ˆ p2 ∼ N (p2, p2(1 − p2)/n2) from section 6.2.3. Therefore · p1(1 − p1) p2(1 − p2) D∼N p1 − p 2 , + n1 n2 See section 8.2.2. Conﬁdence Interval: Using exactly the same kind of arguments as in 7.1, it follows that an (approximate) 100(1 − α)% conﬁdence interval for p1 − p2 is given by D − zα/2SE(D), D + zα/2SE(D) . Statistics 160 – Semester 2, 2003 261 Statistics 160 – Semester 2, 2003 262 9.4 Signiﬁcance Testing Test Statistic • Would like to use Hypotheses: D−0 D H1 is a statement of some claim about p1 − p2, and z= = σD p1 (1−p1 ) + p2(1−p2) H0 is the negation of this statement. n1 n2 In practice interest is almost always in detecting a However, the denominator depends on the diﬀerence between p1 and p2, so we will take the null unknown population proportions. hypothesis to be • Since we assume that p1 = p2 under H0, it H0 : p 1 − p2 = 0 is sensible to replace both p1 and p2 in the denominator by the pooled estimate, The alternative will be one of X1 + X 2 ˆ p= n1 + n 2 H1 : p1−p2 > 0 or H1 : p1−p2 < 0 or H1 : p1−p2 = 0 • We therefore use the test statistic D z = ˆ p p(1−ˆ) n1 ˆ p + p(1−ˆ) n2 D = ˆ ˆ p(1 − p)(1/n1 + 1/n2) This test statistic has a N (0, 1) sampling distribution under the assumption that H0 is correct. Statistics 160 – Semester 2, 2003 263 Statistics 160 – Semester 2, 2003 264 P-Value Example 9.3: Do Men or Women Really Diﬀer in How Early They Go Grey? Suppose that the observed value of the test statistic is z. In this example we take the data on grey hair and H1 P-value test p1 − p 2 > 0 P (Z ≥ z) H0 : p 1 = p2 p1 − p 2 < 0 P (Z ≤ z) against the two-sided alternative, p1 − p 2 = 0 2P (Z ≥ |z|) where Z ∼ N (0, 1). H1 : p 1 = p2 where p1 and p2 are the population proportions as deﬁned in Example 9.1. In order to evaluate the test statistic here, we need to obtain a pooled estimate of proportion: 290 + 340 ˆ p = 1000 + 1000 = 0.315 The test statistic is ˆ ˆ p1 − p 2 z = ˆ ˆ p(1 − p)(1/n1 + 1/n2) 0.05 = 0.315 × (1 − 0.315) × (1/1000 + 1/1000) = 2.41 Statistics 160 – Semester 2, 2003 265 Statistics 160 – Semester 2, 2003 266 The P-value corresponding to this test statistic is 9.5 Relative Risk P = 2P (Z ≥ |2.41|) since 2-sided test = 2 × 0.0080 from normal tables The estimated relative risk is = 0.016 ˆ p1 RR = . We therefore conclude that the data provide evidence ˆ p2 that chances of developing some grey hair by the age of 25 does diﬀer between men and women. It is widely used in medical statistics because it is easy to interpret. End note: Notice that the numbers of men and women with some grey were both multiples of 10. It it quite possible that the data, as reported in Family Example 9.4: Smoking and Heart Attacks in Circle, were rounded to the nearest percentage Women point (i.e. the nearest ten individuals). Suppose An article in the British Medical Journal (Prescott, that the true ﬁgures were actually 294 for men, Hippe, Schnor and Vestbo, 1998) gave the following and 336 for women – would your conclusions from data for random samples of Danish women: this signiﬁcance test change? You might like to investigate this as an exercise. Number of Sample size heart attacks Smokers 6461 380 Non-smokers 5011 132 See also M&M Example 8.9, and Exercises 8.29, The sample proportion of heart attacks among the 8.33, 8.35. ˆ smokers is pS = 380/6461 = 0.0588. For non- ˆ smokers the proportion is pN S = 132/5011 = 0.0263. Hence the relative risk of a heart attack for a female smoker (compared to a female non- Statistics 160 – Semester 2, 2003 267 Statistics 160 – Semester 2, 2003 268 smoker) is 10. Inference for Regression 0.0588 RR = = 2.24 0.0263 In other words, a smoker is more than twice as likely to suﬀer a heart attack as a non-smoker. 10.1 The Simple Linear Regression Model It is possible to obtain a conﬁdence interval Simple linear regression aims to model a corresponding to this relative risk, but the details (quantitative) response variable, y, in terms of are beyond the Statistics 160 unit. an explanatory variable (covariate) x. The linear regression model is Yi ∼ N (a + bxi, σ 2) (i = 1, 2, . . . , n) (2) See also M&M Example 8.11. where • xi, Yi are the explanatory and response variables measured on the ith individual; • the responses, Y1, Y2, . . . , Yn are independent; • σ 2 is the ‘error’ variance (which is not dependent on xi). Note that the linear regression model represents the variability in the r.v. Yi, regarding xi as a ﬁxed quantity. Statistics 160 – Semester 2, 2003 269 Statistics 160 – Semester 2, 2003 270 More on the Simple Linear Regression Example 10.1: A Linear Regression Model Model The model can be written equivalently as Suppose that it is known that a response Y is dependent upon the explanatory variable x via the relationship Yi = a + bxi + i (i = 1, 2, . . . , n) Y = 1 + 2x + where has a normal distribution with zero mean where 1, 2, . . . , n are independent N (0, σ 2) and standard deviation σ = 1.5. Hence, random ‘error’ terms. From Eqn. 2 we have E[Y ] = 1 + 2x E[Yi| xi] = a + bxi (3) and the regression parameters are intercept a = 1 and slope b = 2. • We write E[Yi|xi] to emphasise that the expected A computer was used to simulate 20 responses (i.e. value of Yi depends on xi. observations on Y ) when x = 2, and 20 responses when x = 4. The regression line is plotted below, • Eqn. 3 deﬁnes the population regression equation. along with a scatterplot of the simulated data. The quantities a and b are the regression parameters (representing intercept and slope respectively). Statistics 160 – Semester 2, 2003 271 Statistics 160 – Semester 2, 2003 272 10.2 Inference for Regression y Parameters 12 • • • • • • 10.2.1 Estimation • • • • 8 • • • Data: (x1, y1), (x2, y2), . . . , (xn, yn). • • • Aim: use these data to estimate the true • • • • • (population) regression equation. 4 • y = 1 + 2x • • • Least Squares Estimation: The least squares intercept a and slope ˆ are ˆ b estimates of the parameters a and b respectively. 0 1 2 3 4 5 x The least squares regression line y = a+ˆ ˆ ˆ bx Note: • The regression model describes the variability in is often called the ﬁtted regression line (to distinguish Y for any given x that we choose. it from the true regression line). • The regression line itself describes how the mean of Yi depends on xi. • The degree of scattering of points about the regression line is controlled by the error standard deviation, σ. Statistics 160 – Semester 2, 2003 273 Statistics 160 – Semester 2, 2003 274 Properties of Least Squares Regression Example 10.2: Linear Regression for Coeﬃcients the Image Quality Data The image quality data was introduced in Example • E[ˆ] = a and E[ˆ = b (so both are unbiased). a b] 3.2 (and revisited in Examples 3.4 and 3.7). In this data set image quality x and reading time y are n • Var(ˆ = σ 2/[ b) i=1(xi − x)2], so that ¯ available for 8 images. From Example 3.7 we see that the ﬁtted regression equation is σ σˆ = b n . i=1(xi − x)2 ¯ ˆ y = 9.6378 − 0.3485x so that the estimated slope parameter is ˆ = b To ﬁnd SE(ˆ we must estimate the unknown b) −0.3845. In this Example we shall compute the parameter σ 2. It turns out that an unbiased standard error of this estimate by hand. This is estimator of σ 2 is given by done for illustrative purposes only – in practice you can use a suitable computer package to perform the 1 necessary calculations. σ2 = ˆ RSS (n − 2) In order to compute SE(ˆ we need to obtain an b) where estimate, σ 2, of the error variance. This requires ˆ n us to compute the residual sum of squares for the RSS = (yi − [ˆ + ˆ i])2 a bx regression. This calculation is facilitated by setting i=1 out the data and other relevant quantities in a table. is the residual sum of squares. Hence Table 22 has columns as follows: x value, y value, σ ˆ ﬁtted value, residual and squared residual. The ﬁtted SE(ˆ = b) n . value for the ith observation is deﬁned by i=1(xi − x)2 ¯ yi = a + ˆ i ˆ ˆ bx Statistics 160 – Semester 2, 2003 275 Statistics 160 – Semester 2, 2003 276 √ √ while the residual is deﬁned by = 0.0612/ 4.835 ei = yi − yi = yi − [ˆ + ˆ i] ˆ ˆ a bx = 0.113 x y yˆ y−y ˆ (y − y )2 ˆ In fact this standard error is computed automatically 4.30 8.00 8.14 -0.14 0.019 by MINITAB when performing a regression analysis. 4.55 8.30 8.05 0.25 0.061 Below we reproduce some of the MINITAB output 5.55 7.80 7.70 0.10 0.009 from ﬁtting the regression in Example 3.7. 5.65 7.25 7.67 -0.42 0.175 Regression Analysis 5.95 7.70 7.56 0.14 0.018 6.30 7.50 7.44 0.06 0.003 The regression equation is 6.45 7.60 7.39 0.21 0.044 y = 9.64 - 0.349 x 6.45 7.20 7.39 -0.19 0.036 Sum 0.367 Predictor Coef Stdev t-ratio p Table 22: x − y data, ﬁtted value, residual and Constant 9.6378 0.6419 15.01 0.000 squared residual for the image data. x -0.3485 0.1125 -3.10 0.021 From Table 22 we see that the residual sum of MINITAB gives SE(ˆ = 0.1125 from this output b) n which tallies with the value that we computed by squares, RSS = i=1(yi − yi)2 = 0.367. Hence the ˆ error variance is estimated by hand. 1 0.367 σ2 = ˆ RSS = = 0.0612 n−2 8−2 n Now i=1(xi − x)2 = 4.835 for these data, so ¯ ˆ σ SE(ˆ = b) n i=1(xi − x)2 ¯ Statistics 160 – Semester 2, 2003 277 Statistics 160 – Semester 2, 2003 278 10.2.2 Conﬁdence Intervals 10.2.3 Signiﬁcance Testing for a Regression Slope The sampling distribution of ˆ is b Example 10.4: Predicting Tooth Age from a Measurement of Dentine n A group of researchers performed a study into ˆ ∼ N (b, σ 2/ b (xi − x)2) ¯ the possibility of predicting the age in years, i=1 y, of a tooth from the percentage, x, of the tooth’s root with transparent dentine. The data Using standard arguments it follows that a 100(1 − is displayed in the Table below. (Data source: ‘Root α)% conﬁdence interval for b is given by dentine transparency: age determination of human teeth using computerized densitometric analysis’, ˆ − tα/2(ν)SE(ˆ ˆ + tα/2(ν)SE(ˆ American Journal of Anthropology, 1991, pp. 25- b b), b b) 30.) where ν = n − 2. x 15 19 31 39 41 44 47 48 55 65 y 23 52 65 55 32 60 78 59 61 60 A scatter plot of this data is given in Figure 38 below. Example 10.3 From Example 10.2, ﬁnd a 95% conﬁdence interval for the regression slope b. Statistics 160 – Semester 2, 2003 279 Statistics 160 – Semester 2, 2003 280 Signiﬁcance Testing Continued 80 If, in truth, the response is not linearly related to the explanatory variable, then the true (population) 70 value of the slope will be b = 0. 60 Hypotheses 50 y H0 : b = 0 i.e. no linear relationship between variables 40 against one of the alternatives 30 H1 : b = 0 or H1 : b > 0 or H1 : b < 0 20 30 40 50 60 x Test statistic ˆ b Figure 38: Scatter plot of age, y, against percentage t= SE(ˆb) transparent dentine, x. which has a t distribution with ν = n − 2 degrees of Looking at this plot, the evidence that tooth age freedom, assuming that H0 is correct. is related to percentage transparent dentine is not entirely convincing. Is there really an increasing linear relationship between the variables, or is the hint of such behaviour in the scatter plot purely a result of random chance? Statistics 160 – Semester 2, 2003 281 Statistics 160 – Semester 2, 2003 282 P-value Example 10.5: Attempting to Predict Tooth Age The P-value for an observed test statistic t is given Following on from Example 10.4, we investigate by whether or not tooth age (Y ) can be predicted from the percentage of transparent dentine (x) by ﬁtting H1 P-value a linear regression b>0 P (T ≥ t) b<0 P (T ≤ t) E[Y ] = a + bx b=0 2P (T ≥ |t|) where T has a t distribution with ν = n − 2 degrees to the data, and then testing of freedom. These P-values can be computed by MINITAB. H0 : b = 0 against H1 : b = 0. We can ﬁt the regression model in MINITAB via the menu sequence Stat →Regression →Regression..., as described in Example 3.7. Regression Analysis The regression equation is y = 32.1 + 0.555 x Predictor Coef Stdev t-ratio p Constant 32.08 13.32 2.41 0.043 x 0.5549 0.3101 1.79 0.111 s = 14.30 R-sq = 28.6% R-sq(adj) = 19.7% Statistics 160 – Semester 2, 2003 283 Statistics 160 – Semester 2, 2003 284 Recall that for testing H0 against H1 we need to 11. Chi-Square Tests for Categorical compute the observed value of the test statistic Data ˆ b t= SE(ˆ b) 11.1 Testing a Probability Model From the MINITAB output, ˆ = 0.5549 and SE(ˆ = b b) Data: Count data for a categorical variable with r 0.3101 so that t = 0.5549/0.3101 = 1.79. In fact categories. there is no need to do this calculation – MINITAB also reports that the observed t statistic is 1.79 Model: A probability model assigns a probability to (which it refers to as a ‘t-ratio’). MINITAB reports each category. the P-value for testing Aim: Want to know whether this model is appropriate for the data – i.e. is the model ‘a good H0 : b = 0 ﬁt’. Notation: against the two-sided alternative H1 : b = 0 • pi denotes the probability that a given individual falls in the ith category (i = 1, 2, . . . , r). For the data at hand this P-value is P = 0.111. (Had we been considering the alternative H1 : p > 0 • R.v. Ni represent the number of individuals falling then we would have had to halve this P-value to in the ith category (i = 1, 2, . . . , r) when N1 + give the correct value P = 0.111/2 = 0.056.) We N2 + ... + Nr = n individuals are observed. conclude that the data do not provide statistically signiﬁcant evidence (at even a 0.1 level) against H0. See also M&M Examples 10.11, 10.12 and Exercises 10.1, 10.3 (use MINITAB). Statistics 160 – Semester 2, 2003 285 Statistics 160 – Semester 2, 2003 286 Example 11.1: Market Research for 11.1.1 Hypotheses Toothpaste In a ‘goodness of ﬁt test’ for a probability model, A market research company asks a random sample of the null hypothesis states that the model is correct. n = 100 people which of 4 brands of toothpaste (1, That is, 2, 3, 4) they prefer. (So ‘toothpaste brand preferred’ is a categorical variable with r = 4 categories.) H 0 : p 1 = π1 , p 2 = π2 , . . . , p r = πr Anecdotal evidence suggests that each of 4 brands of toothpaste are equally popular, but the company where π1, π2, . . . , πr are the category probabilities believes this may not be true. The market research according to the model. company therefore wants to test the adequacy of the probability model The alternative hypothesis is p1 = p2 = p3 = p4 = 1/4 H1 : H0 is not correct (where pi is the category i probability; i.e. the probability that a randomly selected individual prefers brand i). Let the r.v.s N1, N2, N3, N4 represent the counts for each category (i.e. the number of people preferring each brand). If the probability model is correct then E[Ni] = 25 for each category i = 1, 2, 3, 4. If we observe category counts n1, n2, n3, n4, we will have reason to doubt the probability model if these observed counts diﬀer markedly from 25. Statistics 160 – Semester 2, 2003 287 Statistics 160 – Semester 2, 2003 288 11.1.2 Test Statistic Example 11.2: Testing the Mendelian Theory of Inheritance • If we observe n individuals, then the expected number falling in category i is The data in Table 23 below give observed counts of diﬀerent kinds of pea seeds in crosses from plants E[Ni] = npi (i = 1, 2, . . . , r) with round yellow seeds and plants with wrinkled green seeds. The Mendelian theory of inheritance • Hence under the assumption that H0 is correct, 1 suggests that 16 of the seeds will be wrinkled green the expected category counts are nπ1, nπ2, . . . , 3 3 (WG), 16 will be round green (RG), 16 will be nπr . 9 wrinkled yellow (WY), and the remaining 16 will be round yellow (RY). • If we observe category counts n1, n2, . . . , nr then H0 would seem plausible if Type of seed RY WY RG WG Total Observed count 93 27 32 8 160 n1 ≈ nπ1 n2 ≈ nπ2 and so on Table 23: Pea seed counts from an experiment on Mendelian inheritance • On the other hand, if ni is very diﬀerent to nπi for some categories, this will indicate evidence Is the Mendelian model appropriate for the data? against H0. We investigate by testing Chi Square Statistic: The appropriate test statistic 9 3 3 1 is the chi square statistic H0 : p 1 = p2 = p3 = p4 = 16 16 16 16 r (ni − nπi)2 where p1 = P (RY), p2 = P (WY), p3 = P (RG), χ2 = i=1 nπi p4 = P (WG) . Greek letter chi (There is no need to state an alternative hypothesis Statistics 160 – Semester 2, 2003 289 Statistics 160 – Semester 2, 2003 290 in this type of goodness of ﬁt test. It is understood Chi Square Test Statistic that the alternative is simply ‘H0 is not correct’.) If H0 is correct, then the expected number of RYs is 9 • The larger the value of the chi square statistic, e1 = nπ1 = 160 × = 90. the more evidence that we have against H0. 16 Similarly, • If H0 is correct then a chi square statistic follows a χ2 distribution with r − 1 degrees of freedom 3 (approximately). for WY, e2 = 160 × = 30. 16 3 for RG, e3 = 160 × = 30. 16 1 for WG, e4 = 160 × = 10. 16 This can be set out neatly in table form. Type of seed RY WY RG WG Total Observed count, o 93 27 32 8 160 Expected count, e 90 30 30 10 160 o−e 3 -3 2 -2 0 (o − e)2/e 0.1 0.3 0.133 0.4 0.933 Hence, the χ2 statistic here is 0.933 (3 d.p.). Statistics 160 – Semester 2, 2003 291 Statistics 160 – Semester 2, 2003 292 11.1.3 χ2 Distribution Example 11.3: Chi Square Probabilities in MINITAB • If r.v. X has a chi square distribution with ν degrees of freedom, we write X ∼ χ2(ν). Suppose that we wish to ﬁnd P (X ≤ 5.0) where • The degrees of freedom is the number of X ∼ χ2(3). This can be done via the menu categories (r) minus the number of constraints on sequence Calc →Probability Distributions the counts. When testing the ﬁt of a probability →Chisquare..., but is more easily done by typing r model there is a single constraint: i=1 nr = n. in the commands directly in the MINITAB session window. The syntax is as follows: • Cumulative chi square probabilities, i.e. P (X ≤ x) where X ∼ χ2(ν) (see Figure 39), can be (i) Type in CDF 5.0; (NB the semi-colon) at MTB computed using MINITAB. prompt, then press return. (This indicates that we wish to compute a cumulative probability, P (X ≤ x), for x = 5.0.) (ii) After (i) you will then see a new SUBC prompt (standing for ‘sub-command’). Enter Chisquare 3. (NB the full-stop) at this prompt, then press return. The MINITAB session window should then look as follows. MTB > CDF 5.0; x 0 SUBC> Chisquare 3. Figure 39: P (X ≤ x), for a χ2 distribution. Cumulative Distribution Function Statistics 160 – Semester 2, 2003 293 Statistics 160 – Semester 2, 2003 294 11.1.4 P-Values for Chi Square Test Chisquare with 3 d.f. x P( X <= x) 5.0000 0.8282 • If χ2 is the observed value of a chi square test statistic, then the P-value is Hence P (X ≤ 5.0) = 0.8282. P = P (X ≥ χ2) Connection with P-Values To compute P-values in a chi square test we need to where X ∼ χ2(r − 1). compute probability of getting a test statistic more extreme (i.e. larger) than the observed one. So, if • The P-value for a chi square test is interpreted the observed test statistic is χ2, then we need to ﬁnd (and compared with a signiﬁcance level α) in the P (X ≥ χ2) where X ∼ χ2(ν). We can do this by usual way. using MINITAB to compute P (X < χ2) and then using the fact that P (X ≥ χ2) = 1 − P (X < χ2). Example 11.4 In Example 11.2 we observed a test statistic χ2 = 0.933. The degrees of freedom were ν = r − 1 = So for example, if we observed a test statistic χ2 = 4 − 1 = 3. Using MINITAB it can be shown that 5.0 when there are ν = 3 degrees of freedom, the P-value will be P (X ≤ 0.933) = 0.1825 P (X ≥ 5.0) = 1 − P (X < 5.0) where X ∼ χ2(3). Hence the P-value is = 1 − 0.8282 from above P = P (X ≥ 0.9333) = 1 − 0.1825 = 0.8175 = 0.1718 We conclude that there is no evidence that the Mendelian model is inappropriate for the data. Statistics 160 – Semester 2, 2003 295 Statistics 160 – Semester 2, 2003 296 11.1.5 Combining Categories Age Group 18-29 30-39 40-49 50-59 60-69 70-79 80+ Count 15 25 22 18 14 5 1 Table 25: Ages of sample of 100 adult South Africans • The χ2 statistic can be unreliable if some of the categories have expected counts (i.e. nπi = ei) Let pi be the probability that a randomly selected smaller than 5. individual sampled by the company falls in age group i (where age group 1 is 18-29 years, age group 2 is 30-39 years and so on). We can investigate • Categories with small expected counts should be whether the market research company is obtaining combined to overcome this problem. an unrepresentative sample (with respect to age) by testing Example 11.5: A Market Survey H0 : p1 = 0.29 p2 = 0.24 p3 = 0.20 p4 = 0.15 A market research company is concerned that it is not obtaining samples with a representative age p5 = 0.08 p6 = 0.03 p7 = 0.01 proﬁle. One particular survey concerned the opinions (against the assumed alternative, H1, that ‘H0 is of adults in South Africa. The age proﬁle of adult not correct’). Now under this null hypothesis, the South Africans is described in Table 24 below. expected counts from a sample of size n = 100 are as follows: Age Group e1 e2 e3 e4 e5 e6 e7 18-29 30-39 40-49 50-59 60-69 70-79 80+ 29 24 20 15 8 3 1 % 29 24 20 15 8 3 1 Table 24: Age proﬁle for adult South Africans • The expected number of individuals in the last category (80 years or older) is e7 = 1. The company’s survey technique obtained a sample of 100 adults, split by age group as displayed in Table • Since this expect count is less than 5 we should 25. combine this group with another. Statistics 160 – Semester 2, 2003 297 Statistics 160 – Semester 2, 2003 298 • It makes sense to combine groups 6 and 7 (to give 11.2 Testing for Association in a single new group 6, corresponding to 70 years Two-Way Contingency Tables and older). • The new group 6 has expected count equal to the sum of the expected counts from the original 11.2.1 Introduction groups 6 and 7; that is, e6 + e7 = 3 + 1 = 4. Recall from section 3.5 that when individuals are • This expected count is less than 5 so we must grouped by two categorical variables, then we can continue to combine categories. arrange the counts in a two-way contingency table. • We now combine the original groups 5, 6 and 7 (to make a new group of 60 years and older). This group has expected count 8 + 3 + 1 = 12, so there is no need for further group combination. • After all the combinations of groups, we obtain the following table of observed and expected counts. Age group 18-29 30-39 40-49 50-59 60+ Expected 29 24 20 15 12 8+3+1 Observed 15 25 22 18 20 14+5+1 • The chi square test can now continue in the usual way using this listing of expected and observed counts. (The remainder of the test is left as an exercise.) Statistics 160 – Semester 2, 2003 299 Statistics 160 – Semester 2, 2003 300 Example 11.6: Child Beneﬁt and 11.2.2 Hypotheses Occupation • Typically want to know if there is a relationship A researcher investigating public attitudes to the level between the row and column variables. of welfare beneﬁts in the United Kingdom carried out a pilot survey in the town where she lived. A simple • However, the exact statement of the hypotheses random sample of individuals was drawn from the depend on the manner in which the data were electoral register, and each member of the sample collected. was asked to complete a questionnaire. Table 26 gives the results for questions on child beneﬁt, in which respondents are categorized by occupation. • Depends on whether marginal totals are ﬁxed. Child Non-manual Manual No current Total beneﬁt is: occupation occupation occupation Too high 18 13 3 34 About right 29 40 13 82 Too low 10 26 11 47 Don’t know 15 14 8 37 Total 72 93 35 200 Table 26: Responses to a survey on child beneﬁt These data form an example of a 4 × 3 contingency table. (More generally, a r × c contingency table is a two-way table with r rows and c columns.) It is of interest to social scientists and policy makers to investigate whether an individual’s occupation is related to their feelings about child beneﬁt. Statistics 160 – Semester 2, 2003 301 Statistics 160 – Semester 2, 2003 302 Marginal Totals Fixed Totals Fixed in One Margin Suppose the data have been collected from an • If the data have been collected from a survey experiment (or a survey using stratiﬁed sampling) then neither row nor column marginal totals will so that be ﬁxed. • each row is a sample from a particular population; • In this case the appropriate hypotheses are • the marginal row totals are ﬁxed. H0 : No association between row and column variables In this situation the row variable is an explanatory and variable (or factor) and the column variable is a response. H1 : Association between row and column variables exists The hypotheses are H0 : The distribution of the response is the same for every level of the explanatory variable Example 11.7 From example 11.6 we will be interested in testing and H1 : The distribution of the response is not the H0 : No assoc. between occupation and opinion on c.b. same for every level of the explanatory variable against H1 : Assoc. between occupation and opinion on c.b. exists Statistics 160 – Semester 2, 2003 303 Statistics 160 – Semester 2, 2003 304 Example 11.8 11.2.3 Test Statistics and P-values A market research company surveys 50 employed, and 50 unemployed people, asking how they plan to Denote observed counts in a r × c table as follows. vote in the next federal election. The results are as follows. Column 1 2 ... c Total Labor Liberal Other Total Row 1 o11 o12 . . . o1c o1· Employed 18 27 5 50 2 o21 o22 . . . o2c o2· Unemployed 29 17 4 50 . . . . . . ... . . . . Total 47 44 9 r or1 cr2 . . . orc or· Here employment status is an explanatory variable, Total o·1 o·2 . . . o·c n and voting intention is a response. It may be of Hence oij is the observed count in cell i, j of the interest to test table. H0 : he distribution of voting intentions is the same for employed and unemployed people Expected Counts against Assuming H0 is correct, the expected count in cell H1 : The distribution of voting intentions diﬀers i, j is according to employment status oi·o·j eij = n Row i total × Column j total Important: The test statistic, P-value etc. are = calculated in the same way no matter whether row Grand total marginal totals are ﬁxed or not. Statistics 160 – Semester 2, 2003 305 Statistics 160 – Semester 2, 2003 306 Chi Square Test Statistic Example 11.9: Child Beneﬁt and Occupation The chi square test statistic is Following on from Example 11.6, we are interested r c in testing (oij − eij )2 χ2 = H0: There is no association between occupation and i=1 j=1 eij opinion on child beneﬁt, (i.e. they are independent) • Under the assumption that H0 is correct, this against test statistic has an (approximate) chi square H1: There is an association between occupation and distribution with ν = (r − 1)(c − 1) degrees of opinion on child beneﬁt. freedom. Solution by Hand • The test statistic is likely to take extreme values Suppose H0 is correct. Then the expected number if H0 is not correct. of non-manual workers who think that the level of child beneﬁt is too high will be 72 × 34/200 = 12.24. P-Value Carrying on in a similar fashion produces the Table The P-value for an observed test statistic χ2 is given 27 of expected counts: by P = P (X ≥ χ2) Child Non-manual Manual No current Total beneﬁt is: occupation occupation occupation where X ∼ χ2(ν) and ν = (r − 1)(c − 1). Too high 12.240 15.810 5.950 34 About right 29.520 38.130 14.350 82 Too low 16.920 21.855 8.225 47 Don’t know 13.320 17.205 6.475 37 Total 72 93 35 200 Table 27: Expected cell counts. Statistics 160 – Semester 2, 2003 307 Statistics 160 – Semester 2, 2003 308 The contributions to the χ2 statistic from each Solution Using MINITAB category, (o − e)2/e, are The ﬁrst step is to enter the columns of the table into columns C1-C3 (say) of a new MINITAB Worksheet. Child Non-manual Manual No current The χ2 test can be carried out via the menu sequence beneﬁt is: occupation occupation occupation Stat, Tables, Chisquare Test..., which provides Too high 2.7106 0.4994 1.4626 a dialogue box asking for the worksheet columns About right 0.0092 0.0917 0.1270 containing the contingency table. Too low 2.8302 0.7861 0.9362 Don’t know 0.2119 0.5970 0.3592 The output in the session window is as follows. The entries in this table sum to give χ2 = 10.6211. Expected counts are printed below observed counts 2 If H0 is correct, this statistic comes from a χ distribution with (4 − 1) × (3 − 1) = 6 degrees of C1 C2 C3 Total freedom. Using MINITAB, 1 18 13 3 34 12.24 15.81 5.95 P (X ≤ 10.6211) = 0.8993 2 29 40 13 82 29.52 38.13 14.35 where X ∼ χ2(6). Hence the P-value is 3 10 26 11 47 P = P (X ≥ 10.6211) = 1−0.8993 = 0.101 3 d.p. 16.92 21.85 8.23 4 15 14 8 37 Conclusion: There is insuﬃcient evidence to 13.32 17.20 6.47 conclude that an association between type of occupation and attitude towards level of child beneﬁt Total 72 93 35 200 exists. ChiSq = 2.711 + 0.499 + 1.463 + Statistics 160 – Semester 2, 2003 309 Statistics 160 – Semester 2, 2003 310 0.009 + 0.092 + 0.127 + Example 11.10: Heart Attacks and 2.830 + 0.786 + 0.936 + Smoking in Danish Women 0.212 + 0.597 + 0.359 = 10.621 df = 6, p=0.101 Note that this output gives us the following Recall from Example 9.4 that we had data on over information. 11,000 Danish women, each one of who was classiﬁed according to whether or not she smoked, and also • Observed and expected counts for each cell. according to whether or not she had suﬀered a heart attack. These data may be displayed in a 2 × 2 • Observed chi square statistic (χ2 = 10.621). contingency table, as below. Heart attack • The degrees of freedom, ν = 6. No Yes Smoker 6081 380 • The P-value of P = 0.101 Non-smoker 4879 132 We may analyse this data using a chi square test in Naturally our conclusion is exactly as above. the usual way. We can set up the hypotheses as H0 : No assoc. between heart attacks and smoking and H1 : Assoc. between heart attacks and smoking Using MINITAB we get the following output: Statistics 160 – Semester 2, 2003 311 Statistics 160 – Semester 2, 2003 312 Chi-Square Test against H1 : p S = pN S Expected counts are printed below observed counts (where pS is the population proportion of female smokers having heart attacks, and pN S the C1 C2 Total population proportion for non-smokers), we can use 1 6081 380 6461 the techniques from chapter 9. The test statistic is 6172.64 288.36 0.0588 − 0.0263 2 4879 132 5011 z = 0.0446 × (1 − 0.0446) × (1/6461 + 1/5011) 4787.36 223.64 = 8.36 Total 10960 512 11472 The P-value corresponding to this test statistic is ChiSq = 1.361 + 29.125 + p < 0.00001. Hence we again have extremely strong 1.754 + 37.553 = 69.793 evidence against H0, i.e. in favour of a diﬀerence in df = 1, p = 0.0000 risk of heart attack for smokers and non-smokers. We can conclude that there is extremely strong Which Method is Better? evidence to support the existence of an association The chi square contingency table approach and between having a heart attack and smoking. the comparison of proportions approach are both perfectly valid. They will give the same P-values. It We could have analyzed this data in a diﬀerent way, is a matter of taste as to which method of analysis by comparing the proportions of women having heart you prefer. attacks in the smoking and non-smoking groups. ˆ For the smokers, the proportion is pS = 380/6461 = 0.0588, and for the non-smokers the proportion is See also M&M Exercises 9.1, 9.7, 9.9, 9.21, 9.23 ˆ pN S = 132/5011 = 0.0263. For testing H0 : p S = pN S Statistics 160 – Semester 2, 2003 313 Statistics 160 – Semester 2, 2003 314

DOCUMENT INFO

Shared By:

Categories:

Tags:
united states, the organization, summary statistics, test statistic, problem solving, conﬁdence interval, t distribution, population variances, degrees of freedom, survival values, cell survival, survival results, wild horses, the other, thermal parameters

Stats:

views: | 14 |

posted: | 1/8/2010 |

language: | English |

pages: | 28 |

OTHER DOCS BY morgossi7a8

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.