Document Sample

Statistics 371, Fall 2004 Pooled Standard Error If we wish to assume that the two population standard deviations are equal, σ1 = σ2, then it makes sense to use data from both samples to estimate the common population standard deviation. We estimate the common population variance with a weighted average of the sample variances, weighted by the degrees of freedom. (n1 − 1)s2 + (n2 − 1)s2 s2 pooled = 1 2 n1 + n2 − 2 The pooled standard error is then as below. 1 1 SEpooled = spooled + n1 n2 Two Independent Samples Statistics 371, Fall 2004 4 Sampling Distributions Comparing Two Groups The sampling distribution of the diﬀerence in sample means has • Chapter 7 describes two ways to compare two populations University of Wisconsin - Madison these characteristics. on the basis of independent samples: a conﬁdence interval for the diﬀerence in population means and a hypothesis test. Department of Statistics • The basic structure of the conﬁdence interval is the same • Mean: µ1 − µ2 as in the previous chapter — an estimate plus or minus a multiple of a standard error. 2 σ1 2 σ2 • Hypothesis testing will introduce several new concepts. • SD: n1 + n2 October 18, 2004 • Shape: Exactly normal if both populations are normal, Bret Larget approximately normal if populations are not normal but both sample sizes are suﬃciently large. Statistics 371, Fall 2004 5 Statistics 371, Fall 2004 1 Theory for Conﬁdence Interval Setting The recipe for constructing a conﬁdence interval for a single pop- • Model two populations as buckets of numbered balls. ulation mean is based on facts about the sampling distribution • The population means are µ1 and µ2, respectively. of the statistic • The population standard deviations are σ1 and σ2, respec- ¯ Y −µ tively. T = . ¯ SE(Y ) • We are interested in estimating µ1 − µ2 and in testing the Similarly, the theory for conﬁdence intervals for µ1 − µ2 is based hypothesis that µ1 = µ2. on the sampling distribution of the statistic ¯ ¯ (Y1 − Y2) − (µ1 − µ2) T = ¯ ¯ SE(Y1 − Y2) where we standardize by subtracting the mean and dividing by mean µ1 (1) (1) mean µ2 (2) (2) y1 ,..., yn1 y1 ,..., yn2 the standard deviation of the sampling distribution. sd σ1 s1 sd σ2 s2 y1 y2 If both populations are normal and if we know the population Statistics 371, Fall 2004 6 Statistics 371, Fall 2004 2 Theory for Conﬁdence Interval ¯ ¯ Standard Error of y1 − y2 standard deviations, then The standard error of the diﬀerence in two sample means is an empirical measure of how far the diﬀerence in sample means ¯ ¯ (Y1 − Y2) − (µ1 − µ2) will typically be from the diﬀerence in the respective population Pr −1.96 ≤ ≤ 1.96 = 0.95 2 σ1 2 σ2 means. n1 + n2 s2 2 1 + s2 where we can choose z other than 1.96 for diﬀerent conﬁdence ¯ SE(¯1 − y2) = y n1 n2 levels. This statement is true because the expression in the middle has a standard normal distribution. An alternative formula is But in practice, we don’t know the population standard devia- tions. If we substitute in sample estimates instead, we get this. ¯ SE(¯1 − y2) = y (SE(¯1))2 + (SE(¯2))2 y y ¯ ¯ (Y1 − Y2) − (µ1 − µ2) This formula reminds us of how to ﬁnd the length of the Pr −t ≤ ≤t = 0.95 s2 1 s2 2 hypotenuse of a triangle. n1 + n2 We need to choose diﬀerent end points to account for the (Variances add, but standard deviations don’t.) additional randomness in the denominator. Statistics 371, Fall 2004 6 Statistics 371, Fall 2004 3 Example Using R Theory for Conﬁdence Interval Exercise 7.21 It turns out that the sampling distribution of the statistic above is approximately a t distribution where the degrees of freedom This exercise examines the growth of bean plants under red and should be estimated from the data as well. green light. A 95% conﬁdence interval is part of the output below. Algebraic manipulation leads to the following expression. > ex7.21 = read.table("lights.txt", header = T) > str(ex7.21) s2 1 s2 s2 s2 Pr ¯ ¯ (Y1 − Y2 ) − t + 2 ≤ µ1 − µ2 ≤ (Y1 − Y2 ) + t ¯ ¯ 1 + 2 = 0.95 ‘data.frame’: 42 obs. of 2 variables: n1 n2 n1 n2 $ height: num 8.4 8.4 10 8.8 7.1 9.4 8.8 4.3 9 8.4 ... $ color : Factor w/ 2 levels "green","red": 2 2 2 2 2 2 2 2 2 2 ... We use a t multiplier so that the area between −t and t under > attach(ex7.21) > t.test(height ~ color) a t distribution with the estimated degrees of freedom will be Welch Two Sample t-test 0.95. data: height by color t = 1.1432, df = 38.019, p-value = 0.2601 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.4479687 1.6103216 sample estimates: mean in group green mean in group red 8.940000 8.358824 Statistics 371, Fall 2004 9 Statistics 371, Fall 2004 6 Example Assuming Equal Variances Conﬁdence Interval for µ1 − µ2 For the same data, were we to assume that the population The conﬁdence interval for diﬀerences in population means has variances were equal, the degrees of freedom, the standard error, the same structure as that for a single population mean. and the conﬁdence interval are all slightly diﬀerent. (Estimate) ± (t Multiplier) × SE > t.test(height ~ color, var.equal = T) The only diﬀerence is that for this more complicated setting, we Two Sample t-test have more complicated formulas for the standard error and the data: height by color degrees of freedom. t = 1.1064, df = 40, p-value = 0.2752 alternative hypothesis: true difference in means is not equal to 0 Here is the df formula. 95 percent confidence interval: -0.4804523 1.6428053 (SE2 + SE2)2 1 2 df = sample estimates: SE4/(n1 − 1) + SE4/(n2 − 1) 1 2 mean in group green mean in group red √ 8.940000 8.358824 where SEi = si/ ni for i = 1, 2. As a check, the value is often close to n1 + n2 − 2. (This will be exact if s1 = s2 and if n1 = n2.) The value from the messy formula will always be between the smaller of n1 − 1 and n2 − 1 and n1 + n2 − 2. Statistics 371, Fall 2004 10 Statistics 371, Fall 2004 7 Hypothesis Tests Example • Hypothesis tests are an alternative approach to statistical Exercise 7.12 inference. • Unlike conﬁdence intervals where the goal is estimation with In this example, subjects with high blood pressure are randomly assessment of likely precision of the estimate, the goal of allocated to two treatments. The biofeedback group receives hypothesis testing is to ascertain whether or not data is relaxation training aided by biofeedback and meditation over consistent with what we might expect to see assuming that eight weeks. The control group does not. Reduction in systolic a hypothesis is true. blood pressure is tabulated here. • The logic of hypothesis testing is a probabilistic form of proof by contradiction. Biofeedback Control • In logic, if we can say that a proposition H leads to a n 99 93 contradiction, then we have proved H false and have proved ¯ y 13.8 4.0 SE 1.34 1.30 {notH} to be true. • In hypothesis testing, if observed data is highly unlikely under For 190 degrees of freedom (which come from both the simple an assumed hypothesis H, then there is strong (but not and messy formulas) the table says to use 1.977 (140 is rounded deﬁnitive) evidence that the hypothesis is false. down) whereas with R you ﬁnd 1.973. Statistics 371, Fall 2004 11 Statistics 371, Fall 2004 8 Logic of Hypothesis Tests Example All of the hypothesis tests we will see this semester fall into this A calculator or R can compute the margin of error. general framework. > se = sqrt(1.34^2 + 1.3^2) > tmult = qt(0.975, 190) > me = round(tmult * se, 1) 1. State a null hypothesis and an alternative hypothesis. > se [1] 1.866976 > tmult [1] 1.972528 2. Gather data and compute a test statistic. > me [1] 3.7 3. Consider the sampling distribution of the test statistic We are 95% conﬁdent that the mean reduction in systolic assuming that the null hypothesis is true. blood pressure due to the biofeedback treatment in a population of similar individuals to those in this study 4. Compute a p-value, a measure of how consistent the data would be between 6.1 and 13.5 mm more than the mean is with the null hypothesis in consideration of a speciﬁc reduction in the same population undergoing the control alternative hypothesis. treatment. Statistics 371, Fall 2004 12 Statistics 371, Fall 2004 8 Example: Calculate a Test Statistic Logic of Hypothesis Tests If the population means are equal, their diﬀerence is zero. This 5. Assess the strength of the evidence against the null hypoth- test statistic tells us that the actual observed diﬀerence in sample esis in the context of the problem. means is 1.99 standard errors away from zero. We will introduce all of these concepts in the setting of testing the equality of two population means, but the general ideas will reappear in many settings throughout the remainder of the semester. Statistics 371, Fall 2004 15 Statistics 371, Fall 2004 12 Example: Find the Sampling Wisconsin Fast Plants Example Distribution The sampling distribution of the test statistic is a t distribution • In an experiment, seven Wisconsin Fast Plants (Brassica with degrees of freedom calculated by the messy formula. This campestris) were grown with a treatment of Ancymidol useful R code computes it. If you type this in and save your work (ancy) and eight control plants were given ordinary water. space at the end of a session, you can use it again in the future. • The null hypothesis is that the treatment has no eﬀect on plant growth (as measured by the height of the plant after > getDF = function(s1, n1, s2, n2) { + se1 = s1/sqrt(n1) 14 days of growth). + se2 = s2/sqrt(n2) • The alternative hypothesis is that the treatment has an eﬀect + return((se1^2 + se2^2)^2/(se1^4/(n1 - 1) + se2^4/(n2 - 1))) which would result in diﬀerent mean growth amounts + } > getDF(4.8, 8, 4.7, 7) • A summary of the sample data is as follows. The eight [1] 12.80635 control plants had a mean growth of 15.9 cm and standard deviation 4.8 cm. The seven ancy plants had a mean growth of 11.0 cm and standard deviation 4.7 cm. • The question is, is it reasonable to think that the observed diﬀerence in sample means of 4.9 cm is due to chance variation alone, or is there evidence that some of the diﬀerence is due to the ancy treatment? Statistics 371, Fall 2004 16 Statistics 371, Fall 2004 13 Example: Compute a P-Value Example: State Hypotheses To describe how likely it is to see such a test statistic, we can Let µ1 be the population mean growth with the control condi- ask what is the probability that chance alone would result in a tions and let µ2 be the population mean with ancy. test statistic at least this far from zero? The answer is the area below −1.99 and above 1.99 under a t density curve with 12.8 The null and alternative hypotheses are expressed as degrees of freedom. H0 : µ1 = µ2 HA : µ1 = µ2 With the t-table, we can only calculate this p-value within a We state statistical hypotheses as statements about population range. If we round down to 12 df, the t statistic is bracketed parameters. between 1.912 and 2.076 in the table. Thus, the area to the right of 1.99 is between 0.03 and 0.04. The p-value in this problem is twice as large because we need to include as well the area to the left of −1.99. So, 0.06 < p < 0.08. With, R we can be more precise. > p = 2 * pt(-ts, getDF(4.8, 8, 4.7, 7)) > p [1] 0.06783269 Statistics 371, Fall 2004 17 Statistics 371, Fall 2004 14 Example: Interpreting a P-Value Example: Calculate a Test Statistic The smaller the p-value, the more inconsistent the data is with In the setting of a diﬀerence between two independent sample the null hypothesis, the stronger the evidence is against the null means, our test statistic is hypothesis in favor of the alternative. ¯ (¯1 − y2) − (µ1 − µ2) y t= s2 1 s2 2 n1 + n2 Traditionally, people have measured statistical signiﬁcance by comparing a p-value with arbitrary signiﬁcance levels such as (Your book adds a subscript, ts, to remind you that this is computed from the sample.) α = 0.05. The phrase “statistically signiﬁcant at the 5% level” means that the p-value is smaller than 0.05. For the data, we ﬁnd this. > se = sqrt(4.8^2/8 + 4.7^2/7) In reporting results, it is best to report an actual p-value and > se not simply a statement about whether or not it is “statistically [1] 2.456769 > ts = (15.9 - 11)/se signiﬁcant”. > ts [1] 1.994489 The standard error tells us that we would expect that the observed diﬀerence in sample means would typically diﬀer from the population diﬀerence in sample means by about 2.5 cm. Statistics 371, Fall 2004 18 Statistics 371, Fall 2004 15 Type I and Type II Errors Example: Summarizing the Results There are two possible decision errors. For this example, I might summarize the results as follows. • Rejecting a true null hypothesis is a Type I error. There is slight evidence (p = 0.068, two-sided indepen- • You can interpret α = Pr {rejecting H0 | H0 is true}, so α is dent sample t-test) that there is a diﬀerence in the mean the probability of a Type I error. (You cannot make a Type I height at 14 days between Wisconsin Fast Plants grown error when the null hypothesis is false.) with ordinary water and those grown with Ancymidol. • Not rejecting a false null hypothesis is a Type II error. • It is convention to use β as the probability of a Type II error, or β = Pr {not rejecting H0 | H0 is false}. If the null Generally speaking, a conﬁdence interval is more informative hypothesis is false, one of the many possible alternative than a p-value because it estimates a diﬀerence in the units of hypotheses is true. It is typical to calculate β separately the problem, which allows the reader with background knowledge for each possible alternative. (In this setting, for each value in the subject area to assess both the statistical signiﬁcance and of µ1 − µ2.) the practical importance of the observed diﬀerence. In contrast, • Power is the probability of rejecting a false null hypothesis. a hypothesis test examines statistical signiﬁcance alone. Power = 1 − β. Statistics 371, Fall 2004 23 Statistics 371, Fall 2004 19 More on P -Values Rejection Regions Another way to think about P -values is to recognize that they Suppose that we were asked to make a decision about a depend on the values of the data, and so are random variables. hypothesis based on data. We may decide, for example to reject Let P be the p-value from a test. the null hypothesis if the p-value were smaller than 0.05 and to not reject the null hypothesis if the p-value were larger than 0.05. • If the null hypothesis is true, then P is a random variable distributed uniformly between 0 and 1. This procedure has a signiﬁcance level of 0.05, which means that • In other words, the probability density of P is a ﬂat rectangle. if we follow the rule, there is a probability of 0.05 of rejecting • Notice that this implies that Pr {P < c} = c for any number c a true null hypothesis. (We would need further assumptions to between 0 and 1. If the null is true, there is a 5% probability calculate the probability of not rejecting a false null hypothesis.) that P is less than 0.05, a 1% probability P is less than 0.01, and so on. Rejecting the null hypothesis occurs precisely when the test • On the other hand, if the alternative hypothesis is true, then statistic falls into a rejection region, in this case either the upper the distribution of P will be not be uniform and instead will or lower 2.5% tail of the sampling distribution. be shifted toward zero. Statistics 371, Fall 2004 24 Statistics 371, Fall 2004 20 Relationship between t tests and Simulation conﬁdence intervals We can explore these statements with a simulation based on the The rejection region corresponds exactly to the test statistics for Wisconsin Fast Plants example. The ﬁrst histogram shows p- which a 95% conﬁdence interval contains 0. values from 10,000 samples where µ1 − µ2 = 0 while the second assumes that µ1 − µ2 = 5. Both simulations use σ1 = σ2 = 4.8 but the calculation of the p-value does not. We would reject the null hypothesis H0 : µ1 − µ2 = 0 versus the two-sided alternative at the α = 0.05 level of Sampling Dist of P under Null Sampling Dist of P under Alt. signiﬁcance if and only if a 95% conﬁdence interval for µ1 − µ2 does not contain 0. We could make similar statements for general α and a (1 − α) × 100% conﬁdence interval. 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 P−value P−value Statistics 371, Fall 2004 25 Statistics 371, Fall 2004 21 More P-value Interpretations Comparing α and P -values A verbal deﬁnition of a p-value is as follows. • In this setting, the signiﬁcance level α and p-values are both areas under t curves, but they are not the same thing. • The signiﬁcance level is a prespeciﬁed, arbitrary value, that The p-value of the data is the probability calculated does not depend on the data. assuming that the null hypothesis is true of obtaining a • The p-value depends on the data. test statistic that deviates from what is expected under • If a decision rule is to reject the null hypothesis when the the null (in the direction of the alternative hypothesis) test statistic is in a rejection region, this is equivalent to at least as much as the actual data does. rejecting the null hypothesis when the p-value is less than the signiﬁcance level α. The p-value is not the probability that the null hypothesis is true. Interpreting the p-value in this way will mislead you! Statistics 371, Fall 2004 26 Statistics 371, Fall 2004 22 Exercise 7.54 Example for P-value Interpretation Calculate a test statistic. In a medical testing setting, we may want a procedure that indicates when a subject has a disease. We can think of the > ts = (31.96 - 25.32)/sqrt(12.05^2/25 + 13.78^2/25) > ts decision healthy as corresponding to a null hypothesis and the [1] 1.813664 decision ill as corresponding to the alternative hypothesis. Find the null sampling distribution. Consider now a situation where 1% of a population has a disease. Suppose that a test has an 80% chance of detecting the disease The book reports a t distribution with 47.2 degrees of freedom. when a person has the disease (so the power of the test is 80%) We can check this. and that the test has a 95% of correctly saying the person does > degf = getDF(12.05, 25, 13.78, 25) not have the disease when the person does not (so there is a 5% > degf [1] 47.16131 chance of a false positive, or false rejecting the null). Compute a (one-sided) p-value. > p = 1 - pt(ts, degf) > p [1] 0.03804753 Statistics 371, Fall 2004 30 Statistics 371, Fall 2004 27 Exercise 7.54 Example (cont.) Summarize the results. Here is a table of the results in a hypothetical population of 100,000 people. True Situation There is fairly strong evidence that the drug would Healthy Ill provide more pain relief than the placebo on average (H0 is true) (H0 is false) Total Test Negative 94,050 200 94,250 for a population of women similar to those in this study (do not reject H0 ) (p = 0.038, one-sided independent sample t-test). Result Positive 4,950 800 5,750 (reject H0 ) Total 99,000 1,000 100,000 Notice that this result is “statistically signiﬁcant at the 5% level” Notice that of the 5750 times H0 was rejected (so that the because the p-value is less than 0.05. the test indicated illness), the person was actually healthy 4950/5750 = 86% the time! For a two-sided test, the p-value would be twice as large, and not signiﬁcant at the 5% level. A rule that rejects H0 when the p-value is less than 5% only rejects 5% of the true null hypotheses, but this can be a large proportion of the total number of rejected hypotheses when the false null hypotheses occur rarely. Statistics 371, Fall 2004 30 Statistics 371, Fall 2004 28 Validity of t Methods One-tailed Tests • All of the methods seen so far are formally based on the • Often, we are interested not only in demonstrating that two assumption that populations are normal. population means are diﬀerent, but in demonstrating that • In practice, they are valid as long as the sampling distribution the diﬀerence is in a particular direction. of the diﬀerence in sample means is approximately normal, • Instead of the two-sided alternative µ1 = µ2, we would which occurs when the sample sizes are large enough choose one of two possible one-sided alternatives, µ1 < µ2 or (justiﬁed by the Central Limit Theorem). µ1 > µ 2 . • Speciﬁcally, we need the sampling distribution of the test • For the alternative hypothesis HA : µ1 < µ2, the p-value is statistic to have an approximate t distribution. the area to the left of the test statistic. • But what if the sample sizes are small and the samples • For the alternative hypothesis HA : µ1 > µ2, the p-value is indicate non-normality in the populations? the area to the right of the test statistic. • One approach is to transform the data, often by taking loga- • If the test statistic is in the direction of the alternative rithms, so that the transformed distribution is approximately normal. hypothesis, the p-value from a one-sided test will be half • The textbook suggests a nonparametric method called the the p-value of a two-sided test. Wilcoxon-Mann-Whitney test that is based on converting the data to ranks. • I will show an alternative called a permutation test. Statistics 371, Fall 2004 31 Statistics 371, Fall 2004 29 Permutation Tests Exercise 7.54 • The idea of a permutation test in this setting is quite The following data comes from an experiment to test the eﬃcacy straightforward. of a drug to reduce pain in women after child birth. Possible pain • We begin by computing the diﬀerence in sample means for relief scores vary from 0 (no relief) to 56 (complete relief). the two samples of sizes n1 and n2. • Now, imagine taking the group labels and mixing them up Pain Relief Score (permuting them) and then assigning them at random to the Treatment n mean sd observations. We could then again calculate a diﬀerence in Drug 25 31.96 12.05 sample means. Placebo 25 25.32 13.78 • Next, imagine doing this process over and over and collecting the permutation sampling distribution of the diﬀerence in State hypotheses. sample means. • If the diﬀerence in sample means for the actual grouping of the data is atypical as compared to the diﬀerences from Let µ1 be the population mean score for the drug. and µ2 be random groupings, this indicates evidence that the actual the population mean score for the placebo. grouping is associated with the measured variable. • The p-value would be the proportion of random relabellings with sample mean diﬀerences at least as extreme as that from the original groups. H0 : µ1 = µ2 HA : µ1 > µ2 Statistics 371, Fall 2004 32 Statistics 371, Fall 2004 30 Permutation Tests • With very small samples, it is possible to enumerate all possible ways to divide the n1 + n2 total observations into groups of size n1 and n2. • An R function can carry out a permutation test. Statistics 371, Fall 2004 32 Example Soil cores were taken from two areas, an area under an opening in a forest canopy (the gap) and a nearby area under an area of heavy tree growth (the growth). The amount of carbon dioxide given oﬀ by each soil core (in mol CO2 g soil/hr). > growth = c(17, 20, 170, 315, 22, 190, 64) > gap = c(22, 29, 13, 16, 15, 18, 14, 6) > boxplot(list(growth = growth, gap = gap)) 300 200 100 0 growth gap Statistics 371, Fall 2004 33 Example Permutation Test in R > library(exactRankTests) > perm.test(growth, gap) 2-sample Permutation Test data: growth and gap T = 798, p-value = 0.006371 alternative hypothesis: true mu is not equal to 0 There is very strong evidence (p = 0.0064, two sample permutation test) that the soil respiration rates are diﬀerent in the gap and growth areas. Statistics 371, Fall 2004 34

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 2 |

posted: | 2/14/2012 |

language: | |

pages: | 6 |

OTHER DOCS BY wuzhengqin

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.