VIEWS: 4 PAGES: 17 POSTED ON: 2/24/2013 Public Domain
Everything about Single Proportions STA 570 401-402 Spring 2010 Things we’ve learned about hypothesis testing 1) Planning a hypothesis test – For fixed α and power at a fixed alternative value, we can determine the required sample size n. 2) Planning or Analysis phase of a hypothesis test. – For fixed n and α, we can compute the cutoff for the hypothesis test (where to reject). – For fixed n and α, we can compute the power for a fixed alternative value. 3) Analysis phase of a hypothesis test (e.g. we have data). – Determine whether or not to accept or reject H0. – Compute the p-value of the test. Things we’ve learned about confidence intervals 1) Prior to collecting data – We can determine the required sample size to achieve a particular confidence interval length. 2) After collecting data – We can construct a confidence interval from the data. Information common to all hypothesis tests we have studied or will study The null hypothesis states a parameter is equal to a specific value, while the alternative can state that the parameter is any of “<“, “>”, or “≠” that specific value. The cutoff of the hypothesis test is based on the null distribution, which is the sampling distribution when H0 is true. The cutoff is in the direction/s of the alternative hypothesis. Cutoff/s – α percentile for a “<“ alternative (reject below) – 1-α percentile for a “>” alternative (reject above) – both the α/2 and 1-(α/2) percentiles for a “≠” alternative (reject to the extremes) More common information Finding the power of the test requires the alternative distribution, the sampling distribution at a fixed point in the alternative hypothesis. The power of the test is the probability of rejecting the null hypothesis when the alternative hypothesis is true (the right answer, thus we want a high power). More common information To compute a sample size in advance of the experiment, we want to find a sample size which simultaneously achieves a given α and power. We need both the null and alternative distributions in terms of an unknown n. We then solve an equation equating percentiles of the null and alternative distributions, based on the direction of the alternative. Which percentiles to equate? To find the required sample size in advance... – “<“ alternative – equate α percentile of null to POW percentile of alternative. – “>” alternative – equate 1-α percentile of null to 1- POW percentile of alternative. – “≠” alternative with alternative value less than null value – equate the α/2 percentile of null to POW percentile of alternative. – “≠” alternative with alternative value greater than null value – equate the 1-(α/2) percentile of null to POW percentile of alternative. Common information about p- values p-values are computed from the data. For all α > p-value, reject the null hypothesis H0. For all α < p-value, do not reject H0. Way to remember – small p-values typically result in rejecting H0. To compute a p-value, you need to compute a probability involving the null distribution, in the direction of the alternative hypothesis. Computing p-values To compute a p-value, compute the following probabilities under the null distribution. – “<“ alternative – compute the probability below the data. – “>” alternative – compute the probability above the data. – “≠” alternative, data below the null value – compute the probability below the data AND double the result. – “≠” alternative, data above the null value – compute the probability above the data AND double the result. Common information about confidence intervals For STA570 (there are more general situations we do not consider) a confidence interval is centered at the best point estimate available. The width of the confidence interval is determined by computing the width needed to contain the middle 1-α of the sampling distribution. If that width itself depends on the parameter (e.g. p(1-p) involves the parameter p), then estimate the parameter in determining the width. To determine a sample size in advance, set up an equation relating the length of the interval to n, then solve for n. Inference for Proportions All inference for proportions of based on drawing a random sample of size n from a large population (remember n>30 and we assume we are sampling less than 5% of the population). We estimate the population proportion p with the sample proportion phat. While phat is usually not exactly equal to p, it varies in a close range around p defined by the sampling distribution phat ~ N(p, sqrt(p(1-p)/n)) Conducting a hypothesis test We have a null hypothesis that specifies a single value for p, H0 : p=p0. We are testing against an alternative H1 : p<p0, H1 : p>p0, or H1 : p≠p0. The null distribution is N(p0, sqrt(p0(1-p0)/n)) Where useful, the alternative distribution is N(p1, sqrt(p1(1-p1)/n)) for a fixed p1. Determining sample sizes for hypothesis tests Let s0=sqrt(p0(1-p0)) and s1=sqrt(p1(1-p1)) Let z0 and z1 be the Z values corresponding to the appropriate percentiles from the null and alternative distribution. The minimum required sample size, for all cases, is 2 z1s1 z0 s0 n p0 p1 Confidence Intervals A confidence interval takes the observed value of phat and computes a range of possible values for the population proportion p. The “confidence level” of this interval refers to the probability the procedure will produce an interval containing the population proportion. Here α is 1 minus the confidence level (e.g. 99% confidence results in α=0.01) Let z* be the Z-score corresponding to the 1-(α/2) percentile. Note the values corresponding to Z=±z* contain probability equaling the confidence level. Confidence Intervals A confidence interval for p is phat ± z* sqrt(phat(1-phat)/n) Note we have estimated the standard deviation using phat. This does not cause problems for the sample sizes we are considering (n>30). Note the length of this confidence interval is 2 z* sqrt(phat(1-phat)/n)) Computing sample sizes for confidence intervals Recall the length of the confidence interval is 2 z* sqrt(phat(1-phat)/n)) To make sure this is less than a prespecified length L, you need n to be at least n 4 p (1 p ) z ˆ ˆ * 2 2 L If you have a guess of phat in advance, use it. Otherwise, guess phat=0.5 to protect yourself against all phats simultaneously. Common threads We make repeated use of the sampling distribution (through the null distribution, the alternative distribution, and the form of the confidence interval). Different situations have different sampling distributions, but the way we use them we remain the same. We will still want particular percentiles of the null and alternative distributions, for example.