VIEWS: 5 PAGES: 14 POSTED ON: 2/24/2013 Public Domain
Everything about Single Proportions STA 570 001-002 Spring 2008 Inference for Proportions All inference for proportions of based on drawing a random sample of size n from a large population (remember n>30 and we assume we are sampling less than 5% of the population). We estimate the population proportion p with the sample proportion phat. While phat is usually not exactly equal to p, it varies in a close range around p defined by the sampling distribution phat ~ N(p, sqrt(p(1-p)/n)) Things we’ve learned about proportions We now know how to 1) Conduct a hypothesis test 2) Compute power for a hypothesis test 3) Compute the p-value for a hypothesis test 4) Determine the sample size for a hypothesis test 5) Construct a confidence interval 6) Determine a sample size in advance for a confidence interval Conducting a hypothesis test We have a null hypothesis that specifies a single value for p, H0 : p=p0. We are testing against an alternative H1 : p<p0, H1 : p>p0, or H1 : p≠p0. The null distribution is N(p0, sqrt(p0(1-p0)/n)) The cutoff depends on the alternative. For “<“, reject for phat below the α percentile of the null distribution. For “>”, reject for phat above the 1-α percentile of the null distribution. For “≠”, reject for phat outside the α/2 and 1-(α/2) percentiles of the null distribution. Computing power for a hypothesis test Power is the probability of rejecting the null hypothesis when H1 is true (the right decision). For composite alternatives, we talk about computing power at a point in the alternative (points near the null have power near α, point far from the null have power near 1). So you are given a point p1 in the alternative. Compute the alternative distribution N(p1, sqrt(p1(1- p1)/n)) Find the area where you reject H0 (using the null distribution as before) Compute the probability the alternative distribution places in the rejection region. That is the power. P-values The p-value is defined as the transition point between α values where you would reject H0 and α values where you would not reject H0. If you have a p-value, remember you reject H0 whenever the p-value is smaller than α. You do not reject when the p-value is bigger than α. Computing the p-value depends on the alternative AND requires the data. Computing p-values For a “<“ alternative, the p-value is the probability the null distribution places below phat. For a “>” alternative, the p-value is probability the null distribution places above phat. For a “≠” alternative, if depends on the value of phat. For phat below p0, the p-value is twice the probability the null distribution places below phat. For phat above p0, the p- value is twice the probability the null distribution places above phat. Determining sample sizes for hypothesis tests A sample size calculation is premised on finding a minimum sample size for achieving a desired α while simultaneously achieving a desired power (POW) at a particular point p1. Remember power changes as p varies in the alternative hypothesis, you have to pick somewhere you consider meaningful for p1. How to compute this sample size depends on the alternative. Determining sample sizes for hypothesis tests For a “<“ alternative, you need to equate the α percentile of the null distribution with the POW percentile of the alternative distribution. For a “>” alternative, you need to equate the 1-α percentile of the null distribution with the 1-POW percentile of the alternative distribution. For a “≠” alternative with p1<p0, you need to equate the α/2 percentile of the null distribution to the POW percentile of the alternative distribution. For a “≠” alternative with p1>p0, you need to equate the 1-(α/2) percentile of the null distribution to the 1- POW percentile of the alternative distribution. Determining sample sizes for hypothesis tests Let s0=sqrt(p0(1-p0)) and s1=sqrt(p1(1-p1)) Let z0 and z1 be the Z values corresponding to the appropriate percentiles from the null and alternative distribution. The minimum required sample size, for all cases, is 2 z1s1 z0 s0 n p0 p1 Confidence Intervals A confidence interval takes the observed value of phat and computes a range of possible values for the population proportion p. The “confidence level” of this interval refers to the probability the procedure will produce an interval containing the population proportion. Here α is 1 minus the confidence level (e.g. 99% confidence results in α=0.01) Let z* be the Z-score corresponding to the 1-(α/2) percentile. Note the values corresponding to Z=±z* contain probability equaling the confidence level. Confidence Intervals A confidence interval for p is phat ± z* sqrt(phat(1-phat)/n) Note we have estimated the standard deviation using phat. This does not cause problems for the sample sizes we are considering (n>30). Note the length of this confidence interval is 2 z* sqrt(phat(1-phat)/n)) Computing sample sizes for confidence intervals Recall the length of the confidence interval is 2 z* sqrt(phat(1-phat)/n)) To make sure this is less than a prespecified length L, you need n to be at least n 4 p (1 p ) z ˆ ˆ * 2 2 L If you have a guess of phat in advance, use it. Otherwise, guess phat=0.5 to protect yourself against all phats simultaneously. Common threads We make repeated use of the sampling distribution (through the null distribution, the alternative distribution, and the form of the confidence interval). Different situations have different sampling distributions, but the way we use them we remain the same. We will still want particular percentiles of the null and alternative distributions, for example.