VIEWS: 68 PAGES: 19 POSTED ON: 8/8/2011
Stat 2013 Chapter 18 Inference about a Population Proportion The two types of data — reminder Quantitative Something that can be counted or measured and then added, subtracted, averaged, etc., across individuals in the population. Example: How tall you are, your age, your blood cholesterol level Categorical Something that falls into one of several categories. What can be counted is the proportion of individuals in each category. Example: Your blood type (A, B, AB, O), your hair color, your family health history for genetic diseases, whether you will develop lung cancer ˆ The sample proportion p We now study categorical data and draw inference on the proportion, or percentage, of the population with a specific characteristic. If we call a given categorical characteristic in the population “success,” then the sample proportion of ˆ successes, p hat) is: (p count of successesin the sample p ˆ count of observations in the sample Sampling distribution of p The sampling distribution of p is never exactly ˆ normal. But as the sample size increases, the sampling distribution of becomes approximately normal. Example Suppose that 35% of all nurses in America are willing to switch hospitals if offered a higher salary. If a recruiter randomly contacts 100 nurses, what is the probability that over 40% of the sample would be willing to switch hospitals if offered a higher salary? Mean of the sampling distribution = 0.35 Standard Deviation of the sampling distribution = (.35)(.65)/100 We want P(p-hat > 0.40) Z-score: 0.40 0.35 0.05 Looking up 1.05 in our z-table 1.05 gives .1469 (or 14.7%) of the (.35)(.65) 0.048 samples of nurses would 100 have over 40% willing to switch hospitals. Conditions for Inference In order to use the sampling distribution for proportions, a few conditions must be met… The data must be from an SRS of the population of interest The population is at least 10 times as large as the sample. The sample size n is large enough to ensure that the distribution of z is close to the Normal model. One-proportion z-interval When the conditions are met, we can find the confidence interval for the population proportion p. F pq I p z *G J Hn K Use this method when the number of successes and failures in the sample are both at least 15. Example Vitamin D, whether ingested as a dietary supplement or produced naturally when sunlight falls upon the skin, is essential for strong, healthy bones. The bone disease rickets was largely eliminated in England during the 1950s, but now there is concern that a generation of children more likely to watch TV or play computer games than spend time outdoors is at increased risk. A recent study of 2700 children randomly selected from all parts of England found 20% of them deficient in vitamin D. Find a 98% confidence interval for Vitamin D deficiency. (.20)(.80) 0.20 2.326 2700 0.20 0.0179 We are 98% confident that the true proportion of (0.1821, 0.2179) children in England who are Vitamin D deficient is between 18.2% and 21.8%. Because we have to use an estimate of p to compute the margin of error, confidence intervals for a population proportion are not very accurate. ˆ ˆ p(1 p) m z* n Specifically, we tend to be incorrect more often than the confidence level would indicate. But there is no systematic amount (because it depends on p). Use with caution! “Plus four” confidence interval for p A simple adjustment produces more accurate confidence intervals. We act as if we had four additional observations, two being successes and two being failures. Thus, the new sample size is n + 4 and the count of successes is X + 2. counts of successes 2 The “plus four” estimate of p is: ~ p count of all observations 4 And an approximate level C confidence interval is: CI : ~ m , with p m z * SE z * ~(1 ~) (n 4) p p Use this method when C is at least 90% and sample size is at least 10. Example Wildlife biologists inspect 153 deer taken by hunters and find 32 of them carrying ticks that test positive for Lyme disease. Use the “plus 4” method to create a 90% confidence interval for the percentage of deer that may carry such ticks. First, find the value of the “plus four” p-hat: 32 2 34 p 0.2166 153 4 157 Now, calculate the “plus four” interval: (.2166)(.7834) With 90% confidence, 0.2166 1.645 between 16.25% and 157 27.1% of deer may 0.2166 0.0541 carry Lyme positive ticks. (.1625,.2707) Margin of Error and Sample Size In planning a study, we may want to choose a sample size that will allow us to estimate the parameter within a given margin of error (MoE). The MoE in the large-sample confidence interval is: p (1 p ) ˆ ˆ MoE z * n We will need to guess what p-hat is…. You can either use a guess based on a pilot study (or past information), or use p-hat=0.50, which is the most conservative estimate. Example It’s believed that as many as 25% of adults over 50 never (0.5)(0.5) graduated from high school. 0.06 1.645 We wish to see if this n percentage is the same among 0.25 the 25 to 30 age group. 0.0365 How many of this younger age n group must we survey in order to estimate the proportion of 0.25 non-grads to within 6% with 0.00133 90% confidence? n n 187.92 MoE=6% Since we don’t have data about We should sample at least 188 this age group, use 0.50 people in the 25 to 30 age group. The One-Sample Proportions test Just like before, the steps to the hypothesis test 1) State the hypotheses 2) Calculate the test statistic (z-score!) 3) Find the p-value (From the calculator) OR the critical value (From the table) 4) State the conclusion (reject or retain) in the context of your situation Example Some doctors suspect that young mothers have fewer multiple births. In 2001, a national vital statistics report indicated that about 3% of all births produced twins. Data from a large city hospital found only 7 sets of twins were born to 469 teenage girls. Does that suggest that mothers under age 20 may be less likely to have twins? Test an appropriate hypothesis and state your conclusion. Use alpha=0.05 Step 1: State the hypotheses Ho: p=0.03 (No different than the national report) Ha: p<0.03 (We suspect young mothers fewer than national) Step 2: Calculate the test statistic 0.015 0.03 0.015 z 1.90 (0.03)(0.97) 0.008 469 Example Some doctors suspect that young mothers have fewer multiple births. In 2001, a national vital statistics report indicated that about 3% of all births produced twins. Data from a large city hospital found only 7 sets of twins were born to 469 teenage girls. Does that suggest that mothers under age 20 may be less likely to have twins? Test an appropriate hypothesis and state your conclusion. Use alpha=0.05 Step 3: Find the p-value (calculator) OR the critical value (table). Note… the x indicates the number of successes and must be a whole number!!!! Example Some doctors suspect that young mothers have fewer multiple births. In 2001, a national vital statistics report indicated that about 3% of all births produced twins. Data from a large city hospital found only 7 sets of twins were born to 469 teenage girls. Does that suggest that mothers under age 20 may be less likely to have twins? Test an appropriate hypothesis and state your conclusion. Use alpha=0.05 Step 3: Find the p-value (calculator) OR the critical value (table). Upper tail probability P 0.25 0.2 0.15 0.1 0.05 0.025 0.02 0.01 0.005 0.003 0.001 0.0005 z* 0.674 0.841 1.036 1.282 1.645 1.960 2.054 2.326 2.576 2.807 3.091 3.291 50% 60% 70% 80% 90% 95% 96% 98% 99% 99.5% 99.8% 99.9% Confidence level C Since this was a 1-tailed test, we use the 0.05 column. The critical value is -1.645 (was a lower tailed test), so anything beyond this value is in the rejection region. Example Some doctors suspect that young mothers have fewer multiple births. In 2001, a national vital statistics report indicated that about 3% of all births produced twins. Data from a large city hospital found only 7 sets of twins were born to 469 teenage girls. Does that suggest that mothers under age 20 may be less likely to have twins? Test an appropriate hypothesis and state your conclusion. Use alpha=0.05 Step 4: State the conclusion (reject or retain) in the context of your situation P-value method: With a p-value of 0.028, this is below our alpha of 0.05, so we reject the null hypothesis and conclude that there is evidence that mothers under age 20 may be less likely to have twins. Critical value method: Our test statistic was -1.90, which is beyond the critical value of -1.645, so our test statistic is in the rejection region, therefore we conclude that there is evidence that mothers under age 20 may be less likely to have twins.