Ch 19: Confidence Intervals for Proportions
Shared by: HC12030809540
-
Stats
- views:
- 4
- posted:
- 3/8/2012
- language:
- pages:
- 13
Document Sample


Ch 19: Confidence Intervals for
Proportions
INFERENTIAL STATISTICS
-point vs. interval
-Standard Error
-finding confidence intervals
-Confidence levels
-Examples
-Some relevant points
Summary
• Recall that the population proportion is the
percent of the population with a certain
characteristic, denoted p
• Sample proportions are denoted p-hat
• Our goal is to use a p-hat gathered using
an appropriate sampling method to
estimate p
Point Estimate vs. Interval estimate
• We can predict the population proportion using a
point estimate (a single value) or an interval
estimate (a range of values we think contains p)
• For a point estimate, just use p-hat
• In other words, the single number that best
predicts p is p-hat
• However it is unlikely that our population
proportion is exactly equal to p-hat, so usually
we use an interval estimate
• To figure out our interval estimate, we will find
what is called the standard error
Standard Error??!???!?!?
• Recall that in the last chapter, we estimated the
possible values of p-hat and determined that for
all sample p-hats of size n:
• The mean of the p-hats would equal p
• The standard deviation for the p-hats was equal
to the sqrt of (pq/n)
• For large n, the p-hats would be normally
distributed, so that we could assume that most
p-hats were within 2 s.d. of p, and virtually all p-
hats would be within 3 s.d of p
OK, but what about Standard
Error?
• We want to reverse this process, and assert for
example that for a given p-hat, the population
will be within 2 s.d. of p-hat.. But the problem is
we don’t know p (its what we’re looking for!) and
therefore can’t find the standard deviation
• Instead of the standard deviation, we use what is
called the standard error, which is equal to sqrt
((p-hat * q-hat) / n)
• In other words, it’s the same formula we used for
std dev in the last chapter, except we use p-hat
in place of p
Example: We want to estimate the proportion of sea fans
infected with a disease
• We collect some sample data and find that 54 out of 104 sampled
sea fans are infected
• So p-hat = 54/104 = .519
• Our S.E. (Standard Error) =
sqrt (.519 * .481 / 104) = .049
• Therefore, assuming a normal model for the dist of p-hat
is appropriate, we can say that:
• 68% of sample p-hats will be within 4.9% of p
• 95% of sample p-hats will be within 9.8% of p
• And more importantly…it follows that:
• There is a 68% chance that p lies within 4.9% of OUR p-
hat of 51.9%
• There is a 95% chance that p lies within 9.8% of OUR p-
hat
• So there is a 95% chance that p is between 42.1% and
61.7%
Additional implications of “So there is a 95% chance that p
is between 42.1% and 61.7%
• Seems like an awfully broad interval – to get a
more precise fix on p we would need a larger
sample
• There is a 5% chance that our sample is very
unrepresentative of the population and therefore
p is NOT within 2 S.E. of OUR p-hat
• Using more formal notation, we write “there is a
95% chance that .421 < p < .617” and we refer
to this as a CONFIDENCE INTERVAL
• We call 95% our CONFIDENCE LEVEL
Finding intervals for other
confidence levels
• We generally use z-scores to establish confidence
intervals
• We call Z* (z-star) the critical z-score for a given
confidence level
• The most common levels of confidence are 90%, 95%,
and 99%
• For 90%, Z* = 1.645. Why? Because when we convert
the regular scores to z-scores, 90% of the data lies
between z = -1.645 and z = 1.645
• For 95%, Z* is approximately 2 since 95% of scores lie
within 2 SE of the mean.. However more precisely Z* =
1.96
• For 99%, Z* = 2.575, because 99% of the data in a
normal distribution lies between z = -2.575 and z =
2.5757
General formula
• To find a confidence interval for p, choose a confidence
level and then find the values of:
• p-hat ± (Z* times SE)
• We call the product (Z* times SE) the MARGIN OF
ERROR
• Often the margin of error is included when parameter
estimates are provided in polls, etc.
• For example, when a poll mentions a margin of error of
3%, this implies that when the interval was calculated,
the product (Z* times SE) = .03
• We’ll look at some examples on the next few slides
Example: A survey of 1000 voters finds that 56% plan to vote for Matt
Shanahan. Find a 95% confidence interval for p
• P-hat = .56, q-hat = .44, n = 1000
• Z* = 1.96
• SE = sqrt (.56 * .44/1000) = .016
• So (p-hat ± (Z* times SE)) becomes
• .56 ± (1.96 * .016) = .56 ± .03136
• So there is a 95% probability that p lies between
.529 and .591… ALEX WINS!*
• * caveat: there is a 5% chance our sample is
unrepresentative and p does not lie in the
interval
What about for c = .9 or c = .99
(c = confidence level)
• Again, SE = .016, p-hat = .56
• So for 90%, the interval is .56 ± 1.645 * .016
• There is a 90% chance that p is between 53.4% and
58.6%
• For 99%, the interval is .56 ± 2.575 * .016
• There is a 99% chance that p is between .519 and .601
• Notice anything? What happens as c goes up?
• As the confidence level increases, the interval widens…
a TRADE OFF
• We gain more confidence that our interval contains p..
But sometimes the interval is so wide that it isn’t very
useful!
CONDITIONS
• Plausible independence – there is no reason to suspect
that the data values somehow affect each other
• Randomization condition – the data must be sampled at
random
• 10% condition – Samples must be less than 10% of the
population, if drawn without replacement
• The model we use for inference (despite our data being
sample proportions) is based on the Central Limit
Theorem; therefore EITHER the population must be
known to be normal OR n must be sufficiently large (say,
25 or 30)
• Finally (PHEW) the SUCCESS/FAILURE condition; we
must expect at least 10 successes and 10 failures (that
is, np >10 and nq > 10)
HOMEWORK
• Pg. 378, 1 – 7 odds, 11, 15
Related docs
Other docs by HC12030809540
Get documents about "