Ch 19: Confidence Intervals for Proportions

Shared by: HC12030809540
Categories
Tags
-
Stats
views:
4
posted:
3/8/2012
language:
pages:
13
Document Sample
scope of work template
							Ch 19: Confidence Intervals for
         Proportions
 INFERENTIAL STATISTICS
 -point vs. interval
 -Standard Error
 -finding confidence intervals
 -Confidence levels
 -Examples
 -Some relevant points
                Summary
• Recall that the population proportion is the
  percent of the population with a certain
  characteristic, denoted p
• Sample proportions are denoted p-hat
• Our goal is to use a p-hat gathered using
  an appropriate sampling method to
  estimate p
Point Estimate vs. Interval estimate
• We can predict the population proportion using a
  point estimate (a single value) or an interval
  estimate (a range of values we think contains p)
• For a point estimate, just use p-hat
• In other words, the single number that best
  predicts p is p-hat
• However it is unlikely that our population
  proportion is exactly equal to p-hat, so usually
  we use an interval estimate
• To figure out our interval estimate, we will find
  what is called the standard error
     Standard Error??!???!?!?
• Recall that in the last chapter, we estimated the
  possible values of p-hat and determined that for
  all sample p-hats of size n:
• The mean of the p-hats would equal p
• The standard deviation for the p-hats was equal
  to the sqrt of (pq/n)
• For large n, the p-hats would be normally
  distributed, so that we could assume that most
  p-hats were within 2 s.d. of p, and virtually all p-
  hats would be within 3 s.d of p
    OK, but what about Standard
              Error?
• We want to reverse this process, and assert for
  example that for a given p-hat, the population
  will be within 2 s.d. of p-hat.. But the problem is
  we don’t know p (its what we’re looking for!) and
  therefore can’t find the standard deviation
• Instead of the standard deviation, we use what is
  called the standard error, which is equal to sqrt
  ((p-hat * q-hat) / n)
• In other words, it’s the same formula we used for
  std dev in the last chapter, except we use p-hat
  in place of p
 Example: We want to estimate the proportion of sea fans
               infected with a disease
• We collect some sample data and find that 54 out of 104 sampled
  sea fans are infected
• So p-hat = 54/104 = .519
• Our S.E. (Standard Error) =
  sqrt (.519 * .481 / 104) = .049
• Therefore, assuming a normal model for the dist of p-hat
  is appropriate, we can say that:
• 68% of sample p-hats will be within 4.9% of p
• 95% of sample p-hats will be within 9.8% of p
• And more importantly…it follows that:
• There is a 68% chance that p lies within 4.9% of OUR p-
  hat of 51.9%
• There is a 95% chance that p lies within 9.8% of OUR p-
  hat
• So there is a 95% chance that p is between 42.1% and
  61.7%
Additional implications of “So there is a 95% chance that p
               is between 42.1% and 61.7%


• Seems like an awfully broad interval – to get a
  more precise fix on p we would need a larger
  sample
• There is a 5% chance that our sample is very
  unrepresentative of the population and therefore
  p is NOT within 2 S.E. of OUR p-hat
• Using more formal notation, we write “there is a
  95% chance that .421 < p < .617” and we refer
  to this as a CONFIDENCE INTERVAL
• We call 95% our CONFIDENCE LEVEL
        Finding intervals for other
            confidence levels
• We generally use z-scores to establish confidence
  intervals
• We call Z* (z-star) the critical z-score for a given
  confidence level
• The most common levels of confidence are 90%, 95%,
  and 99%
• For 90%, Z* = 1.645. Why? Because when we convert
  the regular scores to z-scores, 90% of the data lies
  between z = -1.645 and z = 1.645
• For 95%, Z* is approximately 2 since 95% of scores lie
  within 2 SE of the mean.. However more precisely Z* =
  1.96
• For 99%, Z* = 2.575, because 99% of the data in a
  normal distribution lies between z = -2.575 and z =
  2.5757
              General formula
• To find a confidence interval for p, choose a confidence
  level and then find the values of:
• p-hat ± (Z* times SE)
• We call the product (Z* times SE) the MARGIN OF
  ERROR
• Often the margin of error is included when parameter
  estimates are provided in polls, etc.
• For example, when a poll mentions a margin of error of
  3%, this implies that when the interval was calculated,
  the product (Z* times SE) = .03
• We’ll look at some examples on the next few slides
    Example: A survey of 1000 voters finds that 56% plan to vote for Matt
              Shanahan. Find a 95% confidence interval for p


• P-hat = .56, q-hat = .44, n = 1000
• Z* = 1.96
• SE = sqrt (.56 * .44/1000) = .016
• So (p-hat ± (Z* times SE)) becomes
• .56 ± (1.96 * .016) = .56 ± .03136
• So there is a 95% probability that p lies between
  .529 and .591… ALEX WINS!*
• * caveat: there is a 5% chance our sample is
  unrepresentative and p does not lie in the
  interval
   What about for c = .9 or c = .99
      (c = confidence level)
• Again, SE = .016, p-hat = .56
• So for 90%, the interval is .56 ± 1.645 * .016
• There is a 90% chance that p is between 53.4% and
  58.6%
• For 99%, the interval is .56 ± 2.575 * .016
• There is a 99% chance that p is between .519 and .601
• Notice anything? What happens as c goes up?
• As the confidence level increases, the interval widens…
  a TRADE OFF
• We gain more confidence that our interval contains p..
  But sometimes the interval is so wide that it isn’t very
  useful!
                CONDITIONS
• Plausible independence – there is no reason to suspect
  that the data values somehow affect each other
• Randomization condition – the data must be sampled at
  random
• 10% condition – Samples must be less than 10% of the
  population, if drawn without replacement
• The model we use for inference (despite our data being
  sample proportions) is based on the Central Limit
  Theorem; therefore EITHER the population must be
  known to be normal OR n must be sufficiently large (say,
  25 or 30)
• Finally (PHEW) the SUCCESS/FAILURE condition; we
  must expect at least 10 successes and 10 failures (that
  is, np >10 and nq > 10)
            HOMEWORK
• Pg. 378, 1 – 7 odds, 11, 15

						
Related docs
Other docs by HC12030809540