Confidence Intervals

Document Sample
Confidence Intervals Powered By Docstoc
					                       Math 115: Confidence Intervals Spring 2005

1. In this study, the population of interest is all bags of milk chocolate M&Ms. The parameter
being estimated is the proportion of blue M&Ms per bag. We make several assumptions: That 30
bags purchased at the Carleton book store are a random sample of bags. And that the 50 or so
M&Ms per bag are a random sample of M&Ms.

2. Before consumption, count the M&Ms in your bag and the amounts for each color.

                       Color          Number         Proportion
                            Total                      100%

3. To estimate the proportion of BLUE M&Ms in the population, record the sample proportion of
BLUE M&Ms in your sample.

                               p    = ____________

4. Statistical theory tells us that the sampling distribution of p is approximately normal with
                                               p (1  p )
mean equal to p and standard deviation equal to           . The M&M Company claims that the
proportion of BLUES in a bag of M&Ms is actually 24%. That is, they state that p = .24.

     * Compute the standard deviation for the sampling distribution of p . Assume the M&Ms
Company is telling the truth and draw the normal curve showing the sampling distribution of p .

       Standard deviation of the sampling distribution of   ˆ
                                                            p   = ________________________ .

       * On the x-axis find your particular sample value of p . How many standard deviations
(roughly) is your value of p value from p (the true mean)? Recall that the z-score exactly
answers this question. And recall that the z-score is equal to (Observed value – Mean)/(St. Dev),
                                                      ( p  p)
       Number of Standard Deviations from p =                       = __________________.
                                                       p (1  p )

        * Does it look like your data is consistent with the M&M Company’s claim? A common
measure of what is “extreme” or “unusual” is an observation that is more than two standard
deviations from the mean. Is your data value within two standard deviations from p?

5. Now we estimate the true proportion of ORANGE M&Ms in the population. This will be our
new p. For now, I won’t tell you what the M&M company claims for the value of p. What is
 p from your sample? (Record below.) You can’t draw the sampling distribution of p because
you don’t know p. However, p is an approximation for p, and if the sample is a true random
sample, the approximation should be close. You can approximate the standard deviation by
substituting p for p in the standard deviation formula. Find the approximate standard deviation
for p .

        p   = ________________.

       Approximate standard deviation of the sampling distribution of p = _______________.

        This last number gives a measure of how precise your p estimate is. You can interpret it
as the “give or take” in your estimation. That is, “The estimated proportion of ORANGE M&Ms
is ______ ( p ) give or take _____ (standard deviation).”

6. To make the above paragraph more precise, we compute a confidence interval for p. A 95%
confidence interval for the proportion of ORANGE M&Ms is gotten by constructing an interval
of the form

                Estimate  Margin of Error, or p  (2 standard deviations)

Two standard deviations is the most common number of standard deviations to use for a
confidence interval. The 68-95-99.7 rule says that 95% of the area under a normal curve is within
two standard deviations of the mean. Actually, that rule is not exact and the more accurate
number is 1.96 standard deviations. So for 95% confidence intervals we’ll use the more precise
Find the 95% confidence interval for the true proportion of ORANGE M&Ms:

                           p (1  p )
                           ˆ      ˆ
                p  1.96
                ˆ                     .

You should always write this interval in two ways: (1) First in the form p  Margin of Error.
This way of writing the interval makes it apparent what the actual point estimate is ( p ) and what
the margin of error is. (2) Also report the actual interval with left and right endpoints [ XX , XX ]
to make it clear that the confidence interval actually is an interval. Notice that the point estimate
 p lies in the middle of the interval. The range of the interval (right endpoint minus left endpoint)
is the width of the confidence interval.

                       p   = _________________.

                       Margin of Error = ____________________.

                       Width of interval = ____________________.

Now for the moment you’ve been waiting for: The M&M Company states that the proportion of
ORANGE M&Ms in their milk chocolate bags is actually 20%. Does your interval contain p =
0.20? Do you think the M&M Company’s claim is accurate?

7. In the previous example, 95% was the confidence level of the interval. We say that we are
“95% confident” that the true parameter value p lies in the interval. It would be nice if you could
be 100% confident about your interval, but you never can be (unless you know the parameter
value to begin with). So we would like to be as confident as possible. At the same time, it would
be nice if the width of the interval was as narrow as possible so that the estimate is very precise.
(Reporting the interval [.24, .26] says a lot more than reporting [.05, .45]. Unfortunately there is
a trade-off between greater confidence and greater precision.

We can get more confidence, but at a price. To be more confident we have to accept a wider
interval (and hence more uncertainty in the estimate). On the other hand, we can make our
confidence interval narrower, but we have to accept less confidence in the final result.

8. In general a confidence interval for a population proportion takes the form

                                         p (1  p )
                                         ˆ      ˆ
                                p  z*
                                ˆ                   .

The z* value is called the critical value and is determined by the confidence level. Some
common values are:

                         Confidence Level         Critical Value
                         90%                      1.64
                         95%                      1.96
                         99%                      2.58

Notice that the greater the confidence level, the larger the critical value. But making the critical
value larger increases the width of the interval.

9. A confidence level value that is not very common is 75%. The critical value for a 75%
confidence interval is z * =1.15. We’ll use it in our next inference problem, to estimate the
proportion of GREEN M&Ms. Compute two confidence intervals for p, the proportion of
GREEN M&Ms in the population: at the 75% and 95% levels. (Remember to write them in two

       75% Confidence Interval:

       95% Confidence Interval

10. What does a 75% confidence interval really mean? Here is the most precise statement: If you
were to repeat the experiment many, many times, each time taking a different random sample,
finding p , and computing a 75% confidence interval, then 75% of all these intervals would
contain the true parameter value. This also means that 25% of all these intervals would not
contain p. Similarly, if you were estimating the population proportion with 95% confidence
intervals, then about 95% of all the intervals obtained would contain the true proportion value,
and 5% of them would not.

We’ll try to illustrate and verify these theoretical results with the data from your confidence

11. For more fascinating (!) information about M&Ms, see