CHAPTER 8 Inference for Proportions by liwenting

VIEWS: 766 PAGES: 34

									                                                                                                                                  CHAPTER          8
RICHARD KEPPEL-SMITH/GETTY




                                                                                                                        The U.S. video game market is
                                                                                                                        approximately $8.2 billion and

                             Inference for Proportions                                                                  growing. Case 8.1 discusses a
                                                                                                                        PEW survey that has been used
                                                                                                                        to collect data on gamers.



                             Introduction                                                                               CHAPTER OUTLINE
                                                                                                                        8.1 Inference for a Single
                             We frequently collect data on categorical variables, such as whether or not a
                             person is a full-time college student or a part-time college student, the brand            Proportion
                             name of a cell phone, or the country where a college student studies abroad.               8.2 Comparing Two
                             When we record categorical variables, our data consist of counts or of percents            Proportions
                             obtained from counts.
                                  The parameters we want to do inference about in these settings are popu-
                             lation proportions. Just as in the case of inference about population means, we
                             may be concerned with a single population or with comparing two populations.
                             Inference about one or two proportions is very similar to inference about means,
                             which we discussed in Chapter 7. In particular, inference for both means and
                             proportions is based on sampling distributions that are approximately Normal.
                                  We begin in Section 8.1 with inference about a single population propor-
                             tion. Section 8.2 concerns methods for comparing two proportions.


                             8.1 Inference for a Single Proportion
                                        Adults and Video Games A PriceWaterhouseCooper report estimates that
                             CASE 8.1




                                        the U.S. video game market was approximately $8.6 billion in 2007 and is ex-
                                        pected to increase at an annual rate of 6.3% through 2012.1 Who plays video
                                        games? A PEW survey, conducted by Princeton Survey Research International,
                                        reports that over half of American adults aged 18 and over play video games.2
                                        The PEW survey used a nationally representative sample of 2054 adults. Of the
                                        total, 1063 adults said that they played video games.


                             For problems involving a single proportion, we will use n for the sample size
                             and X for the count of the outcome of interest. Often we will use the terms
                             “success” and “failure” for the two possible outcomes. When we do this, X is
                             the number of successes.
458                           CHAPTER 8 Inference for Proportions


                                EXAMPLE 8.1        Data for the Video Game Case

                CASE 8.1      The count of people who responded “Yes” to the question about whether or not they played video
                              games in the sample of Case 8.1 is X = 1063. The sample size is n = 2054.




                                   We would like to know the proportion of video game players in the adult U.S.
      population proportion   population. This population proportion is the parameter of interest. The statistic used
         sample proportion    to estimate this unknown parameter is the sample proportion. The sample proportion
                              is p = X/n.
                                 ˆ


                                EXAMPLE 8.2        Estimating the Proportion of Adults Who Play Video Games

                CASE 8.1                                 ˆ
                              The sample proportion p in Case 8.1 is a discrete random variable that can take the values 0,
                              1/2054, 2/2054, . . . , 2053/2054, or 1. For our particular sample, we have

                                                                          1063
                                                                     p=
                                                                     ˆ         = 0.52
                                                                          2054




                                                                            ˆ
                                   In many cases, a probability model for p can be based on the binomial distributions
                              for counts, discussed in Chapter 5. If the sample size n is very small, we can base tests and
                                                                                            ˆ
                              confidence intervals for p on the discrete distribution of p . We will focus on situations
                              where the sample size is sufficiently large that we can approximate the distribution of p    ˆ
                              by a Normal distribution.


                                  Sampling Distribution of a Sample Proportion
                                   Choose an SRS of size n from a large population that contains population proportion p of
                                                                                                     ˆ
                                  “successes.” Let X be the count of successes in the sample and let p be the sample
                                  proportion of successes,
                                                                                   X
                                                                              p=
                                                                              ˆ
                                                                                   n
                                  Then:
                                  • For large sample sizes, the distribution of p is approximately Normal.
                                                                                ˆ
                                  • The mean of the distribution of p is p.
                                                                    ˆ
                                  • The standard deviation of p is
                                                              ˆ

                                                                               p(1 − p)
                                                                                  n



                                   Figure 8.1 summarizes these facts in a form that recalls the idea of sampling dis-
                              tributions. Our inference procedures are based on this Normal approximation. These
                              procedures are similar to those for inference about the mean of a Normal distribu-
                              tion. We will see, however, that there are a few extra details involved, caused by the
                                                                                            ˆ
                              added difficulty in approximating the discrete distribution of p by a continuous Normal
                              distribution.
                                                             8.1 Inference for a Single Proportion    459




                          SRS size n    ^
                                        p
                           SRS size n                                                p(1 - p)
                                        ^
                                        p                                               n
                           SRS siz
                                  en
                                        ^
                                        p
                                                                                        Mean p


 Population
proportion p                                                                  ^
                                                                    Values of p



               FIGURE 8.1 Draw a large SRS from a population in which the proportion p are successes. The
                                                              ˆ
               sampling distribution of the sample proportion p of successes has approximately a Normal
               distribution.

                APPLY YOUR KNOWLEDGE

               8.1 Bank acquisitions. The American Bankers Association Community Bank Com-
               petitiveness Survey for 2008 had responses from 760 community banks. Of these,
               283 reported that they expected to acquire another bank within five years.3
               (a) What is the sample size n for this survey?
               (b) What is the count X ? Describe the count in a short sentence.
                                               ˆ
               (c) Find the sample proportion p .
                CASE 8.1 8.2 How often do they play? In the PEW survey described in Case 8.1,
               those who played video games were asked how often they played. In this subpopulation,
               223 adults said that they played every day or almost every day.
               (a) What is the sample size n for the subpopulation of U.S. adults who play video
                   games? (Hint: Look at Case 8.1.)
               (b) What is the count X of those who said that they played every day or almost every
                   day?
                                                ˆ
               (c) Find the sample proportion p .

               Large-sample confidence interval for a single proportion
               The sample proportion p = X/n is the natural estimator of the population proportion p.
                                         ˆ
                            √
               Notice that p(1 − p)/n, the standard deviation of p , depends upon the unknown pa-
                                                                         ˆ
               rameter p. In our calculations, we estimate it by replacing the population parameter p with
                                                                                            √
               the sample estimate p . Therefore, our estimated standard error is SE p = p (1 − p )/n.
                                      ˆ                                                ˆ      ˆ      ˆ
                                                                ˆ
               If the sample size is large, the distribution of p will be approximately Normal with mean
                                                                  ˆ
               p and standard deviation SE p . It follows that p will be within two standard deviations
                                               ˆ
               (2SE p ) of the unknown parameter p about 95% of the time. This is how we use the
                     ˆ
               Normal approximation to construct the large-sample confidence interval for p. Here are
               the details.

                   z Confidence Interval for a Population Proportion
                    Choose an SRS of size n from a large population with unknown proportion p of
                   successes. The sample proportion is
                                                                X
                                                           p=
                                                           ˆ
                                                                n
460              CHAPTER 8 Inference for Proportions


                                           ˆ
                     The standard error of p is

                                                                    p (1 − p )
                                                                    ˆ      ˆ
                                                         SE p =
                                                            ˆ
                                                                        n
                     and the margin of error for confidence level C is

                                                              m = z ∗ SE p
                                                                         ˆ

                     where z ∗ is the value for the standard Normal density curve with area C between −z ∗ and
                     z ∗ . The large-sample level C confidence interval for p is

                                                                  p±m
                                                                  ˆ

                     You can use this interval for 90% (z ∗ = 1.645), 95% (z ∗ = 1.960), or 99% (z ∗ = 2.576)
                     confidence when the number of successes and the number of failures are both at least 15.



                   EXAMPLE 8.3           Confidence Interval for the Proportion of Adults Who Play
                                         Video Games

      CASE 8.1   The sample survey in Case 8.1 found that 1063 of a sample of 2054 adults reported that they
                 played video games. So, the sample size is n = 2054 and the count is X = 1063. The sample
                 proportion of adults who play video games is
                                                         X   1063
                                                    p=
                                                    ˆ      =      = 0.51753
                                                         n   2054
                 The standard error is
                                               p (1 − p )
                                               ˆ      ˆ       0.5175(1 − 0.5175)
                                   SE p =
                                      ˆ                   =                      = 0.011026
                                                   n                 2054
                 The z critical value for 95% confidence is z ∗ = 1.96, so the margin of error is

                                           m = 1.96SE p = (1.96)(0.011026) = 0.021610
                                                      ˆ

                 The confidence interval is

                                                       p ± m = 0.52 ± 0.02
                                                       ˆ

                 We are 95% confident that between 50% and 54% of adults play video games.



                      In performing these calculations we have kept a large number of digits for our
                 intermediate calculations. However, when reporting the results we prefer to use rounded
                 values. For example, “52% with a margin of error of 2%.” In this way we focus attention
                 on what is important. There is no additional information to be gained by reporting
                 0.51753 with a margin of error of 0.021610.
                      Remember that the margin of error in any confidence interval includes only random
                 sampling error. If people do not respond honestly to the questions asked, for example,
                 your estimate is likely to miss by more than the margin of error.
                      Because the calculations for statistical inference for a single proportion are relatively
                 straightforward, we often do them with a calculator or in a spreadsheet. Figure 8.2 gives
                 output from Minitab and SAS for the data in Case 8.1. As usual, the output reports more
                 digits than are useful. When you use software, be sure to think about how many digits
                 are meaningful for your purposes. Do not clutter your report with information that is
                 not meaningful. SAS gives the standard error next to the label ASE, which stands for
                                                                                   8.1 Inference for a Single Proportion             461

FIGURE 8.2 Minitab and SAS                Minitab
outputs for the confidence
interval in Example 8.3.
                                             Test and Cl for One Proportion
                                             Sample         X        N     Sample p            95% CI
                                             1           1063     2054     0.517527     (0.495917, 0.539137)
                                             Using the normal approximation.


                                          SAS

                                             Binomial Proportion for y = 0


                                             Proportion                      0.5175
                                             ASE                             0.0110
                                             95% Lower Conf Limit            0.4959
                                             95% Upper Conf Limit            0.5391

                                             Exact Conf Limits
                                             95% Lower Conf Limit            0.4957
                                             95% Upper Conf Limit            0.5393




                             asymptotic standard error. The SAS output also includes an alternative interval based on
                             an “exact” method.

                              APPLY YOUR KNOWLEDGE

                             8.3 Bank acquisitions. Refer to Exercise 8.1 (page 459).
                                                                   ˆ
                             (a) Find SE p , the standard error of p .
                                         ˆ
                             (b) Give the 95% confidence interval for p in the form of estimate plus or minus the
                                 margin of error.
                             (c) Give the confidence interval as an interval of percents.
                              CASE 8.1 8.4 How often do they play? Refer to Exercise 8.2 (page 459).
                                                                   ˆ
                             (a) Find SE p , the standard error of p .
                                         ˆ
                             (b) Give the 95% confidence interval for p in the form of estimate plus or minus the
                                 margin of error.
                             (c) Give the confidence interval as an interval of percents.


                             Plus four confidence interval for a single proportion*
                             Suppose we have a sample where the count is X = 0. Then, because p = 0, the standard
                                                                                                  ˆ
                             error and the margin of error based on this estimate will both be 0. The confidence
                             interval for any confidence level would be the single point 0. Confidence intervals based
                             on the large-sample Normal approximation do not make sense in this situation.
                                  Both computer studies and careful mathematics show that we can do better by
                             moving the sample proportion p slightly away from 0 and 1.4 There are several ways to
                                                             ˆ
                             do this. Here is a simple adjustment that works very well in practice.
                                  The adjustment is based on the following idea: act as if we have 4 additional obser-
                             vations, 2 of which are successes and 2 of which are failures. The new sample size is


                             *The material on the plus four confidence interval is optional and can be omitted without loss of continuity.
462                     CHAPTER 8 Inference for Proportions

                        n + 4 and the count of successes is X + 2. Because this estimate was first suggested by
                        Edwin Bidwell Wilson in 1927 (though rarely used in practice until recently), we call it
      Wilson estimate   the Wilson estimate.
                             To compute a confidence interval based on the Wilson estimate, first replace the
                        value of X by X + 2 and the value of n by n + 4. Then use these values in the formulas
                        for the z confidence interval.
                             In Example 8.1, we had X = 1063 and n = 2054. To apply the plus four approach
                        we use the z procedure with X = 1065 and n = 2058. You can use this interval when
                        the sample size is at least n = 10 and the confidence level is 90%, 95%, or 99%.


                         APPLY YOUR KNOWLEDGE

                        8.5 Use plus-four for adults who play video games. Refer to Example 8.3 (page 460).
                        Compute the plus four 95% confidence interval and compare this interval with the one
                        given in that example.
                        8.6 New-product sales. Yesterday, your top salesperson called on 6 customers and
                        obtained orders for your new product from all 6. Suppose that it is reasonable to view
                        these 6 customers as a random sample of all of her customers.
                        (a) Give the plus four estimate of the proportion of her customers who would buy the
                            new product. Notice that we don’t estimate that all customers will buy, even
                            though all 6 in the sample did.
                        (b) Give the margin of error for 95% confidence. (You may see that the upper
                            endpoint of the confidence interval is greater than 1. In that case, take the upper
                            endpoint to be 1.)
                        (c) Do the results apply to all of your sales force? Explain why or why not.
                        8.7 Construct an example. Make up an example where the large-sample method and
                        the plus four method give very different intervals. Do not use a case where either p = 0
                                                                                                           ˆ
                        or p = 1.
                           ˆ


                        Significance test for a single proportion
                        We know that the sample proportion √ = X/n is approximately Normal, with mean
                                                                 ˆ
                                                                 p
                        μ p = p and standard deviation σ p = p(1 − p)/n. To construct confidence intervals,
                          ˆ                                 ˆ
                        we need to use an estimate of the standard deviation based on the data because the standard
                        deviation depends upon the unknown parameter p. When performing a significance test,
                        however, the null hypothesis specifies a value for p, which we will call p0 . When we
                        calculate P-values, we act as if the hypothesized p were actually true. When we test
                        H0: p = p0 , we substitute p0 for p in the expression for σ p and then standardize p . Here
                                                                                    ˆ                       ˆ
                        are the details.


                            z Significance Test for a Population Proportion
                            Choose an SRS of size n from a large population with unknown proportion p of
                           successes. To test the hypothesis H0: p = p0 , compute the z statistic
                                                                      p − p0
                                                                      ˆ
                                                              z=
                                                                     p0 (1 − p0 )
                                                                          n
                                                         8.1 Inference for a Single Proportion   463


    In terms of a standard Normal random variable Z , the approximate P-value for a test of
    H0 against


                   Ha : p > p0   is     P(Z ≥ z)
                                                                                     z


                   Ha : p < p0   is     P(Z ≤ z)
                                                                             z


                   Ha : p = p0   is    2P(Z ≥ |z|)
                                                                                    |z|

    Use this test when the expected number of successes np0 and the expected number of
    failures n(1 − p0 ) are both at least 10.


     We call this z test a “large-sample test” because it is based on a Normal approximation
                                   ˆ
to the sampling distribution of p that becomes more accurate as the sample size increases.
For small samples, or if the population is less than 10 times as large as the sample, consult
an expert for other procedures.


  EXAMPLE 8.4           Comparing Two Sun Block Lotions
Your company produces a sun block lotion designed to protect the skin from both UVA and UVB
exposure to the sun. You hire a company to compare your product with the product sold by your
major competitor. The testing company exposes skin on the backs of a sample of 20 people to
UVA and UVB rays and measures the protection provided by each product. For 13 of the subjects,
your product provided better protection, while for the other 7 subjects, your competitor’s product
provided better protection. Do you have evidence to support a commercial claiming that your
product provides superior UVA and UVB protection? For the data we have n = 20 subjects and
X = 13 successes. To answer the claim question, we test

                                               H0: p = 0.5
                                               Ha : p = 0.5

The expected numbers of successes (your product provides better protection) and failures (your
competitor’s product provides better protection) are 20 × 0.5 = 10 and 20 × 0.5 = 10. Both are
at least 10, so we can use the z test. The sample proportion is
                                               X   13
                                          p=
                                          ˆ      =    = 0.65
                                               n   20
The test statistic is
                                      p − p0
                                      ˆ                    0.65 − 0.5
                           z=                        =                    = 1.34
                                      p0 (1 − p0 )           (0.5)(0.5)
                                           n                     20
     From Table A we find P(Z ≥ 1.34) = 0.9099, so the probability in the upper tail is
1 − 0.9099 = 0.0901. The P-value is the area in both tails, P = 2 × 0.0901 = 0.1802. Minitab
and SAS outputs for the analysis appear in Figure 8.3. We conclude that the sun block testing data
are compatible with the hypothesis of no difference between your product and your competitor’s
( p = 0.65, z = 1.34, P = 0.18). The data do not provide you with a basis to support your
  ˆ
advertising claim.
464                                 CHAPTER 8 Inference for Proportions

FIGURE 8.3 Minitab and SAS              Minitab
outputs for the significance test
in Example 8.4.
                                          Test and Cl for One Proportion
                                          Test of p=0.5 vs p not=0.5
                                          Sample         X      N   Sample p          95% CI          Z-Value   P-Value
                                          1             13     20   0.650000   (0.440963, 0.859037)      1.34     0.180
                                          Using the normal approximation.



                                        SAS

                                          Binomial Proportion for x = 1

                                          Proportion                 0.6500
                                          ASE                        0.1067
                                          95% Lower Conf Limit       0.4410
                                          95% Upper Conf Limit       0.8590

                                          Exact Conf Limits
                                          95% Lower Conf Limit       0.4078
                                          95% Upper Conf Limit       0.8461

                                          Test of H0:    Proportion = 0.5

                                          ASE under H0               0.1118
                                          Z                          1.3416
                                          One-sided Pr > Z           0.0899
                                          Two-sided Pr > |Z|         0.1797

                                          Sample Size = 20




                                         Note that we used a two-sided hypothesis test when we compared the two sun block
                                    lotions in Example 8.4. In settings like this, we must start with the view that either
                                    product could be better if we want to prove a claim of superiority. Thinking or hoping
                                    that your product is superior cannot be used to justify a one-sided test.

                                     APPLY YOUR KNOWLEDGE

                                    8.8 Draw a picture. Draw a picture of a standard Normal curve and shade the tail
                                    areas to illustrate the calculation of the P-value for Example 8.4.
                                    8.9 What does the confidence interval tell us? Inspect the outputs in Figure 8.3
                                    and report the confidence interval for the percent of people who would get better sun
                                    protection from your product than from your competitor’s. Be sure to convert from
                                    proportions to percents and to round appropriately. Interpret the confidence interval
                                    and compare this way of analyzing data with the significance test.
                                    8.10 The effect of X. In Example 8.4, suppose that your product provided better
                                    UVA and UVB protection for 15 of the 20 subjects. Perform the significance test and
                                    summarize the results.
                                    8.11 The effect of n. In Example 8.4, consider what would have happened if you
                                    had paid for twice as many subjects to be tested. Assume that the results would be
                                    the same as what you obtained for 20 subjects; that is 65% had better UVA and
                                    UVB protection with your product. Perform the significance test and summarize the
                                    results.
                                                8.1 Inference for a Single Proportion   465

     In Example 8.4, we treated an outcome as a success whenever your product provided
better sun protection. Would we get the same results if we defined success as an outcome
where your competitor’s product was superior? In this setting the null hypothesis is still
H0 : p = 0.5. You will find that the z test statistic is unchanged except for its sign and
that the P-value remains the same.

 APPLY YOUR KNOWLEDGE

8.12 Yes or no? In Example 8.4 we performed a significance test to compare your
product with your competitor’s. Success was defined as the outcome where your product
provided better protection. Now, take the viewpoint of your competitor and define
success as the outcome where your competitor’s product provides better protection. In
other words, n remains the same (20) but X is now 7.
(a) Perform the two-sided significance test and report the results. How do these
    compare with what we found in Example 8.4?
(b) Find the 95% confidence interval for this setting and compare it with the interval
    calculated where success is defined as the outcome when your product provides
    better protection.


Choosing a sample size
In Chapter 6, we showed how to choose the sample size n to obtain a confidence interval
with specified margin of error m for a Normal mean. Because we are using a Normal ap-
proximation for inference about a population proportion, sample size selection proceeds
in much the same way.
     Recall that the margin of error for the large-sample confidence interval for a popu-
lation proportion is

                                                        p (1 − p )
                                                        ˆ      ˆ
                              m = z ∗ SE p = z ∗
                                         ˆ
                                                            n
Choosing a confidence level C fixes the critical value z ∗ . The margin of error also depends
                 ˆ                                                                ˆ
on the value of p and the sample size n. Because we don’t know the value of p until we
gather the data, we must guess a value to use in the calculations. We will call the guessed
value p ∗ . Here are two ways to get p ∗ :

• Use the sample estimate from a pilot study or from similar studies done earlier.
• Use p∗ = 0.5. Because the margin of error is largest when p = 0.5, this choice
                                                             ˆ
   gives a sample size that is somewhat larger than we really need for the confidence
   level we choose. It is a safe choice no matter what the data later show.

Once we have chosen p ∗ and the margin of error m that we want, we can find the n we
need to achieve this margin of error. Here is the result.


   Sample Size for Desired Margin of Error
    The level C confidence interval for a proportion p will have a margin of error
   approximately equal to a specified value m when the sample size is
                                                 2
                                           z∗
                                    n=               p ∗ (1 − p ∗ )
                                           m
466   CHAPTER 8 Inference for Proportions


                 Here z ∗ is the critical value for confidence C, and p ∗ is a guessed value for the proportion
                 of successes in the future sample.
                     The margin of error will be less than or equal to m if p ∗ is chosen to be 0.5. The
                 sample size required is then given by
                                                                           2
                                                                  z∗
                                                            n=
                                                                  2m


          The value of n obtained by this method is not particularly sensitive to the choice of
      p∗ as long as p ∗ is not too far from 0.5. However, if your actual sample turns out to have
      p smaller than about 0.3 or larger than about 0.7, the sample size based on p ∗ = 0.5
      ˆ
      may be much larger than needed.


          EXAMPLE 8.5              Planning a Sample of Customers
      Your company has received complaints about its customer support service. You intend to hire a
      consulting company to carry out a sample survey of customers. Before contacting the consultant,
      you want some idea of the sample size you will have to pay for. One critical question is the degree
      of satisfaction with your customer service, measured on a five-point scale. You want to estimate
      the proportion p of your customers who are satisfied (that is, who choose either “satisfied” or
      “very satisfied,” the two highest levels on the five-point scale).
           You want to estimate p with 95% confidence and a margin of error less than or equal to 3%,
      or 0.03. For planning purposes, you are willing to use p ∗ = 0.5. To find the sample size required,
                                                      2                        2
                                                z∗              1.96
                                         n=               =                        = 1067.1
                                                2m            (2)(0.03)

      Round up to get n = 1068. (Always round up. Rounding down would give a margin of error
      slightly greater than 0.03.)
           Similarly, for a 2.5% margin of error we have (after rounding up)
                                                                      2
                                                         1.96
                                                n=                        = 1537
                                                      (2)(0.025)

      and for a 2% margin of error,
                                                                      2
                                                            1.96
                                                n=                        = 2401
                                                          (2)(0.02)



           News reports frequently describe the results of surveys with sample sizes between
      1000 and 1500 and a margin of error of about 3%. These surveys generally use sam-
      pling procedures more complicated than simple random sampling, so the calculation
      of confidence intervals is more involved than what we have studied in this section. The
      calculations in Example 8.5 nonetheless show in principle how such surveys are planned.
           In practice, many factors influence the choice of a sample size. Case 8.2 illustrates
      one set of factors.

                   Marketing Christmas Trees An association of Christmas tree growers in Indiana
      CASE 8.2




                   sponsored a sample survey of Indiana households to help improve the marketing of Christmas
                   trees.5 The researchers decided to use a telephone survey and estimated that each telephone
                   interview would take about 2 minutes. Nine trained students in agribusiness marketing were
                   to make the phone calls between 1:00 P.M. and 8:00 P.M. on a Sunday. After discussing
                                                              8.1 Inference for a Single Proportion    467

                 problems related to people not being at home or being unwilling to answer the questions,
                 the survey team proposed a sample size of 500. Several of the questions asked demographic
                 information about the household. The key questions of interest had responses of “Yes” or
                 “No,” for example, “Did you have a Christmas tree last year?” The primary purpose of the
                 survey was to estimate various sample proportions for Indiana households. An important
                 issue in designing the survey was therefore whether the proposed sample size of n = 500
                 would be adequate to provide the sponsors of the survey with the information they required.

                To address this question, we calculate the margins of error of 95% confidence
                                           ˆ
           intervals for various values of p .

             EXAMPLE 8.6         Margins of Error

CASE 8.2   In the Christmas tree market survey, the margin of error of a 95% confidence interval for any
           value of p and n = 500 is
                    ˆ
                                                m = z ∗ SE p
                                                           ˆ

                                                               p (1 − p )
                                                               ˆ      ˆ
                                                     = 1.96
                                                                  500
                                             ˆ
           The results for various values of p are

                                         ˆ
                                         p            m            ˆ
                                                                   p          m
                                       0.05          0.019       0.60       0.043
                                       0.10          0.026       0.70       0.040
                                       0.20          0.035       0.80       0.035
                                       0.30          0.040       0.90       0.026
                                       0.40          0.043       0.95       0.019
                                       0.50          0.044


                The survey team judged these margins of error to be acceptable, and they used a sample size
           of 500 in their survey.


                 The table in Example 8.6 illustrates two points. First, the margins of error for
           p = 0.05 and p = 0.95 are the same. The margins of error will always be the same for
            ˆ               ˆ
           p and 1 − p . This is a direct consequence of the form of the confidence interval. Second,
            ˆ           ˆ
                                                                         ˆ
           the margin of error varies only between 0.040 and 0.044 as p varies from 0.3 to 0.7, and
           the margin of error is greatest when p = 0.5, as we claimed earlier. It is true in general
                                                  ˆ
                                                                              ˆ
           that the margin of error will vary relatively little for values of p between 0.3 and 0.7.
           Therefore, when planning a study, it is not necessary to have a very precise guess for p.
           If p ∗ = 0.5 is used and the observed p is between 0.3 and 0.7, the actual interval will
                                                    ˆ
           be a little shorter than needed, but the difference will be quite small.

            APPLY YOUR KNOWLEDGE

           8.13 Is there interest in a new product? One of your employees has suggested that
           your company develop a new product. You decide to take a random sample of your
           customers and ask whether or not there is interest in the new product. The response
           is on a 1 to 5 scale, with 1 indicating “definitely would not purchase”; 2, “probably
           would not purchase”; 3, “not sure”; 4, “probably would purchase”; and 5, “definitely
           would purchase.” For an initial analysis, you will record the responses 1, 2, and 3 as
           “No” and 4 and 5 as “Yes.” What sample size would you use if you wanted the 95%
           margin of error to be 0.1 or less?
468   CHAPTER 8 Inference for Proportions

      8.14 More information is needed. Refer to the previous exercise. Suppose that, after
      reviewing the results of the previous survey, you proceeded with preliminary develop-
      ment of the product. Now you are at the stage where you need to decide whether or not
      to make a major investment to produce and market the product. You will use another
      random sample of your customers, but now you want the margin of error to be smaller.
      What sample size would you use if you wanted the 95% margin of error to be 0.05 or
      less?


        SECTION 8.1 Summary

      • Inference about a population proportion is based on an SRS of size n. When n is large,
        the distribution of the sample proportion p = X/n is approximately Normal with
                                        √         ˆ
        mean p and standard deviation p(1 − p)/n.
      • The standard error of p is
                                 ˆ
                                                       p (1 − p )
                                                       ˆ      ˆ
                                          SE p =
                                             ˆ
                                                           n
      • The z margin of error for confidence level C is
                                                 m = z ∗ SE p
                                                            ˆ

         where z ∗ is the value for the standard Normal density curve with area C between −z ∗
         and z ∗ .
      • The z large-sample level C confidence interval for p is
                                                   p±m
                                                   ˆ
         We recommend using this method when the number of successes and the number of
         failures are both at least 15.
      • The plus four estimate of a population proportion is obtained by adding two suc-
         cesses and two failures to the sample and then using the z procedure. We recommend
         using this method when the sample size is at least 10 and the confidence level is 90%,
         95%, or 99%.
      • The sample size required to obtain a confidence interval of approximate margin of
         error m for a proportion is found from
                                                z∗ 2 ∗
                                        n=            p (1 − p ∗ )
                                                m
         where p ∗ is a guessed value for the proportion, and z ∗ is the standard Normal critical
         value for the desired level of confidence. To ensure that the margin of error of the
                                                            ˆ
         interval is less than or equal to m no matter what p may be, use
                                                       z∗   2
                                              n=
                                                       2m
      • Tests of H0: p = p0 are based on the z statistic
                                                     p − p0
                                                     ˆ
                                            z=
                                                  p0 (1 − p0 )
                                                        n
         with P-values calculated from the N (0, 1) distribution. Use this test when the expected
         number of successes np0 and the expected number of failures n(1 − p0 ) are both at
         least 10.
                                                                                     8.1 Inference for a Single Proportion           469


  SECTION 8.1 Exercises

For Exercises 8.1 and 8.2, see page 459; for 8.3 and 8.4,            8.20 Guitar Hero and Rock Band. An electronic survey of
see page 461; for 8.5 to 8.7, see page 462; for 8.8 to 8.11,         7061 reported that 67% of players of Guitar Hero and Rock Band
see page 464; for 8.12, see page 465; and for 8.13 and 8.14,         who do not currently play a musical instrument said that they are
see pages 467–468.                                                   likely to begin playing a real musical instrument in the next two
                                                                     years.8 The reports describing the survey do not give the number
8.15 What’s wrong? Explain what is wrong with each of the
                                                                     of respondents who do not currently play a musical instrument.
following.
                                                                     (a) Explain why it is important to know the number of respon-
(a) You can use a significance test to evaluate the hypothesis
                                                                     dents who do not currently play a musical instrument.
H0: p = 0.6 versus the two-sided alternative.
     ˆ
                                                                     (b) Assume that half of the respondents do not currently play a
(b) The large-sample significance test for a population propor-
                                                                     musical instrument. Find the count of players who said that they
tion is based on a t statistic.
                                                                     are likely to begin playing a real musical instrument in the next
(c) A large-sample 95% confidence interval for an unknown pro-
                                                                     two years.
             ˆ
portion p is p plus or minus its standard error.
                                                                     (c) Give a 99% confidence interval for the population proportion
8.16 What’s wrong? Explain what is wrong with each of the            who would say that they are likely to begin playing a real musical
following.                                                           instrument in the next two years.
(a) The margin of error for a confidence interval used for an         (d) The survey collected data from two separate consumer pan-
opinion poll takes into account that fact that people who did not    els. There were 3300 respondents from the LightSpeed consumer
answer the poll questions may have had different responses from      panel and the others were from Guitar Center’s proprietary con-
those who did answer the questions.                                  sumer panel. Comment on the sampling procedure used for this
(b) If the P-value for a significance test is 0.35, we can conclude   survey and how it would influence your interpretation of the
that the null hypothesis has a 35% chance of being true.             findings.
(c) A student project used a confidence interval to describe the      8.21 Guitar Hero and Rock Band. Refer to the previous
results in a final report. The confidence level was 110%.              exercise.
8.17 Draw some pictures. Consider the binomial setting with          (a) How would the result that you reported in part (c) of the pre-
n = 50 and p = 0.4.                                                  vious exercise change if only 25% of the respondents said that
                             ˆ
(a) The sample proportion p will have a distribution that is ap-     they did not currently play a musical instrument?
proximately Normal. Give the mean and the standard deviation         (b) Do the same calculations if the percent was 75%.
of this Normal distribution.                                         (c) The main conclusion of the survey that appeared in many
(b) Draw a sketch of this Normal distribution. Mark the location     news stories was that 67% of players of Guitar Hero and Rock
of the mean.                                                         Band who do not currently play a musical instrument said that
(c) Find a value p∗ for which the probability is 95% that p will
                                                          ˆ          they are likely to begin playing a real musical instrument in the
be between ± p ∗ . Mark these two values on your sketch.             next two years. What can you conclude about the effect of the
                                                                     three scenarios—part (b) in the previous exercise and parts (a)
8.18 Country food and Inuits. Country food includes seal,            and (b) in this exercise—on the margin of error for the main
caribou, whale, duck, fish, and berries and is an important part of   result?
the diet of the aboriginal people called Inuits who inhabit Inuit
Nunaat, the northern region of what is now called Canada. A          8.22 Gambling and college athletics. Gambling is an issue
survey of Inuits in Inuit Nunaat reported that 3274 out of 5000      of great concern to those involved in intercollegiate athletics.
respondents said that at least half of the meat and fish that they    Because of this, the National Collegiate Athletic Association
eat is country food.6 Find the sample proportion and a 95% con-      (NCAA) surveyed student-athletes concerning their gambling-
fidence interval for the population proportion of Inuits who eat      related behaviors.9 There were 5594 Division I male athletes in
meat and fish that are at least half country food.                    the survey. Of these, 3547 reported participation in some gam-
                                                                     bling behavior. This includes playing cards, betting on games
8.19 Most desirable mates. A poll of 5000 residents in Brazil,       of skill, buying lottery tickets, betting on sports, and similar
Canada, China, France, Malaysia, South Africa, and the United        activities.
States asked about what profession they would prefer their mar-      (a) Find the sample proportion and the large-sample margin of
riage partner to have. The choice receiving the highest percent,     error for 95% confidence. Explain in simple terms the meaning
805 of the responses, was doctors, nurses, and other health care     of the 95%.
professionals.7                                                      (b) Because of the way that the study was designed to protect
(a) Find the sample proportion and a 95% confidence interval          the anonymity of the student-athletes who responded, it was not
for the proportion of people who would prefer a doctor, nurse, or    possible to calculate the number of students who were asked to
other health care professional as a marriage partner.                respond but did not. Does this fact affect the way that you interpret
(b) Convert the estimate and the confidence interval to percents.     the results? Write a short paragraph explaining your answer.
470                                   CHAPTER 8 Inference for Proportions

8.23 Women athletes and gambling. In the study described in             (c) There were 1236 congregations surveyed in this study.
the previous exercise, 1447 out of a total of 3469 female student-      Calculate the nonresponse rate for this question. Does this in-
athletes reported participation in some gambling activity.              fluence how you interpret the results? Write a short discussion of
(a) Use the large-sample methods to find an estimate of the true         this issue.
proportion with a 95% confidence interval.                               (d) The respondents to this question were not asked to use a
(b) The margin of error for this sample is not the same as the          stopwatch to record the lengths of a random sample of sermons
margin of error calculated for the previous exercise. Explain           at their congregations. They responded based on their impres-
why.                                                                    sions of the sermons. Do you think that ministers, priests, rabbis,
                                                                        or other staff persons or leaders might perceive sermon lengths
8.24 Students doing community service. In a sample of                   differently from the people listening to the sermons? Discuss
159,949 first-year college students, the National Survey of Stu-         how your ideas influence your interpretation of the results of this
dent Engagement reported that 39% participated in community             study.
service or volunteer work.10
(a) Find the margin of error for 99% confidence.                         8.28 Are the congregations conservative? The study described
(b) Here are some facts from the report that summarizes the sur-        in the previous exercise also asked each respondent to classify his
vey. The students were from 617 four-year colleges and universi-        or her congregation according to theological orientation. For this
ties. The response rate was 36%. Institutions paid a participation      question, 707 out of 1191 congregations were classified as “more
fee of between $1800 and $7800 based on the size of their under-        conservative.” Using the questions in the previous exercise as a
graduate enrollment. Discuss these facts as possible sources of         guide, analyze and interpret these data. Compare your answers
error in this study. How do you think these errors would compare        to parts (c) and (d) in the two exercises and discuss reasons why
with the error that you calculated in part (a)?                         you think the answers should be similar or different.

8.25 Plans to study abroad. The survey described in the pre-            8.29 Student credit cards. In a survey of 1430 undergraduate
vious exercise also asked about items related to academics. In          students, 1087 reported that they had one or more credit cards.13
response to one of these questions, 42% of first-year students           Give a 95% confidence interval for the proportion of all college
reported that they plan to study abroad.                                students who have at least one credit card.
(a) Based on the information available, what is the value of the        8.30 How many credit cards? The survey described in the pre-
count of students who plan to study abroad?                             vious exercise reported that 43% of undergraduates had four or
(b) Give a 99% confidence interval for the population proportion         more credit cards. Give a 95% confidence interval for the propor-
of first-year college students who plan to study abroad.                 tion of all college students who have four or more credit cards.
8.26 Dogs or rats to find cocaine (optional). Dogs are big and           8.31 How would the confidence interval change? Refer to
expensive. Rats are small and cheap. Can rats be trained to re-         Exercise 8.25. Would a 95% confidence interval be wider or nar-
place dogs in sniffing out illegal drugs? One study trained six          rower than the one that you found in that exercise? Verify your
male albino Sprague-Dawley rats to rear up on their hind legs in        results by computing the interval.
response to the smell of cocaine.11 After training, each rat was
tested 80 times. In the test a rat was presented with a large num-      8.32 How would the confidence interval change? Refer to
ber of cups, one of which smelled like cocaine. A success was           Exercise 8.23. Would a 90% confidence interval be wider or nar-
recorded if the rat correctly identified the cup containing cocaine      rower than the one that you found in that exercise? Verify your
by rearing up in front of it. The numbers of successes for the          results by computing the interval.
six rats were 80, 80, 73, 80, 74, and 80. You want to estimate
                                                                        8.33 College students and diets. For a study of unhealthy eat-
the success rate in the future for each of the six rats. Compare
                                                                        ing behaviors, 267 college women aged 18 to 25 years were
the large-sample estimates with the plus four estimates for this
                                                                        surveyed.14 Of these, 69% reported that they had been on a diet
problem and make a recommendation concerning which is better.
                                                                        sometime during the past year. Give a 95% confidence interval
Write a short summary giving reasons for your recommendation.
                                                                        for the true proportion of college women aged 18 to 25 years in
8.27 Long sermons. The National Congregations Study col-                this population who dieted last year.
lected data in a one-hour interview with a key informant—that
                                                                        8.34 High school students and diets. In the study described
is, a minister, priest, rabbi, or other staff person or leader.12 One
                                                                        in the previous exercise, the researchers also surveyed 266 high
question concerned the length of the typical sermon. For 390
                                                                        school students who were 18 years old. In this sample 58.3%
out of 1191 congregations, the typical sermon lasted more than
                                                                        reported that they had dieted sometime in the past year. Give a
30 minutes.
                                                                        95% confidence interval for the true proportion of 18-year-old
(a) Use the large-sample inference procedures to estimate the
                                                                        high school students in this population who were on a diet some-
true proportion for this question with a 95% confidence interval.
                                                                        time during the past year.
(b) (Optional) Compute the interval using the plus four method.
Compare these results with those from part (a) and summarize            8.35 Marketing pet care products to older adults. You have
what the comparison tells you about the two methods.                    been asked to investigate the possibility of a marketing campaign
                                                                                       8.1 Inference for a Single Proportion          471

to promote your company’s pet care products to older adults. Your      ing survey, 38% were from rural areas (including small towns),
report will include information about your potential market. In a      and the other 62% were from urban areas (including suburbs).
study of the relationship between pet ownership and physical ac-       According to the census, 36% of Indiana households are in ru-
tivity in older adults, 594 subjects reported that they owned a pet,   ral areas, and the remaining 64% are in urban areas. Let p be
while 1939 reported that they did not.15 Give a 95% confidence          the proportion of rural respondents. Set up hypotheses about p0
interval for the proportion of older adults in this population who     and perform a test of significance to examine how well the sam-
are pet owners.                                                        ple represents the state in regard to rural versus urban residence.
                                                                       Summarize your results.
 CASE 8.2 8.36 Christmas tree marketing. One question in
the Christmas tree market survey described in Case 8.2 was “Did        8.42 More on demographics. In the previous exercise we arbi-
you have a Christmas tree last year?” Of the 500 respondents,          trarily chose to state the hypotheses in terms of the proportion of
421 answered “Yes.”                                                    rural respondents. We could as easily have used the proportion
(a) What proportion of the sampled households responded                of urban respondents.
“Yes”?                                                                 (a) Write hypotheses in terms of the proportion of urban res-
(b) Give the standard error for your estimate in part (a).             idents to examine how well the sample represents the state in
(c) Find a 95% confidence interval for the proportion of Indiana        regard to rural versus urban residence.
households that had a Christmas tree last year.                        (b) Perform the test of significance and summarize the
                                                                       results.
8.37 Shipping the orders on time. As part of a quality improve-
                                                                       (c) Compare your results with the results of the previous exer-
ment program, your mail-order company is studying the process
                                                                       cise. Summarize and generalize your conclusion.
of filling customer orders. According to company standards, an
order is shipped on time if it is sent within 2 working days of        8.43 Vouchers for schools? A national opinion poll found that
the time it is received. You select an SRS of 150 of the 5000          42% of all American adults agree that parents should be given
orders received in the past month for an audit. The audit reveals      vouchers good for education at any public or private school of
that 124 of these orders were shipped on time. Find a 95% confi-        their choice. The result was based on a small sample. How large
dence interval for the true proportion of the month’s orders that      an SRS is required to obtain a margin of error of ±0.035 (that is,
were shipped on time.                                                  ±3.5%) in a 95% confidence interval? (Use the previous poll’s
8.38 Power companies and trimming trees. Large trees grow-             result to obtain the guessed value p ∗ .)
ing near power lines can cause power failures during storms when
                                                                        CASE 8.2 8.44 Profile of the survey respondents. Of the
their branches fall on the lines. Power companies spend a great
                                                                       500 respondents in the Christmas tree market survey of Case 8.2,
deal of time and money trimming and removing trees to prevent
                                                                       44% had no children at home and 56% had at least one child at
this problem. Researchers are developing hormone and chemical
                                                                       home. The corresponding census figures are 48% with no chil-
treatments that will stunt or slow tree growth. If the treatment
                                                                       dren and 52% with at least one child. Test the null hypothesis
is too severe, however, the tree will die. In one series of labo-
                                                                       that the telephone survey technique has a probability of selecting
ratory experiments on 216 sycamore trees, 41 trees died. Give
                                                                       a household with no children that is equal to the value obtained
a 90% confidence interval for the proportion of sycamore trees
                                                                       by the census. Give the z statistic and the P-value. What do you
that would be expected to die from this particular treatment.
                                                                       conclude?
8.39 Financial goals of college students. In recent years over
70% of first-year college students responding to a national survey      8.45 Mathematician tosses coin 10,000 times! The South
have identified “being well-off financially” as an important per-        African mathematician John Kerrich, while a prisoner of war
sonal goal. A state university finds that 141 of an SRS of 200 of       during World War II, tossed a coin 10,000 times and obtained
its first-year students say that this goal is important. Give a 95%     5067 heads.
confidence interval for the proportion of all first-year students at     (a) Is this significant evidence at the 5% level that the probability
the university who would identify being well-off as an important       that Kerrich’s coin comes up heads is not 0.5?
personal goal.                                                         (b) Give a 95% confidence interval to see what probabilities of
                                                                       heads are roughly consistent with Kerrich’s result.
8.40 Can we use the z test? In each of the following cases, is
the sample large enough to permit safe use of the z test? (The         8.46 Instant versus fresh-brewed coffee. A matched pairs ex-
population is very large.)                                             periment compares the taste of instant coffee with fresh-brewed
(a) n = 12 and H0: p = 0.6.                                            coffee. Each subject tastes two unmarked cups of coffee, one of
(b) n = 100 and H0: p = 0.4.                                           each type, in random order and states which he or she prefers. Of
(c) n = 1000 and H0: p = 0.98.                                         the 60 subjects who participate in the study, 25 prefer the instant
(d) n = 500 and H0: p = 0.3.                                           coffee and the other 35 prefer fresh-brewed. Take p to be the
                                                                       proportion of the population that prefers fresh-brewed coffee.
 CASE 8.2 8.41 Checking the demographics of a sample. Of               (a) Test the claim that a majority of people prefer the taste of
the 500 households that responded to the Christmas tree market-        fresh-brewed coffee. Report the z statistic and its P-value. Is
472                                  CHAPTER 8 Inference for Proportions

your result significant at the 5% level? What is your practical        favorably. Calculate the margin of error of the 95% confidence
conclusion?                                                           interval.
(b) Find a 90% confidence interval for p.
                                                                      8.50 Are the customers dissatisfied? A cell phone manufac-
8.47 High-income households on a mailing list. Land’s Begin-          turer would like to know what proportion of its customers are
ning sells merchandise through the mail. It is considering buying     dissatisfied with the service received from their local distrib-
a list of addresses from a magazine. The magazine claims that at      utor. The customer relations department will survey a random
least 25% of its subscribers have high incomes (that is, household    sample of customers and compute a 99% confidence interval
income in excess of $100,000). Land’s Beginning would like to         for the proportion that are dissatisfied. From past studies, they
estimate the proportion of high-income people on the list. Veri-      believe that this proportion will be about 0.1. Find the sample
fying income is difficult, but another company offers this service.    size needed if the margin of error of the confidence interval is
Land’s Beginning will pay to verify the incomes of an SRS of          to be about 0.02. Suppose 18% of the sample say that they are
people on the magazine’s list. They would like the margin of er-      dissatisfied. What is the margin of error of the 99% confidence
ror of the 95% confidence interval for the proportion to be 0.05       interval?
or less. Use the guessed value p ∗ = 0.25 to find the required
                                                                      8.51 Increase student fees? You have been asked to survey
sample size.
                                                                      students at a large college to determine the proportion that favor
8.48 Change the specs. Refer to the previous exercise. For            an increase in student fees to support an expansion of the stu-
each of the following variations on the design specifications,         dent newspaper. Each student will be asked whether he or she
state whether the required sample size will be higher, lower, or      is in favor of the proposed increase. Using records provided by
the same as that found above.                                         the registrar you can select a random sample of students from
(a) Use a 99% confidence interval.                                     the college. After careful consideration of your resources, you
(b) Change the allowable margin of error to 0.01.                     decide that it is reasonable to conduct a study with a sample of
(c) Use a planning value of p ∗ = 0.15.                               150 students. Construct a table of the margins of error for 95%
(d) Use a different company to do the income verification.                                 ˆ
                                                                      confidence when p takes the values 0.1, 0.2, 0.3, 0.4, 0.5, 0.6,
                                                                      0.7, 0.8, and 0.9.
8.49 Start a student nightclub? A student organization wants
to start a nightclub for students under the age of 21. To assess      8.52 Justify the cost of the survey. A former editor of the stu-
support for this proposal, the organization will select an SRS        dent newspaper agrees to underwrite the study in the previous
of students and ask each respondent if he or she would patron-        exercise because she believes the results will demonstrate that
ize this type of establishment. About 75% of the student body         most students support an increase in fees. She is willing to pro-
are expected to respond favorably. What sample size is required       vide funds for a sample of size 500. Write a short summary for
to obtain a 95% confidence interval with an approximate mar-           your benefactor of why the increased sample size will provide
gin of error of 0.06? Suppose that 50% of the sample responds         better results.



                                     8.2 Comparing Two Proportions
                                     Because comparative studies are so common, we often want to compare the proportions
                                     of two groups (such as men and women) that have some characteristic. We call the
                                     two groups being compared Population 1 and Population 2, and the two population
                                     proportions of “successes” p1 and p2 . The data consist of two independent SRSs. The
                                     sample sizes are n 1 for Population 1 and n 2 for Population 2. The proportion of successes
                                     in each sample estimates the corresponding population proportion. Here is the notation
                                     we will use in this section:

                                                                Population        Sample         Count of          Sample
                                             Population         proportion          size         successes        proportion
                                                  1                  p1              n1              X1           p1 = X 1 /n 1
                                                                                                                  ˆ
                                                  2                  p2              n2              X2           p2 = X 2 /n 2
                                                                                                                  ˆ


                                     To compare the two unknown population proportions, start with the observed difference
                                     between the two sample proportions,

                                                                              D = p 1 − p2
                                                                                  ˆ     ˆ
                                                                                      8.2 Comparing Two Proportions       473

FIGURE 8.4 The sampling
distribution of the difference                   Sampling distribution                           Standard deviation
between two sample                                   of p1 - p2                                p1(1 - p1)   p (1 - p2)
proportions is approximately                                                                              + 2
                                                                                                   n1            n2
Normal. The mean and
standard deviation are found
from the two population
proportions of successes, p1                                                                               Mean p1 - p2
and p2 .




                                                                         Values of p1 – p2




                                 When both sample sizes are sufficiently large, the sampling distribution of the difference
                                 D is approximately Normal. What are the mean and the standard deviation of D? Each of
                                          ˆ
                                 the two p ’s has the mean and standard deviation given in the box on page 458. Because
                                                                            ˆ
                                 the two samples are independent, the two p ’s are also independent. We can apply the
                                 rules for means and variances of sums of random variables. Here is the result, which is
                                 summarized in Figure 8.4.


                                    Sampling Distribution of p1 − p2
                                                             ˆ    ˆ
                                     Choose independent SRSs of sizes n 1 and n 2 from two populations with proportions p1
                                    and p2 of successes. Let D = p1 − p2 be the difference between the two sample
                                                                  ˆ    ˆ
                                    proportions of successes. Then
                                    • As both sample sizes increase, the sampling distribution of D becomes approximately
                                       Normal.
                                    • The mean of the sampling distribution is p1 − p2 .
                                    • The standard deviation of the sampling distribution is
                                                                             p1 (1 − p1 )   p2 (1 − p2 )
                                                                  σD =                    +
                                                                                  n1             n2



                                  APPLY YOUR KNOWLEDGE

                                 8.53 Rules for means and variances. Suppose p1 = 0.4, n 1 = 25, p2 = 0.5,
                                 n 2 = 30. Find the mean and the standard deviation of the sampling distribution of
                                 p 1 − p2 .
                                 8.54 Effect of the sample sizes. Suppose p1 = 0.4, n 1 = 100, p2 = 0.5, n 2 = 120.
                                 (a) Find the mean and the standard deviation of the sampling distribution of p1 − p2 .
                                 (b) The sample sizes here are four times as large as those in the previous exercise,
                                     while the population proportions are the same. Compare the results for this
                                     exercise with those that you found in the previous exercise. What is the effect of
                                     multiplying the sample sizes by 4?
                474   CHAPTER 8 Inference for Proportions

                      8.55 Rules for means and variances. It is quite easy to verify the mean and standard
                      deviation of the difference D.
                      (a) What are the means and standard deviations of the two sample proportions p1ˆ
                               ˆ
                          and p2 ? (Look at the box on page 460 if you need to review this.)
                      (b) Use the addition rule for means of random variables: what is the mean of
                          D = p1 − p2 ?
                                 ˆ    ˆ
                      (c) The two samples are independent. Use the addition rule for variances of random
                          variables: what is the variance of D?

                      Large-sample confidence intervals for a difference in proportions
                      The large-sample estimate of the difference in two proportions p1 − p2 is the corre-
                      sponding difference in sample proportions p1 − p2 . To obtain a confidence interval for
                                                                  ˆ   ˆ
                      the difference, we once again replace the unknown parameters in the standard deviation
                      by estimates to obtain an estimated standard deviation, or standard error. Here is the
                      confidence interval we want.

                                 z Confidence Interval for Comparing Two Proportions
                                  Choose an SRS of size n 1 from a large population having proportion p1 of successes and
                                 an independent SRS of size n 2 from another population having proportion p2 of successes.
                                 The large-sample estimate of the difference in proportions is
                                                                                   X1     X2
                                                                 D = p1 − p2 =
                                                                       ˆ     ˆ         −
                                                                                   n1     n2
                                 The standard error of the difference is
                                                                        p1 (1 − p1 )
                                                                        ˆ       ˆ      p2 (1 − p2 )
                                                                                       ˆ       ˆ
                                                            SE D =                   +
                                                                             n1             n2
                                 and the margin of error for confidence level C is
                                                                          m = z ∗ SE D
                                 where z ∗ is the value for the standard Normal density curve with area C between −z ∗ and
                                 z ∗ . The large-sample level C confidence interval for p1 − p2 is
                                                                        ( p1 − p2 ) ± m
                                                                          ˆ    ˆ
                                 Use this method when the number of successes and the number of failures in each of the
                                 samples are at least 10.


                                   “No Sweat” Garment Labels Following complaints about the working conditions in
                      CASE 8.3




                                   some apparel factories both in the United States and abroad, a joint government and industry
                                   commission recommended that companies that monitor and enforce proper standards be
                                   allowed to display a “No Sweat” label on their products. Does the presence of these labels
                                   influence consumer behavior?
                                        A survey of U.S. residents aged 18 or older asked a series of questions about how likely
                                   they would be to purchase a garment under various conditions. For some conditions, it was
                                   stated that the garment had a “No Sweat” label; for others, there was no mention of such a
                                   label. On the basis of the responses, each person was classified as a “label user” or a “label
                                   nonuser.”16 About 16.5% of those surveyed were label users. One purpose of the study was
                                   to describe the demographic characteristics of users and nonusers.
                                        Here is a summary of the data. We let X denote the number of label users.
Paul Galipeau




                                                          Population        n       X         p = X/n
                                                                                              ˆ
                                                          1 (women)        296      63           0.213
                                                          2 (men)          251      27           0.108
                                                                                      8.2 Comparing Two Proportions   475

                                  The study in Case 8.3 suggested that there is a gender difference in the proportion
                             of label users. Let’s explore this possibility using a confidence interval.


                               EXAMPLE 8.7         Gender Differences in Label Use

                  CASE 8.3   First, we find the estimate of the difference:
                                                                      X1   X2
                                                  D = p 1 − p2 =
                                                      ˆ     ˆ            −    = 0.213 − 0.108 = 0.105
                                                                      n1   n2
                             Next, we calculate the standard error:

                                                           0.213(1 − 0.213) 0.108(1 − 0.108)
                                               SE D =                      +                 = 0.0308
                                                                  63               27
                             For 95% confidence, we use z ∗ = 1.96, so the margin of error is

                                                           m = z ∗ SE D = (1.96)(0.0308) = 0.060

                             The large-sample 95% confidence interval is

                                                         D ± m = 0.105 ± 0.060 = (0.04, 0.16)

                             With 95% confidence we can say that the difference in the proportions is between 0.04 and 0.16.
                             Alternatively, we can report that the gender difference is about 10% in favor of women, with a
                             95% margin of error of 6%.



                                  Minitab and SAS output for Example 8.7 appear in Figure 8.5. Other statistical
                             packages provide output that is similar.
                                  In surveys such as this, men and women are typically not sampled separately. The
                             respondents to a single sample are divided after the fact into men and women. The
                             sample sizes are then random and reflect the characteristics of the population sampled.
                             Two-sample significance tests and confidence intervals are still approximately correct in
                             this situation, even though the two sample sizes were not fixed in advance.


FIGURE 8.5 Minitab and SAS                  Minitab
outputs for Example 8.7.
                                              Sample                             X      N          Sample p
                                              1                                 63      296        0.212838
                                              2                                 27      251        0.107570

                                              Difference = p (1) – p (2)
                                              Estimate for difference: 0.105268
                                              95% CI for difference: (0.0449066, 0.165630)



                                            SAS

                                                                                          (Asymptotic) 95%
                                                                 Risk          ASE        Confidence Limits

                                              Row 1            0.2128        0.0238       0.1662        0.2595
                                              Row 2            0.1076        0.0196       0.0692        0.1459
                                              Total            0.1645        0.0159       0.1335        0.1956

                                              Difference       0.1053        0.0308       0.0449        0.1656
476                      CHAPTER 8 Inference for Proportions

                               In Example 8.7 we chose women to be the first population. Had we chosen men as
                         the first population, the estimate of the difference would be negative (−0.104). Because
                         it is easier to discuss positive numbers, we generally choose the first population to be the
                         one with the higher proportion. The choice doesn’t affect the substance of the analysis.

                          APPLY YOUR KNOWLEDGE

                         8.56 Lying and online dating profiles. JupiterResearch estimates that the U.S. online
                         dating market will reach $932 million by 2011 and that the European online dating
                         sites will double revenues from 243 million euros in 2006 to 549 million euros in
                         2011.17 When trying to start a new relationship, people want to make a favorable
                         impression. Sometimes they will even stretch the truth a bit when disclosing information
                         about themselves. A study of deception in online dating studied the accuracy of the
                         information given in their online dating profiles by 80 online daters.18 The study found
                         that 22 of 40 men lied about their height, while 17 of 40 women were deceptive in this
                         way. A difference between the person’s actual height and that reported in the online
                         dating profile was classified as a lie if it was greater than 0.5 inches.
                         (a) Find the sample proportion of men who lied about their height. Do the same for
                              the women.
                         (b) Give the estimate of the difference between the proportion of men who lie about
                              their height and the proportion of women who lie about their height.
                         (c) Find the standard error for the estimated difference.
                         (d) Give the 95% confidence interval for the difference.
                         8.57 Lying about weight. The study described in the previous exercise also described
                         results for lying about weight. They reported that 24 men and 23 women lied about
                         their weight. Answer parts (a) through (d) from the previous exercise for these data.

                              The previous two exercises ask questions about lying about weight and lying about
                         height. Suppose we wanted to look at the men only and compare the lying rates for
                         height and weight. Can we do this using the methods that we just studied? Stop for a
                         moment to review the material in the box on page 474, paying particular attention to the
                         assumptions that are needed for this method to be valid. The assumptions state that we
                         have independent samples from the two populations. In our examples, however, we are
                         using data from the same people to examine lying about height and lying about weight.
                         The z confidence interval for comparing two proportions that we have been studying
                         is not valid for this situation. Be sure to check your assumptions before applying any
                         statistical inference procedure.


                         Plus four confidence intervals for a difference in proportions*
                         Just as in the case of estimating a single proportion, a small modification of the sample
                         proportions greatly improves the accuracy of confidence intervals.19 As before, we first
                         add 2 successes and 2 failures to the actual data, dividing them equally between the two
                         samples. That is, add 1 success and 1 failure to each sample. Note that each sample
                         size is increased by 2. We then perform the calculations for the z procedure with the
      Wilson estimates   modified data. As in the case of a single sample, we use the term Wilson estimates for


                         *The material on the plus four confidence interval for a difference in proportions is optional and can be
                         omitted without loss of continuity.
                                                   8.2 Comparing Two Proportions       477

the estimates produced in this way. We recommend using this method when both sample
sizes are at least 5 and the confidence level is 90%, 95%, or 99%.
     In Example 8.7, we had X 1 = 63, n 1 = 296, X 2 = 27, and n 2 = 251. For the plus
four procedure, we would use X 1 = 64, n 1 = 298, X 2 = 28, and n 2 = 253.

 APPLY YOUR KNOWLEDGE

8.58 Gender and labels using plus four. Refer to Example 8.7 (page 475), where we
computed a 95% confidence interval for the difference in the proportions of men and
women who were likely to use “No Sweat” labels when deciding to purchase clothing.
Redo the computations using the plus four method and compare your results with those
obtained in Example 8.7.
8.59 Gender and labels using plus four. Refer to the previous exercise and to Exam-
ple 8.7. Suppose that the sample sizes were smaller but that the proportions remained
approximately the same. Specifically, assume that 6 out of 30 women were label users
and 3 out of 25 men were label users. Compute the plus four interval for 95% confi-
dence. Then, compute the corresponding z interval and compare the results.
8.60 Lying about age. Refer to Exercises 8.56 and 8.57, where you analyzed data
about lying about height and weight in online dating profiles. The study also reported
that 10 men and 5 women lied about their age.
(a) The z confidence interval for comparing two proportions should not be used for
    these data. Why?
(b) Compute the plus four confidence interval for the difference in proportions.

Significance tests
Although we prefer to compare two proportions by giving a confidence interval for the
difference between the two population proportions, it is sometimes useful to test the null
hypothesis that the two population proportions are the same.
      We standardize D = p1 − p2 by subtracting its mean p1 − p2 and then dividing by
                          ˆ     ˆ
its standard deviation
                                      p1 (1 − p1 )   p2 (1 − p2 )
                           σD =                    +
                                           n1             n2
If n 1 and n 2 are large, the standardized difference is approximately N (0, 1). To get
a confidence interval, we used sample estimates in place of the unknown population
proportions p1 and p2 in the expression for σ D . Although this approach would lead to a
valid significance test, we follow the more common practice of replacing the unknown
σ D with an estimate that takes into account the null hypothesis that p1 = p2 . If these two
proportions are equal, we can view all of the data as coming from a single population.
Let p denote the common value of p1 and p2 . The standard deviation of D = p1 − p2   ˆ    ˆ
is then
                                        p(1 − p)   p(1 − p)
                            σ Dp =               +
                                           n1         n2

                                                    1    1
                                  =     p(1 − p)       +
                                                    n1   n2
The subscript on σ Dp reminds us that this is the standard deviation under the special
condition that the two populations share a common proportion p of successes.
478                          CHAPTER 8 Inference for Proportions

                                 We estimate the common value of p by the overall proportion of successes in the
                             two samples:
                                                   number of successes in both samples     X1 + X2
                                             p=
                                             ˆ                                           =
                                                  number of observations in both samples   n1 + n2
      pooled estimate of p   This estimate of p is called the pooled estimate because it combines, or pools, the
                             information from both samples.
                                                                                            ˆ
                                   To estimate the standard deviation of D, substitute p for p in the expression for
                             σ Dp . The result is a standard error for D under the condition that the null hypothesis
                             H0: p1 = p2 is true. The test statistic uses this standard error to standardize the difference
                             between the two sample proportions.


                                 Significance Tests for Comparing Two Proportions
                                 Choose an SRS of size n 1 from a large population having proportion p1 of successes and
                                 an independent SRS of size n 2 from another population having proportion p2 of
                                 successes. To test the hypothesis
                                                                        H0: p1 = p2
                                 compute the z statistic
                                                                             p1 − p2
                                                                             ˆ     ˆ
                                                                        z=
                                                                               SE Dp
                                 where the pooled standard error is

                                                                                       1    1
                                                             SE Dp =     p (1 − p )
                                                                         ˆ      ˆ         +
                                                                                       n1   n2

                                 based on the pooled estimate of the common proportion of successes
                                                                             X1 + X2
                                                                        p=
                                                                        ˆ
                                                                             n1 + n2
                                 In terms of a standard Normal random variable Z , the P-value for a test of H0 against


                                             Ha : p1 > p2   is   P(Z ≥ z)
                                                                                                          z


                                             Ha : p1 < p2   is   P(Z ≤ z)
                                                                                                  z


                                             Ha : p1 = p2   is   2P(Z ≥ |z|)
                                                                                                         |z|

                                 Use this test when the number of successes and the number of failures in each of the
                                 samples are at least 5.



                               EXAMPLE 8.8        Men, Women, and Garment Labels

              CASE 8.3       Example 8.7 (page 475) presents survey data on whether consumers are “label users” who pay
                             attention to label details when buying a garment. Are men and women equally likely to be label
                             users? Here is the data summary:
                                                        8.2 Comparing Two Proportions            479

                           Population          n       X           p = X/n
                                                                   ˆ
                           1 (women)         296       63           0.213
                           2 (men)           251       27           0.108


The sample proportions are certainly quite different, but we need a significance test to verify that
the difference is too large to easily result from the play of chance in choosing the sample. Formally,
we compare the proportions of label users in the two populations (women and men) by testing the
hypotheses

                                             H0: p1 = p2
                                             Ha : p1 = p2

The pooled estimate of the common value of p is
                                        63 + 27    90
                                 p=
                                 ˆ               =     = 0.1645
                                       296 + 251   547
This is just the proportion of label users in the entire sample.
     First, we compute the standard error

                                                        1   1
                     SE Dp =     (0.1645)(0.8355)         +           = 0.03181
                                                       296 251

and then we use this in the calculation of the test statistic
                                    p1 − p2
                                    ˆ     ˆ   0.213 − 0.108
                               z=           =               = 3.30
                                      SE Dp      0.03181
The difference in the sample proportions is more than 3 standard deviations away from zero. The
P-value is 2P(Z ≥ 3.30). From Table A we have P = 2 × 0.0005 = 0.0010. Software gives
P = 0.0009. We report: 21% of women are label users versus only 11% of men; the difference is
statistically significant (z = 3.30, P < 0.001).



     Figure 8.6 gives the Minitab and SAS outputs for Example 8.8. Carefully examine
the output to find all the important pieces that you would need to report the results of the
analysis and to draw a conclusion.
     Many market researchers would expect the proportion of label users to be higher
among women than among men. That is, we might choose the one-sided alternative
Ha : p1 > p2 . The P-value would be half of the value obtained for the two-sided test.
Because the z statistic is so large, this distinction is of no practical importance.


 APPLY YOUR KNOWLEDGE

8.61 Do men lie more often about their height than women? Refer to Exercise 8.56
(page 476) about lying and online dating profiles.
(a) State appropriate null and alternative hypotheses for this setting. Give a
    justification for your choice.
(b) Use the data given in Exercise 8.56 to perform a two-sided significance test. Give
    the test statistic and the P-value.
(c) Summarize the results of your significance test.
8.62 What about weight? Refer to Exercise 8.57 (page 476) for the data on lying
about weight. Answer the questions given in the previous exercise for weight.
480                              CHAPTER 8 Inference for Proportions

FIGURE 8.6 Minitab and SAS              Minitab
outputs for Example 8.8.
                                          Sample                                 X        N              Sample p
                                          1                                     63        296            0.212838
                                          2                                     27        251            0.107570

                                          Difference = p(1)– p(2)
                                          Estimate for difference: 0.105268
                                                                             :
                                          Test for difference = 0(vs not = 0) z = 3.31 P–Value = 0.001




                                        SAS

                                         Two Sample Test of Equality of Proportions
                                             Sample Statistics
                                                                  – Frequencies of x for gen –
                                                   Value      1                       2

                                                   0                      233                   224
                                                   1                      63                        27

                                              Hypothesis Test
                                                 Null hypothesis:
                                                           Proportion of x(gen=1) – Proportion of x(gen=2) = 0


                                                   Alternative:
                                                           Proportion of x(gen=1) – Proportion of x(gen=2) ^= 0
                                                                  – Proportions of x for gen –
                                                   Value          1                   2                         z     Prob > z

                                                   1                    0.2128              0.1076            3.31    0.0009




                                 BEYOND THE BASICS: Relative Risk
                                 In Example 8.7 (page 475) we compared the proportions of women and men who are
                                 “label users” when they shop for clothing by giving a confidence interval for the difference
                                 of proportions. Alternatively, we might choose to make this comparison by giving the
                 relative risk   ratio of the two proportions. This ratio is often called the relative risk (RR). A relative
                                                                      ˆ       ˆ
                                 risk of 1 means that the proportions p1 and p2 are equal. Confidence intervals for relative
                                 risk apply the principles that we have studied, but the details are somewhat complicated.
                                 Fortunately, we can leave the details to software and concentrate on interpreting and
                                 communicating the results.



                                  EXAMPLE 8.9          Relative Risk for Use of Labels

                  CASE 8.3       The following table summarizes the data on the proportions of men and women who use labels
                                 when buying clothing:

                                                           Population            n              X           p = X/n
                                                                                                            ˆ
                                                           1 (women)            296         63               0.2128
                                                           2 (men)              251         27               0.1076
                                                          8.2 Comparing Two Proportions           481

The relative risk for this sample is
                                              ˆ
                                              p1   0.2128
                                    RR =         =        = 1.98
                                              ˆ
                                              p2   0.1076
Confidence intervals for the relative risk in the entire population of shoppers are based on this
sample relative risk. Software (for example, PROC FREQ with the MEASURES option in SAS)
gives a 95% confidence interval as 1.30 to 3.01. Our summary: Women are about twice as likely
as men to use labels; the 95% confidence interval is (1.30, 3.01).


     In Example 8.9 the confidence interval is clearly not symmetric about the estimate:
that is, 1.98 is not the midpoint of 1.30 and 3.01. This is true in general for confidence
intervals for relative risk.
     Relative risk—that is, comparing proportions by a ratio rather than by a difference—
is particularly useful when the proportions are small. This is often the case in epidemi-
ology and medical statistics. Here is a typical epidemiological example.


 EXAMPLE 8.10          Smoking and Colorectal Cancer
Colorectal cancer is fourth in the list of types of cancers that lead to death. Many studies have
examined the relationship between cigarette smoking and colorectal cancer but the results have
been inconsistent. Twenty-six studies gave relative risk estimates for people who had ever smoked
relative to those who had never smoked. A recent study combined the results of these studies to
obtain a summary measure of relative risk.20 The smokers are Population 1 and the nonsmokers are
Population 2. The report of the study stated that the relative risk was 1.18 with a 95% confidence
interval of 1.11 to 1.25. Since the confidence interval does not include the value of 1, which would
correspond to equal risks in the two populations, we conclude that there is a higher risk of colorectal
cancer for cigarette smokers. The estimated increase in risk is 18% with a 95% confidence interval
of 11% to 25%.




  SECTION 8.2 Summary

• The estimate of the difference in two population proportions is
                                               D = p 1 − p2
                                                   ˆ     ˆ

   where
                                              X1                   X2
                                       p1 =
                                       ˆ            and     p2 =
                                                            ˆ
                                              n1                   n2
   The standard error of the difference is

                                              p1 (1 − p1 )
                                              ˆ       ˆ      p2 (1 − p2 )
                                                             ˆ       ˆ
                               SE D =                      +
                                                   n1             n2

   and the margin of error for confidence level C is

                                               m = z ∗ SE D

   where z ∗ is the value for the standard Normal density curve with area C between −z ∗
   and z ∗ .
482                                    CHAPTER 8 Inference for Proportions

                                       • The z large-sample level C confidence interval for the difference in two proportions
                                           p1 − p2 is
                                                                                   ( p 1 − p2 ) ± m
                                                                                     ˆ     ˆ
                                           We recommend using this method when the number of successes and the number of
                                           failures in both samples are at least 10.
                                       • The plus four confidence interval for comparing two proportions is obtained by
                                           adding one success and one failure to each sample and then using the z procedure.
                                           We recommend using this method when both sample sizes are at least 5 and the
                                           confidence level is 90%, 95%, or 99%.
                                       • Significance tests of H0: p1 = p2 use the z statistic
                                                                                         p1 − p2
                                                                                         ˆ     ˆ
                                                                                    z=
                                                                                           SE Dp
                                           with P-values from the N (0, 1) distribution. In this statistic,

                                                                                                   1    1
                                                                         SE Dp =     p (1 − p )
                                                                                     ˆ      ˆ         +
                                                                                                   n1   n2

                                                 ˆ
                                           where p is the pooled estimate of the common value of p1 and p2 ,
                                                                                         X1 + X2
                                                                                   p=
                                                                                   ˆ
                                                                                         n1 + n2
                                           We recommend using this test when the number of successes and the number of
                                           failures in each of the samples are at least 5.
                                       • Relative risk is the ratio of two sample proportions:
                                                                                              ˆ
                                                                                              p1
                                                                                     RR =
                                                                                              ˆ
                                                                                              p2
                                           Confidence intervals for relative risk are an alternative to confidence intervals for the
                                           difference when we want to compare two proportions.


  SECTION 8.2 Exercises

For Exercises 8.53 to 8.55, see pages 473–474; for 8.56 and 8.57,         (b) Find the estimate of the difference between the proportion
see page 476; for 8.58 to 8.60, see page 477; and for 8.61 and            of Internet users who had downloaded podcasts as of February
8.62, see page 479.                                                       to April 2006 and the proportion as of May 2008.
                                                                          (c) Is the large-sample confidence interval for the difference in
8.63 Podcast downloading. The Podcast Alley Web site re-
                                                                          two proportions appropriate to use in this setting? Explain your
cently reported that they have 53,501 podcasts available for
                                                                          answer.
downloading, with 3,447,545 episodes.21 A Pew survey of Inter-
                                                                          (d) Find the 95% confidence interval for the difference.
net users described the results of two surveys about podcast down-
                                                                          (e) Convert your estimated difference and confidence interval
loading. The first was conducted between February and April
                                                                          to percents.
2006 and surveyed 2822 Internet users. They found that 198 of
                                                                          (f) One of the surveys was conducted between February and
these said that they had downloaded a podcast to listen to it or
                                                                          April, whereas the other was conducted in May. Do you think
view it later at least once. In a more recent survey, conducted in
                                                                          that this difference should have any effect on the interpretation
May 2008, there were 1553 Internet users. Of this total, 295 said
                                                                          of the results? Be sure to explain your answer.
that they had downloaded a podcast to listen to it or view it later.22
(a) Refer to the table that appears at the beginning of this section
(page 472). Fill in the numerical values of all quantities that are       8.64 Significance test for podcast downloading. Refer to
known.                                                                    the previous exercise. Test the null hypothesis that the two
                                                                                            8.2 Comparing Two Proportions             483

proportions are equal. Report the test statistic with the P-value      difference in proportions to compare teens’ use of computers
and summarize your conclusion.                                         with teens’ use of consoles? Write a short paragraph giving the
                                                                       reason for your answer. (Hint: Look carefully in the box giving
8.65 Are more Internet users downloading podcasts? Refer
                                                                       the assumptions needed for this procedure.)
to the previous two exercises. The ratio of the proportion in the
2008 sample to the proportion in the 2006 sample is about 2.7.         8.71 Draw a picture. Suppose that there are two binomial
(a) Can you conclude that 2.7 times as many people are down-           populations. For the first, the true proportion of successes is 0.4;
loading podcasts? Explain why or why not.                              for the second, it is 0.5. Consider taking independent samples
(b) Can you conclude from the data available that there has            from these populations, 50 from the first and 60 from the second.
been an increase from 2006 to 2008 in the number of people who         (a) Find the mean and the standard deviation of the distribution
download podcasts? If your answer is no, explain what additional       of p1 − p2 .
                                                                          ˆ     ˆ
data you would need or what additional assumptions you would           (b) This distribution is approximately Normal. Sketch this Nor-
have to make to be able to draw this conclusion.                       mal distribution and mark the location of the mean.
                                                                       (c) Find a value d for which the probability is 0.95 that the dif-
8.66 Adult gamers versus teen gamers. A Pew Internet Project
                                                                       ference in sample proportions is within ±d. Mark these values
Data Memo presented data comparing adult gamers with teen
                                                                       on your sketch.
gamers with respect to the devices on which they play. The data
are from two surveys. The adult survey had 1063 gamers, and            8.72 What’s wrong? For each of the following, explain what
the teen survey had 1064 gamers. The memo reports that 54% of          is wrong and why.
adult gamers played on game consoles (Xbox, PlayStation, Wii,          (a) A z statistic is used to test the null hypothesis that p1 = p2 .
                                                                                                                                  ˆ    ˆ
etc.), and 89% of teen gamers played on game consoles.23               (b) If two sample proportions are equal, then the sample counts
(a) Refer to the table that appears at the beginning of this section   are equal.
(page 472). Fill in the numerical values of all quantities that are    (c) A 95% confidence interval for the difference in two propor-
known.                                                                 tions includes errors due to nonresponse.
(b) Find the estimate of the difference between the proportion
of teen gamers who played on game consoles and the proportion          8.73 College student summer employment. Suppose (as is
of adults who played on these devices.                                 roughly true) that 85% of college men and 83% of college women
(c) Is the large-sample confidence interval for the difference in       were employed last summer. A sample survey interviews SRSs
two proportions appropriate to use in this setting? Explain your       of 400 college men and 400 college women. The two samples
answer.                                                                are of course independent.
(d) Find the 95% confidence interval for the difference.                (a) What is the approximate distribution of the proportion p F ˆ
(e) Convert your estimated difference and confidence interval           of women who worked last summer? What is the approximate
to percents.                                                                                          ˆ
                                                                       distribution of the proportion p M of men who worked?
(f) The adult survey was conducted between October and De-             (b) The survey wants to compare men and women. What is the
cember 2008, whereas the teen survey was conducted between             approximate distribution of the difference in the proportions who
November 2007 and February 2008. Do you think that this differ-        worked, p M − p F ?
                                                                                 ˆ      ˆ
ence should have any effect on the interpretation of the results?
                                                                       8.74 A corporate liability trial. A major court case on liabil-
Be sure to explain your answer.
                                                                       ity for contamination of groundwater took place in the town of
8.67 Significance test for gaming on consoles. Refer to the             Woburn, Massachusetts. A town well in Woburn was contami-
previous exercise. Test the null hypothesis that the two propor-       nated by industrial chemicals. During the period that residents
tions are equal. Report the test statistic with the P-value and        drank water from this well, there were 16 birth defects among
summarize your conclusion.                                             414 births. In years when the contaminated well was shut off and
                                                                       water was supplied from other wells, there were 3 birth defects
8.68 Gamers on computers. The report described in Exer-                among 228 births. The plaintiffs suing the firms responsible for
cise 8.66 also presented data from the same surveys for gaming         the contamination claimed that these data show that the rate
on computers (desktops or laptops). These devices were used by         of birth defects was higher when the contaminated well was in
73% of adult gamers and by 76% of teen gamers. Answer the              use.24 How statistically significant is the evidence? Be sure to
questions given in Exercise 8.66 for gaming on computers.              state what assumptions your analysis requires and to what extent
8.69 Significance test for gaming on computers. Refer to the            these assumptions seem reasonable in this case.
previous exercise. Test the null hypothesis that the two propor-
                                                                        CASE 8.2 8.75 Natural versus artificial Christmas trees. In
tions are equal. Report the test statistic with the P-value and
                                                                       the Christmas tree survey introduced in Case 8.2 (page 466), re-
summarize your conclusion.
                                                                       spondents who had a tree during the holiday season were asked
8.70 Can we compare gaming on consoles with gaming on                  whether the tree was natural or artificial. Respondents were also
computers? Refer to the previous four exercises. Do you think          asked if they lived in an urban area or in a rural area. Of the 421
that you can use the large-sample confidence intervals for a            households displaying a Christmas tree, 160 lived in rural areas
484                                    CHAPTER 8 Inference for Proportions

and 261 were urban residents. The tree growers want to know if           of male references that are juvenile (“boy” rather than “man”)?
there is a difference in preference for natural trees versus artificial   Here are data from one of the texts:
trees between urban and rural households. Here are the data:
                                                                                    Gender          n         X (juvenile)
            Population           n          X(natural)                              Female          60             48
            1 (rural)           160              64                                 Male           132             52
            2 (urban)           261              89
                                                                         (a) Find the sample proportions of juvenile references for fe-
(a) Give the null and alternative hypotheses that are appropriate        males and for males.
for this problem assuming that we have no prior information              (b) Give a 95% confidence interval for the difference and briefly
suggesting that one population would have a higher preference            summarize what the data show.
than the other.                                                          8.78 Is the gender bias statistically significant? The previous
(b) Test the null hypothesis. Give the test statistic and the            exercise addresses a question about gender bias with a confidence
P-value, and summarize the results.                                      interval. Set up the problem as a significance test. Carry out the
(c) Give a 90% confidence interval for the difference in propor-          test and summarize the results.
tions.
                                                                         8.79 Effect of the sample size. Return to the study of un-
8.76 Summer employment of college students. A university                 dergraduate student summer employment described in Exercise
financial aid office polled an SRS of undergraduate students to            8.76. Similar results from a smaller number of students may not
study their summer employment. Not all students were employed            have the same statistical significance. Specifically, suppose that
the previous summer. Here are the results for men and women:             71 of 78 men surveyed were employed and 62 of 71 women
                                                                         surveyed were employed. The sample proportions are essentially
                                   Men         Women                     the same as in the earlier exercise.
            Employed                 712          623                    (a) Compute the z statistic for these data and report the P-value.
            Not employed              68           92                    What do you conclude?
                                                                         (b) Compare the results of this significance test with your results
            Total                    780          715                    in Exercise 8.76. What do you observe about the effect of the
                                                                         sample size on the results of these significance tests?
(a) Is there evidence that the proportion of male students em-
                                                                         8.80 Relative risk for gamers. Refer to the Pew data about
ployed during the summer differs from the proportion of female
                                                                         gaming on game consoles (Xbox, PlayStation, Wii, etc.) by adults
students who were employed? State H0 and Ha , compute the test
                                                                         and teens in Exercises 8.66 and 8.67 (page 483). Now, compare
statistic, and give its P-value.
                                                                         the adults with the teens using the relative risk approach.
(b) Give a 95% confidence interval for the difference between
                                                                         (a) Find the proportion of adult gamers who use game consoles.
the proportions of male and female students who were em-
                                                                         Do the same for the teen gamers.
ployed during the summer. Does the difference seem practically
                                                                         (b) Find the relative risk using the teen proportion in the
important to you?
                                                                         numerator.
8.77 Gender bias in textbooks. To what extent do textbooks               (c) Repeat the computation of the relative risk using percents in
on syntax (analysis of sentence structure) display gender bias? A        place of proportions. Compare this calculation with the one that
study of this question sampled sentences from 10 texts.25 One            you performed in part (b) and explain what you have learned.
part of the study examined the use of the words “girl,” “boy,”           (d) Do you expect the 95% confidence interval for the relative
“man,” and “woman.” Call the first two words juvenile and the             risk to include the value 1? Explain why or why not.
last two adult. Is the proportion of female references that are          (e) Find the 95% confidence interval if you have access to
juvenile (“girl” rather than “woman”) equal to the proportion            software that can do this calculation.


                                       STATISTICS IN SUMMARY
                                       Inference about population proportions is based on sample proportions. We rely on the
                                       fact that a sample proportion has a distribution that is close to Normal unless the sample is
                                       small. All the z procedures in this chapter work well when the samples are large enough.
                                       You must check this before using them. Here is a review list of the most important skills
                                       you should have acquired from your study of this chapter.
                                       A. Recognition
                                          1. Recognize from the design of a study whether one-sample or two-sample
                                             procedures are needed.
                                                                                           CHAPTER 8 Review Exercises              485

                                          2. Recognize what parameter or parameters an inference problem concerns. In
                                             particular, distinguish between settings that require inference about a proportion
                                             and comparing two proportions.
                                          3. Calculate from sample counts the sample proportion or proportions.

                                      B. Inference about One Proportion
                                         1. Use the z procedure to give a confidence interval for a population proportion p.
                                         2. Use the z statistic to carry out a test of significance for the hypothesis
                                            H0: p = p0 about a population proportion p against either a one-sided or a
                                            two-sided alternative.
                                         3. Check that you can safely use these z procedures in a particular setting.

                                      C. Comparing Two Proportions
                                         1. Use the two-sample z procedure to give a confidence interval for the difference
                                            p1 − p2 between proportions in two populations based on independent samples
                                            from the populations.
                                         2. Use a z statistic to test the hypothesis H0: p1 = p2 that proportions in two
                                            distinct populations are equal.
                                         3. Check that you can safely use these z procedures in a particular setting.

                                           Statistical inference always draws conclusions about one or more parameters of a
                                      population. When you think about doing inference, ask first what the population is and
                                      what parameter you are interested in. The t procedures of Chapter 7 allow us to give
                                      confidence intervals and carry out tests about population means. We use the z procedures
                                      of this chapter for inference about population proportions.


CHAPTER 8          Review Exercises

8.81 Changes in credit card usage by undergraduates. In                (b) Suppose that the sample size for the 2000 study was 2000.
Exercise 8.30 (page 470) we looked at data from a survey of 1430       Redo the confidence interval and significance testing calculations
undergraduate students and their credit card use. These students       for this scenario.
were surveyed 2004. In the sample, 43% said that they had four         (c) Compare your results for parts (a) and (b) of this exercise
or more credit cards. A similar study performed in 2000 by the         with the results that you found in the previous two exercises.
same organization reported that 32% of the sample said that they       Write a short paragraph about the effects of assuming a value for
had four or more credit cards.26 Assume that the sample sizes for      the sample size on your conclusions.
the two studies are the same. Find a 95% confidence interval for        8.84 Student employment during the school year. A study of
the change from 2000 to 2004 in the percent of undergraduates          1430 undergraduate students reported that 994 work 10 or more
who report having four or more credit cards.                           hours a week during the school year. Give a 95% confidence in-
8.82 Do the significance test for the change. Refer to the pre-         terval for the proportion of all undergraduate students who work
vious exercise. Perform the significance test for comparing the         10 or more hours a week during the school year.
two proportions. Report your test statistic, the P-value, and sum-     8.85 Examine the effect of the sample size. Refer to the previ-
marize your conclusion.                                                ous exercise. Assume a variety of different scenarios where the
8.83 We did not know the sample size. Refer to the previous            sample size changes but the proportion in the sample who work
two exercises. We did not report the sample size for the 2000          10 or more hours a week during the school year remains the same.
study, but it is reasonable to assume that it is fairly close to the   Write a short report summarizing your results and conclusions.
sample size for the 2004 study.                                        Be sure to include numerical and graphical summaries of what
(a) Suppose that the sample size for the 2000 study was only           you have found.
1000. Redo the confidence interval and significance testing cal-         8.86 Video game genres. U.S. computer and video game soft-
culations for this scenario.                                           ware sales were $9.5 billion in 2007.27 A survey of 1102 teens
486                                  CHAPTER 8 Inference for Proportions

collected data about their video game use. The table below lists      of households that were wireless only in December 2007 and the
the most popular game genres.28                                       households that were wireless only in December 2003.
                                                                      (e) Give the margin of error for 95% confidence for the differ-
                                                         Percent
                                                                      ence in proportions.
Genre          Examples                                  who play
Racing         NASCAR, Mario Kart, Burnout                 74         8.90 Analyze the change in terms of relative risk. Refer to
Puzzle         Bejeweled, Tetris, Solitaire                72         the previous two exercises.
Sports         Madden, FIFA, Tony Hawk                     68         (a) Summarize the change data in terms of relative risk. The term
Action         Grand Theft Auto,                           67         “relative risk” is a poor description of the ratio that you are using
               Devil May Cry, Ratchet and Clank                       for this exercise. Give a better term for this ratio.
Adventure      Legend of Zelda, Tomb Raider                 66        (b) Analyze the data in terms of relative risk and write a summary
Rhythm         Guitar Hero,                                 61        of your results.
               Dance Dance Revolution, Lumines                        (c) Compare your results in part (b) with your findings in terms
                                                                      of a difference in proportions from the previous exercise.
Give a 95% confidence interval for the proportion who play             (d) Which approach do you prefer? Give reasons for your
games in each of these six genres.                                    answer.
8.87 Too many errors. Refer to the previous exercise. The             8.91 Gambling and student-athletes. Gambling behaviors of
chance that each of the six intervals that you calculated includes    Division I intercollegiate male student-athletes were analyzed
the true proportion for that genre is approximately 95%. In other     in Exercise 8.22 (page 469). Similar data for women were given
words, the chance that you make an error and your interval misses     in Exercise 8.23. Compare the males and females with a signif-
the true value is approximately 5%.                                   icance test and give an estimate of the difference in proportions
(a) Explain why the chance that at least one of your intervals does   of student-athletes who participate in any gambling activity with
not contain the true value of the parameter is greater than 5%.       a 95% margin of error. We noted in Exercise 8.22 that we do not
(b) One way to deal with this problem is to adjust the confidence      have any information available to assess nonresponse. Consider
level for each interval so that the overall probability of at least   the possibility that the response rates differ by gender and by
one miss is 5%. One simple way to do this is to use a Bonferroni      whether or not the person participates in any gambling activity.
procedure. Here is the basic idea: You have an error budget of        Write a short summary of how these differences might affect
5% and you choose to spend it equally on six intervals. Each          inference on these issues.
interval has a budget of 0.05/6 = 0.0083. So each confidence
interval should have a 0.83% chance of missing the true value.        8.92 Effects of reducing air pollution. A study that evaluated
In other words, the confidence level for each interval should be       the effects of a reduction in exposure to traffic-related air pollu-
1 − 0.0083 = 0.9917. Use Table A to find the value of z for a          tants compared respiratory symptoms of 283 residents of an area
large-sample confidence interval for a single proportion corre-        with congested streets with 165 residents in a similar area where
sponding to 99.17% confidence.                                         the congestion was removed because a bypass was constructed.
(c) Calculate the six confidence intervals using the Bonferroni        The symptoms of the residents of both areas were evaluated
procedure.                                                            at baseline and again a year after the bypass was completed.30
                                                                      For the residents of the congested streets, 17 reported that their
8.88 Wireless only. Are customers giving up their landlines and       symptoms of wheezing improved between baseline and one year
relying on wireless for all their phone needs? Surveys have col-      later, while 35 of the residents of the bypass streets reported
lected data to answer this question.29 In December 2003, 4.2% of      improvement.
households were wireless only. Assume that this survey is based       (a) Find the two sample proportions.
on sampling 15,000 households.                                        (b) Report the difference in the proportions and the standard
(a) Convert the percent to a proportion. Then use the proportion      error of the difference.
and the sample size to find the count of households who were           (c) What are the appropriate null and alternative hypotheses
wireless only.                                                        for examining the question of interest? Be sure to explain your
(b) Find a 95% confidence interval for the proportion of house-        choice of the alternative hypothesis.
holds that were wireless only in December 2003.                       (d) Find the test statistic. Construct a sketch of the distribution of
8.89 Change in wireless only. Refer to the previous exercise.         the test statistic under the assumption that the null hypothesis is
The percent increased to 16.4% in December 2007. Assume the           true. Find the P-value and use your sketch to explain its meaning.
same sample size for this sample.                                     (e) Is no evidence of an effect the same as evidence that there is
(a) Find the proportion and the count for this sample.                no effect? Use a 95% confidence interval to answer this question.
(b) Compute the 95% confidence interval for the proportion.            Summarize your ideas in a way that could be understood by
(c) Convert the estimate and confidence interval in terms of pro-      someone who has very little experience with statistics.
portions to an estimate and confidence interval in terms of            (f) The study was done in the United Kingdom. To what ex-
percents.                                                             tent do you think that the results can be generalized to other
(d) Find the estimate of the difference in the proportions            circumstances?
                                                                                          CHAPTER 8 Review Exercises               487

8.93 Downloading music from the Internet. The following              The table gives the number of subjects in each group and the
quote is from a survey of Internet users.31 The sample size for      number reporting improvement. So, for example, the proportion
the survey was 1371. Since 18% of those surveyed said they           who reported improvement in the number of wheezing attacks
download music, the sample size for this subsample is 247.           was 21/163 in the congested group.
                                                                     (a) The reported sample sizes vary from symptom to symptom.
    Among current music downloaders, 38% say they
                                                                     Give possible reasons for this and discuss the possible impact on
    are downloading less because of the RIAA suits. . . . About a
                                                                     the results.
    third of current music downloaders say they use peer-to-peer
                                                                     (b) Calculate the difference in the proportions for each symp-
    networks. . . . 24% of them say they swap files using
                                                                     tom. Make a table of symptoms ordered from highest to lowest
    email and instant messaging; 20% download files from
                                                                     based on these differences. Include the estimates of the differ-
    music-related Web sites like those run by music magazines or
                                                                     ences and the 95% confidence intervals in the table. Summarize
    musician homepages. And while online music services like
                                                                     your conclusions.
    iTunes are far from trumping the popularity of file-sharing
                                                                     (c) Can you justify a one-sided alternative in this situation? Give
    networks, 17% of current music downloaders say they are
                                                                     reasons for your answer.
    using these paid services. Overall, 7% of Internet users say
                                                                     (d) Perform a significance test to compare the two groups for
    they have bought music at these new services at one time
                                                                     each of the symptoms. Summarize the results.
    or another, including 3% who currently use paid services.
                                                                     (e) Reanalyze the data using only the data from the bypass group.
(a) For each percent quoted, give the 95% margin of error. You       Give confidence intervals for the proportions that reported im-
should express these in percents, as given in the quote.             proved symptoms. Compare the conclusions that someone might
(b) Rewrite the paragraph in a shorter form but include the          make from these results with those you presented in part (b).
margins of error.                                                    Use your analyses of the data in this exercise to discuss the
(c) Pick either side (A) or side (B) and give arguments in favor     importance of a control group in studies such as this.
of the view that you select. (A) The margins of error should be
included because they are necessary for the reader to properly       8.95 The parrot effect: how to increase your tips. An experi-
interpret the results. (B) The margins of error interfere with the   ment examined the relationship between tips and server behavior
flow of the important ideas. It would be better to just report one    in a restaurant.32 In one condition, the server repeated the cus-
margin of error and say that all of the others are no greater than   tomer’s order word for word, while in the other condition, the
this number. If you choose view (B), be sure to give the value of    orders were not repeated. Tips were received in 47 of the 60 trials
the margin of error that you report.                                 under the repeat condition and in 31 of the 60 trials under the
                                                                     no-repeat condition.
8.94 Other effects of reducing air pollution. In Exercise 8.92       (a) Find the sample proportions and compute a 95% confidence
the effects of a reduction in air pollution on wheezing was ex-      interval for the difference in population proportions.
amined by comparing the one-year change in symptoms in a             (b) Use a significance test to compare the two conditions. Sum-
group of residents who lived on congested streets with a group       marize the results.
who lived in an area that had been congested but from which          (c) The study was performed in a restaurant in the Nether-
the congestion was removed when a bypass was built. The effect       lands. Two waitresses performed the tasks. How do these facts
of the reduction in air pollution was assessed by comparing the      relate to the type of conclusions that can be drawn from this
proportions of residents in the two groups who reported that their   study? Do you think that the parrot effect would apply in other
wheezing symptoms improved. Here are some additional data            countries?
from the same study:                                                 (d) Design a study to test the parrot effect in a setting that
                                                                     is familiar to you. Be sure to include complete details about
                             Bypass              Congested           how the study will be conducted and how you will analyze the
Symptom                 n     Improved          n     Improved       results.
Number of              282       45            163        21
                                                                     8.96 Brand loyalty and the Chicago Cubs. According to lit-
 wheezing attacks                                                    erature on brand loyalty, consumers who are loyal to a brand
Wheezing disturbs      282       45            164        12         are likely to consistently select the same product. This type of
 sleep                                                               consistency may come from a positive childhood association.
Wheezing limits        282       12            164         4         To examine brand loyalty among fans of the Chicago Cubs, 371
  speech                                                             Cubs fans among patrons of a restaurant located in Wrigleyville
Wheezing affects       281       26            165        13         were surveyed before a game at Wrigley Field, the Cubs home
 activities                                                          field.33 The respondents were classified as “die-hard fans” or
Winter cough           261       15            156        14         “less loyal fans.” Of the 134 die-hard fans, 90.3% reported that
Winter phlegm          253       12            144        10         they had watched or listened to Cubs games when they were
                                                                     children. Among the 237 less loyal fans, 67.9% said that they
Consulted doctor       247       29            140        18
                                                                     had watched or listened as children.
488                                   CHAPTER 8 Inference for Proportions

(a) Find the numbers of die-hard Cubs fans who watched or              your conclusion with a clear statement of your assumptions and
listened to games when they were children. Do the same for the         the results of your statistical calculations.
less loyal fans.
                                                                       8.102 How much is the improvement? In the setting of the pre-
(b) Use a significance test to compare the die-hard fans with the
                                                                       vious exercise, give a 95% confidence interval for the proportion
less loyal fans with respect to their childhood experiences of the
                                                                       of nonconforming items for the modified process. Then, taking
team.
                                                                       p0 = 0.11 to be the old proportion and p the proportion for the
(c) Express the results with a 95% confidence interval for the
                                                                       modified process, give a 95% confidence interval for p − p0 .
difference in proportions.
                                                                       8.103 Choosing sample sizes. For a single proportion the mar-
8.97 Brand loyalty in action. The study mentioned in the pre-
                                                                       gin of error of a confidence interval is largest for any given
vious exercise found that two-thirds of the die-hard fans attended
                                                                       sample size n and confidence level C when p = 0.5. This led
                                                                                                                        ˆ
Cubs games at least once a month, but only 20% of the less loyal
                                                                       us to use p ∗ = 0.5 for planning purposes. A similar result is
fans attended this often. Analyze these data using a significance
                                                                       true for the two-sample problem. The margin of error of the
test and a confidence interval. Write a short summary of your
                                                                       confidence interval for the difference between two proportions
findings.
                                                                       is largest when p1 = p2 = 0.5. Use these conservative values
                                                                                        ˆ      ˆ
8.98 Frequent lottery players. A study of state lotteries in-          in the following calculations, and assume that the sample sizes
cluded a random digit dialing (RDD) survey conducted by the            n 1 and n 2 have the common value n. Calculate the margins of
National Opinion Research Center (NORC). The survey asked              error of the 95% confidence intervals for the difference in two
2406 adults about their lottery spending.34 A total of 248 indi-       proportions for the following choices of n: 10, 25, 50, 100, 150,
viduals were classified as “heavy” players. Of these, 152 were          200, 400, and 500. Present the results in a table and with a graph.
male. The study notes that 48.5% of U.S. adults are male. Use          Summarize your conclusions.
a significance test to compare the proportion of males among
                                                                       8.104 Choosing sample sizes, continued. As the previous ex-
heavy lottery players with the proportion of males in the U.S.
                                                                                                                            ˆ      ˆ
                                                                       ercise noted, using the guessed value 0.5 for both p1 and p2 gives
adult population and write a short summary of your results. For
                                                                       a conservative margin of error in confidence intervals for the dif-
this analysis, assume that the 248 heavy lottery players are a
                                                                       ference between two population proportions. You are planning a
random sample of all heavy lottery players and that the margin
                                                                       survey and will calculate a 95% confidence interval for the differ-
of error for the 48.5% estimate of the percent of males in the
                                                                       ence in two proportions when the data are collected. You would
U.S. adult population is so small that it can be neglected.
                                                                       like the margin of error of the interval to be less than or equal to
8.99 Use a confidence interval. Use a confidence interval to             0.04. You will use the same sample size n for both populations.
give an alternative analysis for the previous exercise.                (a) How large a value of n is needed?
                                                                       (b) Give a general formula for n in terms of the desired margin
8.100 Time to repair golf clubs. The Ping Company makes                of error m and the critical value z ∗ .
custom-built golf clubs and competes in the $4 billion golf equip-
ment industry. To improve its business processes, Ping decided to      8.105 Unequal sample sizes. You are planning a survey in
seek ISO 9001 certification.35 As part of this process, a study of      which a 95% confidence interval for the difference between two
the time it took to repair golf clubs that were sent to the company    proportions will present the results. You will use the conservative
by mail determined that 16% of orders were sent back to the                                     ˆ        ˆ
                                                                       guessed value 0.5 for p1 and p2 in your planning. You would
customers in 5 days or less. Ping examined the processing of           like the margin of error of the confidence interval to be less
repair orders and made changes. Following the changes, 90% of          than or equal to 0.15. It is very difficult to sample from the first
orders were completed within 5 days. Assume that each of the           population, so that it will be impossible for you to obtain more
estimated percents is based on a random sample of 200 orders.          than 25 observations from this population. Taking n 1 = 25, can
(a) How many orders were completed in 5 days or less before            you find a value of n 2 that will guarantee the desired margin of
the changes? Give a 95% confidence interval for the proportion          error? If so, report the value; if not, explain why not.
of orders completed in this time.
                                                                       8.106 Students change their majors. In a random sample of
(b) Do the same for orders after the changes.
                                                                       950 students from a large public university, it was found that 444
(c) Give a 95% confidence interval for the improvement. Ex-
                                                                       of the students changed majors during their college years.
press this both for a difference in proportions and for a difference
                                                                       (a) Give a 99% confidence interval for the proportion of students
in percents.
                                                                       at this university who change majors.
8.101 Does the new process give a better product? Eleven               (b) Express your results from (a) in terms of the percent of
percent of the products produced by an industrial process over         students who change majors.
the past several months fail to conform to the specifications. The      (c) University officials are more interested in the number of
company modifies the process in an attempt to reduce the rate           students who change majors than in the proportion. The univer-
of nonconformities. In a trial run, the modified process produces       sity has 30,000 undergraduate students. Convert your confidence
16 nonconforming items out of a total of 300 produced. Do these        interval in (a) to a confidence interval for the number of students
results demonstrate that the modification is effective? Support         who change majors during their college years.
                                                                                                     CHAPTER 8 Appendix                489

8.107 Statistics and the law. Casteneda v. Partida is an impor-         (b) Let p be the probability that a randomly selected juror
tant court case in which statistical methods were used as part of       is a Mexican American. The null hypothesis to be tested is
a legal argument. When reviewing this case, the Supreme Court           H0 : p = p0 . Find the value of p for this problem, compute
                                                                                                            ˆ
used the phrase “two or three standard deviations” as a crite-          the z statistic, and find the P-value. What do you conclude? (A
rion for statistical significance. This Supreme Court review has         finding of statistical significance in this circumstance does not
served as the basis for many subsequent applications of statistical     constitute proof of discrimination. It can be used, however, to
methods in legal settings. (The two or three standard deviations        establish a prima facie case. The burden of proof then shifts to the
referred to by the Court are values of the z statistic and correspond   defense.)
to P-values of approximately 0.05 and 0.0026.) In Casteneda             (c) We can reformulate this exercise as a two-sample problem.
the plaintiffs alleged that the method for selecting juries in a        Here we wish to compare the proportion of Mexican Americans
county in Texas was biased against Mexican Americans.36 For             among those selected as jurors with the proportion of Mexican
the period of time at issue, there were 181,535 persons eligible        Americans among those not selected as jurors. Let p1 be the prob-
for jury duty, of whom 143,611 were Mexican Americans. Of the           ability that a randomly selected juror is a Mexican American,
870 people selected for jury duty, 339 were Mexican Americans.          and let p2 be the probability that a randomly selected nonjuror
(a) What proportion of eligible jurors were Mexican Americans?          is a Mexican American. Find the z statistic and its P-value. How
Let this value be p0 .                                                  do your answers compare with your results in (b)?



CHAPTER 8             Case Study Exercises
CASE STUDY EXERCISE 1: Gender bias in textbooks.                        CASE STUDY EXERCISE 2: Sample size, P-value, and the
Exercise 8.77 (page 484) reports a study of gender bias in 10 syn-      margin of error. In this Case Study we examine the effects of
tax textbooks. Here are the counts of “girl,” “woman,” “boy,” and       the sample size on the significance test and the confidence inter-
“man” for all the texts. The data in Exercise 8.77 are for text         val for comparing two proportions. For each calculation, suppose
number 6.                                                               that p1 = 0.75 and p2 = 0.5, and take n to be the common value
                                                                              ˆ               ˆ
                                                                        of n 1 and n 2 . Use the z statistic to test H0 : p1 = p2 versus the
                                Text Number
                                                                        alternative Ha : p1 = p2 . Compute the statistic and the associ-
            1     2      3    4     5   6   7        8     9     10     ated P-value for the following values of n: 12, 20, 40, 80, 100,
Girl        2     5     25    11     2 48 38         5     48    13     200, and 500. Summarize the results in a table and make a plot.
Woman       3     2     31    65     1 12   2       13     24     5     Explain what you observe about the effect of the sample size on
Boy         7    18     14    19 12 52 70            6    128    32                                                                ˆ
                                                                        statistical significance when the sample proportions p1 and p2     ˆ
Man        27    45     51   138 31 80      2       27     48    95     are unchanged.
                                                                              Now we will do similar calculations for the confidence
     Analyze the data and write a report summarizing your con-          interval. Here, we suppose that p1 = 0.75 and p2 = 0.5.
                                                                                                               ˆ                 ˆ
clusions. The researchers who conducted the study note that the         Compute the margin of error for the 95% confidence in-
authors of texts 8, 9, and 10 are women, while the other seven          terval for the difference in the two proportions for n =
were written by men. Do you see any pattern that suggests that          12, 20, 40, 80, 100, 200, and 500. Summarize and explain your
the gender of the author is associated with the results?                results.




CHAPTER 8              Appendix

Using Minitab and Excel for Inference                                   If the data are in the worksheet, then choose the Samples in
for Proportions                                                         columns option and click the column containing the data
                                                                        into the box below. With this option, you can construct
Confidence Interval for a Single Proportion                             confidence intervals for more than one data set simultane-
                                                                        ously. With respect to the nature of the data, data entries
Minitab:                                                                can be any two distinct values (numeric or text) where one
                                                                        value represents “success” and the other represents “fail-
         Stat ➤ Basic Statistics ➤ 1 Proportion                         ure.” Alternatively, if you know the number of successes,
490                              CHAPTER 8 Inference for Proportions

you can choose the Summarized data option and then in-         two distinct values (numeric or text) where one value repre-
put the number of successes in the Number of events box        sents “success” and the other represents “failure.” Minitab
and input the sample size in the Number of trials box.         will allow you to store the data for the two samples all in
Now it is important that you click the Options button and      one column with no requirement in terms of placement or-
select the Use test and interval based on normal distri-       der of the data. When this is done, a second column in the
bution option. You can also input your desired confidence       worksheet is required in which there are two distinct la-
level in the Confidence level box if other than 95%. Click      bels (numerical or text) indicating for a given row whether
OK to close the pop-up box and then click OK to find the        the corresponding data observation comes from the first or
sample proportion and confidence interval reported in the       second sample. If the data are stored in this manner, then
Session window.                                                choose the Samples in one column option and click the
                                                               data column into the Samples box and click the column
Excel:                                                         of labels into the Subscripts box. As a third option, if you
Confidence intervals for the proportion are not available       know the number of successes in each of the samples, you
in standard Excel but they are available in the WHFStat        can choose the Summarized data option and input the
Add-In for Excel.                                              number of successes in the Number of events box and in-
                                                               put the sample size in the Number of trials box for each of
Test for a Single Proportion                                   the samples. If you wish to change the level of confidence
                                                               from the default value of 95%, click the Options button
Minitab:                                                       and input your desired confidence level in the Confidence
                                                               level box. Click OK to close the pop-up box and then click
         Stat ➤ Basic Statistics ➤ 1 Proportion                OK to find the sample proportions and confidence interval
                                                               for the difference between population proportions reported
This is the same routine described in this Appendix for        in the Session window.
obtaining the confidence interval for the proportion. If you
wish to conduct a hypothesis test, select the Perform hy-      Excel:
pothesis test option and input the null hypothesis mean        Confidence intervals for comparing two proportions are
value ( p0 ) in the box below. Now click the Options but-      not available in standard Excel but they are available in
ton and select the Use test and interval based on nor-         the WHFStat Add-In for Excel.
mal distribution option. With this pop-up box, you can
also select your alternative hypothesis from the Alterna-      Test for Comparing Two Proportions
tive menu box. Click OK to close the pop-up box and
then click OK to find the test statistic and corresponding      Minitab:
P-value reported in the Session window.
                                                                       Stat ➤ Basic Statistics ➤ 2 Proportions
Excel:
                                                               This is the same routine described in this Appendix for ob-
Testing for the proportion is not available in standard
                                                               taining the confidence interval for the difference between
Excel but it is available in the WHFStat Add-In for
                                                               two proportions. To have the test based on a pooled stan-
Excel.
                                                               dard error as described in the chapter, click the Options
                                                               button and select the Use pooled estimate of p for test
Confidence Interval for Comparing
                                                               option. With this pop-up box, you can also select your al-
Two Proportions
                                                               ternative hypothesis from the Alternative menu box. Click
                                                               OK to close the pop-up box and then click OK to find the
Minitab:
                                                               test statistic and corresponding P-value reported in the
         Stat ➤ Basic Statistics ➤ 2 Proportions               Session window.

If the data for the two samples are in two separate columns,   Excel:
choose the Samples in different columns option and then        Testing for two proportions is not available in standard
click the columns containing the data into the two boxes be-   Excel but it is available in the WHFStat Add-In for
low. As with the single proportion, data entries can be any    Excel.

								
To top