# CHAPTER 8 Inference for Proportions by liwenting

VIEWS: 766 PAGES: 34

• pg 1
```									                                                                                                                                  CHAPTER          8
RICHARD KEPPEL-SMITH/GETTY

The U.S. video game market is
approximately \$8.2 billion and

Inference for Proportions                                                                  growing. Case 8.1 discusses a
PEW survey that has been used
to collect data on gamers.

Introduction                                                                               CHAPTER OUTLINE
8.1 Inference for a Single
We frequently collect data on categorical variables, such as whether or not a
person is a full-time college student or a part-time college student, the brand            Proportion
name of a cell phone, or the country where a college student studies abroad.               8.2 Comparing Two
When we record categorical variables, our data consist of counts or of percents            Proportions
obtained from counts.
The parameters we want to do inference about in these settings are popu-
lation proportions. Just as in the case of inference about population means, we
may be concerned with a single population or with comparing two populations.
Inference about one or two proportions is very similar to inference about means,
which we discussed in Chapter 7. In particular, inference for both means and
proportions is based on sampling distributions that are approximately Normal.
We begin in Section 8.1 with inference about a single population propor-
tion. Section 8.2 concerns methods for comparing two proportions.

8.1 Inference for a Single Proportion
Adults and Video Games A PriceWaterhouseCooper report estimates that
CASE 8.1

the U.S. video game market was approximately \$8.6 billion in 2007 and is ex-
pected to increase at an annual rate of 6.3% through 2012.1 Who plays video
games? A PEW survey, conducted by Princeton Survey Research International,
reports that over half of American adults aged 18 and over play video games.2
The PEW survey used a nationally representative sample of 2054 adults. Of the
total, 1063 adults said that they played video games.

For problems involving a single proportion, we will use n for the sample size
and X for the count of the outcome of interest. Often we will use the terms
“success” and “failure” for the two possible outcomes. When we do this, X is
the number of successes.
458                           CHAPTER 8 Inference for Proportions

EXAMPLE 8.1        Data for the Video Game Case

CASE 8.1      The count of people who responded “Yes” to the question about whether or not they played video
games in the sample of Case 8.1 is X = 1063. The sample size is n = 2054.

We would like to know the proportion of video game players in the adult U.S.
population proportion   population. This population proportion is the parameter of interest. The statistic used
sample proportion    to estimate this unknown parameter is the sample proportion. The sample proportion
is p = X/n.
ˆ

EXAMPLE 8.2        Estimating the Proportion of Adults Who Play Video Games

CASE 8.1                                 ˆ
The sample proportion p in Case 8.1 is a discrete random variable that can take the values 0,
1/2054, 2/2054, . . . , 2053/2054, or 1. For our particular sample, we have

1063
p=
ˆ         = 0.52
2054

ˆ
In many cases, a probability model for p can be based on the binomial distributions
for counts, discussed in Chapter 5. If the sample size n is very small, we can base tests and
ˆ
conﬁdence intervals for p on the discrete distribution of p . We will focus on situations
where the sample size is sufﬁciently large that we can approximate the distribution of p    ˆ
by a Normal distribution.

Sampling Distribution of a Sample Proportion
Choose an SRS of size n from a large population that contains population proportion p of
ˆ
“successes.” Let X be the count of successes in the sample and let p be the sample
proportion of successes,
X
p=
ˆ
n
Then:
• For large sample sizes, the distribution of p is approximately Normal.
ˆ
• The mean of the distribution of p is p.
ˆ
• The standard deviation of p is
ˆ

p(1 − p)
n

Figure 8.1 summarizes these facts in a form that recalls the idea of sampling dis-
tributions. Our inference procedures are based on this Normal approximation. These
procedures are similar to those for inference about the mean of a Normal distribu-
tion. We will see, however, that there are a few extra details involved, caused by the
ˆ
added difﬁculty in approximating the discrete distribution of p by a continuous Normal
distribution.
8.1 Inference for a Single Proportion    459

SRS size n    ^
p
SRS size n                                                p(1 - p)
^
p                                               n
SRS siz
en
^
p
Mean p

Population
proportion p                                                                  ^
Values of p

FIGURE 8.1 Draw a large SRS from a population in which the proportion p are successes. The
ˆ
sampling distribution of the sample proportion p of successes has approximately a Normal
distribution.

8.1 Bank acquisitions. The American Bankers Association Community Bank Com-
petitiveness Survey for 2008 had responses from 760 community banks. Of these,
283 reported that they expected to acquire another bank within ﬁve years.3
(a) What is the sample size n for this survey?
(b) What is the count X ? Describe the count in a short sentence.
ˆ
(c) Find the sample proportion p .
CASE 8.1 8.2 How often do they play? In the PEW survey described in Case 8.1,
those who played video games were asked how often they played. In this subpopulation,
223 adults said that they played every day or almost every day.
(a) What is the sample size n for the subpopulation of U.S. adults who play video
games? (Hint: Look at Case 8.1.)
(b) What is the count X of those who said that they played every day or almost every
day?
ˆ
(c) Find the sample proportion p .

Large-sample confidence interval for a single proportion
The sample proportion p = X/n is the natural estimator of the population proportion p.
ˆ
√
Notice that p(1 − p)/n, the standard deviation of p , depends upon the unknown pa-
ˆ
rameter p. In our calculations, we estimate it by replacing the population parameter p with
√
the sample estimate p . Therefore, our estimated standard error is SE p = p (1 − p )/n.
ˆ                                                ˆ      ˆ      ˆ
ˆ
If the sample size is large, the distribution of p will be approximately Normal with mean
ˆ
p and standard deviation SE p . It follows that p will be within two standard deviations
ˆ
(2SE p ) of the unknown parameter p about 95% of the time. This is how we use the
ˆ
Normal approximation to construct the large-sample conﬁdence interval for p. Here are
the details.

z Conﬁdence Interval for a Population Proportion
Choose an SRS of size n from a large population with unknown proportion p of
successes. The sample proportion is
X
p=
ˆ
n
460              CHAPTER 8 Inference for Proportions

ˆ
The standard error of p is

p (1 − p )
ˆ      ˆ
SE p =
ˆ
n
and the margin of error for conﬁdence level C is

m = z ∗ SE p
ˆ

where z ∗ is the value for the standard Normal density curve with area C between −z ∗ and
z ∗ . The large-sample level C conﬁdence interval for p is

p±m
ˆ

You can use this interval for 90% (z ∗ = 1.645), 95% (z ∗ = 1.960), or 99% (z ∗ = 2.576)
conﬁdence when the number of successes and the number of failures are both at least 15.

EXAMPLE 8.3           Conﬁdence Interval for the Proportion of Adults Who Play
Video Games

CASE 8.1   The sample survey in Case 8.1 found that 1063 of a sample of 2054 adults reported that they
played video games. So, the sample size is n = 2054 and the count is X = 1063. The sample
proportion of adults who play video games is
X   1063
p=
ˆ      =      = 0.51753
n   2054
The standard error is
p (1 − p )
ˆ      ˆ       0.5175(1 − 0.5175)
SE p =
ˆ                   =                      = 0.011026
n                 2054
The z critical value for 95% conﬁdence is z ∗ = 1.96, so the margin of error is

m = 1.96SE p = (1.96)(0.011026) = 0.021610
ˆ

The conﬁdence interval is

p ± m = 0.52 ± 0.02
ˆ

We are 95% conﬁdent that between 50% and 54% of adults play video games.

In performing these calculations we have kept a large number of digits for our
intermediate calculations. However, when reporting the results we prefer to use rounded
values. For example, “52% with a margin of error of 2%.” In this way we focus attention
on what is important. There is no additional information to be gained by reporting
0.51753 with a margin of error of 0.021610.
Remember that the margin of error in any conﬁdence interval includes only random
sampling error. If people do not respond honestly to the questions asked, for example,
your estimate is likely to miss by more than the margin of error.
Because the calculations for statistical inference for a single proportion are relatively
straightforward, we often do them with a calculator or in a spreadsheet. Figure 8.2 gives
output from Minitab and SAS for the data in Case 8.1. As usual, the output reports more
digits than are useful. When you use software, be sure to think about how many digits
are meaningful for your purposes. Do not clutter your report with information that is
not meaningful. SAS gives the standard error next to the label ASE, which stands for
8.1 Inference for a Single Proportion             461

FIGURE 8.2 Minitab and SAS                Minitab
outputs for the confidence
interval in Example 8.3.
Test and Cl for One Proportion
Sample         X        N     Sample p            95% CI
1           1063     2054     0.517527     (0.495917, 0.539137)
Using the normal approximation.

SAS

Binomial Proportion for y = 0

Proportion                      0.5175
ASE                             0.0110
95% Lower Conf Limit            0.4959
95% Upper Conf Limit            0.5391

Exact Conf Limits
95% Lower Conf Limit            0.4957
95% Upper Conf Limit            0.5393

asymptotic standard error. The SAS output also includes an alternative interval based on
an “exact” method.

8.3 Bank acquisitions. Refer to Exercise 8.1 (page 459).
ˆ
(a) Find SE p , the standard error of p .
ˆ
(b) Give the 95% conﬁdence interval for p in the form of estimate plus or minus the
margin of error.
(c) Give the conﬁdence interval as an interval of percents.
CASE 8.1 8.4 How often do they play? Refer to Exercise 8.2 (page 459).
ˆ
(a) Find SE p , the standard error of p .
ˆ
(b) Give the 95% conﬁdence interval for p in the form of estimate plus or minus the
margin of error.
(c) Give the conﬁdence interval as an interval of percents.

Plus four confidence interval for a single proportion*
Suppose we have a sample where the count is X = 0. Then, because p = 0, the standard
ˆ
error and the margin of error based on this estimate will both be 0. The conﬁdence
interval for any conﬁdence level would be the single point 0. Conﬁdence intervals based
on the large-sample Normal approximation do not make sense in this situation.
Both computer studies and careful mathematics show that we can do better by
moving the sample proportion p slightly away from 0 and 1.4 There are several ways to
ˆ
do this. Here is a simple adjustment that works very well in practice.
The adjustment is based on the following idea: act as if we have 4 additional obser-
vations, 2 of which are successes and 2 of which are failures. The new sample size is

*The material on the plus four conﬁdence interval is optional and can be omitted without loss of continuity.
462                     CHAPTER 8 Inference for Proportions

n + 4 and the count of successes is X + 2. Because this estimate was ﬁrst suggested by
Edwin Bidwell Wilson in 1927 (though rarely used in practice until recently), we call it
Wilson estimate   the Wilson estimate.
To compute a conﬁdence interval based on the Wilson estimate, ﬁrst replace the
value of X by X + 2 and the value of n by n + 4. Then use these values in the formulas
for the z conﬁdence interval.
In Example 8.1, we had X = 1063 and n = 2054. To apply the plus four approach
we use the z procedure with X = 1065 and n = 2058. You can use this interval when
the sample size is at least n = 10 and the conﬁdence level is 90%, 95%, or 99%.

8.5 Use plus-four for adults who play video games. Refer to Example 8.3 (page 460).
Compute the plus four 95% conﬁdence interval and compare this interval with the one
given in that example.
8.6 New-product sales. Yesterday, your top salesperson called on 6 customers and
obtained orders for your new product from all 6. Suppose that it is reasonable to view
these 6 customers as a random sample of all of her customers.
(a) Give the plus four estimate of the proportion of her customers who would buy the
new product. Notice that we don’t estimate that all customers will buy, even
though all 6 in the sample did.
(b) Give the margin of error for 95% conﬁdence. (You may see that the upper
endpoint of the conﬁdence interval is greater than 1. In that case, take the upper
endpoint to be 1.)
(c) Do the results apply to all of your sales force? Explain why or why not.
8.7 Construct an example. Make up an example where the large-sample method and
the plus four method give very different intervals. Do not use a case where either p = 0
ˆ
or p = 1.
ˆ

Significance test for a single proportion
We know that the sample proportion √ = X/n is approximately Normal, with mean
ˆ
p
μ p = p and standard deviation σ p = p(1 − p)/n. To construct conﬁdence intervals,
ˆ                                 ˆ
we need to use an estimate of the standard deviation based on the data because the standard
deviation depends upon the unknown parameter p. When performing a signiﬁcance test,
however, the null hypothesis speciﬁes a value for p, which we will call p0 . When we
calculate P-values, we act as if the hypothesized p were actually true. When we test
H0: p = p0 , we substitute p0 for p in the expression for σ p and then standardize p . Here
ˆ                       ˆ
are the details.

z Signiﬁcance Test for a Population Proportion
Choose an SRS of size n from a large population with unknown proportion p of
successes. To test the hypothesis H0: p = p0 , compute the z statistic
p − p0
ˆ
z=
p0 (1 − p0 )
n
8.1 Inference for a Single Proportion   463

In terms of a standard Normal random variable Z , the approximate P-value for a test of
H0 against

Ha : p > p0   is     P(Z ≥ z)
z

Ha : p < p0   is     P(Z ≤ z)
z

Ha : p = p0   is    2P(Z ≥ |z|)
|z|

Use this test when the expected number of successes np0 and the expected number of
failures n(1 − p0 ) are both at least 10.

We call this z test a “large-sample test” because it is based on a Normal approximation
ˆ
to the sampling distribution of p that becomes more accurate as the sample size increases.
For small samples, or if the population is less than 10 times as large as the sample, consult
an expert for other procedures.

EXAMPLE 8.4           Comparing Two Sun Block Lotions
Your company produces a sun block lotion designed to protect the skin from both UVA and UVB
exposure to the sun. You hire a company to compare your product with the product sold by your
major competitor. The testing company exposes skin on the backs of a sample of 20 people to
UVA and UVB rays and measures the protection provided by each product. For 13 of the subjects,
your product provided better protection, while for the other 7 subjects, your competitor’s product
provided better protection. Do you have evidence to support a commercial claiming that your
product provides superior UVA and UVB protection? For the data we have n = 20 subjects and
X = 13 successes. To answer the claim question, we test

H0: p = 0.5
Ha : p = 0.5

The expected numbers of successes (your product provides better protection) and failures (your
competitor’s product provides better protection) are 20 × 0.5 = 10 and 20 × 0.5 = 10. Both are
at least 10, so we can use the z test. The sample proportion is
X   13
p=
ˆ      =    = 0.65
n   20
The test statistic is
p − p0
ˆ                    0.65 − 0.5
z=                        =                    = 1.34
p0 (1 − p0 )           (0.5)(0.5)
n                     20
From Table A we ﬁnd P(Z ≥ 1.34) = 0.9099, so the probability in the upper tail is
1 − 0.9099 = 0.0901. The P-value is the area in both tails, P = 2 × 0.0901 = 0.1802. Minitab
and SAS outputs for the analysis appear in Figure 8.3. We conclude that the sun block testing data
are compatible with the hypothesis of no difference between your product and your competitor’s
( p = 0.65, z = 1.34, P = 0.18). The data do not provide you with a basis to support your
ˆ
464                                 CHAPTER 8 Inference for Proportions

FIGURE 8.3 Minitab and SAS              Minitab
outputs for the significance test
in Example 8.4.
Test and Cl for One Proportion
Test of p=0.5 vs p not=0.5
Sample         X      N   Sample p          95% CI          Z-Value   P-Value
1             13     20   0.650000   (0.440963, 0.859037)      1.34     0.180
Using the normal approximation.

SAS

Binomial Proportion for x = 1

Proportion                 0.6500
ASE                        0.1067
95% Lower Conf Limit       0.4410
95% Upper Conf Limit       0.8590

Exact Conf Limits
95% Lower Conf Limit       0.4078
95% Upper Conf Limit       0.8461

Test of H0:    Proportion = 0.5

ASE under H0               0.1118
Z                          1.3416
One-sided Pr > Z           0.0899
Two-sided Pr > |Z|         0.1797

Sample Size = 20

Note that we used a two-sided hypothesis test when we compared the two sun block
lotions in Example 8.4. In settings like this, we must start with the view that either
product could be better if we want to prove a claim of superiority. Thinking or hoping
that your product is superior cannot be used to justify a one-sided test.

8.8 Draw a picture. Draw a picture of a standard Normal curve and shade the tail
areas to illustrate the calculation of the P-value for Example 8.4.
8.9 What does the conﬁdence interval tell us? Inspect the outputs in Figure 8.3
and report the conﬁdence interval for the percent of people who would get better sun
protection from your product than from your competitor’s. Be sure to convert from
proportions to percents and to round appropriately. Interpret the conﬁdence interval
and compare this way of analyzing data with the signiﬁcance test.
8.10 The effect of X. In Example 8.4, suppose that your product provided better
UVA and UVB protection for 15 of the 20 subjects. Perform the signiﬁcance test and
summarize the results.
8.11 The effect of n. In Example 8.4, consider what would have happened if you
had paid for twice as many subjects to be tested. Assume that the results would be
the same as what you obtained for 20 subjects; that is 65% had better UVA and
UVB protection with your product. Perform the signiﬁcance test and summarize the
results.
8.1 Inference for a Single Proportion   465

In Example 8.4, we treated an outcome as a success whenever your product provided
better sun protection. Would we get the same results if we deﬁned success as an outcome
where your competitor’s product was superior? In this setting the null hypothesis is still
H0 : p = 0.5. You will ﬁnd that the z test statistic is unchanged except for its sign and
that the P-value remains the same.

8.12 Yes or no? In Example 8.4 we performed a signiﬁcance test to compare your
product with your competitor’s. Success was deﬁned as the outcome where your product
provided better protection. Now, take the viewpoint of your competitor and deﬁne
success as the outcome where your competitor’s product provides better protection. In
other words, n remains the same (20) but X is now 7.
(a) Perform the two-sided signiﬁcance test and report the results. How do these
compare with what we found in Example 8.4?
(b) Find the 95% conﬁdence interval for this setting and compare it with the interval
calculated where success is deﬁned as the outcome when your product provides
better protection.

Choosing a sample size
In Chapter 6, we showed how to choose the sample size n to obtain a conﬁdence interval
with speciﬁed margin of error m for a Normal mean. Because we are using a Normal ap-
proximation for inference about a population proportion, sample size selection proceeds
in much the same way.
Recall that the margin of error for the large-sample conﬁdence interval for a popu-
lation proportion is

p (1 − p )
ˆ      ˆ
m = z ∗ SE p = z ∗
ˆ
n
Choosing a conﬁdence level C ﬁxes the critical value z ∗ . The margin of error also depends
ˆ                                                                ˆ
on the value of p and the sample size n. Because we don’t know the value of p until we
gather the data, we must guess a value to use in the calculations. We will call the guessed
value p ∗ . Here are two ways to get p ∗ :

• Use the sample estimate from a pilot study or from similar studies done earlier.
• Use p∗ = 0.5. Because the margin of error is largest when p = 0.5, this choice
ˆ
gives a sample size that is somewhat larger than we really need for the conﬁdence
level we choose. It is a safe choice no matter what the data later show.

Once we have chosen p ∗ and the margin of error m that we want, we can ﬁnd the n we
need to achieve this margin of error. Here is the result.

Sample Size for Desired Margin of Error
The level C conﬁdence interval for a proportion p will have a margin of error
approximately equal to a speciﬁed value m when the sample size is
2
z∗
n=               p ∗ (1 − p ∗ )
m
466   CHAPTER 8 Inference for Proportions

Here z ∗ is the critical value for conﬁdence C, and p ∗ is a guessed value for the proportion
of successes in the future sample.
The margin of error will be less than or equal to m if p ∗ is chosen to be 0.5. The
sample size required is then given by
2
z∗
n=
2m

The value of n obtained by this method is not particularly sensitive to the choice of
p∗ as long as p ∗ is not too far from 0.5. However, if your actual sample turns out to have
p smaller than about 0.3 or larger than about 0.7, the sample size based on p ∗ = 0.5
ˆ
may be much larger than needed.

EXAMPLE 8.5              Planning a Sample of Customers
consulting company to carry out a sample survey of customers. Before contacting the consultant,
you want some idea of the sample size you will have to pay for. One critical question is the degree
of satisfaction with your customer service, measured on a ﬁve-point scale. You want to estimate
the proportion p of your customers who are satisﬁed (that is, who choose either “satisﬁed” or
“very satisﬁed,” the two highest levels on the ﬁve-point scale).
You want to estimate p with 95% conﬁdence and a margin of error less than or equal to 3%,
or 0.03. For planning purposes, you are willing to use p ∗ = 0.5. To ﬁnd the sample size required,
2                        2
z∗              1.96
n=               =                        = 1067.1
2m            (2)(0.03)

Round up to get n = 1068. (Always round up. Rounding down would give a margin of error
slightly greater than 0.03.)
Similarly, for a 2.5% margin of error we have (after rounding up)
2
1.96
n=                        = 1537
(2)(0.025)

and for a 2% margin of error,
2
1.96
n=                        = 2401
(2)(0.02)

News reports frequently describe the results of surveys with sample sizes between
1000 and 1500 and a margin of error of about 3%. These surveys generally use sam-
pling procedures more complicated than simple random sampling, so the calculation
of conﬁdence intervals is more involved than what we have studied in this section. The
calculations in Example 8.5 nonetheless show in principle how such surveys are planned.
In practice, many factors inﬂuence the choice of a sample size. Case 8.2 illustrates
one set of factors.

Marketing Christmas Trees An association of Christmas tree growers in Indiana
CASE 8.2

sponsored a sample survey of Indiana households to help improve the marketing of Christmas
trees.5 The researchers decided to use a telephone survey and estimated that each telephone
interview would take about 2 minutes. Nine trained students in agribusiness marketing were
to make the phone calls between 1:00 P.M. and 8:00 P.M. on a Sunday. After discussing
8.1 Inference for a Single Proportion    467

problems related to people not being at home or being unwilling to answer the questions,
the survey team proposed a sample size of 500. Several of the questions asked demographic
information about the household. The key questions of interest had responses of “Yes” or
“No,” for example, “Did you have a Christmas tree last year?” The primary purpose of the
survey was to estimate various sample proportions for Indiana households. An important
issue in designing the survey was therefore whether the proposed sample size of n = 500
would be adequate to provide the sponsors of the survey with the information they required.

To address this question, we calculate the margins of error of 95% conﬁdence
ˆ
intervals for various values of p .

EXAMPLE 8.6         Margins of Error

CASE 8.2   In the Christmas tree market survey, the margin of error of a 95% conﬁdence interval for any
value of p and n = 500 is
ˆ
m = z ∗ SE p
ˆ

p (1 − p )
ˆ      ˆ
= 1.96
500
ˆ
The results for various values of p are

ˆ
p            m            ˆ
p          m
0.05          0.019       0.60       0.043
0.10          0.026       0.70       0.040
0.20          0.035       0.80       0.035
0.30          0.040       0.90       0.026
0.40          0.043       0.95       0.019
0.50          0.044

The survey team judged these margins of error to be acceptable, and they used a sample size
of 500 in their survey.

The table in Example 8.6 illustrates two points. First, the margins of error for
p = 0.05 and p = 0.95 are the same. The margins of error will always be the same for
ˆ               ˆ
p and 1 − p . This is a direct consequence of the form of the conﬁdence interval. Second,
ˆ           ˆ
ˆ
the margin of error varies only between 0.040 and 0.044 as p varies from 0.3 to 0.7, and
the margin of error is greatest when p = 0.5, as we claimed earlier. It is true in general
ˆ
ˆ
that the margin of error will vary relatively little for values of p between 0.3 and 0.7.
Therefore, when planning a study, it is not necessary to have a very precise guess for p.
If p ∗ = 0.5 is used and the observed p is between 0.3 and 0.7, the actual interval will
ˆ
be a little shorter than needed, but the difference will be quite small.

8.13 Is there interest in a new product? One of your employees has suggested that
your company develop a new product. You decide to take a random sample of your
customers and ask whether or not there is interest in the new product. The response
is on a 1 to 5 scale, with 1 indicating “deﬁnitely would not purchase”; 2, “probably
would not purchase”; 3, “not sure”; 4, “probably would purchase”; and 5, “deﬁnitely
would purchase.” For an initial analysis, you will record the responses 1, 2, and 3 as
“No” and 4 and 5 as “Yes.” What sample size would you use if you wanted the 95%
margin of error to be 0.1 or less?
468   CHAPTER 8 Inference for Proportions

8.14 More information is needed. Refer to the previous exercise. Suppose that, after
reviewing the results of the previous survey, you proceeded with preliminary develop-
ment of the product. Now you are at the stage where you need to decide whether or not
to make a major investment to produce and market the product. You will use another
random sample of your customers, but now you want the margin of error to be smaller.
What sample size would you use if you wanted the 95% margin of error to be 0.05 or
less?

SECTION 8.1 Summary

• Inference about a population proportion is based on an SRS of size n. When n is large,
the distribution of the sample proportion p = X/n is approximately Normal with
√         ˆ
mean p and standard deviation p(1 − p)/n.
• The standard error of p is
ˆ
p (1 − p )
ˆ      ˆ
SE p =
ˆ
n
• The z margin of error for conﬁdence level C is
m = z ∗ SE p
ˆ

where z ∗ is the value for the standard Normal density curve with area C between −z ∗
and z ∗ .
• The z large-sample level C conﬁdence interval for p is
p±m
ˆ
We recommend using this method when the number of successes and the number of
failures are both at least 15.
• The plus four estimate of a population proportion is obtained by adding two suc-
cesses and two failures to the sample and then using the z procedure. We recommend
using this method when the sample size is at least 10 and the conﬁdence level is 90%,
95%, or 99%.
• The sample size required to obtain a conﬁdence interval of approximate margin of
error m for a proportion is found from
z∗ 2 ∗
n=            p (1 − p ∗ )
m
where p ∗ is a guessed value for the proportion, and z ∗ is the standard Normal critical
value for the desired level of conﬁdence. To ensure that the margin of error of the
ˆ
interval is less than or equal to m no matter what p may be, use
z∗   2
n=
2m
• Tests of H0: p = p0 are based on the z statistic
p − p0
ˆ
z=
p0 (1 − p0 )
n
with P-values calculated from the N (0, 1) distribution. Use this test when the expected
number of successes np0 and the expected number of failures n(1 − p0 ) are both at
least 10.
8.1 Inference for a Single Proportion           469

SECTION 8.1 Exercises

For Exercises 8.1 and 8.2, see page 459; for 8.3 and 8.4,            8.20 Guitar Hero and Rock Band. An electronic survey of
see page 461; for 8.5 to 8.7, see page 462; for 8.8 to 8.11,         7061 reported that 67% of players of Guitar Hero and Rock Band
see page 464; for 8.12, see page 465; and for 8.13 and 8.14,         who do not currently play a musical instrument said that they are
see pages 467–468.                                                   likely to begin playing a real musical instrument in the next two
years.8 The reports describing the survey do not give the number
8.15 What’s wrong? Explain what is wrong with each of the
of respondents who do not currently play a musical instrument.
following.
(a) Explain why it is important to know the number of respon-
(a) You can use a signiﬁcance test to evaluate the hypothesis
dents who do not currently play a musical instrument.
H0: p = 0.6 versus the two-sided alternative.
ˆ
(b) Assume that half of the respondents do not currently play a
(b) The large-sample signiﬁcance test for a population propor-
musical instrument. Find the count of players who said that they
tion is based on a t statistic.
are likely to begin playing a real musical instrument in the next
(c) A large-sample 95% conﬁdence interval for an unknown pro-
two years.
ˆ
portion p is p plus or minus its standard error.
(c) Give a 99% conﬁdence interval for the population proportion
8.16 What’s wrong? Explain what is wrong with each of the            who would say that they are likely to begin playing a real musical
following.                                                           instrument in the next two years.
(a) The margin of error for a conﬁdence interval used for an         (d) The survey collected data from two separate consumer pan-
opinion poll takes into account that fact that people who did not    els. There were 3300 respondents from the LightSpeed consumer
answer the poll questions may have had different responses from      panel and the others were from Guitar Center’s proprietary con-
those who did answer the questions.                                  sumer panel. Comment on the sampling procedure used for this
(b) If the P-value for a signiﬁcance test is 0.35, we can conclude   survey and how it would inﬂuence your interpretation of the
that the null hypothesis has a 35% chance of being true.             ﬁndings.
(c) A student project used a conﬁdence interval to describe the      8.21 Guitar Hero and Rock Band. Refer to the previous
results in a ﬁnal report. The conﬁdence level was 110%.              exercise.
8.17 Draw some pictures. Consider the binomial setting with          (a) How would the result that you reported in part (c) of the pre-
n = 50 and p = 0.4.                                                  vious exercise change if only 25% of the respondents said that
ˆ
(a) The sample proportion p will have a distribution that is ap-     they did not currently play a musical instrument?
proximately Normal. Give the mean and the standard deviation         (b) Do the same calculations if the percent was 75%.
of this Normal distribution.                                         (c) The main conclusion of the survey that appeared in many
(b) Draw a sketch of this Normal distribution. Mark the location     news stories was that 67% of players of Guitar Hero and Rock
of the mean.                                                         Band who do not currently play a musical instrument said that
(c) Find a value p∗ for which the probability is 95% that p will
ˆ          they are likely to begin playing a real musical instrument in the
be between ± p ∗ . Mark these two values on your sketch.             next two years. What can you conclude about the effect of the
three scenarios—part (b) in the previous exercise and parts (a)
8.18 Country food and Inuits. Country food includes seal,            and (b) in this exercise—on the margin of error for the main
caribou, whale, duck, ﬁsh, and berries and is an important part of   result?
the diet of the aboriginal people called Inuits who inhabit Inuit
Nunaat, the northern region of what is now called Canada. A          8.22 Gambling and college athletics. Gambling is an issue
survey of Inuits in Inuit Nunaat reported that 3274 out of 5000      of great concern to those involved in intercollegiate athletics.
respondents said that at least half of the meat and ﬁsh that they    Because of this, the National Collegiate Athletic Association
eat is country food.6 Find the sample proportion and a 95% con-      (NCAA) surveyed student-athletes concerning their gambling-
ﬁdence interval for the population proportion of Inuits who eat      related behaviors.9 There were 5594 Division I male athletes in
meat and ﬁsh that are at least half country food.                    the survey. Of these, 3547 reported participation in some gam-
bling behavior. This includes playing cards, betting on games
8.19 Most desirable mates. A poll of 5000 residents in Brazil,       of skill, buying lottery tickets, betting on sports, and similar
Canada, China, France, Malaysia, South Africa, and the United        activities.
States asked about what profession they would prefer their mar-      (a) Find the sample proportion and the large-sample margin of
riage partner to have. The choice receiving the highest percent,     error for 95% conﬁdence. Explain in simple terms the meaning
805 of the responses, was doctors, nurses, and other health care     of the 95%.
professionals.7                                                      (b) Because of the way that the study was designed to protect
(a) Find the sample proportion and a 95% conﬁdence interval          the anonymity of the student-athletes who responded, it was not
for the proportion of people who would prefer a doctor, nurse, or    possible to calculate the number of students who were asked to
other health care professional as a marriage partner.                respond but did not. Does this fact affect the way that you interpret
(b) Convert the estimate and the conﬁdence interval to percents.     the results? Write a short paragraph explaining your answer.
470                                   CHAPTER 8 Inference for Proportions

8.23 Women athletes and gambling. In the study described in             (c) There were 1236 congregations surveyed in this study.
the previous exercise, 1447 out of a total of 3469 female student-      Calculate the nonresponse rate for this question. Does this in-
athletes reported participation in some gambling activity.              ﬂuence how you interpret the results? Write a short discussion of
(a) Use the large-sample methods to ﬁnd an estimate of the true         this issue.
proportion with a 95% conﬁdence interval.                               (d) The respondents to this question were not asked to use a
(b) The margin of error for this sample is not the same as the          stopwatch to record the lengths of a random sample of sermons
margin of error calculated for the previous exercise. Explain           at their congregations. They responded based on their impres-
why.                                                                    sions of the sermons. Do you think that ministers, priests, rabbis,
or other staff persons or leaders might perceive sermon lengths
8.24 Students doing community service. In a sample of                   differently from the people listening to the sermons? Discuss
159,949 ﬁrst-year college students, the National Survey of Stu-         how your ideas inﬂuence your interpretation of the results of this
dent Engagement reported that 39% participated in community             study.
service or volunteer work.10
(a) Find the margin of error for 99% conﬁdence.                         8.28 Are the congregations conservative? The study described
(b) Here are some facts from the report that summarizes the sur-        in the previous exercise also asked each respondent to classify his
vey. The students were from 617 four-year colleges and universi-        or her congregation according to theological orientation. For this
ties. The response rate was 36%. Institutions paid a participation      question, 707 out of 1191 congregations were classiﬁed as “more
fee of between \$1800 and \$7800 based on the size of their under-        conservative.” Using the questions in the previous exercise as a
graduate enrollment. Discuss these facts as possible sources of         guide, analyze and interpret these data. Compare your answers
error in this study. How do you think these errors would compare        to parts (c) and (d) in the two exercises and discuss reasons why
with the error that you calculated in part (a)?                         you think the answers should be similar or different.

8.25 Plans to study abroad. The survey described in the pre-            8.29 Student credit cards. In a survey of 1430 undergraduate
response to one of these questions, 42% of ﬁrst-year students           Give a 95% conﬁdence interval for the proportion of all college
reported that they plan to study abroad.                                students who have at least one credit card.
(a) Based on the information available, what is the value of the        8.30 How many credit cards? The survey described in the pre-
count of students who plan to study abroad?                             vious exercise reported that 43% of undergraduates had four or
(b) Give a 99% conﬁdence interval for the population proportion         more credit cards. Give a 95% conﬁdence interval for the propor-
of ﬁrst-year college students who plan to study abroad.                 tion of all college students who have four or more credit cards.
8.26 Dogs or rats to ﬁnd cocaine (optional). Dogs are big and           8.31 How would the conﬁdence interval change? Refer to
expensive. Rats are small and cheap. Can rats be trained to re-         Exercise 8.25. Would a 95% conﬁdence interval be wider or nar-
place dogs in snifﬁng out illegal drugs? One study trained six          rower than the one that you found in that exercise? Verify your
male albino Sprague-Dawley rats to rear up on their hind legs in        results by computing the interval.
response to the smell of cocaine.11 After training, each rat was
tested 80 times. In the test a rat was presented with a large num-      8.32 How would the conﬁdence interval change? Refer to
ber of cups, one of which smelled like cocaine. A success was           Exercise 8.23. Would a 90% conﬁdence interval be wider or nar-
recorded if the rat correctly identiﬁed the cup containing cocaine      rower than the one that you found in that exercise? Verify your
by rearing up in front of it. The numbers of successes for the          results by computing the interval.
six rats were 80, 80, 73, 80, 74, and 80. You want to estimate
8.33 College students and diets. For a study of unhealthy eat-
the success rate in the future for each of the six rats. Compare
ing behaviors, 267 college women aged 18 to 25 years were
the large-sample estimates with the plus four estimates for this
surveyed.14 Of these, 69% reported that they had been on a diet
problem and make a recommendation concerning which is better.
sometime during the past year. Give a 95% conﬁdence interval
Write a short summary giving reasons for your recommendation.
for the true proportion of college women aged 18 to 25 years in
8.27 Long sermons. The National Congregations Study col-                this population who dieted last year.
lected data in a one-hour interview with a key informant—that
8.34 High school students and diets. In the study described
is, a minister, priest, rabbi, or other staff person or leader.12 One
in the previous exercise, the researchers also surveyed 266 high
question concerned the length of the typical sermon. For 390
school students who were 18 years old. In this sample 58.3%
out of 1191 congregations, the typical sermon lasted more than
reported that they had dieted sometime in the past year. Give a
30 minutes.
95% conﬁdence interval for the true proportion of 18-year-old
(a) Use the large-sample inference procedures to estimate the
high school students in this population who were on a diet some-
true proportion for this question with a 95% conﬁdence interval.
time during the past year.
(b) (Optional) Compute the interval using the plus four method.
Compare these results with those from part (a) and summarize            8.35 Marketing pet care products to older adults. You have
what the comparison tells you about the two methods.                    been asked to investigate the possibility of a marketing campaign
8.1 Inference for a Single Proportion          471

to promote your company’s pet care products to older adults. Your      ing survey, 38% were from rural areas (including small towns),
report will include information about your potential market. In a      and the other 62% were from urban areas (including suburbs).
study of the relationship between pet ownership and physical ac-       According to the census, 36% of Indiana households are in ru-
tivity in older adults, 594 subjects reported that they owned a pet,   ral areas, and the remaining 64% are in urban areas. Let p be
while 1939 reported that they did not.15 Give a 95% conﬁdence          the proportion of rural respondents. Set up hypotheses about p0
interval for the proportion of older adults in this population who     and perform a test of signiﬁcance to examine how well the sam-
are pet owners.                                                        ple represents the state in regard to rural versus urban residence.
CASE 8.2 8.36 Christmas tree marketing. One question in
the Christmas tree market survey described in Case 8.2 was “Did        8.42 More on demographics. In the previous exercise we arbi-
you have a Christmas tree last year?” Of the 500 respondents,          trarily chose to state the hypotheses in terms of the proportion of
421 answered “Yes.”                                                    rural respondents. We could as easily have used the proportion
(a) What proportion of the sampled households responded                of urban respondents.
“Yes”?                                                                 (a) Write hypotheses in terms of the proportion of urban res-
(b) Give the standard error for your estimate in part (a).             idents to examine how well the sample represents the state in
(c) Find a 95% conﬁdence interval for the proportion of Indiana        regard to rural versus urban residence.
households that had a Christmas tree last year.                        (b) Perform the test of signiﬁcance and summarize the
results.
8.37 Shipping the orders on time. As part of a quality improve-
(c) Compare your results with the results of the previous exer-
ment program, your mail-order company is studying the process
cise. Summarize and generalize your conclusion.
of ﬁlling customer orders. According to company standards, an
order is shipped on time if it is sent within 2 working days of        8.43 Vouchers for schools? A national opinion poll found that
the time it is received. You select an SRS of 150 of the 5000          42% of all American adults agree that parents should be given
orders received in the past month for an audit. The audit reveals      vouchers good for education at any public or private school of
that 124 of these orders were shipped on time. Find a 95% conﬁ-        their choice. The result was based on a small sample. How large
dence interval for the true proportion of the month’s orders that      an SRS is required to obtain a margin of error of ±0.035 (that is,
were shipped on time.                                                  ±3.5%) in a 95% conﬁdence interval? (Use the previous poll’s
8.38 Power companies and trimming trees. Large trees grow-             result to obtain the guessed value p ∗ .)
ing near power lines can cause power failures during storms when
CASE 8.2 8.44 Proﬁle of the survey respondents. Of the
their branches fall on the lines. Power companies spend a great
500 respondents in the Christmas tree market survey of Case 8.2,
deal of time and money trimming and removing trees to prevent
44% had no children at home and 56% had at least one child at
this problem. Researchers are developing hormone and chemical
home. The corresponding census ﬁgures are 48% with no chil-
treatments that will stunt or slow tree growth. If the treatment
dren and 52% with at least one child. Test the null hypothesis
is too severe, however, the tree will die. In one series of labo-
that the telephone survey technique has a probability of selecting
ratory experiments on 216 sycamore trees, 41 trees died. Give
a household with no children that is equal to the value obtained
a 90% conﬁdence interval for the proportion of sycamore trees
by the census. Give the z statistic and the P-value. What do you
that would be expected to die from this particular treatment.
conclude?
8.39 Financial goals of college students. In recent years over
70% of ﬁrst-year college students responding to a national survey      8.45 Mathematician tosses coin 10,000 times! The South
have identiﬁed “being well-off ﬁnancially” as an important per-        African mathematician John Kerrich, while a prisoner of war
sonal goal. A state university ﬁnds that 141 of an SRS of 200 of       during World War II, tossed a coin 10,000 times and obtained
its ﬁrst-year students say that this goal is important. Give a 95%     5067 heads.
conﬁdence interval for the proportion of all ﬁrst-year students at     (a) Is this signiﬁcant evidence at the 5% level that the probability
the university who would identify being well-off as an important       that Kerrich’s coin comes up heads is not 0.5?
personal goal.                                                         (b) Give a 95% conﬁdence interval to see what probabilities of
heads are roughly consistent with Kerrich’s result.
8.40 Can we use the z test? In each of the following cases, is
the sample large enough to permit safe use of the z test? (The         8.46 Instant versus fresh-brewed coffee. A matched pairs ex-
population is very large.)                                             periment compares the taste of instant coffee with fresh-brewed
(a) n = 12 and H0: p = 0.6.                                            coffee. Each subject tastes two unmarked cups of coffee, one of
(b) n = 100 and H0: p = 0.4.                                           each type, in random order and states which he or she prefers. Of
(c) n = 1000 and H0: p = 0.98.                                         the 60 subjects who participate in the study, 25 prefer the instant
(d) n = 500 and H0: p = 0.3.                                           coffee and the other 35 prefer fresh-brewed. Take p to be the
proportion of the population that prefers fresh-brewed coffee.
CASE 8.2 8.41 Checking the demographics of a sample. Of               (a) Test the claim that a majority of people prefer the taste of
the 500 households that responded to the Christmas tree market-        fresh-brewed coffee. Report the z statistic and its P-value. Is
472                                  CHAPTER 8 Inference for Proportions

your result signiﬁcant at the 5% level? What is your practical        favorably. Calculate the margin of error of the 95% conﬁdence
conclusion?                                                           interval.
(b) Find a 90% conﬁdence interval for p.
8.50 Are the customers dissatisﬁed? A cell phone manufac-
8.47 High-income households on a mailing list. Land’s Begin-          turer would like to know what proportion of its customers are
ning sells merchandise through the mail. It is considering buying     dissatisﬁed with the service received from their local distrib-
a list of addresses from a magazine. The magazine claims that at      utor. The customer relations department will survey a random
least 25% of its subscribers have high incomes (that is, household    sample of customers and compute a 99% conﬁdence interval
income in excess of \$100,000). Land’s Beginning would like to         for the proportion that are dissatisﬁed. From past studies, they
estimate the proportion of high-income people on the list. Veri-      believe that this proportion will be about 0.1. Find the sample
fying income is difﬁcult, but another company offers this service.    size needed if the margin of error of the conﬁdence interval is
Land’s Beginning will pay to verify the incomes of an SRS of          to be about 0.02. Suppose 18% of the sample say that they are
people on the magazine’s list. They would like the margin of er-      dissatisﬁed. What is the margin of error of the 99% conﬁdence
ror of the 95% conﬁdence interval for the proportion to be 0.05       interval?
or less. Use the guessed value p ∗ = 0.25 to ﬁnd the required
8.51 Increase student fees? You have been asked to survey
sample size.
students at a large college to determine the proportion that favor
8.48 Change the specs. Refer to the previous exercise. For            an increase in student fees to support an expansion of the stu-
each of the following variations on the design speciﬁcations,         dent newspaper. Each student will be asked whether he or she
state whether the required sample size will be higher, lower, or      is in favor of the proposed increase. Using records provided by
the same as that found above.                                         the registrar you can select a random sample of students from
(a) Use a 99% conﬁdence interval.                                     the college. After careful consideration of your resources, you
(b) Change the allowable margin of error to 0.01.                     decide that it is reasonable to conduct a study with a sample of
(c) Use a planning value of p ∗ = 0.15.                               150 students. Construct a table of the margins of error for 95%
(d) Use a different company to do the income veriﬁcation.                                 ˆ
conﬁdence when p takes the values 0.1, 0.2, 0.3, 0.4, 0.5, 0.6,
0.7, 0.8, and 0.9.
8.49 Start a student nightclub? A student organization wants
to start a nightclub for students under the age of 21. To assess      8.52 Justify the cost of the survey. A former editor of the stu-
support for this proposal, the organization will select an SRS        dent newspaper agrees to underwrite the study in the previous
of students and ask each respondent if he or she would patron-        exercise because she believes the results will demonstrate that
ize this type of establishment. About 75% of the student body         most students support an increase in fees. She is willing to pro-
are expected to respond favorably. What sample size is required       vide funds for a sample of size 500. Write a short summary for
to obtain a 95% conﬁdence interval with an approximate mar-           your benefactor of why the increased sample size will provide
gin of error of 0.06? Suppose that 50% of the sample responds         better results.

8.2 Comparing Two Proportions
Because comparative studies are so common, we often want to compare the proportions
of two groups (such as men and women) that have some characteristic. We call the
two groups being compared Population 1 and Population 2, and the two population
proportions of “successes” p1 and p2 . The data consist of two independent SRSs. The
sample sizes are n 1 for Population 1 and n 2 for Population 2. The proportion of successes
in each sample estimates the corresponding population proportion. Here is the notation
we will use in this section:

Population        Sample         Count of          Sample
Population         proportion          size         successes        proportion
1                  p1              n1              X1           p1 = X 1 /n 1
ˆ
2                  p2              n2              X2           p2 = X 2 /n 2
ˆ

To compare the two unknown population proportions, start with the observed difference
between the two sample proportions,

D = p 1 − p2
ˆ     ˆ
8.2 Comparing Two Proportions       473

FIGURE 8.4 The sampling
distribution of the difference                   Sampling distribution                           Standard deviation
between two sample                                   of p1 - p2                                p1(1 - p1)   p (1 - p2)
proportions is approximately                                                                              + 2
n1            n2
Normal. The mean and
standard deviation are found
from the two population
proportions of successes, p1                                                                               Mean p1 - p2
and p2 .

Values of p1 – p2

When both sample sizes are sufﬁciently large, the sampling distribution of the difference
D is approximately Normal. What are the mean and the standard deviation of D? Each of
ˆ
the two p ’s has the mean and standard deviation given in the box on page 458. Because
ˆ
the two samples are independent, the two p ’s are also independent. We can apply the
rules for means and variances of sums of random variables. Here is the result, which is
summarized in Figure 8.4.

Sampling Distribution of p1 − p2
ˆ    ˆ
Choose independent SRSs of sizes n 1 and n 2 from two populations with proportions p1
and p2 of successes. Let D = p1 − p2 be the difference between the two sample
ˆ    ˆ
proportions of successes. Then
• As both sample sizes increase, the sampling distribution of D becomes approximately
Normal.
• The mean of the sampling distribution is p1 − p2 .
• The standard deviation of the sampling distribution is
p1 (1 − p1 )   p2 (1 − p2 )
σD =                    +
n1             n2

8.53 Rules for means and variances. Suppose p1 = 0.4, n 1 = 25, p2 = 0.5,
n 2 = 30. Find the mean and the standard deviation of the sampling distribution of
p 1 − p2 .
8.54 Effect of the sample sizes. Suppose p1 = 0.4, n 1 = 100, p2 = 0.5, n 2 = 120.
(a) Find the mean and the standard deviation of the sampling distribution of p1 − p2 .
(b) The sample sizes here are four times as large as those in the previous exercise,
while the population proportions are the same. Compare the results for this
exercise with those that you found in the previous exercise. What is the effect of
multiplying the sample sizes by 4?
474   CHAPTER 8 Inference for Proportions

8.55 Rules for means and variances. It is quite easy to verify the mean and standard
deviation of the difference D.
(a) What are the means and standard deviations of the two sample proportions p1ˆ
ˆ
and p2 ? (Look at the box on page 460 if you need to review this.)
(b) Use the addition rule for means of random variables: what is the mean of
D = p1 − p2 ?
ˆ    ˆ
(c) The two samples are independent. Use the addition rule for variances of random
variables: what is the variance of D?

Large-sample confidence intervals for a difference in proportions
The large-sample estimate of the difference in two proportions p1 − p2 is the corre-
sponding difference in sample proportions p1 − p2 . To obtain a conﬁdence interval for
ˆ   ˆ
the difference, we once again replace the unknown parameters in the standard deviation
by estimates to obtain an estimated standard deviation, or standard error. Here is the
conﬁdence interval we want.

z Conﬁdence Interval for Comparing Two Proportions
Choose an SRS of size n 1 from a large population having proportion p1 of successes and
an independent SRS of size n 2 from another population having proportion p2 of successes.
The large-sample estimate of the difference in proportions is
X1     X2
D = p1 − p2 =
ˆ     ˆ         −
n1     n2
The standard error of the difference is
p1 (1 − p1 )
ˆ       ˆ      p2 (1 − p2 )
ˆ       ˆ
SE D =                   +
n1             n2
and the margin of error for conﬁdence level C is
m = z ∗ SE D
where z ∗ is the value for the standard Normal density curve with area C between −z ∗ and
z ∗ . The large-sample level C conﬁdence interval for p1 − p2 is
( p1 − p2 ) ± m
ˆ    ˆ
Use this method when the number of successes and the number of failures in each of the
samples are at least 10.

“No Sweat” Garment Labels Following complaints about the working conditions in
CASE 8.3

some apparel factories both in the United States and abroad, a joint government and industry
commission recommended that companies that monitor and enforce proper standards be
allowed to display a “No Sweat” label on their products. Does the presence of these labels
inﬂuence consumer behavior?
A survey of U.S. residents aged 18 or older asked a series of questions about how likely
they would be to purchase a garment under various conditions. For some conditions, it was
stated that the garment had a “No Sweat” label; for others, there was no mention of such a
label. On the basis of the responses, each person was classiﬁed as a “label user” or a “label
nonuser.”16 About 16.5% of those surveyed were label users. One purpose of the study was
to describe the demographic characteristics of users and nonusers.
Here is a summary of the data. We let X denote the number of label users.
Paul Galipeau

Population        n       X         p = X/n
ˆ
1 (women)        296      63           0.213
2 (men)          251      27           0.108
8.2 Comparing Two Proportions   475

The study in Case 8.3 suggested that there is a gender difference in the proportion
of label users. Let’s explore this possibility using a conﬁdence interval.

EXAMPLE 8.7         Gender Differences in Label Use

CASE 8.3   First, we ﬁnd the estimate of the difference:
X1   X2
D = p 1 − p2 =
ˆ     ˆ            −    = 0.213 − 0.108 = 0.105
n1   n2
Next, we calculate the standard error:

0.213(1 − 0.213) 0.108(1 − 0.108)
SE D =                      +                 = 0.0308
63               27
For 95% conﬁdence, we use z ∗ = 1.96, so the margin of error is

m = z ∗ SE D = (1.96)(0.0308) = 0.060

The large-sample 95% conﬁdence interval is

D ± m = 0.105 ± 0.060 = (0.04, 0.16)

With 95% conﬁdence we can say that the difference in the proportions is between 0.04 and 0.16.
Alternatively, we can report that the gender difference is about 10% in favor of women, with a
95% margin of error of 6%.

Minitab and SAS output for Example 8.7 appear in Figure 8.5. Other statistical
packages provide output that is similar.
In surveys such as this, men and women are typically not sampled separately. The
respondents to a single sample are divided after the fact into men and women. The
sample sizes are then random and reﬂect the characteristics of the population sampled.
Two-sample signiﬁcance tests and conﬁdence intervals are still approximately correct in
this situation, even though the two sample sizes were not ﬁxed in advance.

FIGURE 8.5 Minitab and SAS                  Minitab
outputs for Example 8.7.
Sample                             X      N          Sample p
1                                 63      296        0.212838
2                                 27      251        0.107570

Difference = p (1) – p (2)
Estimate for difference: 0.105268
95% CI for difference: (0.0449066, 0.165630)

SAS

(Asymptotic) 95%
Risk          ASE        Confidence Limits

Row 1            0.2128        0.0238       0.1662        0.2595
Row 2            0.1076        0.0196       0.0692        0.1459
Total            0.1645        0.0159       0.1335        0.1956

Difference       0.1053        0.0308       0.0449        0.1656
476                      CHAPTER 8 Inference for Proportions

In Example 8.7 we chose women to be the ﬁrst population. Had we chosen men as
the ﬁrst population, the estimate of the difference would be negative (−0.104). Because
it is easier to discuss positive numbers, we generally choose the ﬁrst population to be the
one with the higher proportion. The choice doesn’t affect the substance of the analysis.

8.56 Lying and online dating proﬁles. JupiterResearch estimates that the U.S. online
dating market will reach \$932 million by 2011 and that the European online dating
sites will double revenues from 243 million euros in 2006 to 549 million euros in
2011.17 When trying to start a new relationship, people want to make a favorable
impression. Sometimes they will even stretch the truth a bit when disclosing information
about themselves. A study of deception in online dating studied the accuracy of the
information given in their online dating proﬁles by 80 online daters.18 The study found
that 22 of 40 men lied about their height, while 17 of 40 women were deceptive in this
way. A difference between the person’s actual height and that reported in the online
dating proﬁle was classiﬁed as a lie if it was greater than 0.5 inches.
(a) Find the sample proportion of men who lied about their height. Do the same for
the women.
(b) Give the estimate of the difference between the proportion of men who lie about
their height and the proportion of women who lie about their height.
(c) Find the standard error for the estimated difference.
(d) Give the 95% conﬁdence interval for the difference.
8.57 Lying about weight. The study described in the previous exercise also described
results for lying about weight. They reported that 24 men and 23 women lied about
their weight. Answer parts (a) through (d) from the previous exercise for these data.

height. Suppose we wanted to look at the men only and compare the lying rates for
height and weight. Can we do this using the methods that we just studied? Stop for a
moment to review the material in the box on page 474, paying particular attention to the
assumptions that are needed for this method to be valid. The assumptions state that we
have independent samples from the two populations. In our examples, however, we are
using data from the same people to examine lying about height and lying about weight.
The z conﬁdence interval for comparing two proportions that we have been studying
is not valid for this situation. Be sure to check your assumptions before applying any
statistical inference procedure.

Plus four confidence intervals for a difference in proportions*
Just as in the case of estimating a single proportion, a small modiﬁcation of the sample
proportions greatly improves the accuracy of conﬁdence intervals.19 As before, we ﬁrst
add 2 successes and 2 failures to the actual data, dividing them equally between the two
samples. That is, add 1 success and 1 failure to each sample. Note that each sample
size is increased by 2. We then perform the calculations for the z procedure with the
Wilson estimates   modiﬁed data. As in the case of a single sample, we use the term Wilson estimates for

*The material on the plus four conﬁdence interval for a difference in proportions is optional and can be
omitted without loss of continuity.
8.2 Comparing Two Proportions       477

the estimates produced in this way. We recommend using this method when both sample
sizes are at least 5 and the conﬁdence level is 90%, 95%, or 99%.
In Example 8.7, we had X 1 = 63, n 1 = 296, X 2 = 27, and n 2 = 251. For the plus
four procedure, we would use X 1 = 64, n 1 = 298, X 2 = 28, and n 2 = 253.

8.58 Gender and labels using plus four. Refer to Example 8.7 (page 475), where we
computed a 95% conﬁdence interval for the difference in the proportions of men and
women who were likely to use “No Sweat” labels when deciding to purchase clothing.
Redo the computations using the plus four method and compare your results with those
obtained in Example 8.7.
8.59 Gender and labels using plus four. Refer to the previous exercise and to Exam-
ple 8.7. Suppose that the sample sizes were smaller but that the proportions remained
approximately the same. Speciﬁcally, assume that 6 out of 30 women were label users
and 3 out of 25 men were label users. Compute the plus four interval for 95% conﬁ-
dence. Then, compute the corresponding z interval and compare the results.
8.60 Lying about age. Refer to Exercises 8.56 and 8.57, where you analyzed data
about lying about height and weight in online dating proﬁles. The study also reported
that 10 men and 5 women lied about their age.
(a) The z conﬁdence interval for comparing two proportions should not be used for
these data. Why?
(b) Compute the plus four conﬁdence interval for the difference in proportions.

Significance tests
Although we prefer to compare two proportions by giving a conﬁdence interval for the
difference between the two population proportions, it is sometimes useful to test the null
hypothesis that the two population proportions are the same.
We standardize D = p1 − p2 by subtracting its mean p1 − p2 and then dividing by
ˆ     ˆ
its standard deviation
p1 (1 − p1 )   p2 (1 − p2 )
σD =                    +
n1             n2
If n 1 and n 2 are large, the standardized difference is approximately N (0, 1). To get
a conﬁdence interval, we used sample estimates in place of the unknown population
proportions p1 and p2 in the expression for σ D . Although this approach would lead to a
valid signiﬁcance test, we follow the more common practice of replacing the unknown
σ D with an estimate that takes into account the null hypothesis that p1 = p2 . If these two
proportions are equal, we can view all of the data as coming from a single population.
Let p denote the common value of p1 and p2 . The standard deviation of D = p1 − p2   ˆ    ˆ
is then
p(1 − p)   p(1 − p)
σ Dp =               +
n1         n2

1    1
=     p(1 − p)       +
n1   n2
The subscript on σ Dp reminds us that this is the standard deviation under the special
condition that the two populations share a common proportion p of successes.
478                          CHAPTER 8 Inference for Proportions

We estimate the common value of p by the overall proportion of successes in the
two samples:
number of successes in both samples     X1 + X2
p=
ˆ                                           =
number of observations in both samples   n1 + n2
pooled estimate of p   This estimate of p is called the pooled estimate because it combines, or pools, the
information from both samples.
ˆ
To estimate the standard deviation of D, substitute p for p in the expression for
σ Dp . The result is a standard error for D under the condition that the null hypothesis
H0: p1 = p2 is true. The test statistic uses this standard error to standardize the difference
between the two sample proportions.

Signiﬁcance Tests for Comparing Two Proportions
Choose an SRS of size n 1 from a large population having proportion p1 of successes and
an independent SRS of size n 2 from another population having proportion p2 of
successes. To test the hypothesis
H0: p1 = p2
compute the z statistic
p1 − p2
ˆ     ˆ
z=
SE Dp
where the pooled standard error is

1    1
SE Dp =     p (1 − p )
ˆ      ˆ         +
n1   n2

based on the pooled estimate of the common proportion of successes
X1 + X2
p=
ˆ
n1 + n2
In terms of a standard Normal random variable Z , the P-value for a test of H0 against

Ha : p1 > p2   is   P(Z ≥ z)
z

Ha : p1 < p2   is   P(Z ≤ z)
z

Ha : p1 = p2   is   2P(Z ≥ |z|)
|z|

Use this test when the number of successes and the number of failures in each of the
samples are at least 5.

EXAMPLE 8.8        Men, Women, and Garment Labels

CASE 8.3       Example 8.7 (page 475) presents survey data on whether consumers are “label users” who pay
attention to label details when buying a garment. Are men and women equally likely to be label
users? Here is the data summary:
8.2 Comparing Two Proportions            479

Population          n       X           p = X/n
ˆ
1 (women)         296       63           0.213
2 (men)           251       27           0.108

The sample proportions are certainly quite different, but we need a signiﬁcance test to verify that
the difference is too large to easily result from the play of chance in choosing the sample. Formally,
we compare the proportions of label users in the two populations (women and men) by testing the
hypotheses

H0: p1 = p2
Ha : p1 = p2

The pooled estimate of the common value of p is
63 + 27    90
p=
ˆ               =     = 0.1645
296 + 251   547
This is just the proportion of label users in the entire sample.
First, we compute the standard error

1   1
SE Dp =     (0.1645)(0.8355)         +           = 0.03181
296 251

and then we use this in the calculation of the test statistic
p1 − p2
ˆ     ˆ   0.213 − 0.108
z=           =               = 3.30
SE Dp      0.03181
The difference in the sample proportions is more than 3 standard deviations away from zero. The
P-value is 2P(Z ≥ 3.30). From Table A we have P = 2 × 0.0005 = 0.0010. Software gives
P = 0.0009. We report: 21% of women are label users versus only 11% of men; the difference is
statistically signiﬁcant (z = 3.30, P < 0.001).

Figure 8.6 gives the Minitab and SAS outputs for Example 8.8. Carefully examine
the output to ﬁnd all the important pieces that you would need to report the results of the
analysis and to draw a conclusion.
Many market researchers would expect the proportion of label users to be higher
among women than among men. That is, we might choose the one-sided alternative
Ha : p1 > p2 . The P-value would be half of the value obtained for the two-sided test.
Because the z statistic is so large, this distinction is of no practical importance.

8.61 Do men lie more often about their height than women? Refer to Exercise 8.56
(page 476) about lying and online dating proﬁles.
(a) State appropriate null and alternative hypotheses for this setting. Give a
(b) Use the data given in Exercise 8.56 to perform a two-sided signiﬁcance test. Give
the test statistic and the P-value.
(c) Summarize the results of your signiﬁcance test.
8.62 What about weight? Refer to Exercise 8.57 (page 476) for the data on lying
about weight. Answer the questions given in the previous exercise for weight.
480                              CHAPTER 8 Inference for Proportions

FIGURE 8.6 Minitab and SAS              Minitab
outputs for Example 8.8.
Sample                                 X        N              Sample p
1                                     63        296            0.212838
2                                     27        251            0.107570

Difference = p(1)– p(2)
Estimate for difference: 0.105268
:
Test for difference = 0(vs not = 0) z = 3.31 P–Value = 0.001

SAS

Two Sample Test of Equality of Proportions
Sample Statistics
– Frequencies of x for gen –
Value      1                       2

0                      233                   224
1                      63                        27

Hypothesis Test
Null hypothesis:
Proportion of x(gen=1) – Proportion of x(gen=2) = 0

Alternative:
Proportion of x(gen=1) – Proportion of x(gen=2) ^= 0
– Proportions of x for gen –
Value          1                   2                         z     Prob > z

1                    0.2128              0.1076            3.31    0.0009

BEYOND THE BASICS: Relative Risk
In Example 8.7 (page 475) we compared the proportions of women and men who are
“label users” when they shop for clothing by giving a conﬁdence interval for the difference
of proportions. Alternatively, we might choose to make this comparison by giving the
relative risk   ratio of the two proportions. This ratio is often called the relative risk (RR). A relative
ˆ       ˆ
risk of 1 means that the proportions p1 and p2 are equal. Conﬁdence intervals for relative
risk apply the principles that we have studied, but the details are somewhat complicated.
Fortunately, we can leave the details to software and concentrate on interpreting and
communicating the results.

EXAMPLE 8.9          Relative Risk for Use of Labels

CASE 8.3       The following table summarizes the data on the proportions of men and women who use labels

Population            n              X           p = X/n
ˆ
1 (women)            296         63               0.2128
2 (men)              251         27               0.1076
8.2 Comparing Two Proportions           481

The relative risk for this sample is
ˆ
p1   0.2128
RR =         =        = 1.98
ˆ
p2   0.1076
Conﬁdence intervals for the relative risk in the entire population of shoppers are based on this
sample relative risk. Software (for example, PROC FREQ with the MEASURES option in SAS)
gives a 95% conﬁdence interval as 1.30 to 3.01. Our summary: Women are about twice as likely
as men to use labels; the 95% conﬁdence interval is (1.30, 3.01).

In Example 8.9 the conﬁdence interval is clearly not symmetric about the estimate:
that is, 1.98 is not the midpoint of 1.30 and 3.01. This is true in general for conﬁdence
intervals for relative risk.
Relative risk—that is, comparing proportions by a ratio rather than by a difference—
is particularly useful when the proportions are small. This is often the case in epidemi-
ology and medical statistics. Here is a typical epidemiological example.

EXAMPLE 8.10          Smoking and Colorectal Cancer
Colorectal cancer is fourth in the list of types of cancers that lead to death. Many studies have
examined the relationship between cigarette smoking and colorectal cancer but the results have
been inconsistent. Twenty-six studies gave relative risk estimates for people who had ever smoked
relative to those who had never smoked. A recent study combined the results of these studies to
obtain a summary measure of relative risk.20 The smokers are Population 1 and the nonsmokers are
Population 2. The report of the study stated that the relative risk was 1.18 with a 95% conﬁdence
interval of 1.11 to 1.25. Since the conﬁdence interval does not include the value of 1, which would
correspond to equal risks in the two populations, we conclude that there is a higher risk of colorectal
cancer for cigarette smokers. The estimated increase in risk is 18% with a 95% conﬁdence interval
of 11% to 25%.

SECTION 8.2 Summary

• The estimate of the difference in two population proportions is
D = p 1 − p2
ˆ     ˆ

where
X1                   X2
p1 =
ˆ            and     p2 =
ˆ
n1                   n2
The standard error of the difference is

p1 (1 − p1 )
ˆ       ˆ      p2 (1 − p2 )
ˆ       ˆ
SE D =                      +
n1             n2

and the margin of error for conﬁdence level C is

m = z ∗ SE D

where z ∗ is the value for the standard Normal density curve with area C between −z ∗
and z ∗ .
482                                    CHAPTER 8 Inference for Proportions

• The z large-sample level C conﬁdence interval for the difference in two proportions
p1 − p2 is
( p 1 − p2 ) ± m
ˆ     ˆ
We recommend using this method when the number of successes and the number of
failures in both samples are at least 10.
• The plus four conﬁdence interval for comparing two proportions is obtained by
adding one success and one failure to each sample and then using the z procedure.
We recommend using this method when both sample sizes are at least 5 and the
conﬁdence level is 90%, 95%, or 99%.
• Signiﬁcance tests of H0: p1 = p2 use the z statistic
p1 − p2
ˆ     ˆ
z=
SE Dp
with P-values from the N (0, 1) distribution. In this statistic,

1    1
SE Dp =     p (1 − p )
ˆ      ˆ         +
n1   n2

ˆ
where p is the pooled estimate of the common value of p1 and p2 ,
X1 + X2
p=
ˆ
n1 + n2
We recommend using this test when the number of successes and the number of
failures in each of the samples are at least 5.
• Relative risk is the ratio of two sample proportions:
ˆ
p1
RR =
ˆ
p2
Conﬁdence intervals for relative risk are an alternative to conﬁdence intervals for the
difference when we want to compare two proportions.

SECTION 8.2 Exercises

For Exercises 8.53 to 8.55, see pages 473–474; for 8.56 and 8.57,         (b) Find the estimate of the difference between the proportion
see page 476; for 8.58 to 8.60, see page 477; and for 8.61 and            of Internet users who had downloaded podcasts as of February
8.62, see page 479.                                                       to April 2006 and the proportion as of May 2008.
(c) Is the large-sample conﬁdence interval for the difference in
two proportions appropriate to use in this setting? Explain your
cently reported that they have 53,501 podcasts available for
(d) Find the 95% conﬁdence interval for the difference.
net users described the results of two surveys about podcast down-
(e) Convert your estimated difference and conﬁdence interval
to percents.
2006 and surveyed 2822 Internet users. They found that 198 of
(f) One of the surveys was conducted between February and
April, whereas the other was conducted in May. Do you think
view it later at least once. In a more recent survey, conducted in
that this difference should have any effect on the interpretation
May 2008, there were 1553 Internet users. Of this total, 295 said
(a) Refer to the table that appears at the beginning of this section
(page 472). Fill in the numerical values of all quantities that are       8.64 Signiﬁcance test for podcast downloading. Refer to
known.                                                                    the previous exercise. Test the null hypothesis that the two
8.2 Comparing Two Proportions             483

proportions are equal. Report the test statistic with the P-value      difference in proportions to compare teens’ use of computers
and summarize your conclusion.                                         with teens’ use of consoles? Write a short paragraph giving the
the assumptions needed for this procedure.)
to the previous two exercises. The ratio of the proportion in the
2008 sample to the proportion in the 2006 sample is about 2.7.         8.71 Draw a picture. Suppose that there are two binomial
(a) Can you conclude that 2.7 times as many people are down-           populations. For the ﬁrst, the true proportion of successes is 0.4;
loading podcasts? Explain why or why not.                              for the second, it is 0.5. Consider taking independent samples
(b) Can you conclude from the data available that there has            from these populations, 50 from the ﬁrst and 60 from the second.
been an increase from 2006 to 2008 in the number of people who         (a) Find the mean and the standard deviation of the distribution
ˆ     ˆ
data you would need or what additional assumptions you would           (b) This distribution is approximately Normal. Sketch this Nor-
have to make to be able to draw this conclusion.                       mal distribution and mark the location of the mean.
(c) Find a value d for which the probability is 0.95 that the dif-
8.66 Adult gamers versus teen gamers. A Pew Internet Project
ference in sample proportions is within ±d. Mark these values
Data Memo presented data comparing adult gamers with teen
gamers with respect to the devices on which they play. The data
are from two surveys. The adult survey had 1063 gamers, and            8.72 What’s wrong? For each of the following, explain what
the teen survey had 1064 gamers. The memo reports that 54% of          is wrong and why.
adult gamers played on game consoles (Xbox, PlayStation, Wii,          (a) A z statistic is used to test the null hypothesis that p1 = p2 .
ˆ    ˆ
etc.), and 89% of teen gamers played on game consoles.23               (b) If two sample proportions are equal, then the sample counts
(a) Refer to the table that appears at the beginning of this section   are equal.
(page 472). Fill in the numerical values of all quantities that are    (c) A 95% conﬁdence interval for the difference in two propor-
known.                                                                 tions includes errors due to nonresponse.
(b) Find the estimate of the difference between the proportion
of teen gamers who played on game consoles and the proportion          8.73 College student summer employment. Suppose (as is
of adults who played on these devices.                                 roughly true) that 85% of college men and 83% of college women
(c) Is the large-sample conﬁdence interval for the difference in       were employed last summer. A sample survey interviews SRSs
two proportions appropriate to use in this setting? Explain your       of 400 college men and 400 college women. The two samples
(d) Find the 95% conﬁdence interval for the difference.                (a) What is the approximate distribution of the proportion p F ˆ
(e) Convert your estimated difference and conﬁdence interval           of women who worked last summer? What is the approximate
to percents.                                                                                          ˆ
distribution of the proportion p M of men who worked?
(f) The adult survey was conducted between October and De-             (b) The survey wants to compare men and women. What is the
cember 2008, whereas the teen survey was conducted between             approximate distribution of the difference in the proportions who
November 2007 and February 2008. Do you think that this differ-        worked, p M − p F ?
ˆ      ˆ
ence should have any effect on the interpretation of the results?
8.74 A corporate liability trial. A major court case on liabil-
ity for contamination of groundwater took place in the town of
8.67 Signiﬁcance test for gaming on consoles. Refer to the             Woburn, Massachusetts. A town well in Woburn was contami-
previous exercise. Test the null hypothesis that the two propor-       nated by industrial chemicals. During the period that residents
tions are equal. Report the test statistic with the P-value and        drank water from this well, there were 16 birth defects among
summarize your conclusion.                                             414 births. In years when the contaminated well was shut off and
water was supplied from other wells, there were 3 birth defects
8.68 Gamers on computers. The report described in Exer-                among 228 births. The plaintiffs suing the ﬁrms responsible for
cise 8.66 also presented data from the same surveys for gaming         the contamination claimed that these data show that the rate
on computers (desktops or laptops). These devices were used by         of birth defects was higher when the contaminated well was in
73% of adult gamers and by 76% of teen gamers. Answer the              use.24 How statistically signiﬁcant is the evidence? Be sure to
questions given in Exercise 8.66 for gaming on computers.              state what assumptions your analysis requires and to what extent
8.69 Signiﬁcance test for gaming on computers. Refer to the            these assumptions seem reasonable in this case.
previous exercise. Test the null hypothesis that the two propor-
CASE 8.2 8.75 Natural versus artiﬁcial Christmas trees. In
tions are equal. Report the test statistic with the P-value and
the Christmas tree survey introduced in Case 8.2 (page 466), re-
8.70 Can we compare gaming on consoles with gaming on                  whether the tree was natural or artiﬁcial. Respondents were also
computers? Refer to the previous four exercises. Do you think          asked if they lived in an urban area or in a rural area. Of the 421
that you can use the large-sample conﬁdence intervals for a            households displaying a Christmas tree, 160 lived in rural areas
484                                    CHAPTER 8 Inference for Proportions

and 261 were urban residents. The tree growers want to know if           of male references that are juvenile (“boy” rather than “man”)?
there is a difference in preference for natural trees versus artiﬁcial   Here are data from one of the texts:
trees between urban and rural households. Here are the data:
Gender          n         X (juvenile)
Population           n          X(natural)                              Female          60             48
1 (rural)           160              64                                 Male           132             52
2 (urban)           261              89
(a) Find the sample proportions of juvenile references for fe-
(a) Give the null and alternative hypotheses that are appropriate        males and for males.
for this problem assuming that we have no prior information              (b) Give a 95% conﬁdence interval for the difference and brieﬂy
suggesting that one population would have a higher preference            summarize what the data show.
than the other.                                                          8.78 Is the gender bias statistically signiﬁcant? The previous
(b) Test the null hypothesis. Give the test statistic and the            exercise addresses a question about gender bias with a conﬁdence
P-value, and summarize the results.                                      interval. Set up the problem as a signiﬁcance test. Carry out the
(c) Give a 90% conﬁdence interval for the difference in propor-          test and summarize the results.
tions.
8.79 Effect of the sample size. Return to the study of un-
8.76 Summer employment of college students. A university                 dergraduate student summer employment described in Exercise
ﬁnancial aid ofﬁce polled an SRS of undergraduate students to            8.76. Similar results from a smaller number of students may not
study their summer employment. Not all students were employed            have the same statistical signiﬁcance. Speciﬁcally, suppose that
the previous summer. Here are the results for men and women:             71 of 78 men surveyed were employed and 62 of 71 women
surveyed were employed. The sample proportions are essentially
Men         Women                     the same as in the earlier exercise.
Employed                 712          623                    (a) Compute the z statistic for these data and report the P-value.
Not employed              68           92                    What do you conclude?
(b) Compare the results of this signiﬁcance test with your results
Total                    780          715                    in Exercise 8.76. What do you observe about the effect of the
sample size on the results of these signiﬁcance tests?
(a) Is there evidence that the proportion of male students em-
8.80 Relative risk for gamers. Refer to the Pew data about
ployed during the summer differs from the proportion of female
gaming on game consoles (Xbox, PlayStation, Wii, etc.) by adults
students who were employed? State H0 and Ha , compute the test
and teens in Exercises 8.66 and 8.67 (page 483). Now, compare
statistic, and give its P-value.
the adults with the teens using the relative risk approach.
(b) Give a 95% conﬁdence interval for the difference between
(a) Find the proportion of adult gamers who use game consoles.
the proportions of male and female students who were em-
Do the same for the teen gamers.
ployed during the summer. Does the difference seem practically
(b) Find the relative risk using the teen proportion in the
important to you?
numerator.
8.77 Gender bias in textbooks. To what extent do textbooks               (c) Repeat the computation of the relative risk using percents in
on syntax (analysis of sentence structure) display gender bias? A        place of proportions. Compare this calculation with the one that
study of this question sampled sentences from 10 texts.25 One            you performed in part (b) and explain what you have learned.
part of the study examined the use of the words “girl,” “boy,”           (d) Do you expect the 95% conﬁdence interval for the relative
“man,” and “woman.” Call the ﬁrst two words juvenile and the             risk to include the value 1? Explain why or why not.
last two adult. Is the proportion of female references that are          (e) Find the 95% conﬁdence interval if you have access to
juvenile (“girl” rather than “woman”) equal to the proportion            software that can do this calculation.

STATISTICS IN SUMMARY
Inference about population proportions is based on sample proportions. We rely on the
fact that a sample proportion has a distribution that is close to Normal unless the sample is
small. All the z procedures in this chapter work well when the samples are large enough.
You must check this before using them. Here is a review list of the most important skills
you should have acquired from your study of this chapter.
A. Recognition
1. Recognize from the design of a study whether one-sample or two-sample
procedures are needed.
CHAPTER 8 Review Exercises              485

2. Recognize what parameter or parameters an inference problem concerns. In
particular, distinguish between settings that require inference about a proportion
and comparing two proportions.
3. Calculate from sample counts the sample proportion or proportions.

1. Use the z procedure to give a conﬁdence interval for a population proportion p.
2. Use the z statistic to carry out a test of signiﬁcance for the hypothesis
H0: p = p0 about a population proportion p against either a one-sided or a
two-sided alternative.
3. Check that you can safely use these z procedures in a particular setting.

C. Comparing Two Proportions
1. Use the two-sample z procedure to give a conﬁdence interval for the difference
p1 − p2 between proportions in two populations based on independent samples
from the populations.
2. Use a z statistic to test the hypothesis H0: p1 = p2 that proportions in two
distinct populations are equal.
3. Check that you can safely use these z procedures in a particular setting.

Statistical inference always draws conclusions about one or more parameters of a
population. When you think about doing inference, ask ﬁrst what the population is and
what parameter you are interested in. The t procedures of Chapter 7 allow us to give
conﬁdence intervals and carry out tests about population means. We use the z procedures
of this chapter for inference about population proportions.

CHAPTER 8          Review Exercises

8.81 Changes in credit card usage by undergraduates. In                (b) Suppose that the sample size for the 2000 study was 2000.
Exercise 8.30 (page 470) we looked at data from a survey of 1430       Redo the conﬁdence interval and signiﬁcance testing calculations
undergraduate students and their credit card use. These students       for this scenario.
were surveyed 2004. In the sample, 43% said that they had four         (c) Compare your results for parts (a) and (b) of this exercise
or more credit cards. A similar study performed in 2000 by the         with the results that you found in the previous two exercises.
same organization reported that 32% of the sample said that they       Write a short paragraph about the effects of assuming a value for
had four or more credit cards.26 Assume that the sample sizes for      the sample size on your conclusions.
the two studies are the same. Find a 95% conﬁdence interval for        8.84 Student employment during the school year. A study of
the change from 2000 to 2004 in the percent of undergraduates          1430 undergraduate students reported that 994 work 10 or more
who report having four or more credit cards.                           hours a week during the school year. Give a 95% conﬁdence in-
8.82 Do the signiﬁcance test for the change. Refer to the pre-         terval for the proportion of all undergraduate students who work
vious exercise. Perform the signiﬁcance test for comparing the         10 or more hours a week during the school year.
two proportions. Report your test statistic, the P-value, and sum-     8.85 Examine the effect of the sample size. Refer to the previ-
marize your conclusion.                                                ous exercise. Assume a variety of different scenarios where the
8.83 We did not know the sample size. Refer to the previous            sample size changes but the proportion in the sample who work
two exercises. We did not report the sample size for the 2000          10 or more hours a week during the school year remains the same.
study, but it is reasonable to assume that it is fairly close to the   Write a short report summarizing your results and conclusions.
sample size for the 2004 study.                                        Be sure to include numerical and graphical summaries of what
(a) Suppose that the sample size for the 2000 study was only           you have found.
1000. Redo the conﬁdence interval and signiﬁcance testing cal-         8.86 Video game genres. U.S. computer and video game soft-
culations for this scenario.                                           ware sales were \$9.5 billion in 2007.27 A survey of 1102 teens
486                                  CHAPTER 8 Inference for Proportions

collected data about their video game use. The table below lists      of households that were wireless only in December 2007 and the
the most popular game genres.28                                       households that were wireless only in December 2003.
(e) Give the margin of error for 95% conﬁdence for the differ-
Percent
ence in proportions.
Genre          Examples                                  who play
Racing         NASCAR, Mario Kart, Burnout                 74         8.90 Analyze the change in terms of relative risk. Refer to
Puzzle         Bejeweled, Tetris, Solitaire                72         the previous two exercises.
Sports         Madden, FIFA, Tony Hawk                     68         (a) Summarize the change data in terms of relative risk. The term
Action         Grand Theft Auto,                           67         “relative risk” is a poor description of the ratio that you are using
Devil May Cry, Ratchet and Clank                       for this exercise. Give a better term for this ratio.
Adventure      Legend of Zelda, Tomb Raider                 66        (b) Analyze the data in terms of relative risk and write a summary
Rhythm         Guitar Hero,                                 61        of your results.
Dance Dance Revolution, Lumines                        (c) Compare your results in part (b) with your ﬁndings in terms
of a difference in proportions from the previous exercise.
Give a 95% conﬁdence interval for the proportion who play             (d) Which approach do you prefer? Give reasons for your
games in each of these six genres.                                    answer.
8.87 Too many errors. Refer to the previous exercise. The             8.91 Gambling and student-athletes. Gambling behaviors of
chance that each of the six intervals that you calculated includes    Division I intercollegiate male student-athletes were analyzed
the true proportion for that genre is approximately 95%. In other     in Exercise 8.22 (page 469). Similar data for women were given
words, the chance that you make an error and your interval misses     in Exercise 8.23. Compare the males and females with a signif-
the true value is approximately 5%.                                   icance test and give an estimate of the difference in proportions
(a) Explain why the chance that at least one of your intervals does   of student-athletes who participate in any gambling activity with
not contain the true value of the parameter is greater than 5%.       a 95% margin of error. We noted in Exercise 8.22 that we do not
(b) One way to deal with this problem is to adjust the conﬁdence      have any information available to assess nonresponse. Consider
level for each interval so that the overall probability of at least   the possibility that the response rates differ by gender and by
one miss is 5%. One simple way to do this is to use a Bonferroni      whether or not the person participates in any gambling activity.
procedure. Here is the basic idea: You have an error budget of        Write a short summary of how these differences might affect
5% and you choose to spend it equally on six intervals. Each          inference on these issues.
interval has a budget of 0.05/6 = 0.0083. So each conﬁdence
interval should have a 0.83% chance of missing the true value.        8.92 Effects of reducing air pollution. A study that evaluated
In other words, the conﬁdence level for each interval should be       the effects of a reduction in exposure to trafﬁc-related air pollu-
1 − 0.0083 = 0.9917. Use Table A to ﬁnd the value of z for a          tants compared respiratory symptoms of 283 residents of an area
large-sample conﬁdence interval for a single proportion corre-        with congested streets with 165 residents in a similar area where
sponding to 99.17% conﬁdence.                                         the congestion was removed because a bypass was constructed.
(c) Calculate the six conﬁdence intervals using the Bonferroni        The symptoms of the residents of both areas were evaluated
procedure.                                                            at baseline and again a year after the bypass was completed.30
For the residents of the congested streets, 17 reported that their
8.88 Wireless only. Are customers giving up their landlines and       symptoms of wheezing improved between baseline and one year
relying on wireless for all their phone needs? Surveys have col-      later, while 35 of the residents of the bypass streets reported
lected data to answer this question.29 In December 2003, 4.2% of      improvement.
households were wireless only. Assume that this survey is based       (a) Find the two sample proportions.
on sampling 15,000 households.                                        (b) Report the difference in the proportions and the standard
(a) Convert the percent to a proportion. Then use the proportion      error of the difference.
and the sample size to ﬁnd the count of households who were           (c) What are the appropriate null and alternative hypotheses
wireless only.                                                        for examining the question of interest? Be sure to explain your
(b) Find a 95% conﬁdence interval for the proportion of house-        choice of the alternative hypothesis.
holds that were wireless only in December 2003.                       (d) Find the test statistic. Construct a sketch of the distribution of
8.89 Change in wireless only. Refer to the previous exercise.         the test statistic under the assumption that the null hypothesis is
The percent increased to 16.4% in December 2007. Assume the           true. Find the P-value and use your sketch to explain its meaning.
same sample size for this sample.                                     (e) Is no evidence of an effect the same as evidence that there is
(a) Find the proportion and the count for this sample.                no effect? Use a 95% conﬁdence interval to answer this question.
(b) Compute the 95% conﬁdence interval for the proportion.            Summarize your ideas in a way that could be understood by
(c) Convert the estimate and conﬁdence interval in terms of pro-      someone who has very little experience with statistics.
portions to an estimate and conﬁdence interval in terms of            (f) The study was done in the United Kingdom. To what ex-
percents.                                                             tent do you think that the results can be generalized to other
(d) Find the estimate of the difference in the proportions            circumstances?
CHAPTER 8 Review Exercises               487

8.93 Downloading music from the Internet. The following              The table gives the number of subjects in each group and the
quote is from a survey of Internet users.31 The sample size for      number reporting improvement. So, for example, the proportion
the survey was 1371. Since 18% of those surveyed said they           who reported improvement in the number of wheezing attacks
download music, the sample size for this subsample is 247.           was 21/163 in the congested group.
(a) The reported sample sizes vary from symptom to symptom.
Give possible reasons for this and discuss the possible impact on
the results.
(b) Calculate the difference in the proportions for each symp-
networks. . . . 24% of them say they swap ﬁles using
tom. Make a table of symptoms ordered from highest to lowest
based on these differences. Include the estimates of the differ-
music-related Web sites like those run by music magazines or
ences and the 95% conﬁdence intervals in the table. Summarize
musician homepages. And while online music services like
iTunes are far from trumping the popularity of ﬁle-sharing
(c) Can you justify a one-sided alternative in this situation? Give
using these paid services. Overall, 7% of Internet users say
(d) Perform a signiﬁcance test to compare the two groups for
they have bought music at these new services at one time
each of the symptoms. Summarize the results.
or another, including 3% who currently use paid services.
(e) Reanalyze the data using only the data from the bypass group.
(a) For each percent quoted, give the 95% margin of error. You       Give conﬁdence intervals for the proportions that reported im-
should express these in percents, as given in the quote.             proved symptoms. Compare the conclusions that someone might
(b) Rewrite the paragraph in a shorter form but include the          make from these results with those you presented in part (b).
margins of error.                                                    Use your analyses of the data in this exercise to discuss the
(c) Pick either side (A) or side (B) and give arguments in favor     importance of a control group in studies such as this.
of the view that you select. (A) The margins of error should be
included because they are necessary for the reader to properly       8.95 The parrot effect: how to increase your tips. An experi-
interpret the results. (B) The margins of error interfere with the   ment examined the relationship between tips and server behavior
ﬂow of the important ideas. It would be better to just report one    in a restaurant.32 In one condition, the server repeated the cus-
margin of error and say that all of the others are no greater than   tomer’s order word for word, while in the other condition, the
this number. If you choose view (B), be sure to give the value of    orders were not repeated. Tips were received in 47 of the 60 trials
the margin of error that you report.                                 under the repeat condition and in 31 of the 60 trials under the
no-repeat condition.
8.94 Other effects of reducing air pollution. In Exercise 8.92       (a) Find the sample proportions and compute a 95% conﬁdence
the effects of a reduction in air pollution on wheezing was ex-      interval for the difference in population proportions.
amined by comparing the one-year change in symptoms in a             (b) Use a signiﬁcance test to compare the two conditions. Sum-
group of residents who lived on congested streets with a group       marize the results.
who lived in an area that had been congested but from which          (c) The study was performed in a restaurant in the Nether-
the congestion was removed when a bypass was built. The effect       lands. Two waitresses performed the tasks. How do these facts
of the reduction in air pollution was assessed by comparing the      relate to the type of conclusions that can be drawn from this
proportions of residents in the two groups who reported that their   study? Do you think that the parrot effect would apply in other
wheezing symptoms improved. Here are some additional data            countries?
from the same study:                                                 (d) Design a study to test the parrot effect in a setting that
is familiar to you. Be sure to include complete details about
Bypass              Congested           how the study will be conducted and how you will analyze the
Symptom                 n     Improved          n     Improved       results.
Number of              282       45            163        21
8.96 Brand loyalty and the Chicago Cubs. According to lit-
wheezing attacks                                                    erature on brand loyalty, consumers who are loyal to a brand
Wheezing disturbs      282       45            164        12         are likely to consistently select the same product. This type of
sleep                                                               consistency may come from a positive childhood association.
Wheezing limits        282       12            164         4         To examine brand loyalty among fans of the Chicago Cubs, 371
speech                                                             Cubs fans among patrons of a restaurant located in Wrigleyville
Wheezing affects       281       26            165        13         were surveyed before a game at Wrigley Field, the Cubs home
activities                                                          ﬁeld.33 The respondents were classiﬁed as “die-hard fans” or
Winter cough           261       15            156        14         “less loyal fans.” Of the 134 die-hard fans, 90.3% reported that
Winter phlegm          253       12            144        10         they had watched or listened to Cubs games when they were
children. Among the 237 less loyal fans, 67.9% said that they
Consulted doctor       247       29            140        18
had watched or listened as children.
488                                   CHAPTER 8 Inference for Proportions

(a) Find the numbers of die-hard Cubs fans who watched or              your conclusion with a clear statement of your assumptions and
listened to games when they were children. Do the same for the         the results of your statistical calculations.
less loyal fans.
8.102 How much is the improvement? In the setting of the pre-
(b) Use a signiﬁcance test to compare the die-hard fans with the
vious exercise, give a 95% conﬁdence interval for the proportion
less loyal fans with respect to their childhood experiences of the
of nonconforming items for the modiﬁed process. Then, taking
team.
p0 = 0.11 to be the old proportion and p the proportion for the
(c) Express the results with a 95% conﬁdence interval for the
modiﬁed process, give a 95% conﬁdence interval for p − p0 .
difference in proportions.
8.103 Choosing sample sizes. For a single proportion the mar-
8.97 Brand loyalty in action. The study mentioned in the pre-
gin of error of a conﬁdence interval is largest for any given
vious exercise found that two-thirds of the die-hard fans attended
sample size n and conﬁdence level C when p = 0.5. This led
ˆ
Cubs games at least once a month, but only 20% of the less loyal
us to use p ∗ = 0.5 for planning purposes. A similar result is
fans attended this often. Analyze these data using a signiﬁcance
true for the two-sample problem. The margin of error of the
test and a conﬁdence interval. Write a short summary of your
conﬁdence interval for the difference between two proportions
ﬁndings.
is largest when p1 = p2 = 0.5. Use these conservative values
ˆ      ˆ
8.98 Frequent lottery players. A study of state lotteries in-          in the following calculations, and assume that the sample sizes
cluded a random digit dialing (RDD) survey conducted by the            n 1 and n 2 have the common value n. Calculate the margins of
National Opinion Research Center (NORC). The survey asked              error of the 95% conﬁdence intervals for the difference in two
2406 adults about their lottery spending.34 A total of 248 indi-       proportions for the following choices of n: 10, 25, 50, 100, 150,
viduals were classiﬁed as “heavy” players. Of these, 152 were          200, 400, and 500. Present the results in a table and with a graph.
male. The study notes that 48.5% of U.S. adults are male. Use          Summarize your conclusions.
a signiﬁcance test to compare the proportion of males among
8.104 Choosing sample sizes, continued. As the previous ex-
heavy lottery players with the proportion of males in the U.S.
ˆ      ˆ
ercise noted, using the guessed value 0.5 for both p1 and p2 gives
a conservative margin of error in conﬁdence intervals for the dif-
this analysis, assume that the 248 heavy lottery players are a
ference between two population proportions. You are planning a
random sample of all heavy lottery players and that the margin
survey and will calculate a 95% conﬁdence interval for the differ-
of error for the 48.5% estimate of the percent of males in the
ence in two proportions when the data are collected. You would
U.S. adult population is so small that it can be neglected.
like the margin of error of the interval to be less than or equal to
8.99 Use a conﬁdence interval. Use a conﬁdence interval to             0.04. You will use the same sample size n for both populations.
give an alternative analysis for the previous exercise.                (a) How large a value of n is needed?
(b) Give a general formula for n in terms of the desired margin
8.100 Time to repair golf clubs. The Ping Company makes                of error m and the critical value z ∗ .
custom-built golf clubs and competes in the \$4 billion golf equip-
ment industry. To improve its business processes, Ping decided to      8.105 Unequal sample sizes. You are planning a survey in
seek ISO 9001 certiﬁcation.35 As part of this process, a study of      which a 95% conﬁdence interval for the difference between two
the time it took to repair golf clubs that were sent to the company    proportions will present the results. You will use the conservative
by mail determined that 16% of orders were sent back to the                                     ˆ        ˆ
guessed value 0.5 for p1 and p2 in your planning. You would
customers in 5 days or less. Ping examined the processing of           like the margin of error of the conﬁdence interval to be less
repair orders and made changes. Following the changes, 90% of          than or equal to 0.15. It is very difﬁcult to sample from the ﬁrst
orders were completed within 5 days. Assume that each of the           population, so that it will be impossible for you to obtain more
estimated percents is based on a random sample of 200 orders.          than 25 observations from this population. Taking n 1 = 25, can
(a) How many orders were completed in 5 days or less before            you ﬁnd a value of n 2 that will guarantee the desired margin of
the changes? Give a 95% conﬁdence interval for the proportion          error? If so, report the value; if not, explain why not.
of orders completed in this time.
8.106 Students change their majors. In a random sample of
(b) Do the same for orders after the changes.
950 students from a large public university, it was found that 444
(c) Give a 95% conﬁdence interval for the improvement. Ex-
of the students changed majors during their college years.
press this both for a difference in proportions and for a difference
(a) Give a 99% conﬁdence interval for the proportion of students
in percents.
at this university who change majors.
8.101 Does the new process give a better product? Eleven               (b) Express your results from (a) in terms of the percent of
percent of the products produced by an industrial process over         students who change majors.
the past several months fail to conform to the speciﬁcations. The      (c) University ofﬁcials are more interested in the number of
company modiﬁes the process in an attempt to reduce the rate           students who change majors than in the proportion. The univer-
of nonconformities. In a trial run, the modiﬁed process produces       sity has 30,000 undergraduate students. Convert your conﬁdence
16 nonconforming items out of a total of 300 produced. Do these        interval in (a) to a conﬁdence interval for the number of students
results demonstrate that the modiﬁcation is effective? Support         who change majors during their college years.
CHAPTER 8 Appendix                489

8.107 Statistics and the law. Casteneda v. Partida is an impor-         (b) Let p be the probability that a randomly selected juror
tant court case in which statistical methods were used as part of       is a Mexican American. The null hypothesis to be tested is
a legal argument. When reviewing this case, the Supreme Court           H0 : p = p0 . Find the value of p for this problem, compute
ˆ
used the phrase “two or three standard deviations” as a crite-          the z statistic, and ﬁnd the P-value. What do you conclude? (A
rion for statistical signiﬁcance. This Supreme Court review has         ﬁnding of statistical signiﬁcance in this circumstance does not
served as the basis for many subsequent applications of statistical     constitute proof of discrimination. It can be used, however, to
methods in legal settings. (The two or three standard deviations        establish a prima facie case. The burden of proof then shifts to the
referred to by the Court are values of the z statistic and correspond   defense.)
to P-values of approximately 0.05 and 0.0026.) In Casteneda             (c) We can reformulate this exercise as a two-sample problem.
the plaintiffs alleged that the method for selecting juries in a        Here we wish to compare the proportion of Mexican Americans
county in Texas was biased against Mexican Americans.36 For             among those selected as jurors with the proportion of Mexican
the period of time at issue, there were 181,535 persons eligible        Americans among those not selected as jurors. Let p1 be the prob-
for jury duty, of whom 143,611 were Mexican Americans. Of the           ability that a randomly selected juror is a Mexican American,
870 people selected for jury duty, 339 were Mexican Americans.          and let p2 be the probability that a randomly selected nonjuror
(a) What proportion of eligible jurors were Mexican Americans?          is a Mexican American. Find the z statistic and its P-value. How

CHAPTER 8             Case Study Exercises
CASE STUDY EXERCISE 1: Gender bias in textbooks.                        CASE STUDY EXERCISE 2: Sample size, P-value, and the
Exercise 8.77 (page 484) reports a study of gender bias in 10 syn-      margin of error. In this Case Study we examine the effects of
tax textbooks. Here are the counts of “girl,” “woman,” “boy,” and       the sample size on the signiﬁcance test and the conﬁdence inter-
“man” for all the texts. The data in Exercise 8.77 are for text         val for comparing two proportions. For each calculation, suppose
number 6.                                                               that p1 = 0.75 and p2 = 0.5, and take n to be the common value
ˆ               ˆ
of n 1 and n 2 . Use the z statistic to test H0 : p1 = p2 versus the
Text Number
alternative Ha : p1 = p2 . Compute the statistic and the associ-
1     2      3    4     5   6   7        8     9     10     ated P-value for the following values of n: 12, 20, 40, 80, 100,
Girl        2     5     25    11     2 48 38         5     48    13     200, and 500. Summarize the results in a table and make a plot.
Woman       3     2     31    65     1 12   2       13     24     5     Explain what you observe about the effect of the sample size on
Boy         7    18     14    19 12 52 70            6    128    32                                                                ˆ
statistical signiﬁcance when the sample proportions p1 and p2     ˆ
Man        27    45     51   138 31 80      2       27     48    95     are unchanged.
Now we will do similar calculations for the conﬁdence
Analyze the data and write a report summarizing your con-          interval. Here, we suppose that p1 = 0.75 and p2 = 0.5.
ˆ                 ˆ
clusions. The researchers who conducted the study note that the         Compute the margin of error for the 95% conﬁdence in-
authors of texts 8, 9, and 10 are women, while the other seven          terval for the difference in the two proportions for n =
were written by men. Do you see any pattern that suggests that          12, 20, 40, 80, 100, 200, and 500. Summarize and explain your
the gender of the author is associated with the results?                results.

CHAPTER 8              Appendix

Using Minitab and Excel for Inference                                   If the data are in the worksheet, then choose the Samples in
for Proportions                                                         columns option and click the column containing the data
into the box below. With this option, you can construct
Confidence Interval for a Single Proportion                             conﬁdence intervals for more than one data set simultane-
ously. With respect to the nature of the data, data entries
Minitab:                                                                can be any two distinct values (numeric or text) where one
value represents “success” and the other represents “fail-
Stat ➤ Basic Statistics ➤ 1 Proportion                         ure.” Alternatively, if you know the number of successes,
490                              CHAPTER 8 Inference for Proportions

you can choose the Summarized data option and then in-         two distinct values (numeric or text) where one value repre-
put the number of successes in the Number of events box        sents “success” and the other represents “failure.” Minitab
and input the sample size in the Number of trials box.         will allow you to store the data for the two samples all in
Now it is important that you click the Options button and      one column with no requirement in terms of placement or-
select the Use test and interval based on normal distri-       der of the data. When this is done, a second column in the
bution option. You can also input your desired conﬁdence       worksheet is required in which there are two distinct la-
level in the Conﬁdence level box if other than 95%. Click      bels (numerical or text) indicating for a given row whether
OK to close the pop-up box and then click OK to ﬁnd the        the corresponding data observation comes from the ﬁrst or
sample proportion and conﬁdence interval reported in the       second sample. If the data are stored in this manner, then
Session window.                                                choose the Samples in one column option and click the
data column into the Samples box and click the column
Excel:                                                         of labels into the Subscripts box. As a third option, if you
Conﬁdence intervals for the proportion are not available       know the number of successes in each of the samples, you
in standard Excel but they are available in the WHFStat        can choose the Summarized data option and input the
Add-In for Excel.                                              number of successes in the Number of events box and in-
put the sample size in the Number of trials box for each of
Test for a Single Proportion                                   the samples. If you wish to change the level of conﬁdence
from the default value of 95%, click the Options button
Minitab:                                                       and input your desired conﬁdence level in the Conﬁdence
level box. Click OK to close the pop-up box and then click
Stat ➤ Basic Statistics ➤ 1 Proportion                OK to ﬁnd the sample proportions and conﬁdence interval
for the difference between population proportions reported
This is the same routine described in this Appendix for        in the Session window.
obtaining the conﬁdence interval for the proportion. If you
wish to conduct a hypothesis test, select the Perform hy-      Excel:
pothesis test option and input the null hypothesis mean        Conﬁdence intervals for comparing two proportions are
value ( p0 ) in the box below. Now click the Options but-      not available in standard Excel but they are available in
ton and select the Use test and interval based on nor-         the WHFStat Add-In for Excel.
mal distribution option. With this pop-up box, you can
also select your alternative hypothesis from the Alterna-      Test for Comparing Two Proportions
tive menu box. Click OK to close the pop-up box and
then click OK to ﬁnd the test statistic and corresponding      Minitab:
P-value reported in the Session window.
Stat ➤ Basic Statistics ➤ 2 Proportions
Excel:
This is the same routine described in this Appendix for ob-
Testing for the proportion is not available in standard
taining the conﬁdence interval for the difference between
Excel but it is available in the WHFStat Add-In for
two proportions. To have the test based on a pooled stan-
Excel.
dard error as described in the chapter, click the Options
button and select the Use pooled estimate of p for test
Confidence Interval for Comparing
option. With this pop-up box, you can also select your al-
Two Proportions
ternative hypothesis from the Alternative menu box. Click
OK to close the pop-up box and then click OK to ﬁnd the
Minitab:
test statistic and corresponding P-value reported in the
Stat ➤ Basic Statistics ➤ 2 Proportions               Session window.

If the data for the two samples are in two separate columns,   Excel:
choose the Samples in different columns option and then        Testing for two proportions is not available in standard
click the columns containing the data into the two boxes be-   Excel but it is available in the WHFStat Add-In for
low. As with the single proportion, data entries can be any    Excel.

```
To top