# Practice Problems

Document Sample

```					                      Econ 2500 – Introductory Statistics
York University
Department of Economics
Professor Xianghong Li

Practice Problems

Chapter 1

1. The states differ greatly in the kinds of severe weather that afflict them. Table
ta01_005 shows the average property damage caused by tornadoes per year over
the period from 1950 to 1999 in each of the 50 states and Puerto Rico.
a. What are the top five states for tornado damage? The bottom five?
b. Make a histogram of the data by hand, with classes
"0  damage  10," "10  damage  20," and so on. Describe the shape,
center, and spread of the distribution. Which states may be outliers.

2. ex01-035 presented data on the nightly study time claimed by first-year college
men and women. The most common methods for formal comparison of two
groups use x and s to summarize the data. We wonder if this is appropriate here.
a. What kinds of kinds of distributions are best summarized by x and s?
b. Use R to draw separate histograms for men and women.
c. Each set of study times appears to contain a high outlier. Are these points
flagged as suspicious by the 1.5  IQR rule? How much does removing
the outlier change x and s for each group? The presence of outliers makes
us reluctant to use the mean and standard deviation for these data unless
we remove the outliers on the grounds that these students were
exaggerating.

3. Create a set of 5 positive numbers (repeats allowed) that have median 10 and
mean 7. What thought process did you use to create your numbers?

4. Use the definition of the mean x to show that the sum of the deviations xi  x of
the observations from their mean is always zero. This is one reason why the
variance and standard deviation use squared deviations.

5. If you ask a computer to generate “random numbers” between 0 and 1, you will
get observations from a uniform distribution. The following figure graphs the
density curve for a uniform distribution. Use areas under this density curve to
0                                             1

a. Why is the total area under this curve equal to 1?
b. What proportion of the observations lie above 0.75?
c. What proportion of the observations lie between 0.25 and 0.75?

6. What are the mean and median of the uniform distribution in the graph of the
previous question? What are the quartiles?

7. Find the value z of a standard normal variable Z that satisfies each of the
following conditions. (If you use table A, report the value of z that comes closest
to satisfying the condition.) In each case, sketch a standard normal curve with
your value of z marked on the axis.

a. 20% of the observations fall below z.
b. 30% of the observations fall above z.

8. The Wechsler Adult Intelligence Scale (WAIS) is the most common “IQ test.”
The scale of scores is set separately for each age group and is approximately
normal with mean 100 and standard deviation 15. People with WAIS scores
below 70 are considered mentally retarded when, for example, applying for Social
Security disability benefits. What percent of adults are retarded by this criterion?

9. The quartiles of any distribution are the values with cumulative proportions 0.25
and 0.75.

a. What are the quartiles of the standard normal distribution?
b. Using your numerical values from (a), write an equation that gives the
quartiles of N (µ, σ) distributions in terms of µ and σ.
c. The length of human pregnancies from conception to birth varies
according to a distribution that is approximately normal with mean 266
days and standard deviation 16 days. Apply your results from (b): what are
the quartiles of the distribution of lengths of human pregnancies?
10. Use R to generate 100 observations from the standard normal distribution. Make a
histogram of these observations. How does the shape of the histogram compare
with a normal density curve? Make a normal quantile plot of the data. Does the
plot suggest any important deviations from normality? (Repeating this exercise is
a good way to become familiar with how histograms and normal quantile plots
look when data actually are close to normal.)

11. Use R to plot a standard normal density curve and add a normal density curve (0,
2) N on the top of it (as I showed you in class). In another graph, plot a standard
normal density curve and add a normal density curve (2, 1) N – on the top of it.

Note: to get the two density curves shown properly, in each graph you have to let
X take values more than 3 standard deviations of both normal distributions.

Chapter 2

1. Mutual-fund reports often give correlations to describe how the prices of different
investments are related. You look at the correlations between three Fidelity funds
and the Standard & Poor’s 500-Stock Index, which describes stocks of large U.S
companies. The three funds are Dividend Growth (stocks of large U.S
companies), Small Cap Stock (stocks of small U.S companies), and Emerging
Markets (stocks in developing countries). For 2003, the three correlations are r =
0.35, r = 0.81, and r = 0.98.
b. The correlations of the three funds with the index are all positive. Does
this tell you that stocks went up in 2003? Explain your answer.

2. A student wonders if tall women tend to date taller men than do short women. She
measures herself, her dormitory roommate, and the women in the adjoining
rooms; then she measure the next man each woman dates. Here are the data
(heights in inches):

Women (x)     Men (y)
66            72
64            68
66            70
65            68
70            71
65            65

a. Make a scatterplot of these data. Based on the scatterplot, do you expect
the correlation to be positive or negative? Near ±1 or not?
b. Find the correlation, r, between the heights of the men and women.
c. How would r change if all the men were 6 inches shorter than the heights
given in the table? Does the correlation tell us whether women tend to date
men taller than themselves?
d. If heights were measured in centimeters rather than inches, how would the
correlation change? (There are 2.54 centimeters in an inch.)
e. If every woman dated a man exactly 3 inches taller than herself, what
would be the correlation between male and female heights?

3. Make a scatterplot (by hand) of the following data:
x        1     2      3      4     10     10
y        1     3      3      5      1     11

Write down the formula, and calculate the correlation without using the built-in
formula in your calculator. What feature of the data is responsible for reducing the
correlation to this value despite a strong straight-line association between x and y
in most of the observations?

4. Each of the following statements contains a blunder. Explain in each case what is
wrong.
a. “There is a high correlation between the gender of American workers and
their income.”
b. “We found high correlation (r = 1.09) between students’ ratings of faculty
teaching and ratings made by other faculty members.”
c. “The correlation between planting rate and yield of corn was found to be r
= 0.23 bushel.”

5. Many colleges offer online versions of some courses that are also taught in the
classroom. It often happens that the students who enroll in the online versions of
do better than the classroom students on the course exams. This does not show
that online instruction is more effective than classroom teaching, because the kind
of people who sign up for online courses are often quite different from the
classroom students. Suggest some student characteristics that you think could be
confounded with online versus classroom. Use a diagram to illustrate your ideas.

6. Data show that men who are married, and also divorced or widowed men, earn
quite a bit more than men who have never been married. This does not mean that
a man can raise his income by getting married. Suggest several lurking variables
that you think are confounded with marital status and that help explain the
association between marital status and income. Use a diagram to illustrate your
ideas.

7. A study shows that there is a positive correlation between the size of a hospital
(measured by its number of beds x) and the median number of days, y, that
patients remain in the hospital. Does this mean that you can shorten hospital stay
by choosing a small hospital? Use a diagram to explain the association.
Chapter 3

1. A typical hour of prime-time television shows three to five violent acts. Linking
family interviews and police records shows a clear association between time spent
watching TV as a child and later aggressive behavior.
a. Explain why this is an observational study rather than an experiment.
What are the explanatory and response variables?
b. Suggest several lurking variables describing a child’s home life that may
be confounded with how much TV he or she watches. Explain why
confounding makes it difficult to conclude that more TV causes more
aggressive behavior.

2. Several large observational studies suggested that women who take hormones
such as estrogen after menopause have lower risk of a heart attack than women
who do not take hormones. Hormone replacement became popular. But in 2002,
several careful experiments showed that hormone replacement does not reduce
heart attacks. The National Institutes of Health, after reviewing the evidence,
concluded that the observational studies were wrong. Taking hormones after
menopause quickly fell out of favor.
a. Explain the difference between an observational study and an experiment
to compare women who do and don’t take hormones after menopause.
b. Suggest some characteristics of women who choose to take hormones that
might affect the rate of heart attacks. In an observational study, these
characteristics are confounded with taking hormones.

3. Stores advertise price reductions to attract customers. What type of price cut is
most attractive? Market researchers prepared ads for athletic shoes announcing
different levels of discounts (20%, 40%, 60%, or 80%). The student subjects who
sale (25%, 50%, 75%, or 100%). Each subject then rated the attractiveness of the
sale on a scale of 1 to 7.
a. There are two factors. Make a sketch that displays the treatments formed
by all combinations of levels of the factors.
b. Outline a completely randomized design using 80 student subjects. Use R
to conduct random trials to choose the subjects for the first treatment.

4. “Bee pollen is effective for combating fatigue, depression, cancer, and colon
disorders.” So says a Web site that offers the pollen for sale. We wonder if bee
pollen really does prevent colon disorders. Here are two ways to study this
question. Explain why the second design will produce more trustworthy data.
a. Find 200 women who take bee pollen regularly. Match each with a woman
of the same age, race, and occupation who does not take bee pollen.
Follow both groups for 5 years.
b. Find 400 women who do not have colon disorders. Assign 200 to take bee
pollen capsules and the other 200 to take placebo capsules that are
identical in appearance. Follow both groups for 5 years.
5. Calcium is important to the development of young girls. To study how the bodies
of young girls process calcium, investigators used the setting of a summer camp.
Calcium was given in Hawaiian Punch at either a high or a low level. The camp
diet was otherwise the same for all girls. Suppose that there are 60 campers.
a. Outline a completely randomized design for this experiment.
calcium (with a “washout period” between). What is the advantage of the
matched pairs design over the completely randomized design?

6. At a party there are 30 students over age 21 and 20 under age 21. You choose at
random 3 of those over 21 and separately choose at random 2 of those under 21 to
interview about attitudes toward alcohol. You have given every student at the
party the same chance to be interviewed: what is that chance? Why is your sample
not SRS?

7. The following figure shows histograms of four sampling distributions of statistics
intended to estimate the same parameter. Label each distribution relative to the
others as high or low bias and as high or low variability.

Chapter 4

1. About 30% of adult Internet users are between 18 and 29 years of age. Suppose
the probability that a randomly chosen Internet user is in this age group is exactly
0.3. Use R to make a study of short-term variability and long-term regularity as
follows.
a. Set the probability of heads to 0.3. Each head stands for an Internet user
who is between 18 and 29 and each tail is a user who is not. Set the
number of tosses to 20. What is the proportion of heads? Do these 25
times, keep a record of the 25 proportions of heads, and make a stemplot
of these numbers. Lesson: In the short run (20 repetitions) proportions are
quite variable and are often not close to the probability.
b. With the probability of heads still set to 0.3, make 200 tosses. What was
the proportion of heads? Do this 25 times and make a stemplot of the 25
proportions of heads. Lesson: More repetitions make proportions less
variable and generally closer to the probability.

2. The same setting as the previous question
a. Simulate 100 draws of 20 Internet users from the population. (That is, ask
the software to generate 100 binomial observations, each with n = 20 trials
and probability p = 0.3 of a “yes”.) Record the count in the 18 to 29 age
group on each draw. Convert the counts into percents of the 20 Internet
users in each trial who are 18 to 29. Make a histogram of these 100
percents. Describe the shape, center, and spread of this distribution.
b. Now simulate drawing 320 Internet users. (That is, set n = 320 and p =
0.3.) Do this 100 times and record the percent in the 18 to 29 age group for
each of the 100 draws. Make a histogram of the percents and describe the
shape, center, and spread of the distribution.
c. In what ways are the distributions in part (a) and (b) alike? In what ways
do they differ? (Because regularity emerges in the long run, we expect the
results of drawing 320 subjects to be less variable than the results of
drawing 20 subjects.)

3. Dugout Lou thinks that the probabilities for the American League baseball
champion are as follows. The Yankees have probability 0.6 of winning. The Red
Sox and Angels have equal probabilities winning. The Athletics and White Sox
have equal probabilities, but their probabilities are one-third that of Red Sox and
Angels. No other team has a chance. What is Lou’s assignment of probabilities to
teams?

4. The 2000 census allowed each person to choose from a long list of races. That is,
in the eyes of the Census Bureau, you belong to whatever race you say you belong
to. “Hispanic/Latino” is a separate category; Hispanics may be of any race. If we
choose a resident of the United States at random, the 2000 census gives these
probabilities:

Hispanic   Not Hispanic
Asian      0.000        0.036
Black      0.003        0.121
White      0.060        0.691
Other      0.062        0.027
Let A be the event that a randomly chosen American is Hispanic, and let B be the
event that the person chosen is white.
a. Verify that the table gives a legitimate assignment of probabilities.
b. What is P(A)?
c. Describe Bc in words and find P(Bc) by the complement rule.
d. Express “the person chosen is a non-Hispanic white” in terms of events A
and B. What is the probability of this event?

5. Most sample surveys use random digit dialing equipment to call residential
telephone numbers at random. The telephone polling firm Zogby International
reports that the probability that a call reaches a live person is 0.2. Calls are
independent.
a. A polling firm places 5 calls. What is the probability that none of them
reaches a person?
b. When calls are made to New York City, the probability of reaching a
person is only 0.08. What is the probability that none of 5 calls made to
New York City reaches a person?

Questions 6 through 9 are based on the following information about Mendelian
inheritance of blood type.

Each of us has an ABO blood type, which describes whether two characteristics called
A and B are present. Every human being has two blood type alleles (gene forms), one
inherited from our mother and one from our father. Each of these alleles can be A, B,
or O. Which two we inherit determines our blood type. Here is a table that shows
what our blood type is for each combination of two alleles:

Alleles inherited    Blood type
A and A              A
A and B              AB
A and O              A
B and B              B
B and O              B
O and O              O

We inherit each of a parent’s two alleles with probability 0.5. We inherit
independently from our mother and father

6. Hannah and Jacob both have alleles A and B.
a. What blood types can their children have?
b. What is the probability that their next child has each of these blood types?

7. Nancy and David both have alleles B and O.
a. What blood types can their children have?
b. What is the probability that their next child has each of these blood types?
8. Jennifer has alleles A and O. Jose has alleles A and B. they have two children.
What is the probability that both children have blood type A? What is the
probability that both children have the same blood type?

9. Jasmine has alleles A and O. Joshua has alleles B and O.
a. What is the probability that a child of these parents has blood type O?
b. If Jasmine and Joshua have three children, what is the probability that all
three have blood type O? What is the probability that the first child has
blood type O and the next two do not?

10. Some games of chance rely on tossing two dice. Each die has six faces, marked
with 1, 2, …, 6 spots called pips. The dice used in casinos are carefully balanced
so that each face is equally likely to come up. When two dice are tossed, each of
the 36 possible pairs of faces is equally likely to come up. The outcome of interest
to a gambler is the sum of the pips on the two up-faces. Call this random variable
X.
a. Write down all 36 possible pairs of faces
b. If all pairs have the same probability, what must be the probability of each
pair?
c. Write down the value of X next to each pair of faces and use this
information with the result of (b) to give the probability distribution of X.
Draw a probability histogram to display the distribution.
d. One bet available in craps wins if a 7 or an 11 comes up on the next roll of
two dice. What is the probability of rolling a 7 or an 11 on the next roll?
e. Several bets in craps lose if a 7 is rolled. If any outcome other than 7
occurs, these bets either win or continue to the next roll. What is the
probability that anything other than a 7 is rolled?

11. Generate two random numbers between 0 and 1 and take Y to be their sum. Then
Y is a continuous random variable that can take any value between 0 and 2. The
density curve of Y is the triangle shown below.

Height = 1

0                  1                      2
a. Verify by geometry that the area under this curve is 1.
b. What is the probability that Y is less than 1. (Sketch the density curve,
shade the area that represents the probability, then find that area. Do this
for (c) also.)
c. What is the probability that Y is less than 0.5?

12. You have two instruments with which to measure the height of a tower. If the true
height is 100 meters, measurements with the first instrument vary with mean 100
meters and standard deviation 1.2 meters. Measurements with the second
instrument vary with mean 100 meters and standard deviation 0.85 meter. You
make one measurement with each instrument. Your results are X 1 for the first and
X 2 for the second, and are independent.
a. To combine the two measurements, you might average them,

Y  ( X1  X 2 ) 2
What are the mean and standard deviation of Y?

b. It makes sense to give more weight to the less variable measurement
because it is more likely to be close to the truth. Statistical theory says that
to make the standard deviation as small as possible you should weight the
two measurements inversely proportional to their variances. The variance
of X 2 is very close to half the variance of X 1 , so X 2 should get twice the
weight of X 1 . That is, use
1      2
W  X1  X 2
3      3
What are the mean and standard deviation of W?

13. An insurance company sees that in the entire population of homeowners, the
mean loss from fire is µ = \$250 and the standard deviation of the loss is σ = \$300.
What are the mean and standard deviation of the total loss for 12 policies? (Losses
on separate policies are independent.) What are the mean and standard deviation
of the average loss for 12 policies?

Exercises 14 through 16 make use of the following information.
Portfolio analysis: Here are the means, standard deviations, and correlations for the
annual returns from three fidelity mutual funds for the 10 years ending in February 2004.
Because there are three random variables, there three correlations. We use subscripts to
show which pair of random variables a correlation refers to.

W = annual return on 500 Index Fund                           µW = 11.2%, σW = 17.46%
X = annual return on Investment Grade Bond Fund               µX = 6.46%, σX = 4.18%
Y = Annual return on Diversified International Fund           µY = 11.10%, σY = 15.62%
Correlations
ρWX =  0.22, ρwy = 0.56, ρxy =  0.12
14. Many advisors recommend using roughly 20% foreign stocks to diversify
portfolios of U.S stocks. You see that the 500 Index (U.S stocks) and Diversified
International (foreign stocks) Funds had almost the same mean returns. A
portfolio of 80% 500 Index and 20% Diversified International will deliver this
mean return with less risk. Verify this by finding the mean and standard deviation
of returns on this portfolio.

15. Diversification works better when the investments in a portfolio have small
correlations. To Demonstrate this, suppose that returns on 500 Index Fund and
Diversified International fund had the means and standard deviations we have
given but were uncorrelated (ρwy = 0). Show that the standard deviation of a
portfolio that combines 80% 500 Index with 20% Diversified International is then
smaller than your result from the previous exercise. What happens to the mean
return if the correlation is 0?

16. Portfolios often contain more than two investments. The rules for means and
variances continue to apply, though the arithmetic gets messier. A portfolio
containing proportions a of 500 Index Fund, b of Investment Grade Bond Fund,
and c of Diversified International Fund has return R = aW + bX+ cY. Because a, b,
and c are the proportions invested in the three funds, a + b + c = 1. The mean and
variance of the portfolio return R are

µR = aµW + bµX + cµY
σ2R = a2σ2W + b2σ2X + c2σ2Y + 2abρWXσX + 2acρWYσWσY +
2bcρXYσXσY

A basic well-diversified portfolio has 60% in 500 Index, 20% in Investment
Grade Bond, and 20% in Diversified International. What are the (historical) mean
and standard deviation of the annual returns for this portfolio? What does an
investor gain by choosing this diversified portfolio over 100% U.S stocks? What
does the investor lose (at least in this time period)?

17. Here are the counts (in thousands) of earned degrees in the United States in the
2005-2006 academic year, classified by level and by the sex of the degree
recipient:

Bachelor's Master's Professional Doctorate Total
Female           784      276            39        20     1119
Male             559      197            44        25      825
Total           1343      473            83        45     1944

a. If you choose a degree recipient at random, what is the probability that the
person you choose is a woman?
b. What is the conditional probability that you choose a woman, given that
the person chosen received a professional degree?
c. Are the events “choose a woman” and “choose a professional degree
recipient” independent? How do you know?

Exercises 18 - 20 make use of the following information.

Working: In the language of government statistics, you are “in the labor force” if you are
available for work and either working or actively seeking work. The unemployment rate
is the proportion of the labor force (not of the entire population) who are unemployed.
Here are data from the Current Population Survey for the civilian population aged 25
years and over at the end of 2003. The table entries are counts in thousands of people.

Higest Education                         Total population In Labor Force Employed
Did not finish high school                          28,021         12,623   11,552
high school but no college                          59,844         38,210   36,249
Some college, but no bachelor's degree              46,777         33,928   32,429

18. Find the unemployment rate for people with each level of education. How does
the unemployment rate change with education? Explain carefully why your results
show that level of education and being employed are not independent.

19.
a. What is the probability that a randomly chosen person 25 years of age or
older is in the labor force?
b. If you know that the person chosen is a college graduate, what is the
conditional probability that he or she is in the labor force?
c. Are the events “in the labor force” and “college graduate” independent?
How do you know?

20. You know that a person is employed. What is the conditional probability that he
or she is a college graduate? You know that a second person is a college graduate.
What is the conditional probability that he or she is employed?

21. The probability that a randomly chosen student at the University of New
Harmony is a woman is 0.6. The probability that the student is studying education
is 0.15. The conditional probability that the student is a woman, given that the
student is studying education, is 0.8. What is the conditional probability that the
student is studying education, given that she is a woman?

Chapter 5

1. In each situation below, is it reasonable to use a binomial distribution for the
random variable X? Give reasons for your answer in each case. If a binomial
distribution applies, give the values of n and p.
a. Most calls made at random by sample surveys don’t succeed in talking
with a live person. Of calls to New York City, only 1/12 succeed. A
survey calls 500 randomly selected numbers in New York City. X is the
number that reach a live person.
b. At peak periods, 25% of attempted logins to an Internet service provider
fail. Login attempts are independent and each has the same probability of
failing. Darci logs in repeatedly until she succeeds. X is the number of the
c. On a bright October day, Canada geese arrive to foul the pond at an
apartment complex at the average rate of 12 geese per hour; X is the
number of geese that arrive in the next three hours.

2. In each of situation below, is it reasonable to use a binomial distribution for the
a. An auto manufacturer chooses one car from each hour’s production for a
detailed quality inspection. One variable recorded is the count X of finish
defects (dimples, ripples, etc.) in the car’s paint.
b. The pool of potential jurors for a murder case contains 100 persons chosen
at random from the adult residents of a large city. Each person in the pool
is asked whether he or she opposes the death penalty; X is the number who
say “Yes.”
c. Joe buys a ticket in his state’s “pick 3” lottery game every week; X is the
number of times in a year that he wins a prize.

3. Some of the methods in this chapter are approximations rather than exact
probability results. We have given rules of thumb for safe use of these
approximations.
a. You are interested in attitudes toward drinking among the 75 members of
a fraternity. You choose 25 members at random to interview. One question
is “Have you had five or more drinks at one time during the last week?”
Suppose that in fact 20% of the 75 members would say “Yes.” Explain
why you cannot safely use the B (25, 0.2) distribution for the count X in
b. The National AIDS Behavioral Surveys found that 0.2% (that’s 0.002 as a
had a sexual partner from a group at high risk of AIDS. Suppose that this
national proportion holds for your region. Explain why you cannot safely
use the normal approximation for the sample proportion who fall in this
group when you interview an SRS of 500 adults.

4. “What do you think is the ideal number of children for a family to have?” A
Gallup poll asked this question to 1006 randomly chosen adults. Almost half
(49%) thought two children was ideal. Suppose that p = 0.49 is exactly true for
the population of all adults. Gallup announced a margin of error of ±3 percentage
ˆ
points for this poll. What is the probability that the sample proportion p for an
SRS of size n = 1006 falls between 0.46 and 0.52? You see that it is likely, but not
certain, that polls like this give results that are correct within their margin of error.
We will say more about margins of error in Chapter 6.
5. Return to the Gallup poll setting of the previous question. We are supposing that
the proportion of all adults who think that two children is ideal is p = 0.49. What
ˆ
is the probability that a sample proportion p falls between 0.46 and 0.52 (that is,
within ±3 percentage points of the true p) if the sample is an SRS of size n = 250?
Of size n = 4000? Combine these results with your work in the previous question
to make a general statement about the effect of larger samples in a sample survey.

6. The changing probabilities you found in questions 4 and 5 are due to the fact that
ˆ
the standard deviation of the sample proportion p gets smaller as the sample size
n increases. If the population proportion is p = 0.49, how large a sample is needed
to reduce the standard deviation of p to  p  0.005 ? (According to the 68-95-
ˆ       ˆ

99.7 rule, when the standard deviation is this small, about 95% of all samples will
ˆ
have p within 0.01 of the true p.)

7. A selective college would like to have an entering class of 1200 students. Because
not all students who are offered admission accept, the college admits more than
1200 students. Past experience shows that about 70% of the students admitted will
accept. The college decides to admit 1500 students. Assuming that students make
their decisions independently, the number who accept X has the B (1500, 0.7)
distribution. If this number is less than 1200, the college will admit students from
its waiting list.

a. What are the mean and the standard deviation of the number of X students
who accept?
b. Use the normal approximation to find the probability that at least 1000
students accept.
c. The college does not want more than 1200 students. What is the
probability that more than 1200 will accept?
d. If the college decides to increase the number of admission offers to 1700,
what is the probability that more than 1200 will accept?

8. The scores of high school seniors on the ACT college entrance examination in
2003 had mean µ = 20.8 and standard deviation σ = 4.8. The distribution of
scores is only roughly normal.
a. What is the approximate probability that a single student randomly chosen
from all those taking the test scores 23 or higher?
b. Now take an SRS of 25 students who took the test. What are the mean and
standard deviation of the sample mean score X of these 25 students?
c. What is the approximate probability that the mean score X of these
students is 23 or higher?

9. North Carolina State University posts the grade distributions for its courses
online. You can find that the distribution of grades in Statistics 101 in the fall
2003 semester was
Grade               A           B           C         D           F
Probability        0.21        0.43        0.3       0.05        0.01

a. Using the common scale A = 4, B = 3, C = 2, D = 1, F = 0, take X to be the
grade of a randomly chosen Statistics 101 student. Find the mean µ and
standard deviation σ of grades in this course.
b. Statistics 101 is a large course. We can take the grades of an SRS of 50
students to be independent of each other. If X is the average of these 50
grades, what are the mean and standard deviation of X ?
c. What is the probability P (X ≥ 3) that a randomly chosen Statistics 101
student gets a B or better? What is the approximate probability P ( X ≥ 3)
that the grade point average for 50 randomly chosen Statistics 101
students is B or better?

10. A \$1 bet in a state lottery’s Pick 3 game pays \$500 if the three-digit number you
choose exactly matches the winning number, which is drawn at random. Here is
the distribution of the payoff X:

Payoff X      \$0 \$500
Probability 0.999 0.001

Each day’s drawing is independent of other drawings.
a. What are the mean and standard deviation of X?
b. Joe buys a Pick 3 ticket every day. What is does the law of large numbers
c. What does the central limit theorem say about the distribution of Joe’s
average payoff after 365 bets in a year?
d. Joe comes out ahead for the year if his average payoff is greater than \$1
(the amount he spent each day on a ticket). What is the probability that Joe

11. The distribution of annual returns on common stocks is roughly symmetric, but
extreme observations are more frequent than in normal distribution. Because the
distribution is not strongly nonnormal, the mean return over even a moderate
number of years is close to normal. Annual real returns on the Standard & Poor’s
500-Stock Index over the period 1871 to 2004 have varied with mean 9.2% and
standard deviation 20.6%. Andrew plans to retire in 45 years and is considering
investing in stocks. What is the probability (assuming that the past pattern of
variation continues) that the mean annual return on common stocks over the next
45 years will exceed 15%? What is the probability that the mean return will be
less than 5%?

12. According to genetic theory, the blossom color in the second generation of a
certain cross of sweet peas should be red or white in a 3:1 ratio. That is, each
plant has probability ¾ of having red blossoms, and the blossom colors of
separate plants are independent.
a. What is the probability that exactly 6 out of 8 of these plants have red
blossoms?
b. What is the mean number of red-blossomed plants when 80 plants of this
type are grown from seeds?
c. What is the probability of obtaining at least 50 red-blossomed plants when
80 plants are grown from seeds?

13. Does delaying oral practice hinder learning a foreign language? Researchers
randomly assigned 23 beginning students of Russian to begin speaking practice
immediately and another 23 to delay speaking for 4 weeks. At the end of the
semester both groups took a standard test of comprehension of spoken Russian.
Suppose that in the population of all beginning students, the test scores for early
speaking vary according to the N (32, 6) distribution and scores for delayed
speaking have the N (29, 5) distribution.
a. What is the sampling distribution of the mean score X in the early
speaking group in many repetitions of the experiment? What is the
sampling distribution of the mean score Y in the delayed-speaking group?

b. If the experiment were repeated many times, what would be the sampling
distribution of the difference Y  X between the mean scores in the two
groups?

c. What is the probability that the experiment will find (misleadingly) that
the mean score for delayed speaking is at least as large as that for early
speaking?

14. Suppose (as is roughly true) that 88% of college men and 82% of college women
were employed last summer: A sample survey interviews SRSs of 500 college
men and 500 college women. The two samples are of course independent.
ˆ
a. What is the approximate distribution of the proportion pF of women who
worked last summer? What is the approximate distribution of the
ˆ
proportion pM of men who worked?
b. The survey wants to compare men and women. What is the approximate
distribution of the difference in the proportions who worked, pM  pF ?
ˆ    ˆ
c. What is the probability that in the sample a higher proportion of women
than men worked last summer?

15. A fair coin is tossed 250 times.
a. Name the Bernoulli and Binomial random variables involved in this
experiment. State explicitly the parameters of the Bernoulli and Binomial
random variables in this setting and the relationship of Binomial and
Bernoulli random variables.
b. Write down and briefly explain an expression for the probability that 120
heads are observed. Get an approximation of this probability using the
normal tables.
c. Use the normal tables to approximate the probability of observing more

Chapter 6

1. Suppose that the sample mean is 50 and the standard deviation is assumed to be 5.
Make a diagram that illustrates the effect of sample size on the width of a 95%
interval. Use the following sample size: 10, 20, 40, and 100. Summarize what the
diagram shows.

2. A study with 25 observations gave a mean of 70. Assume that the standard
deviation is 15. Make a diagram that illustrates the effect of the confidence level
on the width of the interval. Use 80%, 90%, 95%, and 99%. Summarize what the
diagram shows.

3. Consider the following two scenarios. (A) Take an SRS of 100 students from an
Take a simple random sample of 100 third-graders from the same school. For
each of these samples you will measure the height of each child in the sample.
Which sample should have the smaller margin of error for 95% confidence?

4. A questionnaire about study habits was given to a random sample of students
taking a large introductory statistics class. The sample of 25 students reported that
they spent an average of 80 minutes per week studying statistics. Assume that the
standard deviation is 35 minutes.

a. Give a 95% confidence interval for the mean time spent studying statistics
by students in this class.
b. Is it true that 95% of the students in the class have weekly study times that
lie in the interval you found in part (a)? Explain your answer.

5. You are planning a survey of starting salaries for recent liberal arts major
graduates from your college. From a pilot study you estimate that the standard
deviation is about \$9000. What sample size do you need to have a margin of error
equal to \$40 with 95% confidence?

6. Suppose that in the setting of the previous question you are willing to settle for a
margin of error of \$800. Will the required sample size be larger or smaller?

7. To assess the accuracy of a laboratory scale, a standard weight know to weigh 10
grams is weighed repeatedly. The scale readings are normally distributed with
unknown mean (this mean is 10 grams if the scale has no bias). The standard
deviation of the scale readings is known to be 0.0002 gram.
a. The weight is weighed five times. The mean result is 10.0023 grams. Give
a 98% confidence interval for the mean of repeated measurements of the
weight.
b. How many measurements must be averaged to get a margin of error of
±0.0001 with 98% confidence?

8. A newspaper invites readers to send email stating whether they are in favor of
making full-day kindergarten available to all students in the state. A total of 320
responses are received and, of these, 80% are in favor of the new program. In an
article describing the results, the authors state that the margin of error is 4% for
95% confidence. Assume that they have computed this number correctly.
a. Use the sample proportion and the margin of error to compute the 95%
confidence interval.
b. Do you think that these results are trustworthy? Discuss your answer.

9. Here are several situations where there is an incorrect application of the ideas
presented in Chapter 6. Write a short paragraph explaining what is wrong in each
situation and why it is wrong.
a. A climatologist wants to test the null hypothesis that it will rain tomorrow.
b. A random sample of size 20 is taken from a population that is assumed to
have a standard deviation of 15. The standard deviation of the sample
mean is 15/20.
c. A researcher tests the following null hypothesis: H 0 : X  10
10. Here are several situations where there is an incorrect application of the ideas
presented in Chapter 6. Write a short paragraph explaining what is wrong in each
situation and why it is wrong.
a. A change is made that should improve student satisfaction with the way
grades are processed at your college. The null hypothesis, that there is an
improvement, is tested versus the alternative, that there is no
improvement.
b. A significance test rejected the null hypothesis that the sample mean is 25.
c. A report on a study says that the results are statistically significant and the
P-value is 0.95.

11. Translate each of the following research questions into appropriate Ho and Ha.
a. Census Bureau data show that the mean household income in the area
served by a shopping mall is \$72,500 per year. A market research firm
questions shoppers at the mall to find out whether the mean household
income of mall shoppers is higher than that of the population.
b. Last year, your company’s service technicians took an average of 1.8
purchased service contracts. Do this year’s data show a different average
response time?
12. A test statistic for a two-sided significance test for a population mean is z = 2.3.
Sketch a standard normal curve and mark this value of z on it. Find the P-value
and shade the appropriate areas under the curve to illustrate your calculations.

13. The P-value for a significance test is 0.082.
a. Do you reject the null hypothesis at level α = 0.05?
b. Do you reject the null hypothesis at level α = 0.01?

14. The P-value for a significance test is 0.032.
a. Do you reject the null hypothesis at level α = 0.05?
b. Do you reject the null hypothesis at level α = 0.01?

15. A test of the null hypothesis Ho: µ = µo gives test statistic z = 1.6.
a. What is the P-value if the alternative is Ha: µ > µo?
b. What is the P-value if the alternative is Ha: µ < µo?
c. What is the P-value if the alternative is Ha: µ ≠ µo?

16. A test of the null hypothesis Ho: µ = µo gives test statistic z = 1.6 .
a. What is the P-value if the alternative is Ha: µ > µo?
b. What is the P-value if the alternative is Ha: µ < µo?
c. What is the P-value if the alternative is Ha: µ ≠ µo?

17. The P-value for a two-sided test of the null hypothesis Ho: µ = 30 is 0.09.
a. Does the 95% confidence interval include the value 30? Why?
b. Does the 90% confidence interval include the value 30? Why?

18. The P-value for a two-sided test of the null hypothesis Ho: µ = 30 is 0.04.
a. Does the 95% confidence interval include the value 30? Why?
b. Does the 90% confidence interval include the 30? Why?

19. A 95% confidence interval for a population mean is (57, 65).
a. Can you reject the null hypothesis that µ = 68 at the 5% significance
level? Why?
b. Can you reject the null hypothesis that µ = 62 at the 5% significance
level? Why?

20. A 90% confidence interval for a population mean is (12, 15).
a. Can you reject the null hypothesis that µ = 13 at the 10% significance
level? Why?
b. Can you reject the null hypothesis that µ = 10 at the 10% significance
level? Why?
21. The survey of Study Habits and Attitudes (SSHA) is a psychological test that
measures the motivation, attitude toward school, and study habits of students.
Scores range from 0 to 200. The mean score for U.S college students is about 115,
and the standard deviation is about 30. A teacher who suspects that older students
have better attitudes toward school gives the SSHA to 25 students who are at least
30 years of age. Their mean score is x  132.2
a. Assuming that σ = 30 for the population of older students, carry out a test
of
Ho: µ = 115
Ho: µ > 115

b. Your test in (a) required two important assumptions in addition to the
assumption that the value of σ is known. What are they? Which of these
assumptions is most important to the validity of your conclusion in (a)?

22. The level of calcium in the blood in healthy young adults varies with mean about
9.5 milligrams per deciliter and standard deviation about σ = 0.4. A clinic in rural
Guatemala measures the blood calcium level of 160 healthy pregnant women at
their first visit for prenatal care. The mean is x = 9.57. Is this an indication that
the mean calcium level in the population from which these women come differs
from 9.5?
a. State Ho and Ha.
b. Carry out the test and give the P-value, assuming that σ = 0.4 in this
c. Give a 95% confidence interval for the mean calcium level µ in this
population. We can see that µ lies quite close to 9.5. This illustrates the
fact that a test based on a large sample will often declare even a small
deviation from Ho to be statistically significant.

23. Explain in plain language why a significance test that is significant at the 1% level
must always be significant at the 5% level.

24. You are told that a significance test is significant at the 5% level. From this
information can you determine whether or not it is significant at the 1% level?

25. You will perform a significance test of
Ho: µ = 0 versus Ha: µ > 0
a. What values of z would lead you to reject Ho at the 5% level?
b. If the alternative hypothesis was
Ha: µ ≠ 0
what values of z would lead you to reject Ho at the 5% level?
c. Explain why your answers to parts (a) and (b) are different.
26. Radon is colorless, odorless gas that is naturally released by rocks and soils and
there is some concern that it may be a health hazard. Radon detectors are sold to
homeowners worried about the risk, but the detectors may be inaccurate.
University researchers placed 12 detectors in a chamber where they were exposed
to 105 picocuries per liter (pCi/l) of radon over 3 days. Here are the readings
given by the detectors:

91.9      97.8      111.4     122.3      105.4         95
103.8      99.6       96.6     119.3      104.8      101.7

Assume (unrealistically) that you know that the standard deviation of
readings for all detectors of this type is σ = 9.
a. Give a 95% confidence interval for the mean reading µ for this type of
detector.
b. Is there significant evidence at the 5% level that the mean reading differs
from the true value 105? State hypotheses and conduct a significance test
based on your confidence interval from (a).

27. Consumers can purchase nonprescription medications at food stores, mass
merchandise stores such as Kmart and Wal-Mart, or pharmacies. About 45% of
consumers make such purchases at pharmacies. What accounts for the popularity
of pharmacies, which often charge higher prices? A study examined consumers’
perceptions of overall performance of the three types of stores, using a long
“knowledgeable staff,” and “assistance in choosing among various types of
nonprescription medication.” A performance score was based on 27 such
questions. The subjects were 201 people chosen at random from the Indianapolis
telephone directory. Here are the means and standard deviations of the
performance scores for the sample:

Store type           x     s
Food store         18.67 24.95
Mass merchandisers 32.38 33.37
Pharmacies         48.60 35.62

We do not know the population standard deviations, but a sample standard deviation s
from so large a sample is usually close to σ. Use s in place of the unknown σ in this
exercise.
a. What population do you think the authors of the study want to draw
conclusions about? What population are you certain they can draw
b. Give 95% confidence intervals for the mean performance for each type of
store.
c. Based on these confidence intervals, are you convinced that consumers
think that pharmacies offer higher quality services than the other types of
stores?

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 0 posted: 7/16/2013 language: English pages: 22