Practice Problems

Document Sample
Practice Problems Powered By Docstoc
					                      Econ 2500 – Introductory Statistics
York University
Department of Economics
Professor Xianghong Li

                                  Practice Problems


Chapter 1

   1. The states differ greatly in the kinds of severe weather that afflict them. Table
      ta01_005 shows the average property damage caused by tornadoes per year over
      the period from 1950 to 1999 in each of the 50 states and Puerto Rico.
          a. What are the top five states for tornado damage? The bottom five?
          b. Make a histogram of the data by hand, with classes
              "0  damage  10," "10  damage  20," and so on. Describe the shape,
              center, and spread of the distribution. Which states may be outliers.

   2. ex01-035 presented data on the nightly study time claimed by first-year college
      men and women. The most common methods for formal comparison of two
      groups use x and s to summarize the data. We wonder if this is appropriate here.
         a. What kinds of kinds of distributions are best summarized by x and s?
         b. Use R to draw separate histograms for men and women.
         c. Each set of study times appears to contain a high outlier. Are these points
             flagged as suspicious by the 1.5  IQR rule? How much does removing
             the outlier change x and s for each group? The presence of outliers makes
             us reluctant to use the mean and standard deviation for these data unless
             we remove the outliers on the grounds that these students were
             exaggerating.

   3. Create a set of 5 positive numbers (repeats allowed) that have median 10 and
      mean 7. What thought process did you use to create your numbers?

   4. Use the definition of the mean x to show that the sum of the deviations xi  x of
      the observations from their mean is always zero. This is one reason why the
      variance and standard deviation use squared deviations.

   5. If you ask a computer to generate “random numbers” between 0 and 1, you will
      get observations from a uniform distribution. The following figure graphs the
      density curve for a uniform distribution. Use areas under this density curve to
      answer the following questions.
      0                                             1

       a. Why is the total area under this curve equal to 1?
       b. What proportion of the observations lie above 0.75?
       c. What proportion of the observations lie between 0.25 and 0.75?

6. What are the mean and median of the uniform distribution in the graph of the
   previous question? What are the quartiles?

7. Find the value z of a standard normal variable Z that satisfies each of the
   following conditions. (If you use table A, report the value of z that comes closest
   to satisfying the condition.) In each case, sketch a standard normal curve with
   your value of z marked on the axis.

       a. 20% of the observations fall below z.
       b. 30% of the observations fall above z.

8. The Wechsler Adult Intelligence Scale (WAIS) is the most common “IQ test.”
   The scale of scores is set separately for each age group and is approximately
   normal with mean 100 and standard deviation 15. People with WAIS scores
   below 70 are considered mentally retarded when, for example, applying for Social
   Security disability benefits. What percent of adults are retarded by this criterion?

9. The quartiles of any distribution are the values with cumulative proportions 0.25
   and 0.75.

       a. What are the quartiles of the standard normal distribution?
       b. Using your numerical values from (a), write an equation that gives the
          quartiles of N (µ, σ) distributions in terms of µ and σ.
       c. The length of human pregnancies from conception to birth varies
          according to a distribution that is approximately normal with mean 266
          days and standard deviation 16 days. Apply your results from (b): what are
          the quartiles of the distribution of lengths of human pregnancies?
   10. Use R to generate 100 observations from the standard normal distribution. Make a
       histogram of these observations. How does the shape of the histogram compare
       with a normal density curve? Make a normal quantile plot of the data. Does the
       plot suggest any important deviations from normality? (Repeating this exercise is
       a good way to become familiar with how histograms and normal quantile plots
       look when data actually are close to normal.)


   11. Use R to plot a standard normal density curve and add a normal density curve (0,
       2) N on the top of it (as I showed you in class). In another graph, plot a standard
       normal density curve and add a normal density curve (2, 1) N – on the top of it.

      Note: to get the two density curves shown properly, in each graph you have to let
      X take values more than 3 standard deviations of both normal distributions.


Chapter 2

   1. Mutual-fund reports often give correlations to describe how the prices of different
      investments are related. You look at the correlations between three Fidelity funds
      and the Standard & Poor’s 500-Stock Index, which describes stocks of large U.S
      companies. The three funds are Dividend Growth (stocks of large U.S
      companies), Small Cap Stock (stocks of small U.S companies), and Emerging
      Markets (stocks in developing countries). For 2003, the three correlations are r =
      0.35, r = 0.81, and r = 0.98.
          a. Which correlation goes with each fund? Explain your answer.
          b. The correlations of the three funds with the index are all positive. Does
              this tell you that stocks went up in 2003? Explain your answer.

   2. A student wonders if tall women tend to date taller men than do short women. She
      measures herself, her dormitory roommate, and the women in the adjoining
      rooms; then she measure the next man each woman dates. Here are the data
      (heights in inches):

                               Women (x)     Men (y)
                                 66            72
                                 64            68
                                 66            70
                                 65            68
                                 70            71
                                 65            65

            a. Make a scatterplot of these data. Based on the scatterplot, do you expect
               the correlation to be positive or negative? Near ±1 or not?
            b. Find the correlation, r, between the heights of the men and women.
       c. How would r change if all the men were 6 inches shorter than the heights
          given in the table? Does the correlation tell us whether women tend to date
          men taller than themselves?
       d. If heights were measured in centimeters rather than inches, how would the
          correlation change? (There are 2.54 centimeters in an inch.)
       e. If every woman dated a man exactly 3 inches taller than herself, what
          would be the correlation between male and female heights?

3. Make a scatterplot (by hand) of the following data:
         x        1     2      3      4     10     10
         y        1     3      3      5      1     11

   Write down the formula, and calculate the correlation without using the built-in
   formula in your calculator. What feature of the data is responsible for reducing the
   correlation to this value despite a strong straight-line association between x and y
   in most of the observations?

4. Each of the following statements contains a blunder. Explain in each case what is
   wrong.
      a. “There is a high correlation between the gender of American workers and
          their income.”
      b. “We found high correlation (r = 1.09) between students’ ratings of faculty
          teaching and ratings made by other faculty members.”
      c. “The correlation between planting rate and yield of corn was found to be r
          = 0.23 bushel.”

5. Many colleges offer online versions of some courses that are also taught in the
   classroom. It often happens that the students who enroll in the online versions of
   do better than the classroom students on the course exams. This does not show
   that online instruction is more effective than classroom teaching, because the kind
   of people who sign up for online courses are often quite different from the
   classroom students. Suggest some student characteristics that you think could be
   confounded with online versus classroom. Use a diagram to illustrate your ideas.

6. Data show that men who are married, and also divorced or widowed men, earn
   quite a bit more than men who have never been married. This does not mean that
   a man can raise his income by getting married. Suggest several lurking variables
   that you think are confounded with marital status and that help explain the
   association between marital status and income. Use a diagram to illustrate your
   ideas.

7. A study shows that there is a positive correlation between the size of a hospital
   (measured by its number of beds x) and the median number of days, y, that
   patients remain in the hospital. Does this mean that you can shorten hospital stay
   by choosing a small hospital? Use a diagram to explain the association.
Chapter 3

   1. A typical hour of prime-time television shows three to five violent acts. Linking
      family interviews and police records shows a clear association between time spent
      watching TV as a child and later aggressive behavior.
          a. Explain why this is an observational study rather than an experiment.
              What are the explanatory and response variables?
          b. Suggest several lurking variables describing a child’s home life that may
              be confounded with how much TV he or she watches. Explain why
              confounding makes it difficult to conclude that more TV causes more
              aggressive behavior.

   2. Several large observational studies suggested that women who take hormones
      such as estrogen after menopause have lower risk of a heart attack than women
      who do not take hormones. Hormone replacement became popular. But in 2002,
      several careful experiments showed that hormone replacement does not reduce
      heart attacks. The National Institutes of Health, after reviewing the evidence,
      concluded that the observational studies were wrong. Taking hormones after
      menopause quickly fell out of favor.
         a. Explain the difference between an observational study and an experiment
              to compare women who do and don’t take hormones after menopause.
         b. Suggest some characteristics of women who choose to take hormones that
              might affect the rate of heart attacks. In an observational study, these
              characteristics are confounded with taking hormones.

   3. Stores advertise price reductions to attract customers. What type of price cut is
      most attractive? Market researchers prepared ads for athletic shoes announcing
      different levels of discounts (20%, 40%, 60%, or 80%). The student subjects who
      read the ads were also given “inside information” about the fraction of shoes on
      sale (25%, 50%, 75%, or 100%). Each subject then rated the attractiveness of the
      sale on a scale of 1 to 7.
          a. There are two factors. Make a sketch that displays the treatments formed
              by all combinations of levels of the factors.
          b. Outline a completely randomized design using 80 student subjects. Use R
              to conduct random trials to choose the subjects for the first treatment.

   4. “Bee pollen is effective for combating fatigue, depression, cancer, and colon
      disorders.” So says a Web site that offers the pollen for sale. We wonder if bee
      pollen really does prevent colon disorders. Here are two ways to study this
      question. Explain why the second design will produce more trustworthy data.
          a. Find 200 women who take bee pollen regularly. Match each with a woman
              of the same age, race, and occupation who does not take bee pollen.
              Follow both groups for 5 years.
          b. Find 400 women who do not have colon disorders. Assign 200 to take bee
              pollen capsules and the other 200 to take placebo capsules that are
              identical in appearance. Follow both groups for 5 years.
5. Calcium is important to the development of young girls. To study how the bodies
   of young girls process calcium, investigators used the setting of a summer camp.
   Calcium was given in Hawaiian Punch at either a high or a low level. The camp
   diet was otherwise the same for all girls. Suppose that there are 60 campers.
       a. Outline a completely randomized design for this experiment.
       b. Describe a matched pairs design in which each girl receives both levels of
          calcium (with a “washout period” between). What is the advantage of the
          matched pairs design over the completely randomized design?

6. At a party there are 30 students over age 21 and 20 under age 21. You choose at
   random 3 of those over 21 and separately choose at random 2 of those under 21 to
   interview about attitudes toward alcohol. You have given every student at the
   party the same chance to be interviewed: what is that chance? Why is your sample
   not SRS?

7. The following figure shows histograms of four sampling distributions of statistics
   intended to estimate the same parameter. Label each distribution relative to the
   others as high or low bias and as high or low variability.




Chapter 4

1. About 30% of adult Internet users are between 18 and 29 years of age. Suppose
   the probability that a randomly chosen Internet user is in this age group is exactly
   0.3. Use R to make a study of short-term variability and long-term regularity as
   follows.
       a. Set the probability of heads to 0.3. Each head stands for an Internet user
          who is between 18 and 29 and each tail is a user who is not. Set the
          number of tosses to 20. What is the proportion of heads? Do these 25
          times, keep a record of the 25 proportions of heads, and make a stemplot
          of these numbers. Lesson: In the short run (20 repetitions) proportions are
          quite variable and are often not close to the probability.
       b. With the probability of heads still set to 0.3, make 200 tosses. What was
          the proportion of heads? Do this 25 times and make a stemplot of the 25
          proportions of heads. Lesson: More repetitions make proportions less
          variable and generally closer to the probability.

2. The same setting as the previous question
      a. Simulate 100 draws of 20 Internet users from the population. (That is, ask
         the software to generate 100 binomial observations, each with n = 20 trials
         and probability p = 0.3 of a “yes”.) Record the count in the 18 to 29 age
         group on each draw. Convert the counts into percents of the 20 Internet
         users in each trial who are 18 to 29. Make a histogram of these 100
         percents. Describe the shape, center, and spread of this distribution.
      b. Now simulate drawing 320 Internet users. (That is, set n = 320 and p =
         0.3.) Do this 100 times and record the percent in the 18 to 29 age group for
         each of the 100 draws. Make a histogram of the percents and describe the
         shape, center, and spread of the distribution.
      c. In what ways are the distributions in part (a) and (b) alike? In what ways
         do they differ? (Because regularity emerges in the long run, we expect the
         results of drawing 320 subjects to be less variable than the results of
         drawing 20 subjects.)

3. Dugout Lou thinks that the probabilities for the American League baseball
   champion are as follows. The Yankees have probability 0.6 of winning. The Red
   Sox and Angels have equal probabilities winning. The Athletics and White Sox
   have equal probabilities, but their probabilities are one-third that of Red Sox and
   Angels. No other team has a chance. What is Lou’s assignment of probabilities to
   teams?

4. The 2000 census allowed each person to choose from a long list of races. That is,
   in the eyes of the Census Bureau, you belong to whatever race you say you belong
   to. “Hispanic/Latino” is a separate category; Hispanics may be of any race. If we
   choose a resident of the United States at random, the 2000 census gives these
   probabilities:


                     Hispanic   Not Hispanic
           Asian      0.000        0.036
           Black      0.003        0.121
           White      0.060        0.691
           Other      0.062        0.027
   Let A be the event that a randomly chosen American is Hispanic, and let B be the
   event that the person chosen is white.
       a. Verify that the table gives a legitimate assignment of probabilities.
       b. What is P(A)?
       c. Describe Bc in words and find P(Bc) by the complement rule.
       d. Express “the person chosen is a non-Hispanic white” in terms of events A
           and B. What is the probability of this event?

5. Most sample surveys use random digit dialing equipment to call residential
   telephone numbers at random. The telephone polling firm Zogby International
   reports that the probability that a call reaches a live person is 0.2. Calls are
   independent.
       a. A polling firm places 5 calls. What is the probability that none of them
           reaches a person?
       b. When calls are made to New York City, the probability of reaching a
           person is only 0.08. What is the probability that none of 5 calls made to
           New York City reaches a person?

Questions 6 through 9 are based on the following information about Mendelian
inheritance of blood type.

Each of us has an ABO blood type, which describes whether two characteristics called
A and B are present. Every human being has two blood type alleles (gene forms), one
inherited from our mother and one from our father. Each of these alleles can be A, B,
or O. Which two we inherit determines our blood type. Here is a table that shows
what our blood type is for each combination of two alleles:

           Alleles inherited    Blood type
           A and A              A
           A and B              AB
           A and O              A
           B and B              B
           B and O              B
           O and O              O

   We inherit each of a parent’s two alleles with probability 0.5. We inherit
   independently from our mother and father


6. Hannah and Jacob both have alleles A and B.
      a. What blood types can their children have?
      b. What is the probability that their next child has each of these blood types?

7. Nancy and David both have alleles B and O.
      a. What blood types can their children have?
      b. What is the probability that their next child has each of these blood types?
8. Jennifer has alleles A and O. Jose has alleles A and B. they have two children.
   What is the probability that both children have blood type A? What is the
   probability that both children have the same blood type?

9. Jasmine has alleles A and O. Joshua has alleles B and O.
      a. What is the probability that a child of these parents has blood type O?
      b. If Jasmine and Joshua have three children, what is the probability that all
          three have blood type O? What is the probability that the first child has
          blood type O and the next two do not?

10. Some games of chance rely on tossing two dice. Each die has six faces, marked
    with 1, 2, …, 6 spots called pips. The dice used in casinos are carefully balanced
    so that each face is equally likely to come up. When two dice are tossed, each of
    the 36 possible pairs of faces is equally likely to come up. The outcome of interest
    to a gambler is the sum of the pips on the two up-faces. Call this random variable
    X.
        a. Write down all 36 possible pairs of faces
        b. If all pairs have the same probability, what must be the probability of each
            pair?
        c. Write down the value of X next to each pair of faces and use this
            information with the result of (b) to give the probability distribution of X.
            Draw a probability histogram to display the distribution.
        d. One bet available in craps wins if a 7 or an 11 comes up on the next roll of
            two dice. What is the probability of rolling a 7 or an 11 on the next roll?
        e. Several bets in craps lose if a 7 is rolled. If any outcome other than 7
            occurs, these bets either win or continue to the next roll. What is the
            probability that anything other than a 7 is rolled?

11. Generate two random numbers between 0 and 1 and take Y to be their sum. Then
    Y is a continuous random variable that can take any value between 0 and 2. The
    density curve of Y is the triangle shown below.




                                                                       Height = 1




                 0                  1                      2
           a. Verify by geometry that the area under this curve is 1.
           b. What is the probability that Y is less than 1. (Sketch the density curve,
              shade the area that represents the probability, then find that area. Do this
              for (c) also.)
           c. What is the probability that Y is less than 0.5?

   12. You have two instruments with which to measure the height of a tower. If the true
       height is 100 meters, measurements with the first instrument vary with mean 100
       meters and standard deviation 1.2 meters. Measurements with the second
       instrument vary with mean 100 meters and standard deviation 0.85 meter. You
       make one measurement with each instrument. Your results are X 1 for the first and
        X 2 for the second, and are independent.
            a. To combine the two measurements, you might average them,

                     Y  ( X1  X 2 ) 2
              What are the mean and standard deviation of Y?

           b. It makes sense to give more weight to the less variable measurement
              because it is more likely to be close to the truth. Statistical theory says that
              to make the standard deviation as small as possible you should weight the
              two measurements inversely proportional to their variances. The variance
              of X 2 is very close to half the variance of X 1 , so X 2 should get twice the
              weight of X 1 . That is, use
                            1      2
                      W  X1  X 2
                            3      3
              What are the mean and standard deviation of W?

   13. An insurance company sees that in the entire population of homeowners, the
       mean loss from fire is µ = $250 and the standard deviation of the loss is σ = $300.
       What are the mean and standard deviation of the total loss for 12 policies? (Losses
       on separate policies are independent.) What are the mean and standard deviation
       of the average loss for 12 policies?

Exercises 14 through 16 make use of the following information.
Portfolio analysis: Here are the means, standard deviations, and correlations for the
annual returns from three fidelity mutual funds for the 10 years ending in February 2004.
Because there are three random variables, there three correlations. We use subscripts to
show which pair of random variables a correlation refers to.

W = annual return on 500 Index Fund                           µW = 11.2%, σW = 17.46%
X = annual return on Investment Grade Bond Fund               µX = 6.46%, σX = 4.18%
Y = Annual return on Diversified International Fund           µY = 11.10%, σY = 15.62%
Correlations
ρWX =  0.22, ρwy = 0.56, ρxy =  0.12
14. Many advisors recommend using roughly 20% foreign stocks to diversify
    portfolios of U.S stocks. You see that the 500 Index (U.S stocks) and Diversified
    International (foreign stocks) Funds had almost the same mean returns. A
    portfolio of 80% 500 Index and 20% Diversified International will deliver this
    mean return with less risk. Verify this by finding the mean and standard deviation
    of returns on this portfolio.

15. Diversification works better when the investments in a portfolio have small
    correlations. To Demonstrate this, suppose that returns on 500 Index Fund and
    Diversified International fund had the means and standard deviations we have
    given but were uncorrelated (ρwy = 0). Show that the standard deviation of a
    portfolio that combines 80% 500 Index with 20% Diversified International is then
    smaller than your result from the previous exercise. What happens to the mean
    return if the correlation is 0?

16. Portfolios often contain more than two investments. The rules for means and
    variances continue to apply, though the arithmetic gets messier. A portfolio
    containing proportions a of 500 Index Fund, b of Investment Grade Bond Fund,
    and c of Diversified International Fund has return R = aW + bX+ cY. Because a, b,
    and c are the proportions invested in the three funds, a + b + c = 1. The mean and
    variance of the portfolio return R are

           µR = aµW + bµX + cµY
           σ2R = a2σ2W + b2σ2X + c2σ2Y + 2abρWXσX + 2acρWYσWσY +
           2bcρXYσXσY

   A basic well-diversified portfolio has 60% in 500 Index, 20% in Investment
   Grade Bond, and 20% in Diversified International. What are the (historical) mean
   and standard deviation of the annual returns for this portfolio? What does an
   investor gain by choosing this diversified portfolio over 100% U.S stocks? What
   does the investor lose (at least in this time period)?

17. Here are the counts (in thousands) of earned degrees in the United States in the
    2005-2006 academic year, classified by level and by the sex of the degree
    recipient:

                Bachelor's Master's Professional Doctorate Total
      Female           784      276            39        20     1119
      Male             559      197            44        25      825
      Total           1343      473            83        45     1944


       a. If you choose a degree recipient at random, what is the probability that the
          person you choose is a woman?
       b. What is the conditional probability that you choose a woman, given that
          the person chosen received a professional degree?
            c. Are the events “choose a woman” and “choose a professional degree
               recipient” independent? How do you know?

Exercises 18 - 20 make use of the following information.

Working: In the language of government statistics, you are “in the labor force” if you are
available for work and either working or actively seeking work. The unemployment rate
is the proportion of the labor force (not of the entire population) who are unemployed.
Here are data from the Current Population Survey for the civilian population aged 25
years and over at the end of 2003. The table entries are counts in thousands of people.

Higest Education                         Total population In Labor Force Employed
Did not finish high school                          28,021         12,623   11,552
high school but no college                          59,844         38,210   36,249
Some college, but no bachelor's degree              46,777         33,928   32,429
College graduate                                    51,568         40,414   39,250


   18. Find the unemployment rate for people with each level of education. How does
       the unemployment rate change with education? Explain carefully why your results
       show that level of education and being employed are not independent.

   19.
            a. What is the probability that a randomly chosen person 25 years of age or
               older is in the labor force?
            b. If you know that the person chosen is a college graduate, what is the
               conditional probability that he or she is in the labor force?
            c. Are the events “in the labor force” and “college graduate” independent?
               How do you know?

   20. You know that a person is employed. What is the conditional probability that he
       or she is a college graduate? You know that a second person is a college graduate.
       What is the conditional probability that he or she is employed?

   21. The probability that a randomly chosen student at the University of New
       Harmony is a woman is 0.6. The probability that the student is studying education
       is 0.15. The conditional probability that the student is a woman, given that the
       student is studying education, is 0.8. What is the conditional probability that the
       student is studying education, given that she is a woman?

Chapter 5

   1. In each situation below, is it reasonable to use a binomial distribution for the
      random variable X? Give reasons for your answer in each case. If a binomial
      distribution applies, give the values of n and p.
          a. Most calls made at random by sample surveys don’t succeed in talking
              with a live person. Of calls to New York City, only 1/12 succeed. A
           survey calls 500 randomly selected numbers in New York City. X is the
           number that reach a live person.
        b. At peak periods, 25% of attempted logins to an Internet service provider
           fail. Login attempts are independent and each has the same probability of
           failing. Darci logs in repeatedly until she succeeds. X is the number of the
           login attempt that finally succeeds.
        c. On a bright October day, Canada geese arrive to foul the pond at an
           apartment complex at the average rate of 12 geese per hour; X is the
           number of geese that arrive in the next three hours.

2. In each of situation below, is it reasonable to use a binomial distribution for the
   random variable X? Give reasons for your answer in each case.
       a. An auto manufacturer chooses one car from each hour’s production for a
          detailed quality inspection. One variable recorded is the count X of finish
          defects (dimples, ripples, etc.) in the car’s paint.
       b. The pool of potential jurors for a murder case contains 100 persons chosen
          at random from the adult residents of a large city. Each person in the pool
          is asked whether he or she opposes the death penalty; X is the number who
          say “Yes.”
       c. Joe buys a ticket in his state’s “pick 3” lottery game every week; X is the
          number of times in a year that he wins a prize.

3. Some of the methods in this chapter are approximations rather than exact
   probability results. We have given rules of thumb for safe use of these
   approximations.
      a. You are interested in attitudes toward drinking among the 75 members of
          a fraternity. You choose 25 members at random to interview. One question
          is “Have you had five or more drinks at one time during the last week?”
          Suppose that in fact 20% of the 75 members would say “Yes.” Explain
          why you cannot safely use the B (25, 0.2) distribution for the count X in
          your sample who say “Yes.”
      b. The National AIDS Behavioral Surveys found that 0.2% (that’s 0.002 as a
          fraction) of adult heterosexuals had both received a blood transfusion and
          had a sexual partner from a group at high risk of AIDS. Suppose that this
          national proportion holds for your region. Explain why you cannot safely
          use the normal approximation for the sample proportion who fall in this
          group when you interview an SRS of 500 adults.

4. “What do you think is the ideal number of children for a family to have?” A
   Gallup poll asked this question to 1006 randomly chosen adults. Almost half
   (49%) thought two children was ideal. Suppose that p = 0.49 is exactly true for
   the population of all adults. Gallup announced a margin of error of ±3 percentage
                                                                                ˆ
   points for this poll. What is the probability that the sample proportion p for an
   SRS of size n = 1006 falls between 0.46 and 0.52? You see that it is likely, but not
   certain, that polls like this give results that are correct within their margin of error.
   We will say more about margins of error in Chapter 6.
5. Return to the Gallup poll setting of the previous question. We are supposing that
   the proportion of all adults who think that two children is ideal is p = 0.49. What
                                                ˆ
   is the probability that a sample proportion p falls between 0.46 and 0.52 (that is,
   within ±3 percentage points of the true p) if the sample is an SRS of size n = 250?
   Of size n = 4000? Combine these results with your work in the previous question
   to make a general statement about the effect of larger samples in a sample survey.

6. The changing probabilities you found in questions 4 and 5 are due to the fact that
                                                      ˆ
   the standard deviation of the sample proportion p gets smaller as the sample size
   n increases. If the population proportion is p = 0.49, how large a sample is needed
   to reduce the standard deviation of p to  p  0.005 ? (According to the 68-95-
                                        ˆ       ˆ

   99.7 rule, when the standard deviation is this small, about 95% of all samples will
         ˆ
   have p within 0.01 of the true p.)

7. A selective college would like to have an entering class of 1200 students. Because
   not all students who are offered admission accept, the college admits more than
   1200 students. Past experience shows that about 70% of the students admitted will
   accept. The college decides to admit 1500 students. Assuming that students make
   their decisions independently, the number who accept X has the B (1500, 0.7)
   distribution. If this number is less than 1200, the college will admit students from
   its waiting list.

       a. What are the mean and the standard deviation of the number of X students
          who accept?
       b. Use the normal approximation to find the probability that at least 1000
          students accept.
       c. The college does not want more than 1200 students. What is the
          probability that more than 1200 will accept?
       d. If the college decides to increase the number of admission offers to 1700,
          what is the probability that more than 1200 will accept?

8. The scores of high school seniors on the ACT college entrance examination in
   2003 had mean µ = 20.8 and standard deviation σ = 4.8. The distribution of
   scores is only roughly normal.
       a. What is the approximate probability that a single student randomly chosen
           from all those taking the test scores 23 or higher?
       b. Now take an SRS of 25 students who took the test. What are the mean and
           standard deviation of the sample mean score X of these 25 students?
       c. What is the approximate probability that the mean score X of these
           students is 23 or higher?

9. North Carolina State University posts the grade distributions for its courses
   online. You can find that the distribution of grades in Statistics 101 in the fall
   2003 semester was
           Grade               A           B           C         D           F
           Probability        0.21        0.43        0.3       0.05        0.01

       a. Using the common scale A = 4, B = 3, C = 2, D = 1, F = 0, take X to be the
          grade of a randomly chosen Statistics 101 student. Find the mean µ and
          standard deviation σ of grades in this course.
       b. Statistics 101 is a large course. We can take the grades of an SRS of 50
          students to be independent of each other. If X is the average of these 50
          grades, what are the mean and standard deviation of X ?
       c. What is the probability P (X ≥ 3) that a randomly chosen Statistics 101
          student gets a B or better? What is the approximate probability P ( X ≥ 3)
          that the grade point average for 50 randomly chosen Statistics 101
          students is B or better?

10. A $1 bet in a state lottery’s Pick 3 game pays $500 if the three-digit number you
    choose exactly matches the winning number, which is drawn at random. Here is
    the distribution of the payoff X:

                     Payoff X      $0 $500
                     Probability 0.999 0.001


       Each day’s drawing is independent of other drawings.
       a. What are the mean and standard deviation of X?
       b. Joe buys a Pick 3 ticket every day. What is does the law of large numbers
          say about the average payoff Joe receives from his bets?
       c. What does the central limit theorem say about the distribution of Joe’s
          average payoff after 365 bets in a year?
       d. Joe comes out ahead for the year if his average payoff is greater than $1
          (the amount he spent each day on a ticket). What is the probability that Joe
          ends the year ahead?

11. The distribution of annual returns on common stocks is roughly symmetric, but
    extreme observations are more frequent than in normal distribution. Because the
    distribution is not strongly nonnormal, the mean return over even a moderate
    number of years is close to normal. Annual real returns on the Standard & Poor’s
    500-Stock Index over the period 1871 to 2004 have varied with mean 9.2% and
    standard deviation 20.6%. Andrew plans to retire in 45 years and is considering
    investing in stocks. What is the probability (assuming that the past pattern of
    variation continues) that the mean annual return on common stocks over the next
    45 years will exceed 15%? What is the probability that the mean return will be
    less than 5%?

12. According to genetic theory, the blossom color in the second generation of a
    certain cross of sweet peas should be red or white in a 3:1 ratio. That is, each
    plant has probability ¾ of having red blossoms, and the blossom colors of
    separate plants are independent.
       a. What is the probability that exactly 6 out of 8 of these plants have red
          blossoms?
       b. What is the mean number of red-blossomed plants when 80 plants of this
          type are grown from seeds?
       c. What is the probability of obtaining at least 50 red-blossomed plants when
          80 plants are grown from seeds?

13. Does delaying oral practice hinder learning a foreign language? Researchers
    randomly assigned 23 beginning students of Russian to begin speaking practice
    immediately and another 23 to delay speaking for 4 weeks. At the end of the
    semester both groups took a standard test of comprehension of spoken Russian.
    Suppose that in the population of all beginning students, the test scores for early
    speaking vary according to the N (32, 6) distribution and scores for delayed
    speaking have the N (29, 5) distribution.
       a. What is the sampling distribution of the mean score X in the early
           speaking group in many repetitions of the experiment? What is the
           sampling distribution of the mean score Y in the delayed-speaking group?

       b. If the experiment were repeated many times, what would be the sampling
          distribution of the difference Y  X between the mean scores in the two
          groups?

       c. What is the probability that the experiment will find (misleadingly) that
          the mean score for delayed speaking is at least as large as that for early
          speaking?

14. Suppose (as is roughly true) that 88% of college men and 82% of college women
    were employed last summer: A sample survey interviews SRSs of 500 college
    men and 500 college women. The two samples are of course independent.
                                                                  ˆ
       a. What is the approximate distribution of the proportion pF of women who
          worked last summer? What is the approximate distribution of the
                        ˆ
          proportion pM of men who worked?
       b. The survey wants to compare men and women. What is the approximate
          distribution of the difference in the proportions who worked, pM  pF ?
                                                                         ˆ    ˆ
          Explain the reasoning behind your answer.
       c. What is the probability that in the sample a higher proportion of women
          than men worked last summer?


15. A fair coin is tossed 250 times.
       a. Name the Bernoulli and Binomial random variables involved in this
            experiment. State explicitly the parameters of the Bernoulli and Binomial
            random variables in this setting and the relationship of Binomial and
            Bernoulli random variables.
            b. Write down and briefly explain an expression for the probability that 120
               heads are observed. Get an approximation of this probability using the
               normal tables.
            c. Use the normal tables to approximate the probability of observing more
               than 140 heads.

Chapter 6

   1. Suppose that the sample mean is 50 and the standard deviation is assumed to be 5.
      Make a diagram that illustrates the effect of sample size on the width of a 95%
      interval. Use the following sample size: 10, 20, 40, and 100. Summarize what the
      diagram shows.

   2. A study with 25 observations gave a mean of 70. Assume that the standard
      deviation is 15. Make a diagram that illustrates the effect of the confidence level
      on the width of the interval. Use 80%, 90%, 95%, and 99%. Summarize what the
      diagram shows.

   3. Consider the following two scenarios. (A) Take an SRS of 100 students from an
      elementary school with children in grades kindergarten through fifth grade. (B)
      Take a simple random sample of 100 third-graders from the same school. For
      each of these samples you will measure the height of each child in the sample.
      Which sample should have the smaller margin of error for 95% confidence?
      Explain your answer.

   4. A questionnaire about study habits was given to a random sample of students
      taking a large introductory statistics class. The sample of 25 students reported that
      they spent an average of 80 minutes per week studying statistics. Assume that the
      standard deviation is 35 minutes.

            a. Give a 95% confidence interval for the mean time spent studying statistics
               by students in this class.
            b. Is it true that 95% of the students in the class have weekly study times that
               lie in the interval you found in part (a)? Explain your answer.

   5. You are planning a survey of starting salaries for recent liberal arts major
      graduates from your college. From a pilot study you estimate that the standard
      deviation is about $9000. What sample size do you need to have a margin of error
      equal to $40 with 95% confidence?

   6. Suppose that in the setting of the previous question you are willing to settle for a
      margin of error of $800. Will the required sample size be larger or smaller?
      Verify your answer by performing the calculations.

   7. To assess the accuracy of a laboratory scale, a standard weight know to weigh 10
      grams is weighed repeatedly. The scale readings are normally distributed with
   unknown mean (this mean is 10 grams if the scale has no bias). The standard
   deviation of the scale readings is known to be 0.0002 gram.
      a. The weight is weighed five times. The mean result is 10.0023 grams. Give
           a 98% confidence interval for the mean of repeated measurements of the
           weight.
      b. How many measurements must be averaged to get a margin of error of
           ±0.0001 with 98% confidence?

8. A newspaper invites readers to send email stating whether they are in favor of
   making full-day kindergarten available to all students in the state. A total of 320
   responses are received and, of these, 80% are in favor of the new program. In an
   article describing the results, the authors state that the margin of error is 4% for
   95% confidence. Assume that they have computed this number correctly.
       a. Use the sample proportion and the margin of error to compute the 95%
            confidence interval.
       b. Do you think that these results are trustworthy? Discuss your answer.

9. Here are several situations where there is an incorrect application of the ideas
    presented in Chapter 6. Write a short paragraph explaining what is wrong in each
    situation and why it is wrong.
        a. A climatologist wants to test the null hypothesis that it will rain tomorrow.
        b. A random sample of size 20 is taken from a population that is assumed to
            have a standard deviation of 15. The standard deviation of the sample
            mean is 15/20.
        c. A researcher tests the following null hypothesis: H 0 : X  10
10. Here are several situations where there is an incorrect application of the ideas
    presented in Chapter 6. Write a short paragraph explaining what is wrong in each
    situation and why it is wrong.
        a. A change is made that should improve student satisfaction with the way
            grades are processed at your college. The null hypothesis, that there is an
            improvement, is tested versus the alternative, that there is no
            improvement.
        b. A significance test rejected the null hypothesis that the sample mean is 25.
        c. A report on a study says that the results are statistically significant and the
            P-value is 0.95.

11. Translate each of the following research questions into appropriate Ho and Ha.
       a. Census Bureau data show that the mean household income in the area
           served by a shopping mall is $72,500 per year. A market research firm
           questions shoppers at the mall to find out whether the mean household
           income of mall shoppers is higher than that of the population.
       b. Last year, your company’s service technicians took an average of 1.8
           hours to respond to trouble calls from business customers who had
           purchased service contracts. Do this year’s data show a different average
           response time?
12. A test statistic for a two-sided significance test for a population mean is z = 2.3.
    Sketch a standard normal curve and mark this value of z on it. Find the P-value
    and shade the appropriate areas under the curve to illustrate your calculations.

13. The P-value for a significance test is 0.082.
       a. Do you reject the null hypothesis at level α = 0.05?
       b. Do you reject the null hypothesis at level α = 0.01?
       c. Explain your answers.

14. The P-value for a significance test is 0.032.
       a. Do you reject the null hypothesis at level α = 0.05?
       b. Do you reject the null hypothesis at level α = 0.01?
       c. Explain your answers.

15. A test of the null hypothesis Ho: µ = µo gives test statistic z = 1.6.
        a. What is the P-value if the alternative is Ha: µ > µo?
        b. What is the P-value if the alternative is Ha: µ < µo?
        c. What is the P-value if the alternative is Ha: µ ≠ µo?

16. A test of the null hypothesis Ho: µ = µo gives test statistic z = 1.6 .
        a. What is the P-value if the alternative is Ha: µ > µo?
        b. What is the P-value if the alternative is Ha: µ < µo?
        c. What is the P-value if the alternative is Ha: µ ≠ µo?

17. The P-value for a two-sided test of the null hypothesis Ho: µ = 30 is 0.09.
       a. Does the 95% confidence interval include the value 30? Why?
       b. Does the 90% confidence interval include the value 30? Why?

18. The P-value for a two-sided test of the null hypothesis Ho: µ = 30 is 0.04.
       a. Does the 95% confidence interval include the value 30? Why?
       b. Does the 90% confidence interval include the 30? Why?

19. A 95% confidence interval for a population mean is (57, 65).
       a. Can you reject the null hypothesis that µ = 68 at the 5% significance
          level? Why?
       b. Can you reject the null hypothesis that µ = 62 at the 5% significance
          level? Why?

20. A 90% confidence interval for a population mean is (12, 15).
       a. Can you reject the null hypothesis that µ = 13 at the 10% significance
          level? Why?
       b. Can you reject the null hypothesis that µ = 10 at the 10% significance
          level? Why?
21. The survey of Study Habits and Attitudes (SSHA) is a psychological test that
    measures the motivation, attitude toward school, and study habits of students.
    Scores range from 0 to 200. The mean score for U.S college students is about 115,
    and the standard deviation is about 30. A teacher who suspects that older students
    have better attitudes toward school gives the SSHA to 25 students who are at least
    30 years of age. Their mean score is x  132.2
        a. Assuming that σ = 30 for the population of older students, carry out a test
           of
                Ho: µ = 115
                Ho: µ > 115
           Report the P-value of your test, and state your conclusion clearly.

       b. Your test in (a) required two important assumptions in addition to the
          assumption that the value of σ is known. What are they? Which of these
          assumptions is most important to the validity of your conclusion in (a)?

22. The level of calcium in the blood in healthy young adults varies with mean about
    9.5 milligrams per deciliter and standard deviation about σ = 0.4. A clinic in rural
    Guatemala measures the blood calcium level of 160 healthy pregnant women at
    their first visit for prenatal care. The mean is x = 9.57. Is this an indication that
    the mean calcium level in the population from which these women come differs
    from 9.5?
        a. State Ho and Ha.
        b. Carry out the test and give the P-value, assuming that σ = 0.4 in this
             population. Report your conclusion.
        c. Give a 95% confidence interval for the mean calcium level µ in this
             population. We can see that µ lies quite close to 9.5. This illustrates the
             fact that a test based on a large sample will often declare even a small
             deviation from Ho to be statistically significant.

23. Explain in plain language why a significance test that is significant at the 1% level
    must always be significant at the 5% level.

24. You are told that a significance test is significant at the 5% level. From this
    information can you determine whether or not it is significant at the 1% level?
    Explain your answer.

25. You will perform a significance test of
       Ho: µ = 0 versus Ha: µ > 0
       a. What values of z would lead you to reject Ho at the 5% level?
       b. If the alternative hypothesis was
               Ha: µ ≠ 0
       what values of z would lead you to reject Ho at the 5% level?
       c. Explain why your answers to parts (a) and (b) are different.
26. Radon is colorless, odorless gas that is naturally released by rocks and soils and
    may concentrate in tightly closed houses. Because radon is slightly radioactive,
    there is some concern that it may be a health hazard. Radon detectors are sold to
    homeowners worried about the risk, but the detectors may be inaccurate.
    University researchers placed 12 detectors in a chamber where they were exposed
    to 105 picocuries per liter (pCi/l) of radon over 3 days. Here are the readings
    given by the detectors:

                 91.9      97.8      111.4     122.3      105.4         95
                103.8      99.6       96.6     119.3      104.8      101.7


          Assume (unrealistically) that you know that the standard deviation of
          readings for all detectors of this type is σ = 9.
       a. Give a 95% confidence interval for the mean reading µ for this type of
          detector.
       b. Is there significant evidence at the 5% level that the mean reading differs
          from the true value 105? State hypotheses and conduct a significance test
          based on your confidence interval from (a).

27. Consumers can purchase nonprescription medications at food stores, mass
    merchandise stores such as Kmart and Wal-Mart, or pharmacies. About 45% of
    consumers make such purchases at pharmacies. What accounts for the popularity
    of pharmacies, which often charge higher prices? A study examined consumers’
    perceptions of overall performance of the three types of stores, using a long
    questionnaire that asked about such things as “neat and attractive store,”
    “knowledgeable staff,” and “assistance in choosing among various types of
    nonprescription medication.” A performance score was based on 27 such
    questions. The subjects were 201 people chosen at random from the Indianapolis
    telephone directory. Here are the means and standard deviations of the
    performance scores for the sample:

          Store type           x     s
          Food store         18.67 24.95
          Mass merchandisers 32.38 33.37
          Pharmacies         48.60 35.62


We do not know the population standard deviations, but a sample standard deviation s
from so large a sample is usually close to σ. Use s in place of the unknown σ in this
exercise.
       a. What population do you think the authors of the study want to draw
           conclusions about? What population are you certain they can draw
           conclusions about?
       b. Give 95% confidence intervals for the mean performance for each type of
           store.
c. Based on these confidence intervals, are you convinced that consumers
   think that pharmacies offer higher quality services than the other types of
   stores?

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:7/16/2013
language:English
pages:22