Document Sample

Econ 2500 – Introductory Statistics York University Department of Economics Professor Xianghong Li Practice Problems Chapter 1 1. The states differ greatly in the kinds of severe weather that afflict them. Table ta01_005 shows the average property damage caused by tornadoes per year over the period from 1950 to 1999 in each of the 50 states and Puerto Rico. a. What are the top five states for tornado damage? The bottom five? b. Make a histogram of the data by hand, with classes "0 damage 10," "10 damage 20," and so on. Describe the shape, center, and spread of the distribution. Which states may be outliers. 2. ex01-035 presented data on the nightly study time claimed by first-year college men and women. The most common methods for formal comparison of two groups use x and s to summarize the data. We wonder if this is appropriate here. a. What kinds of kinds of distributions are best summarized by x and s? b. Use R to draw separate histograms for men and women. c. Each set of study times appears to contain a high outlier. Are these points flagged as suspicious by the 1.5 IQR rule? How much does removing the outlier change x and s for each group? The presence of outliers makes us reluctant to use the mean and standard deviation for these data unless we remove the outliers on the grounds that these students were exaggerating. 3. Create a set of 5 positive numbers (repeats allowed) that have median 10 and mean 7. What thought process did you use to create your numbers? 4. Use the definition of the mean x to show that the sum of the deviations xi x of the observations from their mean is always zero. This is one reason why the variance and standard deviation use squared deviations. 5. If you ask a computer to generate “random numbers” between 0 and 1, you will get observations from a uniform distribution. The following figure graphs the density curve for a uniform distribution. Use areas under this density curve to answer the following questions. 0 1 a. Why is the total area under this curve equal to 1? b. What proportion of the observations lie above 0.75? c. What proportion of the observations lie between 0.25 and 0.75? 6. What are the mean and median of the uniform distribution in the graph of the previous question? What are the quartiles? 7. Find the value z of a standard normal variable Z that satisfies each of the following conditions. (If you use table A, report the value of z that comes closest to satisfying the condition.) In each case, sketch a standard normal curve with your value of z marked on the axis. a. 20% of the observations fall below z. b. 30% of the observations fall above z. 8. The Wechsler Adult Intelligence Scale (WAIS) is the most common “IQ test.” The scale of scores is set separately for each age group and is approximately normal with mean 100 and standard deviation 15. People with WAIS scores below 70 are considered mentally retarded when, for example, applying for Social Security disability benefits. What percent of adults are retarded by this criterion? 9. The quartiles of any distribution are the values with cumulative proportions 0.25 and 0.75. a. What are the quartiles of the standard normal distribution? b. Using your numerical values from (a), write an equation that gives the quartiles of N (µ, σ) distributions in terms of µ and σ. c. The length of human pregnancies from conception to birth varies according to a distribution that is approximately normal with mean 266 days and standard deviation 16 days. Apply your results from (b): what are the quartiles of the distribution of lengths of human pregnancies? 10. Use R to generate 100 observations from the standard normal distribution. Make a histogram of these observations. How does the shape of the histogram compare with a normal density curve? Make a normal quantile plot of the data. Does the plot suggest any important deviations from normality? (Repeating this exercise is a good way to become familiar with how histograms and normal quantile plots look when data actually are close to normal.) 11. Use R to plot a standard normal density curve and add a normal density curve (0, 2) N on the top of it (as I showed you in class). In another graph, plot a standard normal density curve and add a normal density curve (2, 1) N – on the top of it. Note: to get the two density curves shown properly, in each graph you have to let X take values more than 3 standard deviations of both normal distributions. Chapter 2 1. Mutual-fund reports often give correlations to describe how the prices of different investments are related. You look at the correlations between three Fidelity funds and the Standard & Poor’s 500-Stock Index, which describes stocks of large U.S companies. The three funds are Dividend Growth (stocks of large U.S companies), Small Cap Stock (stocks of small U.S companies), and Emerging Markets (stocks in developing countries). For 2003, the three correlations are r = 0.35, r = 0.81, and r = 0.98. a. Which correlation goes with each fund? Explain your answer. b. The correlations of the three funds with the index are all positive. Does this tell you that stocks went up in 2003? Explain your answer. 2. A student wonders if tall women tend to date taller men than do short women. She measures herself, her dormitory roommate, and the women in the adjoining rooms; then she measure the next man each woman dates. Here are the data (heights in inches): Women (x) Men (y) 66 72 64 68 66 70 65 68 70 71 65 65 a. Make a scatterplot of these data. Based on the scatterplot, do you expect the correlation to be positive or negative? Near ±1 or not? b. Find the correlation, r, between the heights of the men and women. c. How would r change if all the men were 6 inches shorter than the heights given in the table? Does the correlation tell us whether women tend to date men taller than themselves? d. If heights were measured in centimeters rather than inches, how would the correlation change? (There are 2.54 centimeters in an inch.) e. If every woman dated a man exactly 3 inches taller than herself, what would be the correlation between male and female heights? 3. Make a scatterplot (by hand) of the following data: x 1 2 3 4 10 10 y 1 3 3 5 1 11 Write down the formula, and calculate the correlation without using the built-in formula in your calculator. What feature of the data is responsible for reducing the correlation to this value despite a strong straight-line association between x and y in most of the observations? 4. Each of the following statements contains a blunder. Explain in each case what is wrong. a. “There is a high correlation between the gender of American workers and their income.” b. “We found high correlation (r = 1.09) between students’ ratings of faculty teaching and ratings made by other faculty members.” c. “The correlation between planting rate and yield of corn was found to be r = 0.23 bushel.” 5. Many colleges offer online versions of some courses that are also taught in the classroom. It often happens that the students who enroll in the online versions of do better than the classroom students on the course exams. This does not show that online instruction is more effective than classroom teaching, because the kind of people who sign up for online courses are often quite different from the classroom students. Suggest some student characteristics that you think could be confounded with online versus classroom. Use a diagram to illustrate your ideas. 6. Data show that men who are married, and also divorced or widowed men, earn quite a bit more than men who have never been married. This does not mean that a man can raise his income by getting married. Suggest several lurking variables that you think are confounded with marital status and that help explain the association between marital status and income. Use a diagram to illustrate your ideas. 7. A study shows that there is a positive correlation between the size of a hospital (measured by its number of beds x) and the median number of days, y, that patients remain in the hospital. Does this mean that you can shorten hospital stay by choosing a small hospital? Use a diagram to explain the association. Chapter 3 1. A typical hour of prime-time television shows three to five violent acts. Linking family interviews and police records shows a clear association between time spent watching TV as a child and later aggressive behavior. a. Explain why this is an observational study rather than an experiment. What are the explanatory and response variables? b. Suggest several lurking variables describing a child’s home life that may be confounded with how much TV he or she watches. Explain why confounding makes it difficult to conclude that more TV causes more aggressive behavior. 2. Several large observational studies suggested that women who take hormones such as estrogen after menopause have lower risk of a heart attack than women who do not take hormones. Hormone replacement became popular. But in 2002, several careful experiments showed that hormone replacement does not reduce heart attacks. The National Institutes of Health, after reviewing the evidence, concluded that the observational studies were wrong. Taking hormones after menopause quickly fell out of favor. a. Explain the difference between an observational study and an experiment to compare women who do and don’t take hormones after menopause. b. Suggest some characteristics of women who choose to take hormones that might affect the rate of heart attacks. In an observational study, these characteristics are confounded with taking hormones. 3. Stores advertise price reductions to attract customers. What type of price cut is most attractive? Market researchers prepared ads for athletic shoes announcing different levels of discounts (20%, 40%, 60%, or 80%). The student subjects who read the ads were also given “inside information” about the fraction of shoes on sale (25%, 50%, 75%, or 100%). Each subject then rated the attractiveness of the sale on a scale of 1 to 7. a. There are two factors. Make a sketch that displays the treatments formed by all combinations of levels of the factors. b. Outline a completely randomized design using 80 student subjects. Use R to conduct random trials to choose the subjects for the first treatment. 4. “Bee pollen is effective for combating fatigue, depression, cancer, and colon disorders.” So says a Web site that offers the pollen for sale. We wonder if bee pollen really does prevent colon disorders. Here are two ways to study this question. Explain why the second design will produce more trustworthy data. a. Find 200 women who take bee pollen regularly. Match each with a woman of the same age, race, and occupation who does not take bee pollen. Follow both groups for 5 years. b. Find 400 women who do not have colon disorders. Assign 200 to take bee pollen capsules and the other 200 to take placebo capsules that are identical in appearance. Follow both groups for 5 years. 5. Calcium is important to the development of young girls. To study how the bodies of young girls process calcium, investigators used the setting of a summer camp. Calcium was given in Hawaiian Punch at either a high or a low level. The camp diet was otherwise the same for all girls. Suppose that there are 60 campers. a. Outline a completely randomized design for this experiment. b. Describe a matched pairs design in which each girl receives both levels of calcium (with a “washout period” between). What is the advantage of the matched pairs design over the completely randomized design? 6. At a party there are 30 students over age 21 and 20 under age 21. You choose at random 3 of those over 21 and separately choose at random 2 of those under 21 to interview about attitudes toward alcohol. You have given every student at the party the same chance to be interviewed: what is that chance? Why is your sample not SRS? 7. The following figure shows histograms of four sampling distributions of statistics intended to estimate the same parameter. Label each distribution relative to the others as high or low bias and as high or low variability. Chapter 4 1. About 30% of adult Internet users are between 18 and 29 years of age. Suppose the probability that a randomly chosen Internet user is in this age group is exactly 0.3. Use R to make a study of short-term variability and long-term regularity as follows. a. Set the probability of heads to 0.3. Each head stands for an Internet user who is between 18 and 29 and each tail is a user who is not. Set the number of tosses to 20. What is the proportion of heads? Do these 25 times, keep a record of the 25 proportions of heads, and make a stemplot of these numbers. Lesson: In the short run (20 repetitions) proportions are quite variable and are often not close to the probability. b. With the probability of heads still set to 0.3, make 200 tosses. What was the proportion of heads? Do this 25 times and make a stemplot of the 25 proportions of heads. Lesson: More repetitions make proportions less variable and generally closer to the probability. 2. The same setting as the previous question a. Simulate 100 draws of 20 Internet users from the population. (That is, ask the software to generate 100 binomial observations, each with n = 20 trials and probability p = 0.3 of a “yes”.) Record the count in the 18 to 29 age group on each draw. Convert the counts into percents of the 20 Internet users in each trial who are 18 to 29. Make a histogram of these 100 percents. Describe the shape, center, and spread of this distribution. b. Now simulate drawing 320 Internet users. (That is, set n = 320 and p = 0.3.) Do this 100 times and record the percent in the 18 to 29 age group for each of the 100 draws. Make a histogram of the percents and describe the shape, center, and spread of the distribution. c. In what ways are the distributions in part (a) and (b) alike? In what ways do they differ? (Because regularity emerges in the long run, we expect the results of drawing 320 subjects to be less variable than the results of drawing 20 subjects.) 3. Dugout Lou thinks that the probabilities for the American League baseball champion are as follows. The Yankees have probability 0.6 of winning. The Red Sox and Angels have equal probabilities winning. The Athletics and White Sox have equal probabilities, but their probabilities are one-third that of Red Sox and Angels. No other team has a chance. What is Lou’s assignment of probabilities to teams? 4. The 2000 census allowed each person to choose from a long list of races. That is, in the eyes of the Census Bureau, you belong to whatever race you say you belong to. “Hispanic/Latino” is a separate category; Hispanics may be of any race. If we choose a resident of the United States at random, the 2000 census gives these probabilities: Hispanic Not Hispanic Asian 0.000 0.036 Black 0.003 0.121 White 0.060 0.691 Other 0.062 0.027 Let A be the event that a randomly chosen American is Hispanic, and let B be the event that the person chosen is white. a. Verify that the table gives a legitimate assignment of probabilities. b. What is P(A)? c. Describe Bc in words and find P(Bc) by the complement rule. d. Express “the person chosen is a non-Hispanic white” in terms of events A and B. What is the probability of this event? 5. Most sample surveys use random digit dialing equipment to call residential telephone numbers at random. The telephone polling firm Zogby International reports that the probability that a call reaches a live person is 0.2. Calls are independent. a. A polling firm places 5 calls. What is the probability that none of them reaches a person? b. When calls are made to New York City, the probability of reaching a person is only 0.08. What is the probability that none of 5 calls made to New York City reaches a person? Questions 6 through 9 are based on the following information about Mendelian inheritance of blood type. Each of us has an ABO blood type, which describes whether two characteristics called A and B are present. Every human being has two blood type alleles (gene forms), one inherited from our mother and one from our father. Each of these alleles can be A, B, or O. Which two we inherit determines our blood type. Here is a table that shows what our blood type is for each combination of two alleles: Alleles inherited Blood type A and A A A and B AB A and O A B and B B B and O B O and O O We inherit each of a parent’s two alleles with probability 0.5. We inherit independently from our mother and father 6. Hannah and Jacob both have alleles A and B. a. What blood types can their children have? b. What is the probability that their next child has each of these blood types? 7. Nancy and David both have alleles B and O. a. What blood types can their children have? b. What is the probability that their next child has each of these blood types? 8. Jennifer has alleles A and O. Jose has alleles A and B. they have two children. What is the probability that both children have blood type A? What is the probability that both children have the same blood type? 9. Jasmine has alleles A and O. Joshua has alleles B and O. a. What is the probability that a child of these parents has blood type O? b. If Jasmine and Joshua have three children, what is the probability that all three have blood type O? What is the probability that the first child has blood type O and the next two do not? 10. Some games of chance rely on tossing two dice. Each die has six faces, marked with 1, 2, …, 6 spots called pips. The dice used in casinos are carefully balanced so that each face is equally likely to come up. When two dice are tossed, each of the 36 possible pairs of faces is equally likely to come up. The outcome of interest to a gambler is the sum of the pips on the two up-faces. Call this random variable X. a. Write down all 36 possible pairs of faces b. If all pairs have the same probability, what must be the probability of each pair? c. Write down the value of X next to each pair of faces and use this information with the result of (b) to give the probability distribution of X. Draw a probability histogram to display the distribution. d. One bet available in craps wins if a 7 or an 11 comes up on the next roll of two dice. What is the probability of rolling a 7 or an 11 on the next roll? e. Several bets in craps lose if a 7 is rolled. If any outcome other than 7 occurs, these bets either win or continue to the next roll. What is the probability that anything other than a 7 is rolled? 11. Generate two random numbers between 0 and 1 and take Y to be their sum. Then Y is a continuous random variable that can take any value between 0 and 2. The density curve of Y is the triangle shown below. Height = 1 0 1 2 a. Verify by geometry that the area under this curve is 1. b. What is the probability that Y is less than 1. (Sketch the density curve, shade the area that represents the probability, then find that area. Do this for (c) also.) c. What is the probability that Y is less than 0.5? 12. You have two instruments with which to measure the height of a tower. If the true height is 100 meters, measurements with the first instrument vary with mean 100 meters and standard deviation 1.2 meters. Measurements with the second instrument vary with mean 100 meters and standard deviation 0.85 meter. You make one measurement with each instrument. Your results are X 1 for the first and X 2 for the second, and are independent. a. To combine the two measurements, you might average them, Y ( X1 X 2 ) 2 What are the mean and standard deviation of Y? b. It makes sense to give more weight to the less variable measurement because it is more likely to be close to the truth. Statistical theory says that to make the standard deviation as small as possible you should weight the two measurements inversely proportional to their variances. The variance of X 2 is very close to half the variance of X 1 , so X 2 should get twice the weight of X 1 . That is, use 1 2 W X1 X 2 3 3 What are the mean and standard deviation of W? 13. An insurance company sees that in the entire population of homeowners, the mean loss from fire is µ = $250 and the standard deviation of the loss is σ = $300. What are the mean and standard deviation of the total loss for 12 policies? (Losses on separate policies are independent.) What are the mean and standard deviation of the average loss for 12 policies? Exercises 14 through 16 make use of the following information. Portfolio analysis: Here are the means, standard deviations, and correlations for the annual returns from three fidelity mutual funds for the 10 years ending in February 2004. Because there are three random variables, there three correlations. We use subscripts to show which pair of random variables a correlation refers to. W = annual return on 500 Index Fund µW = 11.2%, σW = 17.46% X = annual return on Investment Grade Bond Fund µX = 6.46%, σX = 4.18% Y = Annual return on Diversified International Fund µY = 11.10%, σY = 15.62% Correlations ρWX = 0.22, ρwy = 0.56, ρxy = 0.12 14. Many advisors recommend using roughly 20% foreign stocks to diversify portfolios of U.S stocks. You see that the 500 Index (U.S stocks) and Diversified International (foreign stocks) Funds had almost the same mean returns. A portfolio of 80% 500 Index and 20% Diversified International will deliver this mean return with less risk. Verify this by finding the mean and standard deviation of returns on this portfolio. 15. Diversification works better when the investments in a portfolio have small correlations. To Demonstrate this, suppose that returns on 500 Index Fund and Diversified International fund had the means and standard deviations we have given but were uncorrelated (ρwy = 0). Show that the standard deviation of a portfolio that combines 80% 500 Index with 20% Diversified International is then smaller than your result from the previous exercise. What happens to the mean return if the correlation is 0? 16. Portfolios often contain more than two investments. The rules for means and variances continue to apply, though the arithmetic gets messier. A portfolio containing proportions a of 500 Index Fund, b of Investment Grade Bond Fund, and c of Diversified International Fund has return R = aW + bX+ cY. Because a, b, and c are the proportions invested in the three funds, a + b + c = 1. The mean and variance of the portfolio return R are µR = aµW + bµX + cµY σ2R = a2σ2W + b2σ2X + c2σ2Y + 2abρWXσX + 2acρWYσWσY + 2bcρXYσXσY A basic well-diversified portfolio has 60% in 500 Index, 20% in Investment Grade Bond, and 20% in Diversified International. What are the (historical) mean and standard deviation of the annual returns for this portfolio? What does an investor gain by choosing this diversified portfolio over 100% U.S stocks? What does the investor lose (at least in this time period)? 17. Here are the counts (in thousands) of earned degrees in the United States in the 2005-2006 academic year, classified by level and by the sex of the degree recipient: Bachelor's Master's Professional Doctorate Total Female 784 276 39 20 1119 Male 559 197 44 25 825 Total 1343 473 83 45 1944 a. If you choose a degree recipient at random, what is the probability that the person you choose is a woman? b. What is the conditional probability that you choose a woman, given that the person chosen received a professional degree? c. Are the events “choose a woman” and “choose a professional degree recipient” independent? How do you know? Exercises 18 - 20 make use of the following information. Working: In the language of government statistics, you are “in the labor force” if you are available for work and either working or actively seeking work. The unemployment rate is the proportion of the labor force (not of the entire population) who are unemployed. Here are data from the Current Population Survey for the civilian population aged 25 years and over at the end of 2003. The table entries are counts in thousands of people. Higest Education Total population In Labor Force Employed Did not finish high school 28,021 12,623 11,552 high school but no college 59,844 38,210 36,249 Some college, but no bachelor's degree 46,777 33,928 32,429 College graduate 51,568 40,414 39,250 18. Find the unemployment rate for people with each level of education. How does the unemployment rate change with education? Explain carefully why your results show that level of education and being employed are not independent. 19. a. What is the probability that a randomly chosen person 25 years of age or older is in the labor force? b. If you know that the person chosen is a college graduate, what is the conditional probability that he or she is in the labor force? c. Are the events “in the labor force” and “college graduate” independent? How do you know? 20. You know that a person is employed. What is the conditional probability that he or she is a college graduate? You know that a second person is a college graduate. What is the conditional probability that he or she is employed? 21. The probability that a randomly chosen student at the University of New Harmony is a woman is 0.6. The probability that the student is studying education is 0.15. The conditional probability that the student is a woman, given that the student is studying education, is 0.8. What is the conditional probability that the student is studying education, given that she is a woman? Chapter 5 1. In each situation below, is it reasonable to use a binomial distribution for the random variable X? Give reasons for your answer in each case. If a binomial distribution applies, give the values of n and p. a. Most calls made at random by sample surveys don’t succeed in talking with a live person. Of calls to New York City, only 1/12 succeed. A survey calls 500 randomly selected numbers in New York City. X is the number that reach a live person. b. At peak periods, 25% of attempted logins to an Internet service provider fail. Login attempts are independent and each has the same probability of failing. Darci logs in repeatedly until she succeeds. X is the number of the login attempt that finally succeeds. c. On a bright October day, Canada geese arrive to foul the pond at an apartment complex at the average rate of 12 geese per hour; X is the number of geese that arrive in the next three hours. 2. In each of situation below, is it reasonable to use a binomial distribution for the random variable X? Give reasons for your answer in each case. a. An auto manufacturer chooses one car from each hour’s production for a detailed quality inspection. One variable recorded is the count X of finish defects (dimples, ripples, etc.) in the car’s paint. b. The pool of potential jurors for a murder case contains 100 persons chosen at random from the adult residents of a large city. Each person in the pool is asked whether he or she opposes the death penalty; X is the number who say “Yes.” c. Joe buys a ticket in his state’s “pick 3” lottery game every week; X is the number of times in a year that he wins a prize. 3. Some of the methods in this chapter are approximations rather than exact probability results. We have given rules of thumb for safe use of these approximations. a. You are interested in attitudes toward drinking among the 75 members of a fraternity. You choose 25 members at random to interview. One question is “Have you had five or more drinks at one time during the last week?” Suppose that in fact 20% of the 75 members would say “Yes.” Explain why you cannot safely use the B (25, 0.2) distribution for the count X in your sample who say “Yes.” b. The National AIDS Behavioral Surveys found that 0.2% (that’s 0.002 as a fraction) of adult heterosexuals had both received a blood transfusion and had a sexual partner from a group at high risk of AIDS. Suppose that this national proportion holds for your region. Explain why you cannot safely use the normal approximation for the sample proportion who fall in this group when you interview an SRS of 500 adults. 4. “What do you think is the ideal number of children for a family to have?” A Gallup poll asked this question to 1006 randomly chosen adults. Almost half (49%) thought two children was ideal. Suppose that p = 0.49 is exactly true for the population of all adults. Gallup announced a margin of error of ±3 percentage ˆ points for this poll. What is the probability that the sample proportion p for an SRS of size n = 1006 falls between 0.46 and 0.52? You see that it is likely, but not certain, that polls like this give results that are correct within their margin of error. We will say more about margins of error in Chapter 6. 5. Return to the Gallup poll setting of the previous question. We are supposing that the proportion of all adults who think that two children is ideal is p = 0.49. What ˆ is the probability that a sample proportion p falls between 0.46 and 0.52 (that is, within ±3 percentage points of the true p) if the sample is an SRS of size n = 250? Of size n = 4000? Combine these results with your work in the previous question to make a general statement about the effect of larger samples in a sample survey. 6. The changing probabilities you found in questions 4 and 5 are due to the fact that ˆ the standard deviation of the sample proportion p gets smaller as the sample size n increases. If the population proportion is p = 0.49, how large a sample is needed to reduce the standard deviation of p to p 0.005 ? (According to the 68-95- ˆ ˆ 99.7 rule, when the standard deviation is this small, about 95% of all samples will ˆ have p within 0.01 of the true p.) 7. A selective college would like to have an entering class of 1200 students. Because not all students who are offered admission accept, the college admits more than 1200 students. Past experience shows that about 70% of the students admitted will accept. The college decides to admit 1500 students. Assuming that students make their decisions independently, the number who accept X has the B (1500, 0.7) distribution. If this number is less than 1200, the college will admit students from its waiting list. a. What are the mean and the standard deviation of the number of X students who accept? b. Use the normal approximation to find the probability that at least 1000 students accept. c. The college does not want more than 1200 students. What is the probability that more than 1200 will accept? d. If the college decides to increase the number of admission offers to 1700, what is the probability that more than 1200 will accept? 8. The scores of high school seniors on the ACT college entrance examination in 2003 had mean µ = 20.8 and standard deviation σ = 4.8. The distribution of scores is only roughly normal. a. What is the approximate probability that a single student randomly chosen from all those taking the test scores 23 or higher? b. Now take an SRS of 25 students who took the test. What are the mean and standard deviation of the sample mean score X of these 25 students? c. What is the approximate probability that the mean score X of these students is 23 or higher? 9. North Carolina State University posts the grade distributions for its courses online. You can find that the distribution of grades in Statistics 101 in the fall 2003 semester was Grade A B C D F Probability 0.21 0.43 0.3 0.05 0.01 a. Using the common scale A = 4, B = 3, C = 2, D = 1, F = 0, take X to be the grade of a randomly chosen Statistics 101 student. Find the mean µ and standard deviation σ of grades in this course. b. Statistics 101 is a large course. We can take the grades of an SRS of 50 students to be independent of each other. If X is the average of these 50 grades, what are the mean and standard deviation of X ? c. What is the probability P (X ≥ 3) that a randomly chosen Statistics 101 student gets a B or better? What is the approximate probability P ( X ≥ 3) that the grade point average for 50 randomly chosen Statistics 101 students is B or better? 10. A $1 bet in a state lottery’s Pick 3 game pays $500 if the three-digit number you choose exactly matches the winning number, which is drawn at random. Here is the distribution of the payoff X: Payoff X $0 $500 Probability 0.999 0.001 Each day’s drawing is independent of other drawings. a. What are the mean and standard deviation of X? b. Joe buys a Pick 3 ticket every day. What is does the law of large numbers say about the average payoff Joe receives from his bets? c. What does the central limit theorem say about the distribution of Joe’s average payoff after 365 bets in a year? d. Joe comes out ahead for the year if his average payoff is greater than $1 (the amount he spent each day on a ticket). What is the probability that Joe ends the year ahead? 11. The distribution of annual returns on common stocks is roughly symmetric, but extreme observations are more frequent than in normal distribution. Because the distribution is not strongly nonnormal, the mean return over even a moderate number of years is close to normal. Annual real returns on the Standard & Poor’s 500-Stock Index over the period 1871 to 2004 have varied with mean 9.2% and standard deviation 20.6%. Andrew plans to retire in 45 years and is considering investing in stocks. What is the probability (assuming that the past pattern of variation continues) that the mean annual return on common stocks over the next 45 years will exceed 15%? What is the probability that the mean return will be less than 5%? 12. According to genetic theory, the blossom color in the second generation of a certain cross of sweet peas should be red or white in a 3:1 ratio. That is, each plant has probability ¾ of having red blossoms, and the blossom colors of separate plants are independent. a. What is the probability that exactly 6 out of 8 of these plants have red blossoms? b. What is the mean number of red-blossomed plants when 80 plants of this type are grown from seeds? c. What is the probability of obtaining at least 50 red-blossomed plants when 80 plants are grown from seeds? 13. Does delaying oral practice hinder learning a foreign language? Researchers randomly assigned 23 beginning students of Russian to begin speaking practice immediately and another 23 to delay speaking for 4 weeks. At the end of the semester both groups took a standard test of comprehension of spoken Russian. Suppose that in the population of all beginning students, the test scores for early speaking vary according to the N (32, 6) distribution and scores for delayed speaking have the N (29, 5) distribution. a. What is the sampling distribution of the mean score X in the early speaking group in many repetitions of the experiment? What is the sampling distribution of the mean score Y in the delayed-speaking group? b. If the experiment were repeated many times, what would be the sampling distribution of the difference Y X between the mean scores in the two groups? c. What is the probability that the experiment will find (misleadingly) that the mean score for delayed speaking is at least as large as that for early speaking? 14. Suppose (as is roughly true) that 88% of college men and 82% of college women were employed last summer: A sample survey interviews SRSs of 500 college men and 500 college women. The two samples are of course independent. ˆ a. What is the approximate distribution of the proportion pF of women who worked last summer? What is the approximate distribution of the ˆ proportion pM of men who worked? b. The survey wants to compare men and women. What is the approximate distribution of the difference in the proportions who worked, pM pF ? ˆ ˆ Explain the reasoning behind your answer. c. What is the probability that in the sample a higher proportion of women than men worked last summer? 15. A fair coin is tossed 250 times. a. Name the Bernoulli and Binomial random variables involved in this experiment. State explicitly the parameters of the Bernoulli and Binomial random variables in this setting and the relationship of Binomial and Bernoulli random variables. b. Write down and briefly explain an expression for the probability that 120 heads are observed. Get an approximation of this probability using the normal tables. c. Use the normal tables to approximate the probability of observing more than 140 heads. Chapter 6 1. Suppose that the sample mean is 50 and the standard deviation is assumed to be 5. Make a diagram that illustrates the effect of sample size on the width of a 95% interval. Use the following sample size: 10, 20, 40, and 100. Summarize what the diagram shows. 2. A study with 25 observations gave a mean of 70. Assume that the standard deviation is 15. Make a diagram that illustrates the effect of the confidence level on the width of the interval. Use 80%, 90%, 95%, and 99%. Summarize what the diagram shows. 3. Consider the following two scenarios. (A) Take an SRS of 100 students from an elementary school with children in grades kindergarten through fifth grade. (B) Take a simple random sample of 100 third-graders from the same school. For each of these samples you will measure the height of each child in the sample. Which sample should have the smaller margin of error for 95% confidence? Explain your answer. 4. A questionnaire about study habits was given to a random sample of students taking a large introductory statistics class. The sample of 25 students reported that they spent an average of 80 minutes per week studying statistics. Assume that the standard deviation is 35 minutes. a. Give a 95% confidence interval for the mean time spent studying statistics by students in this class. b. Is it true that 95% of the students in the class have weekly study times that lie in the interval you found in part (a)? Explain your answer. 5. You are planning a survey of starting salaries for recent liberal arts major graduates from your college. From a pilot study you estimate that the standard deviation is about $9000. What sample size do you need to have a margin of error equal to $40 with 95% confidence? 6. Suppose that in the setting of the previous question you are willing to settle for a margin of error of $800. Will the required sample size be larger or smaller? Verify your answer by performing the calculations. 7. To assess the accuracy of a laboratory scale, a standard weight know to weigh 10 grams is weighed repeatedly. The scale readings are normally distributed with unknown mean (this mean is 10 grams if the scale has no bias). The standard deviation of the scale readings is known to be 0.0002 gram. a. The weight is weighed five times. The mean result is 10.0023 grams. Give a 98% confidence interval for the mean of repeated measurements of the weight. b. How many measurements must be averaged to get a margin of error of ±0.0001 with 98% confidence? 8. A newspaper invites readers to send email stating whether they are in favor of making full-day kindergarten available to all students in the state. A total of 320 responses are received and, of these, 80% are in favor of the new program. In an article describing the results, the authors state that the margin of error is 4% for 95% confidence. Assume that they have computed this number correctly. a. Use the sample proportion and the margin of error to compute the 95% confidence interval. b. Do you think that these results are trustworthy? Discuss your answer. 9. Here are several situations where there is an incorrect application of the ideas presented in Chapter 6. Write a short paragraph explaining what is wrong in each situation and why it is wrong. a. A climatologist wants to test the null hypothesis that it will rain tomorrow. b. A random sample of size 20 is taken from a population that is assumed to have a standard deviation of 15. The standard deviation of the sample mean is 15/20. c. A researcher tests the following null hypothesis: H 0 : X 10 10. Here are several situations where there is an incorrect application of the ideas presented in Chapter 6. Write a short paragraph explaining what is wrong in each situation and why it is wrong. a. A change is made that should improve student satisfaction with the way grades are processed at your college. The null hypothesis, that there is an improvement, is tested versus the alternative, that there is no improvement. b. A significance test rejected the null hypothesis that the sample mean is 25. c. A report on a study says that the results are statistically significant and the P-value is 0.95. 11. Translate each of the following research questions into appropriate Ho and Ha. a. Census Bureau data show that the mean household income in the area served by a shopping mall is $72,500 per year. A market research firm questions shoppers at the mall to find out whether the mean household income of mall shoppers is higher than that of the population. b. Last year, your company’s service technicians took an average of 1.8 hours to respond to trouble calls from business customers who had purchased service contracts. Do this year’s data show a different average response time? 12. A test statistic for a two-sided significance test for a population mean is z = 2.3. Sketch a standard normal curve and mark this value of z on it. Find the P-value and shade the appropriate areas under the curve to illustrate your calculations. 13. The P-value for a significance test is 0.082. a. Do you reject the null hypothesis at level α = 0.05? b. Do you reject the null hypothesis at level α = 0.01? c. Explain your answers. 14. The P-value for a significance test is 0.032. a. Do you reject the null hypothesis at level α = 0.05? b. Do you reject the null hypothesis at level α = 0.01? c. Explain your answers. 15. A test of the null hypothesis Ho: µ = µo gives test statistic z = 1.6. a. What is the P-value if the alternative is Ha: µ > µo? b. What is the P-value if the alternative is Ha: µ < µo? c. What is the P-value if the alternative is Ha: µ ≠ µo? 16. A test of the null hypothesis Ho: µ = µo gives test statistic z = 1.6 . a. What is the P-value if the alternative is Ha: µ > µo? b. What is the P-value if the alternative is Ha: µ < µo? c. What is the P-value if the alternative is Ha: µ ≠ µo? 17. The P-value for a two-sided test of the null hypothesis Ho: µ = 30 is 0.09. a. Does the 95% confidence interval include the value 30? Why? b. Does the 90% confidence interval include the value 30? Why? 18. The P-value for a two-sided test of the null hypothesis Ho: µ = 30 is 0.04. a. Does the 95% confidence interval include the value 30? Why? b. Does the 90% confidence interval include the 30? Why? 19. A 95% confidence interval for a population mean is (57, 65). a. Can you reject the null hypothesis that µ = 68 at the 5% significance level? Why? b. Can you reject the null hypothesis that µ = 62 at the 5% significance level? Why? 20. A 90% confidence interval for a population mean is (12, 15). a. Can you reject the null hypothesis that µ = 13 at the 10% significance level? Why? b. Can you reject the null hypothesis that µ = 10 at the 10% significance level? Why? 21. The survey of Study Habits and Attitudes (SSHA) is a psychological test that measures the motivation, attitude toward school, and study habits of students. Scores range from 0 to 200. The mean score for U.S college students is about 115, and the standard deviation is about 30. A teacher who suspects that older students have better attitudes toward school gives the SSHA to 25 students who are at least 30 years of age. Their mean score is x 132.2 a. Assuming that σ = 30 for the population of older students, carry out a test of Ho: µ = 115 Ho: µ > 115 Report the P-value of your test, and state your conclusion clearly. b. Your test in (a) required two important assumptions in addition to the assumption that the value of σ is known. What are they? Which of these assumptions is most important to the validity of your conclusion in (a)? 22. The level of calcium in the blood in healthy young adults varies with mean about 9.5 milligrams per deciliter and standard deviation about σ = 0.4. A clinic in rural Guatemala measures the blood calcium level of 160 healthy pregnant women at their first visit for prenatal care. The mean is x = 9.57. Is this an indication that the mean calcium level in the population from which these women come differs from 9.5? a. State Ho and Ha. b. Carry out the test and give the P-value, assuming that σ = 0.4 in this population. Report your conclusion. c. Give a 95% confidence interval for the mean calcium level µ in this population. We can see that µ lies quite close to 9.5. This illustrates the fact that a test based on a large sample will often declare even a small deviation from Ho to be statistically significant. 23. Explain in plain language why a significance test that is significant at the 1% level must always be significant at the 5% level. 24. You are told that a significance test is significant at the 5% level. From this information can you determine whether or not it is significant at the 1% level? Explain your answer. 25. You will perform a significance test of Ho: µ = 0 versus Ha: µ > 0 a. What values of z would lead you to reject Ho at the 5% level? b. If the alternative hypothesis was Ha: µ ≠ 0 what values of z would lead you to reject Ho at the 5% level? c. Explain why your answers to parts (a) and (b) are different. 26. Radon is colorless, odorless gas that is naturally released by rocks and soils and may concentrate in tightly closed houses. Because radon is slightly radioactive, there is some concern that it may be a health hazard. Radon detectors are sold to homeowners worried about the risk, but the detectors may be inaccurate. University researchers placed 12 detectors in a chamber where they were exposed to 105 picocuries per liter (pCi/l) of radon over 3 days. Here are the readings given by the detectors: 91.9 97.8 111.4 122.3 105.4 95 103.8 99.6 96.6 119.3 104.8 101.7 Assume (unrealistically) that you know that the standard deviation of readings for all detectors of this type is σ = 9. a. Give a 95% confidence interval for the mean reading µ for this type of detector. b. Is there significant evidence at the 5% level that the mean reading differs from the true value 105? State hypotheses and conduct a significance test based on your confidence interval from (a). 27. Consumers can purchase nonprescription medications at food stores, mass merchandise stores such as Kmart and Wal-Mart, or pharmacies. About 45% of consumers make such purchases at pharmacies. What accounts for the popularity of pharmacies, which often charge higher prices? A study examined consumers’ perceptions of overall performance of the three types of stores, using a long questionnaire that asked about such things as “neat and attractive store,” “knowledgeable staff,” and “assistance in choosing among various types of nonprescription medication.” A performance score was based on 27 such questions. The subjects were 201 people chosen at random from the Indianapolis telephone directory. Here are the means and standard deviations of the performance scores for the sample: Store type x s Food store 18.67 24.95 Mass merchandisers 32.38 33.37 Pharmacies 48.60 35.62 We do not know the population standard deviations, but a sample standard deviation s from so large a sample is usually close to σ. Use s in place of the unknown σ in this exercise. a. What population do you think the authors of the study want to draw conclusions about? What population are you certain they can draw conclusions about? b. Give 95% confidence intervals for the mean performance for each type of store. c. Based on these confidence intervals, are you convinced that consumers think that pharmacies offer higher quality services than the other types of stores?

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 0 |

posted: | 7/16/2013 |

language: | English |

pages: | 22 |

OTHER DOCS BY RuthJohnson93

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.