VIEWS: 39 PAGES: 48 POSTED ON: 5/3/2012
Ch. 9 Sampling Distribution Recall the big picture of statistics—we have a question about a group that can be answered with a number…but the group is too large to measure entirely; so we measure a small part instead. The answer to our question—the number that we can't directly measure, since the group is too large—is called a parameter. A parameter measures some feature of a population (the large group). A parameter has one (and only one) value—we just don't know what it is. The most important parameters for us are the population mean (μ) and population proportion (p or π). The measurement that we took from the small part is called a statistic. A statistic measures some feature of a sample (the small part of the population). Since there are many, many possible samples that could have been chosen, there are many, many different possible values of a statistic. The most important statistics for us are the sample mean (x-bar) and sample proportion (p-hat). Compare • parameter • statistic – mean: μ – mean: x-bar – standard deviation: σ – standard deviation: s – proportion: p – proportion: p-hat • Sometimes we call • Sometimes we call the parameters “true”; the statistics true mean, true “sample”; sample proportion, etc. mean, sample proportion, etc. 5 A phone-in poll conducted by a newspaper reported that 73% of those who called in liked business tycoon Donald Trump. The number 73% is a A) Statistic B) Sample C) Parameter D) Population A phone-in poll conducted by a newspaper reported that 73% of those who called in liked business tycoon Donald Trump. The unknown true percentage of American citizens that like Donald Trump is a A) Statistic B) Sample C) Parameter D) Population A Statistic as a Random Variable Since a statistic can take on many different values, it is a variable. Since the statistic is measured from a random sample, a statistic is a random variable. As with all variables, we are interested in the distribution of the variable (in this case, a statistic). The distribution of a statistic is called a Sampling Distribution (for that statistic). Thus, we will be concerned with the Sampling Distribution of the Sample Mean, and the Sampling Distribution of the Sample Proportion. The sampling distribution of a statistic is A) the probability that we obtain the statistic in repeated random samples. B) the mechanism that determines whether randomization was effective. C) the distribution of values taken by a statistic in all possible samples of the same size from the same population. D) the extent to which the sample results differ systematically from the truth. I flip a coin 10 times and record the proportion of heads I obtain. I then repeat this process of flipping the coin 10 times and recording the proportion of heads obtained many, many times. When done, I make a histogram of my results. This histogram represents A) the bias, if any, that is present. B) the true population parameter. C) simple random sampling. D) the sampling distribution of the proportion of heads in 10 flips of the coin. Unbiased Statistics Perhaps you are wondering why we care about the distribution of the statistic, when what we really want is the value of the parameter—a good question! It turns out that statistics have very special (and useful) relationships with the parameters that they are estimating—provided some conditions are met. The most important condition is that the statistic is unbiased—the mean of the sampling distribution is the same as the parameter that the statistic is estimating. So, for example, if x-bar is unbiased, Mu-xbar = Mu-x . It turns out that we can use our statistic (say, x-bar) to make an estimate of the center. 12 Center If x-bar is unbiased, then Mu- xbar = Mu-x. For now, let's just say that a random sample is your ticket to an unbiased statistic. It's actually more complicated than that, but let's just leave it there for now. Spread X x n Actually, this is only (approximately) true if the size of the sample is quite small (less than 10%) compared to the population. Shape If the distribution of the population is normal, then the distribution of the sample mean is also normal. If the distribution of the population isn't normal, or if we don't know, then we might be in trouble—we can't calculate probabilities unless we know the shape of the distribution. Fortunately, the Central Limit Theorem comes to the rescue! It says that as the sample size increases, the shape of the distribution of x-bar becomes more normal. Of course, that brings another question—how big does the sample need to be in order for the distribution of the sample mean to be approximately normal? The answer is, it depends… The shape of the population is the key. If it has a shape that is approximately normal, then you don't need a very large sample— maybe as few as 15 would do. If the population has a slightly skew shape, then maybe you only need 30 in the sample. If the population is severely skew, perhaps you might need 45 or more. Again, the shape of the population is the key. You need some idea about the shape of the population in order to know how big of a sample you'll need in order to get that normal shape for the distribution of x-bar. But how do you get an idea about the shape of the population? From the distribution of the sample (NOT the sampling distribution). The sample is your best guess as to the nature of the population. If the distribution of the sample is approximately normal, then that's good enough to assume that the population has a distribution which is approximately normal, in which case you don't need a very large sample size to claim that the shape of the distribution of is approximately normal. On the other hand, if the shape of the sample is terribly skewed, then you need a large sample in order to make an approximately normal claim about the distribution of x-bar. 19 A random sample of size 25 is to be taken from a population that is normally distributed with mean 60 and standard deviation 10. The average J of the observations in our sample is to be computed. The sampling distribution of J is A) normal with mean 60 and standard deviation 10. B) normal with mean 60 and standard deviation 2. C) normal with mean 60 and standard deviation 0.4. D) normal with mean 12 and standard deviation 2. An automobile insurer has found that repair claims have a mean of $920 and a standard deviation of $870. Suppose that the next 100 claims can be regarded as a random sample from the long-run claims process. The mean and standard deviation of the average J of the next 100 claims is A) mean = $920 and standard deviation = $87. B) mean = $920 and standard deviation = $8.70. C) mean = $92 and standard deviation = $87. D) mean = $92 and standard deviation = $870. For humans, gestation periods are approximately normally distributed, with mean 266 days and standard deviation 16 days. What is the probability that a single child gestates for at least 270 days? What is the probability that a (random) sample of 5 children gestate for an average of at least 270 days? The average weight of great white sharks is 4000 lbs (with standard deviation 800 lbs). Use this information for all questions on this page. 1) Identify the parameter of interest and the statistic that estimates it. 2) Researchers wonder if the average weight is really lower, and plan on taking a sample of 100 sharks. Each such sample would produce a new mean weight, xbar. Find the values of mu-sub-xbar and sigma-sub-xbar. 3) Describe the shape of the sampling distribution of x-bar. Justify your answer. 4) Regardless of your answer to [3], assume that the sampling distribution of xbar is approximately normally distributed with the mean and standard deviation you gave in [2]. What is the probability that a sample of 100 sharks will 23 have a mean weight of less than 3600 lbs lbs? Distribution of the Sample Proportion Center p p If p-hat is unbiased, then ˆ Again, a random sample is your best bet that this condition is met. Suppose you are going to roll a die 60 times and record p, the proportion of times that an even number (2, 4, or 6) is showing. The sampling distribution of p-hat should be centered about A) 1/6 B) 1/3 C) ½ D) 30 Spread p 1 p p ˆ . Again, this is actually n only true (close enough) if the size of the sample is relatively small. A survey asks a random sample of 1500 adults in Ohio if they support an increase in the state sales tax from 5% to 6%, with the additional revenue going to education. Let p denote the proportion in the sample that say they support the increase. Suppose that 40% of all adults in Ohio support the increase. The standard deviation of p is A) .4 B) .24 C) .0126 D) .000126 A fair coin (one for which both the probability of heads and the probability of tails are 0.5) is tossed 60 times. The probability that less than 1/3 of the tosses are heads is A) .33 B) .109 C) .09 D) 0.0043 Shape Since is (ultimately) measuring a qualitative variable, the population cannot have a normal distribution. However, p- hat itself is quantitative, so p-hat can have a distribution that is approximately normal. In particular, will have an approximately normal distribution if np and n(1 – p) are each at least 10. Example 9.5 • Television executives and companies who advertise on TV are interested in how many viewers watch particular television shows. According to 2001 Nielsen ratings, Survivor II was one of the most watched television shows in the US during every week that is aired. • Suppose that true proportion of US adults who watched Survivor II is p=.37. • Suppose we did a survey with n=100. • Suppose we did this survey 1000 times. 30 31 Example 9.5 • Television executives and companies who advertise on TV are interested in how many viewers watch particular television shows. According to 2001 Nielsen ratings, Survivor II was one of the most watched television shows in the US during every week that is aired. • Suppose that true proportion of US adults who watched Survivor II is p=.37. • Suppose we did a survey with n=1000. • Suppose we did this survey 1000 times. 32 33 34 Different question • An SRS of 1500 first-year college students were asked whether they applied for admission to any other college. In fact, 35% of all first-year students applied to colleges beside the one they are attending. • What is the probability that the poll will be within 2 percentage points of the true p? 35 p p .35 ˆ .35 .65 p ˆ .0123153021 1500 .33 .35 z 1.626 .0123 .37 .35 z 1.626 .0123 P 1.626 Z 1.626 .9484 .0516 .8968 36 In 2010, Mars candy company reported that 35% of the MM’s produced were brown MM’s. Use this information for all questions on this slide. 1) What is the parameter of interest, and what statistic estimates it? 2) The student newspaper wants to see if this figure has increased, and plans to check 45 MM’s. Each such sample of 45 MM’s would result in a new value of p-hat . Find the values of Mu-sub-p hat and sigma-sub-p hat. 3) Describe the shape of the sampling distribution of p hat . Justify your answer. 4) Regardless of your answer to [3], assume that the shape of the sampling distribution of p-hat is approximately normal with the mean and standard deviation you gave in [2]. What is the probability that a sample of 45 MM’s will result in more than 26 holding doctorate degrees? 37 Suppose we select an SRS of size n = 100 from a large population having proportion p of successes. Let X be the number of successes in the sample. For which value of p would it be safe to assume the sampling distribution of X is approximately normal? A) .01 B) 1/9 C) .975 D) .9999 According to USA Today, 56% of the residents of Alaska own cell phones. What is the probability that a random sample of 500 Alaskans will contain fewer than 275 that own cell phones? *Solve as binomial and proportion. In a test of ESP (extrasensory perception), the experimenter looks at cards that are hidden from the subject. Each card contains either a star, a circle, a wavy line, or a square. An experimenter looks at each of 100 cards in turn, and the subject tries to read the experimenter's mind and name the shape on each. What is the probability that the subject gets more than 30 correct if the subject does not have ESP and is just guessing? A) 0.310. B) 0.250. C) 0.123. D) 0.043. If a statistic used to estimate a parameter is such that the mean of its sampling distribution is equal to the true value of the parameter being estimated, the statistic is said to be A) random B) biased C) a proportion D) unbiased The variability of a statistic is described by A) the spread of its sampling distribution. B) the amount of bias present. C) the vagueness in the wording of the question used to collect the sample data. D) the stability of the population it describes. A random variable X has mean mX and standard deviation sX. Suppose n independent observations of X are taken and the average J of these n observations is computed. We can assert that if n is very large, the sampling distribution of J is approximately normal. This assertion follows from A) the law of large numbers B) the central limit theorem C) the definition of sampling distribution D) the bell curve A researcher initially plans to take a SRS of size n from a population that has mean 80 and standard deviation 20. If he were to double his sample size (to 2n), the standard deviation of the sampling distribution of J would change by a factor of A) 2 B) 1/ 2 C) 2 D) 1/2 The weights of extra-large eggs have a normal distribution with a mean of 1 ounce and a standard deviation of 0.1 ounces. The probability that a dozen eggs weighs more than 13 ounces is closest to A) 0.0000. B) 0.0020. C) 0.1814. D) 0.2033. The distribution of actual weights of 8-ounce chocolate bars produced by a certain machine is normal with mean 8.1 ounces and standard deviation 0.1 ounces. If a sample of five of these chocolate bars is selected, the probability that their average weight is less than 8 ounces is A) 0.0125. B) 0.1853. C) 0.4871. D) 0.9873. The distribution of actual weights of 8-ounce chocolate bars produced by a certain machine is normal with mean 8.1 ounces and standard deviation 0.1 ounces. If a sample of five of these chocolate bars is selected, there is only a 5% chance that the average weight of the sample of five of the chocolate bars will be below A) 7.94 ounces. B)8.03 ounces. C)8.08 ounces. D) 8.20 ounces. In a large population of adults, the mean IQ is 112 with a standard deviation of 20. Suppose 200 adults are randomly selected for a market research campaign. The probability that the sample mean IQ is greater than 110 is approximately A) 0.079. B) 0.421. C) 0.921. D) 0.579.