Chapter 6 Standard Deviation as a Ruler and the
Document Sample


Chapter 6: Standard Deviation as a Ruler and the Normal Models A student takes two exams in a class and wants to know whether she did better on the first or second exam. The results of the exams for the class were quite different, so how can she compare her scores? One common way to compare them is through z-scores (also called standardized scores) which measure how many standard deviations above or below the mean the test scores are. The z-score is computed as: z = y−y . s Example: Suppose that the test 1 scores have a mean of 72.3 with a standard deviation of 13.8, and the test 2 scores have a mean of 59.0 with a standard deviation of 18.5. 1. Suppose she scores an 82 on the 1st test and a 71 on the 2nd test. Which score is better (relative to the class)? Using z-scores: Test 1: Test 2: z= z= 82 − 72.3 9.7 = = 0.70. 13.8 13.8 2. Consider another student (Hagrid) who scored a 50 on the 1st test, and a 38 on the 2nd test. On which test was his performance “better” based on the z-score criterion? 47 Notes: • z-scores are unitless. • Computing the z-score involved subtracting a measure of center (the mean) and dividing by a measure of spread (the standard deviation). • z-scores have many useful properties, at least for symmetric, unimodal distributions without outliers. We just need to be cautious that we only use z-scores when they are appropriate. What are z-scores really measuring? Computing z-scores involves shifting and rescaling the data values. To understand what the z-value actually measures, we need to understand the effects of shifting and rescaling a set of data values. 1. Shifting: Shifting a set of data values means adding or subtracting a constant from every data value. Consider the following summary statistics and histogram displaying the heights of females who previously took this course. 50 Min Q1 M Q3 Max IQR Mean 153 161 169 177 185 146 164 168 173 188 168.19 Frequency 0 145 10 20 30 40 Height (centimeters) Std.Dev. 7.172 48 Suppose we added 3 centimeters to each data value. How would this affect: • the 5-number summary? • the median? • the IQR? the mean? the standard deviation? 2. Rescaling: Rescaling a set of data values means multiplying or dividing all values by the same constant. Suppose we multiply each data value by 2. How would this affect: • the 5-number summary? • the median? • the IQR? the mean? the standard deviation? 3. Standardizing: Standardizing data into z-scores is shifting them by the mean y and rescaling them by dividing the shifted mean by the standard deviation s. • What are the mean and standard deviation of the resulting z-scores? Normal Models: The normal distribution is an idealized model that is often used for distributions that are unimodal and roughly 40 0 150 Frequency • It is a bell-shaped curve that can be used to approximate the histogram of a distribution of a quantitative variable. 49 10 20 30 50 symmetric: “mound-shaped.” 60 160 170 180 190 200 210 Armspan (centimeters) • A normal model is completely specified by the mean µ and standard deviation σ of the model. That is, there is a normal model for every possible value of µ and every value of σ > 0. • A model is not useful unless it is flexible enough to be used in a variety of situations. In other words, by choosing different values of µ and σ, we can use the normal distribution to represent SAT scores, weights of newborn babies, or heights of US adult women, as long as the distributions are mound-shaped & symmetric. • “All models are wrong; some models are useful” - George Box (1950s). Why use a model? Why not just use the actual distribution? 1. A model is compact. Saying that SAT math scores are approximately normal with mean µ = 500 and standard deviation σ = 100 (written N (500, 100)) is much more compact than giving the entire set of SAT scores. 2. It’s easier to work with a model than the raw data and it’s often possible to derive simple & useful results (see the 68-95-99.7 rule). The Model Equation: The normal model is a curve represented by the equation: 1 f (x) = √ σ 2π (x − µ)2 − 2σ 2 , for − ∞ < x < ∞. e • Normal curves have the same shape regardless of the values of µ and σ. The curve for the SAT math scores is shown on the next page. 50 • If a distribution of data follows a N (µ, σ) model, then if we y−µ , the stanstandardize all of the data values using z = σ dardized values will follow a N (0, 1) model. The Empirical (68-95-99.7) Rule: One useful result for the normal model is that approximately: 1. 68% of the values fall within 1 standard deviation of µ. 2. 95% of the values fall within 2 standard deviations of µ. 3. 99.7% of the values fall within 3 standard deviations of µ. 51 Example (Back to SAT math scores ∼ N (500, 100)): 1. According to the empirical rule, about 95% of SAT math scores fall between what two values? 2. About what percentage of scores are below 400? Above 700? 3. About what percentage of scores are between 300 and 400? 4. What percentile is 400? What percentile is 700? 5. About what percentage of scores are below 650? 52 Calculating Proportions with Normal Models • Normal models for the distributions of quantitative variables represent proportions by areas under the normal curve. The area under the entire normal curve is 1. • We can find areas under normal curves using a calculator, software , or using a table of the standard normal distribution given in Table Z on pages A-57 & A-58 in the back of the text. Use of the table is illustrated here. • To use Table Z, we must first standardize the data distribution to the standard normal distribution using z = (y − µ)/σ. Example: Suppose the annual snowfall amounts (inches) in Missoula are well-modeled by a normal distribution with mean 46 inches and standard deviation 18 inches. Question: According to the normal model, what proportion of years have snowfall amounts below 25 inches? Answer: Let y = annual precipitation (in.). We want the area shaded (where y < 25). Convert this to a statement about the N (0, 1) distribution by computing the z-score for 25: y − µ 25 − 46 z= = = −1.17. σ 18 So, the area (y < 25) is the same as the area (z < −1.17). By Table Z, this area is 0.1210. Hence, according to the normal model, about 12.1% of years in Missoula have snowfalls below 25 inches. 53 Question: According to the normal model, approximately what proportion of years have between 25 and 50 inches of snowfall? Answer: The area is the same as the area between the corresponding z-scores on the N (0, 1) distribution: z= 50 − 46 25 − 46 = −1.17 and z = = 0.22. 18 18 The area between these two values is the area to the left of 0.22 minus the area to the left of −1.17, which from Table Z is: 0.5871 − 0.1210 = 0.4661. Hence, the normal model estimates that about 46.6% of years have snowfall amounts between 25 and 50 inches. • Does this seem reasonable based on the graph? • Approximately what proportion of Missoula annual snowfalls are above 50 inches? Using z in Reverse: Sometimes, instead of wanting to know what proportion of values are in some interval, we want to know what value corresponds to some proportion (the reverse question). 54 Question: According to the normal model, what is the 10th percentile of snowfall amounts? (i.e.: what is the snowfall such that only 10% of snowfalls are smaller?) Answer: First, sketch the distribution and guess roughly where the 10th percentile is. We’ll call this unknown 10th percentile y. This sketch gives you a check to see if your answer is reasonable. It should be clear that the 10th percentile here is less than 28 inches. Why? • First, we calculate the 10th percentile for the standard normal using Table Z. We look for the value z such that the area to the left of this value is about 0.10. The closest we can get is .1003 which corresponds to z = −1.28. So, the 10th percentile is 1.28 standard deviations below the mean. • For the N (46, 18) model, this means that if we standardize the 10th percentile y, we should get z = −1.28. Computing: y − 46 = −1.28 =⇒ y = 46 − 1.28(18) = 46 − 23.04 = 22.96 inches. 18 • Given your picture, is this reasonable? • Key Hint to Doing Normal Calculations: DRAW A PICTURE!!! 55 Example: The weight of a certain candy bar is advertised as 8 ounces. The actual weights of these candy bars are not all 8 ounces, but vary. The distribution of actual weights is closely approximated by a normal model with mean 8.2 ounces and standard deviation 0.13 ounces. 1. According to the normal model, what proportion of the candy bars weigh less than the advertised weight of 8 ounces? 2. According to the normal model, what proportion of the candy bars weigh between 7.9 and 8.1 ounces? 56 3. According to the normal model, what is the weight such that only 1% of candy bars weigh more than this weight? (For what percentile is this asking?) 4. According to the normal model, how unlikely would it be to obtain a cholocolate bar that weighed at least 8.8 ounces? Impossible? 57
Related docs
Get documents about "