VIEWS: 5 PAGES: 24 POSTED ON: 12/8/2011
Statistics 111 - Lecture 7 Probability Normal Distribution and Standardization June 5, 2008 Stat 111 - Lecture 7 - Normal 1 Distribution Administrative Notes • Homework 2 due on Monday June 5, 2008 Stat 111 - Lecture 7 - Normal 2 Distribution Outline • Law of Large Numbers • Normal Distribution • Standardization and Normal Table June 5, 2008 Stat 111 - Lecture 7 - Normal 3 Distribution Data versus Random Variables • Data variables are variables for which we actually observe values • Eg. height of students in the Stat 111 class • For these data variables, we can directly calculate the statistics s2 and x • Random variables are things that we don't directly observe, but we still have a probability distribution of all possible values • Eg. heights of entire Penn student population June 5, 2008 Stat 111 - Lecture 7 - Normal 4 Distribution Law of Large Numbers • Rest of course will be about using data statistics (x and s2) to estimate parameters of random variables ( and 2) • Law of Large Numbers: as the size of our data sample increases, the mean x of the observed data variable approaches the mean of the population • If our sample is large enough, we can be confident that our sample mean is a good estimate of the population mean! June 5, 2008 Stat 111 - Lecture 7 - Normal 5 Distribution The Normal Distribution • The Normal distribution has the shape of a “bell curve” with parameters and 2 that determine the center and spread: June 5, 2008 Stat 111 - Lecture 7 - Normal 6 Distribution Different Normal Distributions • Each different value of and 2 gives a different Normal distribution, denoted N(,2) N(0,1) N(2,1) N(-1,2) N(0,2) • We can adjust values of and 2 to provide the best approximation to observed data • If = 0 and 2 = 1, we have the Standard Normal distribution June 5, 2008 Stat 111 - Lecture 7 - Normal 7 Distribution Property of Normal Distributions • Normal distribution follows the 68-95-99.7 rule: • 68% of observations are between - and + • 95% of observations are between - 2 and + 2 • 99.7% of observations are between - 3 and + 3 2 June 5, 2008 Stat 111 - Lecture 7 - Normal 8 Distribution Calculating Probabilities • For more general probability calculations, we have to do integration For the standard normal distribution, we have tables of probabilities already made for us! If Z follows N(0,1): P(Z < -1.00) = 0.1587 June 5, 2008 Stat 111 - Lecture 7 - Normal 9 Distribution Standard Normal Table If Z has N(0,1): P(Z > 1.46) = 1 - P(Z < 1.46) = 1 - 0.9279 = 0.0721 • What if we need to do a probability calculation for a non-standard Normal distribution? June 5, 2008 Stat 111 - Lecture 7 - Normal 10 Distribution Standardization • If we only have a standard normal table, then we need to transform our non-standard normal distribution into a standard one • This process is called standardization 1 0 June 5, 2008 Stat 111 - Lecture 7 - Normal 11 Distribution Standardization Formula • We convert a non-standard normal distribution into a standard normal distribution using a linear transformation • If X has a N(,2) distribution, then we can convert to Z which follows a N(0,1) distribution Z = (X-)/ • First, subtract the mean from X • Then, divide by the standard deviation of X June 5, 2008 Stat 111 - Lecture 7 - Normal 12 Distribution Linear Transformations of Variables • Sometimes need to do simple mathematical operations on our variables, such as adding and/or multiplying with constants Y = a ·X + b • Example: changing temperature scales Fahrenheit = 9/5 x Celsius + 32 • How are means and variances affected? June 5, 2008 Stat 111 - Lecture 7 - Normal 13 Distribution Mean/Variances of Linear Transforms • For transformed variable Y = a·X + b mean(Y) = a·mean(X) + b Var(Y) = a2·Var(X) SD(Y) = |a|·SD(X) • Note that adding a constant b does not affect measures of spread (variance and sd) June 5, 2008 Stat 111 - Lecture 7 - Normal 14 Distribution More complicated linear functions • We can also do linear transformations involving with more than one variable: Z = a·X + b·Y + c • The mean formula is similar: mean(Z) = a·mean(X) + b·mean(Y) + c • If X and Y are also independent then var(Z) = a2·var(X) + b2·var(Y) • Need more complicated variance formula (in book) if the variables are not independent June 5, 2008 Stat 111 - Lecture 7 - Normal 15 Distribution Standardization Example Dear Abby, You wrote in your column that a woman is pregnant for 266 days. Who said so? I carried my baby for 10 months and 5 days. My husband is in the Navy and it could not have been conceived any other time because I only saw him once for an hour, and I didn’t see him again until the day after the baby was born. I don’t drink or run around, and there is no way the baby isn’t his, so please print a retraction about the 266-day carrying time because I am in a lot of trouble! -San Diego Reader June 5, 2008 Stat 111 - Lecture 7 - Normal 16 Distribution Standardization Example • According to well-documented data, gestation time follows a normal distribution with mean of 266 days and SD of 16 • Let X = gestation time. What percent of babies have gestation time greater than 310 days (10 months & 5 days) ? • Need to convert X = 310 into standard Z Z = (X-)/ = (310-266)/16 = 44/16 = 2.75 June 5, 2008 Stat 111 - Lecture 7 - Normal 17 Distribution Standardization Example P(X > 310) = P(Z > 2.75) = 1 - P(Z < 2.75) = 1 - 0.9970 = 0.0030 So, only a 0.3% chance of a pregnancy lasting as long as 310 days! June 5, 2008 Stat 111 - Lecture 7 - Normal 18 Distribution Reverse Standardization • Sometimes, we need to convert a standard normal Z into a non-standard normal X • Example: what is the length of pregnancy below which we have 10% of the population? • From table, we see P(Z <-1.28) = 0.10 • Reverse Standardization formula: X = σ⋅Z +μ • For Z = -1.28, we calculate X = -1.28·16 + 266 = 246 days (8.2 months) June 5, 2008 Stat 111 - Lecture 7 - Normal 19 Distribution Another Example • NCAA Division 1 SAT Requirements: athletes are required to score at least 820 on combined math and verbal SAT • In 2000, SAT scores were normally distributed with mean of 1019 and SD of 209 • What percentage of students have scores greater than 820 ? Z = (X-)/ = (820-1019)/209 = -199/209 = -.95 June 5, 2008 Stat 111 - Lecture 7 - Normal 20 Distribution Another Example • P(X > 820) = P(Z > -0.95) = 1- P(Z < -0.95) • P(Z < -0.95) = 0.17 so P(X > 820) = 0.83 • 83% of students meet NCAA requirements June 5, 2008 Stat 111 - Lecture 7 - Normal 21 Distribution SAT Verbal Scores • Now, just look at X = Verbal SAT score, which is normally distributed with mean of 505 and SD of 110 • What Verbal SAT score will place a student in the top 10% of the population? June 5, 2008 Stat 111 - Lecture 7 - Normal 22 Distribution SAT Verbal Scores • From the table, P(Z >1.28) = 0.10 • Need to reverse standardize to get X: X = σ⋅Z + μ = 110⋅1.28 + 505 = 646 • So, a student needs a Verbal SAT score of 646 in order to be in the top 10% of all students June 5, 2008 Stat 111 - Lecture 7 - Normal 23 Distribution Next Class - Lecture 8 • Chapter 5: Sampling Distributions June 5, 2008 Stat 111 - Lecture 7 - Normal 24 Distribution