# Probability by PQ97A8

VIEWS: 5 PAGES: 24

• pg 1
```									         Statistics 111 - Lecture 7

Probability

Normal Distribution
and Standardization

June 5, 2008      Stat 111 - Lecture 7 - Normal   1
Distribution

• Homework 2 due on Monday

June 5, 2008       Stat 111 - Lecture 7 - Normal   2
Distribution
Outline

• Law of Large Numbers
• Normal Distribution
• Standardization and Normal Table

June 5, 2008   Stat 111 - Lecture 7 - Normal   3
Distribution
Data versus Random Variables

• Data variables are variables for which we
actually observe values
• Eg. height of students in the Stat 111 class
• For these data variables, we can directly calculate the statistics
s2 and x

• Random variables are things that we don't
directly observe, but we still have a probability
distribution of all possible values
• Eg. heights of entire Penn student population

June 5, 2008             Stat 111 - Lecture 7 - Normal               4
Distribution
Law of Large Numbers
• Rest of course will be about using data
statistics (x and s2) to estimate parameters of
random variables ( and 2)
• Law of Large Numbers: as the size of our
data sample increases, the mean x of the
observed data variable approaches the mean 
of the population
• If our sample is large enough, we can be
confident that our sample mean is a good
estimate of the population mean!

June 5, 2008        Stat 111 - Lecture 7 - Normal   5
Distribution
The Normal Distribution
• The Normal distribution has the shape of a “bell
curve” with parameters  and 2 that determine





June 5, 2008         Stat 111 - Lecture 7 - Normal   6
Distribution
Different Normal Distributions
• Each different value of  and 2 gives a
different Normal distribution, denoted N(,2)
N(0,1)
N(2,1)
N(-1,2)
N(0,2)

• We can adjust values of  and 2 to provide
the best approximation to observed data
• If  = 0 and 2 = 1, we have the Standard
Normal distribution
June 5, 2008         Stat 111 - Lecture 7 - Normal            7
Distribution
Property of Normal Distributions
• Normal distribution follows the 68-95-99.7 rule:
• 68% of observations are between  -  and  + 
• 95% of observations are between  - 2 and  + 2
• 99.7% of observations are between  - 3 and  + 3


2

June 5, 2008       Stat 111 - Lecture 7 - Normal     8
Distribution
Calculating Probabilities
• For more general probability calculations, we
have to do integration

For the standard
normal distribution,
we have tables of

If Z follows N(0,1):

P(Z < -1.00) = 0.1587
June 5, 2008         Stat 111 - Lecture 7 - Normal   9
Distribution
Standard Normal Table
If Z has N(0,1):

P(Z > 1.46)
= 1 - P(Z < 1.46)
= 1 - 0.9279
= 0.0721

• What if we need to do a probability calculation for
a non-standard Normal distribution?
June 5, 2008        Stat 111 - Lecture 7 - Normal   10
Distribution
Standardization
• If we only have a standard normal table, then we
need to transform our non-standard normal
distribution into a standard one
• This process is called standardization

                                         1

                                         0

June 5, 2008             Stat 111 - Lecture 7 - Normal           11
Distribution
Standardization Formula
• We convert a non-standard normal distribution
into a standard normal distribution using a linear
transformation
• If X has a N(,2) distribution, then we can
convert to Z which follows a N(0,1) distribution

Z = (X-)/

• First, subtract the mean  from X
• Then, divide by the standard deviation  of X

June 5, 2008         Stat 111 - Lecture 7 - Normal   12
Distribution
Linear Transformations of Variables
• Sometimes need to do simple mathematical
operations on our variables, such as adding and/or
multiplying with constants

Y = a ·X + b

• Example: changing temperature scales
Fahrenheit = 9/5 x Celsius + 32

• How are means and variances affected?
June 5, 2008       Stat 111 - Lecture 7 - Normal       13
Distribution
Mean/Variances of Linear Transforms
• For transformed variable Y = a·X + b

mean(Y) = a·mean(X) + b
Var(Y) = a2·Var(X)
SD(Y) = |a|·SD(X)

• Note that adding a constant b does not affect measures

June 5, 2008       Stat 111 - Lecture 7 - Normal     14
Distribution
More complicated linear functions
• We can also do linear transformations involving with
more than one variable:
Z = a·X + b·Y + c
• The mean formula is similar:
mean(Z) = a·mean(X) + b·mean(Y) + c
• If X and Y are also independent then
var(Z) = a2·var(X) + b2·var(Y)
• Need more complicated variance formula (in book) if
the variables are not independent

June 5, 2008       Stat 111 - Lecture 7 - Normal         15
Distribution
Standardization Example
Dear Abby,

You wrote in your column that a woman is pregnant for
266 days. Who said so? I carried my baby for 10
months and 5 days. My husband is in the Navy and it
could not have been conceived any other time because I
only saw him once for an hour, and I didn’t see him
again until the day after the baby was born. I don’t drink
or run around, and there is no way the baby isn’t his, so
because I am in a lot of trouble!

June 5, 2008         Stat 111 - Lecture 7 - Normal                   16
Distribution
Standardization Example
• According to well-documented data, gestation
time follows a normal distribution with mean 
of 266 days and SD  of 16
• Let X = gestation time. What percent of
babies have gestation time greater than 310
days (10 months & 5 days) ?
• Need to convert X = 310 into standard Z

Z = (X-)/ = (310-266)/16 = 44/16 = 2.75

June 5, 2008         Stat 111 - Lecture 7 - Normal   17
Distribution
Standardization Example
P(X > 310)
= P(Z > 2.75)
= 1 - P(Z < 2.75)
= 1 - 0.9970
= 0.0030

So, only a 0.3%
chance of a
pregnancy lasting
as long as 310 days!

June 5, 2008         Stat 111 - Lecture 7 - Normal   18
Distribution
Reverse Standardization
• Sometimes, we need to convert a standard
normal Z into a non-standard normal X
• Example: what is the length of pregnancy
below which we have 10% of the population?
• From table, we see P(Z <-1.28) = 0.10
• Reverse Standardization formula:

X = σ⋅Z +μ
• For Z = -1.28, we calculate
X = -1.28·16 + 266 = 246 days (8.2 months)
June 5, 2008         Stat 111 - Lecture 7 - Normal   19
Distribution
Another Example
• NCAA Division 1 SAT Requirements: athletes
are required to score at least 820 on combined
math and verbal SAT
• In 2000, SAT scores were normally distributed
with mean  of 1019 and SD  of 209
• What percentage of students have scores
greater than 820 ?

Z = (X-)/ = (820-1019)/209 = -199/209 = -.95

June 5, 2008     Stat 111 - Lecture 7 - Normal   20
Distribution
Another Example
• P(X > 820) = P(Z > -0.95) = 1- P(Z < -0.95)

• P(Z < -0.95) = 0.17 so P(X > 820) = 0.83
• 83% of students meet NCAA requirements
June 5, 2008     Stat 111 - Lecture 7 - Normal   21
Distribution
SAT Verbal Scores
• Now, just look at X = Verbal SAT score, which
is normally distributed with mean  of 505 and
SD  of 110
• What Verbal SAT score will place a student in
the top 10% of the population?

June 5, 2008      Stat 111 - Lecture 7 - Normal   22
Distribution
SAT Verbal Scores
• From the table, P(Z >1.28) = 0.10

• Need to reverse standardize to get X:

X = σ⋅Z + μ = 110⋅1.28 + 505 = 646

• So, a student needs a Verbal SAT score
of 646 in order to be in the top 10% of all
students

June 5, 2008      Stat 111 - Lecture 7 - Normal   23
Distribution
Next Class - Lecture 8

• Chapter 5: Sampling Distributions

June 5, 2008        Stat 111 - Lecture 7 - Normal   24
Distribution

```
To top