# Exploratory Data Analysis by nikeborome

VIEWS: 8 PAGES: 30

• pg 1
```									Statistical Inference and
The Normal Distribution

STA 570 401-402
Spring 2006
Review of Inference

   The group of all individuals we are interested
in is called the population. We rarely
actually observe the entire population. If our
question is “will extending the school year by
5 days increase student learning?” then we
are interested in ALL students. We are never
going to design an experiment involving ALL
students.
Parameters

   Numerical aspects of the population are
called parameters. If our population is all
people who drive to work, one parameter is
their average drive time each morning.
   Because we rarely see the entire population,
parameters are typically unknown.
   The goal of inference is to estimate these
unknown parameters.
Samples and Statistics

   We typically observe a small fraction of the
population (we’d prefer to see all of it, but
that just typically isn’t practical). The group
we observe is called the sample. We see
them, we can measure them, etc.
   Any numerical aspect of the sample is called
a statistic. Suppose again we are interested
in the drive time of all drivers, and we send
out a survey. The people who respond are
the sample. Their average drive time is
called the sample mean.
Statistics to Parameters

   Fortunately, probability theory tells us that if
our sample is drawn correctly (i.e. randomly),
then our statistic will be close to our
parameter, allowing us to make educated
guesses about the parameter of interest.
   Drawing a random sample is sometimes
easy, and sometimes difficult (stay tuned,
we’ll cover this more as we go). For now,
we’re going to assume we have a good
sample.
Remember the main idea

   We do NOT see the parameter, we DO see
the statistic.
   Probability theory says there is a little “tether”
connecting the two.
   Imagine seeing a hot air balloon (the
statistic) on a tether over some treetops. You
can’t see where on the ground it is tethered
(the parameter), but you can make a good
guess.
Some limitations of the tether idea

   I like the tether idea, but there are limitations
on how far it applies.
   The “tether” is only probabilistic. It says
things like “there is a 95% chance the
statistic will be within (some number) of the
parameter” and “there is a 99% chance the
statistic will be within (some other number) of
the parameter”, and so on.
More on tethers, continued

   To get a larger probability, you have to increase the
length of the tether. This, I hope, is intuitive. To be
more sure of the result, you have to give the
statistics more room to move.
   If you’re aiming at a dartboard, there is a small
chance you’ll hit the little circle in the middle. There
is a larger chance you’ll hit the dartboard (it’s
bigger). There is a great chance you’ll hit the wall.
The bigger the target, the better the chance of hitting
it. Hence, the longer the tether, the better the chance
of finding the parameter.
Binomial distribution review

   Recall a binomial setting consists of a set of
   1) dichotomous (two-valued) responses
   2) equal chance of success for each
response
   3) independence (responses do not influence
each other)
Inference with Binomial distributions

   Under the binomial setting, if p is the
population proportion, then the sample
proportion phat has a 95% chance of being
within the region p ± 1.96 sqrt(p(1-p)/n)
   In practice, p is unknown, so we use phat to
construct our tether length as well. The
length of the tether (really called the “margin
of error”) is 1.96 sqrt(phat(1-phat)/n)
Binomial Confidence intervals

   In practice, suppose we have n observations in a
binomial setting. We can use those to compute phat
(p remains unknown). A 95% confidence interval for
p is

   Phat ± 1.96 sqrt(phat(1-phat)/n)
   To get a 90% confidence interval, replace 1.96 with
1.645. To get a 99% confidence interval, replace
1.96 with 2.576. Typically large values are used, but
you could in theory find a 50% confidence interval,
where the coefficient is 0.674
Another example

   Does a personal phone call make students
more likely to enroll? Suppose you sample
200 admitted students at random and make
a personal phone call encouraging them to
attend your university. Of those 200, 127
eventually enroll. Construct a 90%
confidence interval for the proportion of
called students who enroll.
Another example continued

   Population = all students who may receive a phone
call.
   Sample = the students you actually called (the 200)
   phat = 127/200 = 63.5%
   For 90% confidence, the margin of error is 1.645
sqrt(phat(1-phat)/n) = 1.645 sqrt(0.635*0.365/200) =
1.645 sqrt(0.034) = 0.056.
   The 90% confidence interval is 0.635 ± 0.056, or
between 57.9% and 69.1%
To repeat, because it’s important

   If you want more confidence (a better chance
of your interval containing the parameter),
you have to increase the width of your
interval (that’s why the coefficients increase,
from 1.645 for 90% to 2.576 for 99%)
   Larger sample sizes produce more accuracy
than smaller sample sizes.
Normal Distributions

   So where did the 1.96, the 1.645, and the
2.576 come from?
   Answer – the normal distribution, also known
as a Gaussian distribution, the error function,
the “bell curve”, and probably others.
   In any case, the normal distribution is your
friend.
You’ve probably all seen a bell curve…
The Normal distribution is common

   Lots of real data follows a normal shape. For
example
   1) Many/most biometric measurements
(heights, femur lengths, skull diameters, etc.)
   2) Scores on many standardized exams (IQ
tests) are forced into a normal shape before
reporting
   3) Many quality control measurements, if you
take the log first, have a normal shape.
When sampling from a normal

   Normal distributions are typically
characterized by two numbers, their mean or
“expected value” which corresponds to the
peak, and their “standard deviation” which is
the distance from the mean to the inflection
point.
   Large standard deviations result in “spread
out” normals. Small standard deviations
result in “strongly peaked” distributions.
Two normals, corresponding to
different standard deviations.

   Mean=100, std.dev = 16
   Mean=100, std.dev = 4
Probabilities from a Normal
distribution

   Normal distributions have a nice property
that, knowing the mean (μ) and standard
deviation (σ), we can tell how much data will
fall in any region.
   Examples – the normal distribution is
symmetric, so 50% of the data is smaller
than μ and 50% is larger than μ.
More Normal Probabilities

   It is always true that about 68% of the data
appears within 1 standard deviation of the
mean (so about 68% of the data appears in
the region μ±σ)
Yet more normal probabilities

   It is also true about 95% of the appears
within 2 standard deviation of the mean, and
about 99.7% of the data appear within 3
standard deviations of the mean (so it’s
VERY rare to go beyond 3 standard
deviations
   Preview of coming attractions, the EXACT
number is that 95% of the data is within 1.96
standard deviations of the mean. That’s
where the 1.96 comes from.
95% within 2 standard deviations,
99.7% within 3 standard deviations
Computing more general probabilities

   Suppose you want to know how much data
appears within 1.5 standard deviations of the
mean, or how much data appears between
1.3 and 1.7 standard deviations of the mean.
   Real answer – use SAS or any of several
other programs.
Another way

   There is another way of computing normal
probabilities that is 1) the way it used to be
done, back in pre-handy-computer days, 2)
useful for understanding more about the
normal distribution.
   The number of standard deviations an
observation is from the mean is called the Z-
score for that observation.
Z-score examples

   If μ=100 and σ=16 (this is true of IQ scores in
the U.S.), then an observation X=125 is 25
points above the mean, which corresponds
to 25/16 = 1.5625 standard deviations above
the mean.
   If general, a Z-score for an observation X is
Z=(X-μ)/σ
   Observations above the mean get positive Z-
scores, observations below the mean get
negative Z-scores.
Computing probabilities with Z-scores

   Fortunately, the Z-score is all you need to
know to compute probabilities from a normal
distribution.
   The reason is that Z-scores map directly to
percentiles.
   For each Z-score SAS can provide the
percentile (to be shown in lab). For example,
if the Z-score is 1, the percentile is 84.13%. If
the Z-score is 2.3, then the percentile is
98.93%
Probabilities between Z-scores

   Again, IQ scores are normally distributed with mean
100 and standard deviation 16.
   How many people have IQ scores between 90 and
120?
   Compute the corresponding Z-scores. For 90, the Z-
score is (90-100)/16 = -0.625. For 120, the Z-score is
(120-100)/16 = 1.25.
   Find the corresponding percentiles (SAS). The
percentile for Z=1.25 is 89.43%. The percentile for
Z=(-0.625) is 26.6%.
   The amount between these is 89.43 – 26.60 =
62.83%
Comparing observations from different
normal distributions

   The central idea is that a Z-score
corresponds to a percentile for the
observations.
   If you have observations from multiple
normal distributions, you can compute the Z-
score for each observations and compare
which has the “better” score.
Example

   Suppose you have two students, one with a 23 on
the ACT (mean 22 and standard deviation 3) and
another with a 1220 on the SAT (mean 900 and
standard deviation 250).
   The Z-score for the student with the ACT is (23-22)/3
= 0.33 while the Z-score for the student with the SAT
is (1220-900)/250 = 1.28.
   The student with the SAT performed much better
(relative to peers on the exam).

```
To top