Sampling Methods
Document Sample


Sampling Methods and Sampling
Distributions
Potential sampling errors
Sampling Distributions and the Central
Limit Theorem
Confidence Intervals
Review
Review of terms
A target population is the entire group of
elements about which we want information.
A sample is part of the target population.
Inference
Making an inference means using sample
results to describe the population.
Sample Inference Population
(Known) (Unknown)
We don’t know the mean of the population so we
have to infer it from samples of the population
Sampling Questions
What errors might there be in a sample
conducted over the phone?
If you wanted to estimate the number of
people who would vote Liberal if an
election was held tomorrow, how would
you go about it?
Sampling Terminology
An Element is an object on which we take a
Measurement. Objects that are people are called
Subjects.
A Target Population is a collection of elements
about which we wish to make an Inference.
Sampling Units are non-overlapping collections
of elements from the target population.
A Frame is a list of sampling units.
The sampling Design specifies the Method of
selecting the sample.
Errors in Survey Sampling
Selection Error
Sampling frame does not represent target population. We
exclude members of the target population from the
sample.
Interested in determining filmgoer’s attitudes toward
horror films. Sampling frame is households that own a
VCR. Many filmgoers do not own VCRs. We have
committed the selection error.
Increasing the Sample Size Will Not Help.
Errors in Survey Sampling
Response Error
Respondents do not:
1) Understand question
2) Have the information
3) Want to give the information.
Ask 13 year old school students the following question:
“How often do you imbibe intoxicating spirits?”
Respondents may not understand or be honest.
Increasing the Sample Size Will Not Help.
Errors in Survey Sampling
Non-Response Error
Respondents are not representative of sampling
frame.
– Be concerned when a large percentage of the sampling
frame does not respond.
– Lower income families may ignore mailed surveys.
– Families with two wage earners eat out often and are
often not at home when an interviewer calls.
Increasing the Sample Size Will Not Help
More terms:
Parameters and Statistics
A population parameter is a numerical
measure that describes the target
population.
A sample statistic is an estimate of the
unknown population parameter and will
vary from sample to sample.
A small population (N = 5)
Number of bedrooms per household:
1 2 2 3 5
1 2 2 3 5
2.6
5
(1 2.6) 2 (2 2.6) (3 2.6) (5 2.6)
2 2 2 2
1.36
5
Note that the denominator for the standard
deviation calculation is N = 5 because this is a
population
A single sample of size n =2
from the population of N =5
25
x 3.5
2
(2 3.5) 2 (5 3.5) 2
s 2.12
2 1
Do not expect:
Sample mean to equal population mean of 2.6.
Sample standard deviation to equal
population standard deviation of 1.36.
Sample Statistics
Note that the sample standard deviation has
a different formula than the population
standard deviation
To help keep the ideas separate we have
different symbols for populations and
samples:
The sample mean is x
The sample standard deviation is s
Margin of Error
Because samples statistics and population
parameters are inevitably (usually) going to
be different we have some error when we
take a sample.
But what affects the amount of error?
Dartboard example
Margins of Error
House type Bedrooms Margin of Error
V 1 Possible difference between
the sample result and the
W 2 result we would obtain if we
X 2 selected the entire population.
Y 3 Want as small as possible.
Z 5
Samples of 3 from population
Samples Value Value Value Sample Mean
V,W,X 1 2 2 1.67
V,W,Y 1 2 3 2
V,W,Z 1 2 22
5 2.67
... ...
W,Y,Z 2 3 5 3.33
X,Y,Z 2 3 5 3.33
Effect of sample size
Samples of Samples of
Size 3 Size 4
Population Mean 2.6 2.6
Largest and
Smallest
3.33 3
Sample Means 1.67 2
Maximum
Margin of Error 0.93 0.6
Increasing the sample size reduces the margin of
error
Effect of level of confidence
MOE = + .5 MOE = + .7
Population Mean 2.6 bdrms 2.6 bdrms
Number of Intervals
that Contain 6 7
Population Mean
Level of
60% 70%
Confidence
Increasing the level of confidence increases the
margin of error
Effect of population variance
New Population:
1 2 4 6 7 bdrms
Small Variance Large Variance
Population Mean 2.6 4
Largest and
1.67 2.33
Smallest Sample
3.33 5.67
Means
Maximum
0.93 1.67
Margin of Error
As the variance of the population increases the
margin of error also increases
Summary:
Sampling Lessons
Increasing the sample size
reduces the margin of error.
If we increase the level of confidence in an
inference, the price we pay is in the margin of
error.
As the variability of the target population
increases, the margin of error increases.
Sampling Distribution
What is a sampling distribution of the
mean?
Bedrooms (samples of n = 3)
The sampling distribution contains
all possible sample means.
x 2.6 = mean
1.36
x 0.79
n 3
= standard error
Sampling distribution
Sample means
3 Population 6 (n=3)
5
2 4
3
1 2
1
0 0
<=1 <=2 <=3 <=4 <=5 <=1 <=2 <=3 <=4 <=5
This is a sampling distribution
Standard Error of the Mean
The standard deviation of the sampling
distribution measures the spread of the sample
means around their mean and is called the
standard error of the mean.
The standard error of the mean is smaller than
the standard deviation of the population.
Why?
2 New Populations (both N=6)
A: 1, 1, 2, 4, 5, 5
B: 1, 2, 3, 3, 4, 5
3 Population A 3 Population B
2 2
1 1
0 0
1 2 3 4 5 1 2 3 4 5
Central Limit Theorem
No matter what the population distribution
looks like, the sampling distribution of the
mean will always end up looking like a
normal distribution (for high enough n).
3 Population A 3 Population B
2 2
1 1
0 0
1 2 3 4 5 1 2 3 4 5
4
Sample Means Sample Means
4
from A from B
3 3
2 2
1 1
0 0
1 1.5 2 2.5 3 3.5 4 4.5 5 1.5 2 2.5 3 3.5 4 4.5
Try playing with the Central Limit
Theorem on the class web page.
- Try different sample sizes (n).
- Try different population distributions.
- See how the sampling distributions
look normal.
10 n=6
n=5
n=4
Population n=3
Samples of size n=2
9
12
6
2.5
18
14
8
16
12
57
10
142
6
10
12
48
1.5
5
8
10
36
4
8
6
1
23
6
4
4
2
4
0.5
12
2
1
2
000
5.1 5.15 5.2 5.25 5.3 5.35 5.4 5.45 5.5 5.55 5.6 5.65 5.7 5.75 5.8 5.85
5.1 5.15 5.2 5.25 5.3 5.35 5.4 5.45 5.5 5.55 5.6 5.65
5.1 5.15 5.2 5.25 5.3 5.35 5.4 5.45 5.5 5.55 5.6 5.65 5.7 5.75 5.8 5.85 5.9 5.95 6
Some Conclusions
Population Sampling Distribution
Mean (unknown) x
Standard
Deviation
(unknown) x
n
Approx Normal
Shape Any Shape
provided n > 30
Estimating Unknown
Population Parameters
Unknown Parameter Sample Statistic
Mean x
Standard
Deviation s
Standard
s
Error
n n
Why does the Central Limit
Theorem work?
As sample size increases
– most sample means will be close to population
mean.
– some sample means will be relatively far above or
below population mean.
– a few sample means will be very far above or
below population mean.
Above bullets describe a normal distribution.
Lessons
The mean of any distribution of the sample
mean is the same as the mean of the population
from which it was derived.
The standard error of the mean is smaller than
the standard deviation of the population.
Lessons
The standard error of the mean decreases as the
sample size increases.
If the population is normal or the sample size is
sufficiently large, the distribution of the sample
mean will be near-normal. We will be able to
use the standard normal table to compute
probabilities for the sample means.
Two assumptions for Central
Limit Theorem to work
1) Samples are drawn randomly from
population (each possible sample has an
equal chance of being chosen)
2) The population is (near) normal or the
sample size is large (n 30)
Overview of Inference
Select Simple Random Sample
Compute Sample Statistics and
Verify Assumptions
Construct a Confidence Interval
that Includes a Margin of Error
Draw Conclusion about a
Population Parameter
Confidence Interval
A confidence interval is a range estimate of
an unknown population parameter.
The level of confidence associated with an
interval estimate is the percentage of
intervals that will include the unknown
population over a large number of similarly
constructed intervals.
– Just like the confidence we had in margin of
error in an earlier lecture (dartboard example)
Confidence Intervals
Sampling
Distribution
of the mean
1.645 1.645
n n
X
90% Samples
1.96 1.96
n n
95% Samples
2.58 99% Samples 2.58
n n
What does 95% confidence look
like? (a = 0.05)
Each probability = 0.025
Intervals and Confidence Level
Sampling
Distribution of
the Mean a
/2 1 -a a
/2
_
X
x =
Intervals (1 - a) % of
Extend from Intervals
Contain .
X Z
n a% Do Not.
to X Z Confidence Intervals
n
Margin of Error
So what a confidence interval does is add
and subtract a margin of error from the
sample mean
The margin of error is:
MOE = z
n
but if we don’t know then we’ll have to use
s (the sample standard deviation) instead.
Assumptions for confidence
intervals
1. Random samples
2. If n < 30 then population must be near
normal to do a confidence interval.
(If n 30 then sampling distribution is
“close enough” to normal whatever the
population.)
Margin of Error - Three Lessons
s
MOE = z
n
Lesson 1 As sample size (n) increases,
margin of error decreases.
Lesson 2 As confidence level increases (z),
margin of error increases.
Lesson 3 As variance increases (s2), margin
of error increases.
Rules of thumb
Some quick rules of thumb for z (values come
from normal distribution)
For confidence of 90% use z = 1.64
For confidence of 95% use z = 2
For confidence of 99% use z = 2.58
T-distribution
What is a t-value?
– Sophisticated statistical way of dealing with smaller sample
sizes by using slightly different values instead of the “rule
of thumb” z-values
Do I have to care?
– No. Increased “accuracy” of t-values possibly spurious and
not worth the added effort. Just think “z-value” wherever
you see t-value and use rule of thumb
If accuracy is important, can I use the t-value anyway?
– Yes. Statpro calculations automatically use t-values.
Width versus meaningfulness
of Confidence Intervals
For n = 1,000 60% CI 99% CI
WIDTH MOE 0.84 s MOE 2.58 s
n n
CONFIDENCE LOW HIGH
GOAL: Narrow Confidence Interval and high level
of confidence.
Try the confidence interval demonstration on
the class web page.
Try different values of a. Count how many of
the confidence intervals contain the population
mean.
Using Statpro
Make sure data is in a column
with label in first row.
Use Statpro function:
Statistical Inference > One
sample analysis…
Select data
Choose “confidence interval
for mean” and input
confidence level (e.g. 95%)
Question:
As the sample size increases, does the
estimated standard error increase, decrease,
or stay the same?
Question:
As the sample size increases, does the
sample standard deviation increase,
decrease, or stay the same?
Question:
As the sample size increases, does the
sample mean increase, decrease, or stay the
same?
Question:
As the sample size increases, does the
margin of error increase, decrease, or stay
the same?
Second hand cars
In a survey of their latest 20 customers, a
second hand car dealer found that the
average age of car buyers is 37.3 years old
with a standard deviation of 4.2 years.
What is a 95% CI for the mean age of
secondhand car buyers?
Small populations
If you have a relatively large sample
compared to the population (n/N > 0.05)
Use correction for confidence interval:
s N n N: number in
x z population
N 1
n: number in
n sample
What did we do?
Talked about margins of error.
Saw how the Central Limit Theorem ensures
that means always have a normal distribution.
Talked about confidence intervals
Reviewed first half of material for subject
Managerial applications
What did you learn today that makes a
difference to the way you manage?
What are the three most important things to
remember from today’s lecture?
Next lecture (after Midterm)
Download data file metrobus.xls and
customerages.xls and bring them on laptop
Read supplementary material on Two
Samples, Matched Pairs and Estimating P.
Get documents about "