Sampling Methods

Shared by:
Categories
Tags
-
Stats
views:
4
posted:
3/28/2012
language:
pages:
56
Document Sample

```							Sampling Methods and Sampling
Distributions
Potential sampling errors
Sampling Distributions and the Central
Limit Theorem
Confidence Intervals
Review
Review of terms
A target population is the entire group of
elements about which we want information.
A sample is part of the target population.
Inference
Making an inference means using sample
results to describe the population.

Sample        Inference            Population

(Known)                            (Unknown)

We don’t know the mean of the population so we
have to infer it from samples of the population
Sampling Questions
What errors might there be in a sample
conducted over the phone?
If you wanted to estimate the number of
people who would vote Liberal if an
election was held tomorrow, how would
Sampling Terminology
An Element is an object on which we take a
Measurement. Objects that are people are called
Subjects.
A Target Population is a collection of elements
about which we wish to make an Inference.
Sampling Units are non-overlapping collections
of elements from the target population.
A Frame is a list of sampling units.
The sampling Design specifies the Method of
selecting the sample.
Errors in Survey Sampling
Selection Error
Sampling frame does not represent target population. We
exclude members of the target population from the
sample.
Interested in determining filmgoer’s attitudes toward
horror films. Sampling frame is households that own a
VCR. Many filmgoers do not own VCRs. We have
committed the selection error.

Increasing the Sample Size Will Not Help.
Errors in Survey Sampling
Response Error
Respondents do not:
1) Understand question
2) Have the information
3) Want to give the information.
Ask 13 year old school students the following question:
“How often do you imbibe intoxicating spirits?”
Respondents may not understand or be honest.
Increasing the Sample Size Will Not Help.
Errors in Survey Sampling
Non-Response Error
Respondents are not representative of sampling
frame.
– Be concerned when a large percentage of the sampling
frame does not respond.
– Lower income families may ignore mailed surveys.
– Families with two wage earners eat out often and are
often not at home when an interviewer calls.
Increasing the Sample Size Will Not Help
More terms:
Parameters and Statistics
A population parameter is a numerical
measure that describes the target
population.
A sample statistic is an estimate of the
unknown population parameter and will
vary from sample to sample.
A small population (N = 5)
Number of bedrooms per household:
1      2    2      3    5

1 2  2  3  5
                   2.6
5
(1  2.6)  2  (2  2.6)  (3  2.6)  (5  2.6)
2             2          2           2
                                                    1.36
5

Note that the denominator for the standard
deviation calculation is N = 5 because this is a
population
A single sample of size n =2
from the population of N =5
25
x      3.5
2
(2  3.5) 2  (5  3.5) 2
s                            2.12
2 1
Do not expect:
Sample mean to equal population mean of 2.6.
Sample standard deviation to equal
population standard deviation of 1.36.
Sample Statistics
Note that the sample standard deviation has
a different formula than the population
standard deviation
To help keep the ideas separate we have
different symbols for populations and
samples:
The sample mean is x
The sample standard deviation is s
Margin of Error
Because samples statistics and population
parameters are inevitably (usually) going to
be different we have some error when we
take a sample.
But what affects the amount of error?

Dartboard example
Margins of Error

House type   Bedrooms   Margin of Error

V           1       Possible difference between
the sample result and the
W            2       result we would obtain if we
X           2       selected the entire population.

Y           3       Want as small as possible.

Z           5
Samples of 3 from population
Samples   Value   Value   Value   Sample Mean
V,W,X       1       2       2        1.67
V,W,Y       1       2       3          2
V,W,Z       1       2      22
5        2.67
...                                 ...
W,Y,Z       2       3       5        3.33
X,Y,Z       2       3       5        3.33
Effect of sample size
Samples of   Samples of
Size 3       Size 4
Population Mean       2.6          2.6
Largest and
Smallest
3.33           3
Sample Means        1.67           2
Maximum
Margin of Error     0.93          0.6

Increasing the sample size reduces the margin of
error
Effect of level of confidence
MOE = + .5   MOE = + .7

Population Mean       2.6 bdrms    2.6 bdrms
Number of Intervals
that Contain          6             7
Population Mean
Level of
60%          70%
Confidence

Increasing the level of confidence increases the
margin of error
Effect of population variance
New Population:
1 2 4 6 7 bdrms
Small Variance   Large Variance

Population Mean        2.6               4
Largest and
1.67             2.33
Smallest Sample
3.33             5.67
Means
Maximum
0.93            1.67
Margin of Error

As the variance of the population increases the
margin of error also increases
Summary:
Sampling Lessons
Increasing the sample size
reduces the margin of error.
If we increase the level of confidence in an
inference, the price we pay is in the margin of
error.
As the variability of the target population
increases, the margin of error increases.
Sampling Distribution
What is a sampling distribution of the
mean?
Bedrooms (samples of n = 3)

The sampling distribution contains
all possible sample means.

 x  2.6 = mean
 1.36
x              0.79
n     3
= standard error
Sampling distribution

Sample means
3                Population   6               (n=3)
5
2                             4
3
1                             2
1
0                             0
<=1 <=2 <=3 <=4 <=5           <=1 <=2 <=3 <=4 <=5

This is a sampling distribution
Standard Error of the Mean

The standard deviation of the sampling
distribution measures the spread of the sample
means around their mean and is called the
standard error of the mean.
The standard error of the mean is smaller than
the standard deviation of the population.
Why?
2 New Populations (both N=6)
A: 1, 1, 2, 4, 5, 5
B: 1, 2, 3, 3, 4, 5

3               Population A      3               Population B

2                                 2

1                                 1

0                                 0
1   2   3   4   5                 1   2   3   4   5
Central Limit Theorem
No matter what the population distribution
looks like, the sampling distribution of the
mean will always end up looking like a
normal distribution (for high enough n).
3                      Population A                     3               Population B

2                                                       2

1                                                       1

0                                                       0
1    2        3     4       5                           1   2   3   4   5

4
Sample Means                                Sample Means
4
from A                                      from B
3                                                       3

2                                                       2

1                                                       1

0                                                       0
1   1.5   2       2.5   3       3.5   4   4.5   5       1.5 2 2.5 3 3.5 4 4.5
Try playing with the Central Limit
Theorem on the class web page.
- Try different sample sizes (n).
- Try different population distributions.
- See how the sampling distributions
look normal.
10                                                 n=6
n=5
n=4
Population n=3
Samples of size n=2
9
12
6
2.5
18
14
8
16
12
57
10
142
6
10
12
48
1.5
5
8
10
36
4
8
6
1
23
6
4
4
2
4
0.5
12
2
1
2

000
5.1 5.15 5.2 5.25 5.3 5.35 5.4 5.45 5.5 5.55 5.6 5.65 5.7 5.75 5.8 5.85
5.1 5.15 5.2 5.25 5.3 5.35 5.4 5.45 5.5 5.55 5.6 5.65
5.1 5.15 5.2 5.25 5.3 5.35 5.4 5.45 5.5 5.55 5.6 5.65 5.7 5.75 5.8 5.85 5.9   5.95   6
Some Conclusions

Population   Sampling Distribution

Mean           (unknown)      x  

Standard
Deviation
    (unknown)      x 
n
Approx Normal
Shape      Any Shape
provided n > 30
Estimating Unknown
Population Parameters
Unknown Parameter   Sample Statistic

Mean                                 x

Standard
Deviation                             s

Standard
                   s
Error
n                   n
Why does the Central Limit
Theorem work?
As sample size increases
– most sample means will be close to population
mean.
– some sample means will be relatively far above or
below population mean.
– a few sample means will be very far above or
below population mean.
Above bullets describe a normal distribution.
Lessons
The mean of any distribution of the sample
mean is the same as the mean of the population
from which it was derived.

The standard error of the mean is smaller than
the standard deviation of the population.
Lessons
The standard error of the mean decreases as the
sample size increases.

If the population is normal or the sample size is
sufficiently large, the distribution of the sample
mean will be near-normal. We will be able to
use the standard normal table to compute
probabilities for the sample means.
Two assumptions for Central
Limit Theorem to work
1) Samples are drawn randomly from
population (each possible sample has an
equal chance of being chosen)
2) The population is (near) normal or the
sample size is large (n  30)
Overview of Inference
Select Simple Random Sample

Compute Sample Statistics and
Verify Assumptions

Construct a Confidence Interval
that Includes a Margin of Error

Population Parameter
Confidence Interval
A confidence interval is a range estimate of
an unknown population parameter.
The level of confidence associated with an
interval estimate is the percentage of
intervals that will include the unknown
population over a large number of similarly
constructed intervals.
– Just like the confidence we had in margin of
error in an earlier lecture (dartboard example)
Confidence Intervals
Sampling
Distribution
of the mean                              
  1.645           1.645
n                n
X

90% Samples
                     
  1.96                1.96
n                     n
95% Samples
                       
  2.58    99% Samples   2.58
n                       n
What does 95% confidence look
like? (a = 0.05)

Each probability = 0.025
Intervals and Confidence Level
Sampling
Distribution of
the Mean      a
/2          1 -a          a
/2
_
X
 x = 
Intervals                                         (1 - a) % of
Extend from                                        Intervals
                                          Contain .
X Z
n                                         a% Do Not.

to X  Z             Confidence Intervals
n
Margin of Error
So what a confidence interval does is add
and subtract a margin of error from the
sample mean
The margin of error is:

MOE = z 
n
but if we don’t know  then we’ll have to use
s (the sample standard deviation) instead.
Assumptions for confidence
intervals
1. Random samples
2. If n < 30 then population must be near
normal to do a confidence interval.
(If n  30 then sampling distribution is
“close enough” to normal whatever the
population.)
Margin of Error - Three Lessons
s
MOE = z 
n
Lesson 1      As sample size (n) increases,
margin of error decreases.
Lesson 2      As confidence level increases (z),
margin of error increases.
Lesson 3      As variance increases (s2), margin
of error increases.
Rules of thumb
Some quick rules of thumb for z (values come
from normal distribution)
For confidence of 90% use z = 1.64
For confidence of 95% use z = 2
For confidence of 99% use z = 2.58
T-distribution
What is a t-value?
– Sophisticated statistical way of dealing with smaller sample
sizes by using slightly different values instead of the “rule
of thumb” z-values
Do I have to care?
– No. Increased “accuracy” of t-values possibly spurious and
not worth the added effort. Just think “z-value” wherever
you see t-value and use rule of thumb
If accuracy is important, can I use the t-value anyway?
– Yes. Statpro calculations automatically use t-values.
Width versus meaningfulness
of Confidence Intervals
For n = 1,000      60% CI           99% CI

WIDTH        MOE  0.84 s    MOE  2.58 s
n                n

CONFIDENCE          LOW             HIGH

GOAL: Narrow Confidence Interval and high level
of confidence.
Try the confidence interval demonstration on
the class web page.
Try different values of a. Count how many of
the confidence intervals contain the population
mean.
Using Statpro
Make sure data is in a column
with label in first row.
Use Statpro function:
Statistical Inference > One
sample analysis…
Select data
Choose “confidence interval
for mean” and input
confidence level (e.g. 95%)
Question:
As the sample size increases, does the
estimated standard error increase, decrease,
or stay the same?
Question:
As the sample size increases, does the
sample standard deviation increase,
decrease, or stay the same?
Question:
As the sample size increases, does the
sample mean increase, decrease, or stay the
same?
Question:
As the sample size increases, does the
margin of error increase, decrease, or stay
the same?
Second hand cars
In a survey of their latest 20 customers, a
second hand car dealer found that the
average age of car buyers is 37.3 years old
with a standard deviation of 4.2 years.
What is a 95% CI for the mean age of
Small populations
If you have a relatively large sample
compared to the population (n/N > 0.05)
Use correction for confidence interval:

s          N n       N: number in

x  z                       population

N 1
n: number in
n                    sample
What did we do?
Saw how the Central Limit Theorem ensures
that means always have a normal distribution.
Reviewed first half of material for subject
Managerial applications
What did you learn today that makes a
difference to the way you manage?
What are the three most important things to
remember from today’s lecture?
Next lecture (after Midterm)
customerages.xls and bring them on laptop
Samples, Matched Pairs and Estimating P.

```
Related docs
Other docs by dfhdhdhdhjr
PowerPoint Presentation - The Radclyffe School