Sampling Methods

Shared by: dfhdhdhdhjr
Categories
Tags
-
Stats
views:
4
posted:
3/28/2012
language:
pages:
56
Document Sample
scope of work template
							Sampling Methods and Sampling
Distributions
Potential sampling errors
Sampling Distributions and the Central
 Limit Theorem
Confidence Intervals
Review
Review of terms
A target population is the entire group of
 elements about which we want information.
A sample is part of the target population.
 Inference
 Making an inference means using sample
  results to describe the population.

Sample        Inference            Population

(Known)                            (Unknown)

We don’t know the mean of the population so we
have to infer it from samples of the population
Sampling Questions
What errors might there be in a sample
 conducted over the phone?
If you wanted to estimate the number of
 people who would vote Liberal if an
 election was held tomorrow, how would
 you go about it?
Sampling Terminology
An Element is an object on which we take a
 Measurement. Objects that are people are called
 Subjects.
A Target Population is a collection of elements
 about which we wish to make an Inference.
Sampling Units are non-overlapping collections
 of elements from the target population.
A Frame is a list of sampling units.
The sampling Design specifies the Method of
 selecting the sample.
Errors in Survey Sampling
Selection Error
Sampling frame does not represent target population. We
  exclude members of the target population from the
  sample.
Interested in determining filmgoer’s attitudes toward
  horror films. Sampling frame is households that own a
  VCR. Many filmgoers do not own VCRs. We have
  committed the selection error.

Increasing the Sample Size Will Not Help.
  Errors in Survey Sampling
Response Error
Respondents do not:
   1) Understand question
   2) Have the information
   3) Want to give the information.
Ask 13 year old school students the following question:
  “How often do you imbibe intoxicating spirits?”
  Respondents may not understand or be honest.
Increasing the Sample Size Will Not Help.
  Errors in Survey Sampling
Non-Response Error
Respondents are not representative of sampling
 frame.
  – Be concerned when a large percentage of the sampling
    frame does not respond.
  – Lower income families may ignore mailed surveys.
  – Families with two wage earners eat out often and are
    often not at home when an interviewer calls.
Increasing the Sample Size Will Not Help
More terms:
Parameters and Statistics
A population parameter is a numerical
 measure that describes the target
 population.
A sample statistic is an estimate of the
 unknown population parameter and will
 vary from sample to sample.
   A small population (N = 5)
    Number of bedrooms per household:
        1      2    2      3    5

   1 2  2  3  5
                   2.6
          5
   (1  2.6)  2  (2  2.6)  (3  2.6)  (5  2.6)
              2             2          2           2
                                                    1.36
                            5

  Note that the denominator for the standard
   deviation calculation is N = 5 because this is a
   population
   A single sample of size n =2
   from the population of N =5
          25
       x      3.5
           2
          (2  3.5) 2  (5  3.5) 2
       s                            2.12
                    2 1
Do not expect:
  Sample mean to equal population mean of 2.6.
  Sample standard deviation to equal
   population standard deviation of 1.36.
Sample Statistics
Note that the sample standard deviation has
 a different formula than the population
 standard deviation
To help keep the ideas separate we have
 different symbols for populations and
 samples:
The sample mean is x
The sample standard deviation is s
Margin of Error
Because samples statistics and population
 parameters are inevitably (usually) going to
 be different we have some error when we
 take a sample.
But what affects the amount of error?

Dartboard example
Margins of Error

House type   Bedrooms   Margin of Error

    V           1       Possible difference between
                        the sample result and the
   W            2       result we would obtain if we
    X           2       selected the entire population.

    Y           3       Want as small as possible.

    Z           5
Samples of 3 from population
Samples   Value   Value   Value   Sample Mean
V,W,X       1       2       2        1.67
V,W,Y       1       2       3          2
V,W,Z       1       2      22
                            5        2.67
  ...                                 ...
W,Y,Z       2       3       5        3.33
X,Y,Z       2       3       5        3.33
Effect of sample size
                   Samples of   Samples of
                     Size 3       Size 4
Population Mean       2.6          2.6
  Largest and
   Smallest
                     3.33           3
 Sample Means        1.67           2
   Maximum
 Margin of Error     0.93          0.6

Increasing the sample size reduces the margin of
                     error
Effect of level of confidence
                       MOE = + .5   MOE = + .7

 Population Mean       2.6 bdrms    2.6 bdrms
 Number of Intervals
    that Contain          6             7
  Population Mean
    Level of
                         60%          70%
   Confidence

Increasing the level of confidence increases the
                 margin of error
Effect of population variance
                                          New Population:
                                          1 2 4 6 7 bdrms
                    Small Variance   Large Variance

  Population Mean        2.6               4
    Largest and
                        1.67             2.33
  Smallest Sample
                        3.33             5.67
      Means
     Maximum
                         0.93            1.67
  Margin of Error

 As the variance of the population increases the
         margin of error also increases
 Summary:
 Sampling Lessons
Increasing the sample size
    reduces the margin of error.
If we increase the level of confidence in an
 inference, the price we pay is in the margin of
 error.
As the variability of the target population
 increases, the margin of error increases.
Sampling Distribution
What is a sampling distribution of the
 mean?
Bedrooms (samples of n = 3)

 The sampling distribution contains
 all possible sample means.

       x  2.6 = mean
              1.36
      x              0.79
              n     3
          = standard error
     Sampling distribution

                                               Sample means
 3                Population   6               (n=3)
                               5
 2                             4
                               3
 1                             2
                               1
 0                             0
     <=1 <=2 <=3 <=4 <=5           <=1 <=2 <=3 <=4 <=5

This is a sampling distribution
  Standard Error of the Mean

The standard deviation of the sampling
 distribution measures the spread of the sample
 means around their mean and is called the
 standard error of the mean.
The standard error of the mean is smaller than
 the standard deviation of the population.
Why?
2 New Populations (both N=6)
                A: 1, 1, 2, 4, 5, 5
                B: 1, 2, 3, 3, 4, 5


3               Population A      3               Population B

2                                 2

1                                 1


0                                 0
    1   2   3   4   5                 1   2   3   4   5
Central Limit Theorem
No matter what the population distribution
 looks like, the sampling distribution of the
 mean will always end up looking like a
 normal distribution (for high enough n).
3                      Population A                     3               Population B

2                                                       2

1                                                       1

0                                                       0
    1    2        3     4       5                           1   2   3   4   5




4
                                    Sample Means                                Sample Means
                                                        4
                                    from A                                      from B
3                                                       3

2                                                       2

1                                                       1

0                                                       0
    1   1.5   2       2.5   3       3.5   4   4.5   5       1.5 2 2.5 3 3.5 4 4.5
Try playing with the Central Limit
Theorem on the class web page.
- Try different sample sizes (n).
- Try different population distributions.
- See how the sampling distributions
      look normal.
10                                                 n=6
                                                   n=5
                                                   n=4
                                      Population n=3
                                   Samples of size n=2
 9
12
6
2.5
18
14
 8
16
12
57
10
142
 6
10
12
48
1.5
 5
 8
10
36
 4
 8
 6
  1
23
 6
 4
 4
 2
 4
0.5
12
 2
 1
 2

000
       5.1 5.15 5.2 5.25 5.3 5.35 5.4 5.45 5.5 5.55 5.6 5.65 5.7 5.75 5.8 5.85
        5.1 5.15 5.2 5.25 5.3 5.35 5.4 5.45 5.5 5.55 5.6 5.65
      5.1 5.15 5.2 5.25 5.3 5.35 5.4 5.45 5.5 5.55 5.6 5.65 5.7 5.75 5.8 5.85 5.9   5.95   6
Some Conclusions

                 Population   Sampling Distribution


  Mean           (unknown)      x  
                                          
 Standard
 Deviation
                 (unknown)      x 
                                           n
                              Approx Normal
  Shape      Any Shape
                              provided n > 30
Estimating Unknown
Population Parameters
             Unknown Parameter   Sample Statistic

  Mean                                 x

 Standard
 Deviation                             s

 Standard
                                      s
   Error
                    n                   n
  Why does the Central Limit
  Theorem work?
As sample size increases
  – most sample means will be close to population
    mean.
  – some sample means will be relatively far above or
    below population mean.
  – a few sample means will be very far above or
    below population mean.
Above bullets describe a normal distribution.
 Lessons
The mean of any distribution of the sample
 mean is the same as the mean of the population
 from which it was derived.

The standard error of the mean is smaller than
 the standard deviation of the population.
 Lessons
The standard error of the mean decreases as the
 sample size increases.

If the population is normal or the sample size is
 sufficiently large, the distribution of the sample
 mean will be near-normal. We will be able to
 use the standard normal table to compute
 probabilities for the sample means.
Two assumptions for Central
Limit Theorem to work
1) Samples are drawn randomly from
  population (each possible sample has an
  equal chance of being chosen)
2) The population is (near) normal or the
  sample size is large (n  30)
Overview of Inference
    Select Simple Random Sample


   Compute Sample Statistics and
       Verify Assumptions


   Construct a Confidence Interval
   that Includes a Margin of Error


      Draw Conclusion about a
       Population Parameter
Confidence Interval
A confidence interval is a range estimate of
 an unknown population parameter.
The level of confidence associated with an
 interval estimate is the percentage of
 intervals that will include the unknown
 population over a large number of similarly
 constructed intervals.
  – Just like the confidence we had in margin of
    error in an earlier lecture (dartboard example)
Confidence Intervals
Sampling
Distribution
of the mean                              
              1.645           1.645
                          n                n
                                               X

                   90% Samples
                                         
             1.96                1.96
                     n                     n
                  95% Samples
                                       
         2.58    99% Samples   2.58
                 n                       n
What does 95% confidence look
like? (a = 0.05)




        Each probability = 0.025
  Intervals and Confidence Level
  Sampling
Distribution of
  the Mean      a
                 /2          1 -a          a
                                              /2
                                                    _
                                                     X
                           x = 
 Intervals                                         (1 - a) % of
Extend from                                        Intervals
                                                  Contain .
X Z
         n                                         a% Do Not.
           
to X  Z             Confidence Intervals
            n
Margin of Error
So what a confidence interval does is add
 and subtract a margin of error from the
 sample mean
The margin of error is:
                          
           MOE = z 
                            n
but if we don’t know  then we’ll have to use
  s (the sample standard deviation) instead.
Assumptions for confidence
intervals
1. Random samples
2. If n < 30 then population must be near
 normal to do a confidence interval.
     (If n  30 then sampling distribution is
 “close enough” to normal whatever the
 population.)
Margin of Error - Three Lessons
                     s
           MOE = z 
                      n
Lesson 1      As sample size (n) increases,
               margin of error decreases.
Lesson 2      As confidence level increases (z),
               margin of error increases.
Lesson 3      As variance increases (s2), margin
               of error increases.
Rules of thumb
Some quick rules of thumb for z (values come
  from normal distribution)
For confidence of 90% use z = 1.64
For confidence of 95% use z = 2
For confidence of 99% use z = 2.58
T-distribution
What is a t-value?
   – Sophisticated statistical way of dealing with smaller sample
     sizes by using slightly different values instead of the “rule
     of thumb” z-values
Do I have to care?
   – No. Increased “accuracy” of t-values possibly spurious and
     not worth the added effort. Just think “z-value” wherever
     you see t-value and use rule of thumb
If accuracy is important, can I use the t-value anyway?
   – Yes. Statpro calculations automatically use t-values.
Width versus meaningfulness
of Confidence Intervals
  For n = 1,000      60% CI           99% CI

     WIDTH        MOE  0.84 s    MOE  2.58 s
                               n                n


  CONFIDENCE          LOW             HIGH


GOAL: Narrow Confidence Interval and high level
      of confidence.
Try the confidence interval demonstration on
the class web page.
Try different values of a. Count how many of
the confidence intervals contain the population
mean.
Using Statpro
           Make sure data is in a column
             with label in first row.
           Use Statpro function:
           Statistical Inference > One
             sample analysis…
           Select data
           Choose “confidence interval
             for mean” and input
             confidence level (e.g. 95%)
Question:
As the sample size increases, does the
 estimated standard error increase, decrease,
 or stay the same?
Question:
As the sample size increases, does the
 sample standard deviation increase,
 decrease, or stay the same?
Question:
As the sample size increases, does the
 sample mean increase, decrease, or stay the
 same?
Question:
As the sample size increases, does the
 margin of error increase, decrease, or stay
 the same?
Second hand cars
In a survey of their latest 20 customers, a
 second hand car dealer found that the
 average age of car buyers is 37.3 years old
 with a standard deviation of 4.2 years.
What is a 95% CI for the mean age of
 secondhand car buyers?
Small populations
If you have a relatively large sample
 compared to the population (n/N > 0.05)
Use correction for confidence interval:

             s          N n       N: number in

      x  z                       population

                        N 1
                                   n: number in
              n                    sample
 What did we do?
Talked about margins of error.
Saw how the Central Limit Theorem ensures
 that means always have a normal distribution.
Talked about confidence intervals
Reviewed first half of material for subject
Managerial applications
What did you learn today that makes a
 difference to the way you manage?
What are the three most important things to
 remember from today’s lecture?
Next lecture (after Midterm)
Download data file metrobus.xls and
 customerages.xls and bring them on laptop
Read supplementary material on Two
 Samples, Matched Pairs and Estimating P.

						
Related docs
Other docs by dfhdhdhdhjr
PowerPoint Presentation - The Radclyffe School
Views: 10  |  Downloads: 0
Recordkeeping and Record Retention
Views: 0  |  Downloads: 0
US History Sources
Views: 7  |  Downloads: 0
MAC 346 Lecture 4
Views: 0  |  Downloads: 0
Group Work An Overview
Views: 0  |  Downloads: 0