Docstoc

SAMPLING ERROR

Document Sample
SAMPLING ERROR Powered By Docstoc
					1990
Tools for Civil Society to Understand
    and Use Development Data:
 Improving MDG Policymaking and
             Monitoring
               Module 8:




               2015
             Living with Error



                                 1
1990
What you will learn from this module


• What causes error in MDG indicators (MDGi’s)

• The 3 types of error in MDGi’s, and how they differ




                    2015                         2
1990From where does error derive?


• MDG indicators are derived from data

• Data represent the population from which they
  were collected

• Any shortfall in the data collection and handling




                     2015
  system will, thus, cause error in the MDGi’s



                                                      3
1990           Types of Error


We can identify three types of error in MDG
indicators (and other summary statistics):

     • Computation error

     • Bias error




                    2015
     • Sampling error


                                              4
1990        Computation Error


• Errors made in the calculation of the MDG
  indicators, or its components

• Purely due to avoidable mistakes

• Less likely when calculation is automated




                    2015                      5
1990              Bias Error

Bias error is a systematic error that causes all
measured values to deviate from the true value in
a consistent direction, higher or lower
• Arises when the characteristics of the population
  from which the sampling frame is drawn differ from
  the characteristics of the target population
• Almost always a big issue when administrative




                    2015
  data
  are used in deriving the MDGi in developing
  countries
• Also are often an issue when survey data are used
                                                6
1990
Sample Means
1. Bias (male)
                   Bias Error (2)



                               xxxxx

2. Bias (female)          xxxxx

3. No bias                   xxxxx




                      2015
Population value               x
                              Measurement scale
                                            7
1990           Sampling Error


• May be thought of as “the difference between a
  sample and the population from which it was
  derived”

• Always present when sample survey data are
  used to derive the MDGi




                    2015
• Not an issue with administrative data (unless
  these are only collected from a sample)

• Not an issue with a census                       8
1990    Sampling Error (2)



                     Sampling error




Sample mean (male)                       X




               2015
Population value:              X
                            Measurement scale

                                                9
1990
    Cumulative effect of bias and
          sampling error

                    Bias error

Sample mean                      x
                                     Sampling error




                    2015
Population value:      X
                                 Measurement scale



                                                      10
1990
  SAMPLING ERROR




      2015         11
1990
       Dozenland: An Example of
           Sampling Error

Dozenland is the world’s smallest country
 It has only 12 households, each of which is
composed by a single person




                    2015                        12
1990            The Problem


Estimate the average income (in Dozenland dollars)
 per person

How shall we do this?
1)Using a census (true value)




                   2015
2)Using a household sample of size 4
3)Using all possible household samples of any size


                                               13
1990                 Census Data

 Head of Household (initials)
            WJK
            RNC
             MM
                                Income (D$)
                                   4200
                                   7500
                                   4700
            JHR                    6900
            HRP                    5900
             KP                    6400
            IMW                    4300
            RDS                    3100




                           2015
            DGN                    4700
             DC                    4500
            MGK                    7000
            DJP                    6400
            Total                  65600
          Average                  5466.7     14
1990             Sample of 4

 Dozenland government has insufficient funds to
  carry out a census, so instead it decides to sample
  four of the twelve households

 At random, it samples the households headed by
  WJK, MM, DC, DJ

 Thus sample results are 4200, 4700, 4500, 7000




                    2015
  Dozenland dollars (D$)

 Sample average is: (4200+4700+4500+7000)/4 =
  5100 D$
                                                 15
1990              Real Error


Since we know the true answers from the
hypothetical census, we can see the exact error in
our sample-based estimate

The error in the estimate of the mean is
  5100 - 5466.7 = -366.7 Dozenland dollars (D$)




                    2015
i.e. we have underestimated average income by
about 7%
                                                16
1990             Interpretation


• This is NOT bias error, since the sample was
  random

• It is purely a result of the sample being different
 from the population




                     2015                           17
1990        Can We do Better?


1. Use samples of different sizes (The easiest
  way to do so is to use a larger sample, making
  the sample more similar to the population from
  which it is drawn)

2. Rely on statistical theory, which tells us how
  to estimate the sampling error




                   2015                        18
1990
         Summary results from taking
            all possible samples
ALL possible samples of size n (ranging from 1 to 12)
from the 12 households
    n     S    Mean     Variation

     1    12   5466.7    1327.5
     2    66   5466.7    895.0      n = sample size; S =
     3   220   5466.7    693.3      number samples of
     4   495   5466.7    566.0
     5   792   5466.7    473.6      size n
     6   924   5466.7    400.3




                            2015
     7   792   5466.7    338.3
     8   495   5466.7    283.0
     9   220   5466.7    231.1
    10    66   5466.7    179.0
    11    12   5466.7    120.7
    12     1   5466.7
                                                     19
1990     What can we conclude?



 If you take all possible sample sizes available, the
 mean of the means will always be the same and
 will be equal to the true population mean

• The variation from sample-to-sample decreases




                     2015
  as the sample size (n) gets bigger
• That is, there is less uncertainty in the estimate as
  the sample size increases
                                                   20
1990       Here’s a Big Problem

• In real life we will only take ONE sample

• Thus we cannot see how values vary from
  sample-to-sample for any given sample size, n

• That is, we cannot measure the mean, or the
  variation, over all samples




                    2015                          21
1990           Here’s a Solution

• We can estimate the sample-to-sample variation
  (“standard error”) from the single sample

• This helps us to understand how our sample mean
  may differ from the true population mean
  Let us consider the sample of four households
  The values in the sample are: 4200, 4700, 4500, and




                      2015
    7000. This yields:
     • Mean = 5100
     • Standard Error = 524
     • 95% confidence interval = 5100 ± 1666 = [3434 to
       6766]                                          22
1990 Common Sampling Schemes

• Simple random sampling
• Stratified sampling – sample independently
  within important groups (“strata”) of the
  population
   –Generally decreases sampling error at minimal
     extra cost
• Cluster or multi-stage sampling – sample (or




                   2015
  sub-sample within) entire groups (“clusters”)
  of the population
   –Generally increases sampling error, but saves
     money and time
                                              23
1990 Statistical Theory to Practice


• Statistics textbooks tell us how to deal with
   – complex survey designs
   – proportions, ratios and other summaries of data
   – CIs with any degree of % confidence




                    2015
• Although the theory differs, the principles,
  practice and interpretation follow exactly as for
  the simple case we have considered

                                                24
1990
  BIAS ERROR




    2015       25
1990Missing the Target Population


In many cases, bias arises because we obtain data
from a population that is not the one we really
should be using, called the target population




                   2015
Example: vital registration
Target population: all deaths
Population used: urban areas

                                              26
1990     Does Bias Error Matter?

Whether or not bias error occurs depends upon
the difference between

• the characteristics of persons included in the
  population used for data collection, and the

• characteristics of the persons not included




                    2015
  Example: are infant deaths more common
  in rural than in urban areas?
                                                   27
1990    Common Sources of Bias



• Deliberate selection
• Errors in defining the population
• Non-response and Human fallacy




                     2015
Note: that there is some overlap between these groupings

                                                    28
1990       Deliberate Selection

This is where some members of the target
population have a greater chance of selection into
the sample than do others
Example: household surveys of income

• An enumerator may not bother to visit isolated
  households, which are hard to access




                    2015
• Such households are more likely to be self-
  dependent, with low income
• Result is upward bias in average income
                                                   29
1990
  Errors in Defining the Population
  This is where the population has been
  incorrectly specified
• We get data for a population either from
  administrative systems or sample surveys
• Incomplete administrative records (rating lists,
  taxpayers' lists, land registers company registers,
  the voting register or street maps) or weak
  sampling frames from which sample is drawn can
  cause bias




                     2015
• In sample surveys the error may arise because the
  sampling frame being used is inadequate

 Classic example: use of a telephone to question
 potential respondents                        30
1990          Missing Groups


Sampling frames or administrative systems might
be inadequate in that clusters of the population are
missing and therefore could not be sampled.

Examples:
  • Sampling frame: list of households omit people
    in institutions such as orphanages




                    2015
   • Administrative systems: Business register may
     omit most or all rural businesses
                                                 31
1990
   Omission and Superfluous Units
 On the other hand the frame might cover all broad
 sectors but may have some units omitted or some
 “foreign elements”. For example:

• Survey: A list of households used as a sampling
  frame may omit persons who have recently
  moved to the area/or mover away

• Administrative systems: A business frame might




                    2015
  omit the new businesses started up in the last
  year because they have not yet been listed or
  business register might include businesses that
  have recently closed.
                                                32
1990       Duplicated Units


 Some units in the population might appear
 twice or more.

Examples:
Administrative data: A business that moves
to a new location may be included in




                 2015
register in both locations




                                             33
1990
Advantages or disadvantages to listing

  The quality of administrative records can depend in
  part on the incentives of registration

 • If subsidies are offered to registrants, then there
   may be an incentive to register fraudulently
 • If registrants are taxed, then they may attempt to
   avoid registration.




                              2015
 Example: Casley and Lury (1981) give an example of a Caribbean
 finance department who offered fertilizer subsidies for every registered
 piece of land on an island

 They later found that they were paying subsidies for an area greater than
 the entire island!                                                  34
1990
 Non-Response and Human Fallacy
                Non-Response


May be classified into three types:

   a) Those unable to respond
   b) Absentees




                   2015
   c) Refusals




                                      35
1990
 Non-Response and Human Fallacy
                    Human Fallacy

• Influenced responses occur when respondents
  are encouraged to answer in a certain way

Example 1: farmers might inflate their land holdings, by
always rounding figures upwards, because they believe that
the survey results will be used to allocate state aid, or….




                       2015
Example 2: the farmers might deflate, by rounding down, in
the hope of minimize taxation

                                                       36
1990
Leading Questions and Prestige Error

Sometimes response bias is caused through
leading questions such as, 'Do you agree that meat
eating is barbaric?'

Most people like to please and/or will take the easy
option of agreeing in the hope of avoiding further
questions!




                    2015
Many people do not want to appear uninformed.

On occasions the very appearance of the
enumerator can cause bias
                                                 37
1990
  TOTAL ERROR




     2015       38
1990             Total Error


We have seen that sampling error will decrease as
the sample size increases

Unfortunately the reverse is generally true about
bias error: it tends to increase as sample size
increases




                  2015                       39
1990    Root Mean Square Error

The total error, sampling and bias combined, is
measured by the root mean square error, (RMSE)


                     This is defined as

        RMSE = (Sampling error)2  (Bias) 2




                      2015
                             RMSE
              Bias

                      Sampling error
                                              40
1990 How Should We Treat Error?

• Quantify it, if we can
   – generally only possible for sampling error

• Acknowledge it, when this does not cause
  confusion or lead to lack of trust

• Record it through use of metadata




                    2015
• Treat small differences in MDGi’s with scepticism
   – differences may be due to error
                                                  41
1990 How Can We Minimize Error?

• Use a larger sample size

• Use a better sample design (e.g. stratified)

• Be more careful in survey administration
  (e.g. minimize non-response)

• Increase coverage of administrative data




                     2015
• Use statistical models to average over time
  periods/countries etc. (e.g. FAO method for
  hunger indicators in MDG1)
                                                 42
1990              Summary
  There are 3 types of error that may have affected
  an MDG indicators:
• Computation error may be avoided by careful
  arithmetic or appropriate use of software

• Sampling error is unavoidable whenever sample
  survey data are used




                    2015
• Bias error is often present, not always obvious,
  but can sometimes be minimised by taking care in
  the data collection process
                                                43
1990              Practical 8


• List three ways by which bias error may arise

• List two methods which can be used to reduce
  sampling error




                    2015                          44

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:7
posted:11/1/2012
language:English
pages:44