VIEWS: 7 PAGES: 44 POSTED ON: 11/1/2012
1990 Tools for Civil Society to Understand and Use Development Data: Improving MDG Policymaking and Monitoring Module 8: 2015 Living with Error 1 1990 What you will learn from this module • What causes error in MDG indicators (MDGi’s) • The 3 types of error in MDGi’s, and how they differ 2015 2 1990From where does error derive? • MDG indicators are derived from data • Data represent the population from which they were collected • Any shortfall in the data collection and handling 2015 system will, thus, cause error in the MDGi’s 3 1990 Types of Error We can identify three types of error in MDG indicators (and other summary statistics): • Computation error • Bias error 2015 • Sampling error 4 1990 Computation Error • Errors made in the calculation of the MDG indicators, or its components • Purely due to avoidable mistakes • Less likely when calculation is automated 2015 5 1990 Bias Error Bias error is a systematic error that causes all measured values to deviate from the true value in a consistent direction, higher or lower • Arises when the characteristics of the population from which the sampling frame is drawn differ from the characteristics of the target population • Almost always a big issue when administrative 2015 data are used in deriving the MDGi in developing countries • Also are often an issue when survey data are used 6 1990 Sample Means 1. Bias (male) Bias Error (2) xxxxx 2. Bias (female) xxxxx 3. No bias xxxxx 2015 Population value x Measurement scale 7 1990 Sampling Error • May be thought of as “the difference between a sample and the population from which it was derived” • Always present when sample survey data are used to derive the MDGi 2015 • Not an issue with administrative data (unless these are only collected from a sample) • Not an issue with a census 8 1990 Sampling Error (2) Sampling error Sample mean (male) X 2015 Population value: X Measurement scale 9 1990 Cumulative effect of bias and sampling error Bias error Sample mean x Sampling error 2015 Population value: X Measurement scale 10 1990 SAMPLING ERROR 2015 11 1990 Dozenland: An Example of Sampling Error Dozenland is the world’s smallest country It has only 12 households, each of which is composed by a single person 2015 12 1990 The Problem Estimate the average income (in Dozenland dollars) per person How shall we do this? 1)Using a census (true value) 2015 2)Using a household sample of size 4 3)Using all possible household samples of any size 13 1990 Census Data Head of Household (initials) WJK RNC MM Income (D$) 4200 7500 4700 JHR 6900 HRP 5900 KP 6400 IMW 4300 RDS 3100 2015 DGN 4700 DC 4500 MGK 7000 DJP 6400 Total 65600 Average 5466.7 14 1990 Sample of 4 Dozenland government has insufficient funds to carry out a census, so instead it decides to sample four of the twelve households At random, it samples the households headed by WJK, MM, DC, DJ Thus sample results are 4200, 4700, 4500, 7000 2015 Dozenland dollars (D$) Sample average is: (4200+4700+4500+7000)/4 = 5100 D$ 15 1990 Real Error Since we know the true answers from the hypothetical census, we can see the exact error in our sample-based estimate The error in the estimate of the mean is 5100 - 5466.7 = -366.7 Dozenland dollars (D$) 2015 i.e. we have underestimated average income by about 7% 16 1990 Interpretation • This is NOT bias error, since the sample was random • It is purely a result of the sample being different from the population 2015 17 1990 Can We do Better? 1. Use samples of different sizes (The easiest way to do so is to use a larger sample, making the sample more similar to the population from which it is drawn) 2. Rely on statistical theory, which tells us how to estimate the sampling error 2015 18 1990 Summary results from taking all possible samples ALL possible samples of size n (ranging from 1 to 12) from the 12 households n S Mean Variation 1 12 5466.7 1327.5 2 66 5466.7 895.0 n = sample size; S = 3 220 5466.7 693.3 number samples of 4 495 5466.7 566.0 5 792 5466.7 473.6 size n 6 924 5466.7 400.3 2015 7 792 5466.7 338.3 8 495 5466.7 283.0 9 220 5466.7 231.1 10 66 5466.7 179.0 11 12 5466.7 120.7 12 1 5466.7 19 1990 What can we conclude? If you take all possible sample sizes available, the mean of the means will always be the same and will be equal to the true population mean • The variation from sample-to-sample decreases 2015 as the sample size (n) gets bigger • That is, there is less uncertainty in the estimate as the sample size increases 20 1990 Here’s a Big Problem • In real life we will only take ONE sample • Thus we cannot see how values vary from sample-to-sample for any given sample size, n • That is, we cannot measure the mean, or the variation, over all samples 2015 21 1990 Here’s a Solution • We can estimate the sample-to-sample variation (“standard error”) from the single sample • This helps us to understand how our sample mean may differ from the true population mean Let us consider the sample of four households The values in the sample are: 4200, 4700, 4500, and 2015 7000. This yields: • Mean = 5100 • Standard Error = 524 • 95% confidence interval = 5100 ± 1666 = [3434 to 6766] 22 1990 Common Sampling Schemes • Simple random sampling • Stratified sampling – sample independently within important groups (“strata”) of the population –Generally decreases sampling error at minimal extra cost • Cluster or multi-stage sampling – sample (or 2015 sub-sample within) entire groups (“clusters”) of the population –Generally increases sampling error, but saves money and time 23 1990 Statistical Theory to Practice • Statistics textbooks tell us how to deal with – complex survey designs – proportions, ratios and other summaries of data – CIs with any degree of % confidence 2015 • Although the theory differs, the principles, practice and interpretation follow exactly as for the simple case we have considered 24 1990 BIAS ERROR 2015 25 1990Missing the Target Population In many cases, bias arises because we obtain data from a population that is not the one we really should be using, called the target population 2015 Example: vital registration Target population: all deaths Population used: urban areas 26 1990 Does Bias Error Matter? Whether or not bias error occurs depends upon the difference between • the characteristics of persons included in the population used for data collection, and the • characteristics of the persons not included 2015 Example: are infant deaths more common in rural than in urban areas? 27 1990 Common Sources of Bias • Deliberate selection • Errors in defining the population • Non-response and Human fallacy 2015 Note: that there is some overlap between these groupings 28 1990 Deliberate Selection This is where some members of the target population have a greater chance of selection into the sample than do others Example: household surveys of income • An enumerator may not bother to visit isolated households, which are hard to access 2015 • Such households are more likely to be self- dependent, with low income • Result is upward bias in average income 29 1990 Errors in Defining the Population This is where the population has been incorrectly specified • We get data for a population either from administrative systems or sample surveys • Incomplete administrative records (rating lists, taxpayers' lists, land registers company registers, the voting register or street maps) or weak sampling frames from which sample is drawn can cause bias 2015 • In sample surveys the error may arise because the sampling frame being used is inadequate Classic example: use of a telephone to question potential respondents 30 1990 Missing Groups Sampling frames or administrative systems might be inadequate in that clusters of the population are missing and therefore could not be sampled. Examples: • Sampling frame: list of households omit people in institutions such as orphanages 2015 • Administrative systems: Business register may omit most or all rural businesses 31 1990 Omission and Superfluous Units On the other hand the frame might cover all broad sectors but may have some units omitted or some “foreign elements”. For example: • Survey: A list of households used as a sampling frame may omit persons who have recently moved to the area/or mover away • Administrative systems: A business frame might 2015 omit the new businesses started up in the last year because they have not yet been listed or business register might include businesses that have recently closed. 32 1990 Duplicated Units Some units in the population might appear twice or more. Examples: Administrative data: A business that moves to a new location may be included in 2015 register in both locations 33 1990 Advantages or disadvantages to listing The quality of administrative records can depend in part on the incentives of registration • If subsidies are offered to registrants, then there may be an incentive to register fraudulently • If registrants are taxed, then they may attempt to avoid registration. 2015 Example: Casley and Lury (1981) give an example of a Caribbean finance department who offered fertilizer subsidies for every registered piece of land on an island They later found that they were paying subsidies for an area greater than the entire island! 34 1990 Non-Response and Human Fallacy Non-Response May be classified into three types: a) Those unable to respond b) Absentees 2015 c) Refusals 35 1990 Non-Response and Human Fallacy Human Fallacy • Influenced responses occur when respondents are encouraged to answer in a certain way Example 1: farmers might inflate their land holdings, by always rounding figures upwards, because they believe that the survey results will be used to allocate state aid, or…. 2015 Example 2: the farmers might deflate, by rounding down, in the hope of minimize taxation 36 1990 Leading Questions and Prestige Error Sometimes response bias is caused through leading questions such as, 'Do you agree that meat eating is barbaric?' Most people like to please and/or will take the easy option of agreeing in the hope of avoiding further questions! 2015 Many people do not want to appear uninformed. On occasions the very appearance of the enumerator can cause bias 37 1990 TOTAL ERROR 2015 38 1990 Total Error We have seen that sampling error will decrease as the sample size increases Unfortunately the reverse is generally true about bias error: it tends to increase as sample size increases 2015 39 1990 Root Mean Square Error The total error, sampling and bias combined, is measured by the root mean square error, (RMSE) This is defined as RMSE = (Sampling error)2 (Bias) 2 2015 RMSE Bias Sampling error 40 1990 How Should We Treat Error? • Quantify it, if we can – generally only possible for sampling error • Acknowledge it, when this does not cause confusion or lead to lack of trust • Record it through use of metadata 2015 • Treat small differences in MDGi’s with scepticism – differences may be due to error 41 1990 How Can We Minimize Error? • Use a larger sample size • Use a better sample design (e.g. stratified) • Be more careful in survey administration (e.g. minimize non-response) • Increase coverage of administrative data 2015 • Use statistical models to average over time periods/countries etc. (e.g. FAO method for hunger indicators in MDG1) 42 1990 Summary There are 3 types of error that may have affected an MDG indicators: • Computation error may be avoided by careful arithmetic or appropriate use of software • Sampling error is unavoidable whenever sample survey data are used 2015 • Bias error is often present, not always obvious, but can sometimes be minimised by taking care in the data collection process 43 1990 Practical 8 • List three ways by which bias error may arise • List two methods which can be used to reduce sampling error 2015 44