Types and Sources of Errors in Statistical Data by hcj

VIEWS: 1 PAGES: 29

									Types and Sources of Errors
in Statistical Data




                    SADC Course in Statistics
Types of Errors
• In general, there are two types of errors:
  a. non-sampling errors and
  b. sampling errors.
• It is important for a researcher to be aware of
  these errors, in particular non-sampling errors, so
  that they can be either minimised or eliminated
  from the data collected.




  To put your footer here go to View > Header and Footer   2
Non-sampling errors
  – These are errors that arise during the course of
    all data collection activities.
  – In summary, they have the following
    characteristics:
     • exist in both sample surveys and censuses
       data.
     • difficult to measure .




  To put your footer here go to View > Header and Footer   3
Sources of non-sampling errors
Non-sampling errors arise from:
• defects in the sampling frame.
• failure to identify the target population.
• non response.
• responses given by respondents.
• data processing and
• reporting, among others.




  To put your footer here go to View > Header and Footer   4
Defects in the sampling frame
• This result in coverage errors.
• These occur when there is an omission,
  duplication or wrongful inclusion of units in the
  sampling frame.
• Omissions are referred to as ‘under coverage’
  while duplications and wrongful inclusions are
  called ‘over coverage’.
• These errors are caused by defects such as
  inaccuracy, incompleteness, duplication,
  inadequacy and out of date sampling frames.
• Coverage errors may also occur in field
  operations, that is, when an enumerator misses
  several households or persons during the
  interviewing process.

  To put your footer here go to View > Header and Footer   5
Failure to Identify Target Population
• This occurs when the target population is not
  clearly defined through the use of imprecise
  definitions or concepts or when the survey
  population does not reflect the target population
  due to an inadequate sampling frame and poor
  coverage rules.




  To put your footer here go to View > Header and Footer   6
Response

• They result from the data that have been
  requested, provided, received or recorded
  incorrectly.
• They may occur as a result of inefficiencies with
  the questionnaire, the interviewer, the respondent
  or the survey process.




  To put your footer here go to View > Header and Footer   7
a.       Poor questionnaire design
• The content and wording of the questionnaire may
  be misleading and the layout of the questionnaire
  may make it difficult to accurately record
  responses.
• As a rule, questions in questionnaire should not be
  loaded, double-barrelled, misleading or
  ambiguous, and should be directly relevant to the
  objectives of the survey.
• It is essential to pilot test questionnaires to
  identify questionnaire flow and question wording
  problems, and allow sufficient time for
  improvements to be made to the questionnaire.


     To put your footer here go to View > Header and Footer   8
Poor questionnaire design – cont’d
• The questionnaire should then be re-tested to
  ensure changes made do not introduce other
  problems.




  To put your footer here go to View > Header and Footer   9
b.     Interviewer bias
• An interviewer may influence the way a
  respondent answers survey questions.
• To prevent this, interviewers must be trained to
  remain neutral throughout the interviewing
  process and must pay close attention to the way
  they ask each question.




     To put your footer here go to View > Header and Footer   10
c.     Respondent errors
• These arise through the respondent providing
  inaccurate or wrong information.
• They occur because of memory biases or
  respondents giving inaccurate or false information
  when they believe that they are protecting their
  personal interests or integrity.
• They can also arise from the way the respondent
  interprets the questionnaire and the wording of
  the answer that the respondent gives.
• Careful questionnaire design and effective
  questionnaire testing can overcome these
  problems to some extent.
     To put your footer here go to View > Header and Footer   11
d.     Problems with the survey process
• Errors can also occur because of problems with
  the actual survey process such as using proxy
  responses, that is, taking answers from someone
  other than the respondent or lacking control over
  the survey procedure.




     To put your footer here go to View > Header and Footer   12
Non-Response
• Non-response results when data is not collected
  from respondents.
• The proportion of these non-respondents in the
  sample is called the non-response rate.
• Non-response can be either total or partial.
• Total non-response or unit non-response can
  arise if a respondent cannot be contacted
  (because the sampling frame is incomplete or out-
  of-dated) or the respondent is not at home or is
  unable to respond because of language difficulties
  or illness or out rightly refuses to answer any
  questions or the dwelling unit is vacant.
• Other respondents may indicate that they simply
  don't have the time to complete the interview or
  survey form.
  To put your footer here go to View > Header and Footer   13
Non-response - cont’d
• When conducting surveys it is important to
   document information on why a respondent has
   not responded.
• Partial non-response or item non-response
   can occur when a respondent replies to some but
   not all questions of the survey.
• This can arise due to memory problems,
   inadequate information or an inability to answer a
   particular question/section of the questionnaire.
• A respondent may refuse to answer if;
a.     they find questions particularly sensitive, or if
b.     they have been asked too many questions.

   To put your footer here go to View > Header and Footer   14
Non-response - cont’d
• To reduce non-response, the following approaches
  can be used:
   – care should be taken in questionnaire design
     through the use of simple questions.
   – pilot testing of the questionnaire.
   – explaining survey purposes and uses.
   – assuring confidentiality of responses.
   – public awareness activities including discussions
     with key organisations and interest groups,
     news releases, media interview and articles.



  To put your footer here go to View > Header and Footer   15
Processing

• These occur at various stages of data processing
  such as data cleaning, data capture and editing.
• Data cleaning involves taking preliminary checks
  before entering the data onto the processing
  system.
• Coder bias is usually a result of poor training or
  incomplete instructions, variability in coder
  performance and data entry errors.




  To put your footer here go to View > Header and Footer   16
Processing – cont’d
• Inadequate checking and quality management at
  this stage can introduce data loss (where data is
  not entered into the system) and data duplication
  (where the same data is entered into the system
  more than once) thus introducing errors in data.
• To minimise these errors, processing staff should
  be given adequate training, instructions and
  realistic workloads.




  To put your footer here go to View > Header and Footer   17
Time Period Bias
• This occurs when a survey is conducted during an
  unrepresentative time period.
• Survey timing is thus important and failure to
  recognise this introduces errors in data.




  To put your footer here go to View > Header and Footer   18
Analysis and Estimation

• Analysis errors include any errors that occur when
  using wrong analytical tools or when preliminary
  results are used instead of the final ones.
• Errors that occur during the publication of the
  data results are also considered as analysis errors.
• Estimation errors occur when inappropriate or
  inaccurate weights are used in the estimation
  procedure thus introducing errors to the data.
• They also occur when wrong estimators are
  selected by the analyst.



  To put your footer here go to View > Header and Footer   19
Reducing non-sampling errors
• Can be minimised by adopting any of the following
  approaches:
   – using an up-to-date and accurate sampling
     frame.
   – careful selection of the time the survey is
     conducted.
   – planning for follow up of non-respondents.
   – careful questionnaire design.
   – providing thorough training and periodic
     retraining of interviewers and processing staff.



  To put your footer here go to View > Header and Footer   20
Reducing non-sampling errors – cont’d

-   designing good systems to capture errors that
    occur during the process of collecting data,
    sometimes called Data Quality Assurance
    Systems.




    To put your footer here go to View > Header and Footer   21
Sampling error
• Refer to the difference between the estimate
  derived from a sample survey and the 'true' value
  that would result if a census of the whole
  population were taken under the same conditions.
• These are errors that arise because data has been
  collected from a part, rather than the whole of the
  population.
• Because of the above, sampling errors are
  restricted to sample surveys only unlike non-
  sampling errors that can occur in both sample
  surveys and censuses data.



  To put your footer here go to View > Header and Footer   22
Sampling errors – cont’d
• There are no sampling errors in a census because
  the calculations are based on the entire
  population.
• They are measurable from the sample data in the
  case of probability sampling.
• More will be discussed in detail in more advanced
  modules of the training programme.




  To put your footer here go to View > Header and Footer   23
Factors Affecting Sampling Error
It is affected by a number of factors including:
a.     sample size.
• In general, larger sample sizes decrease the
   sampling error, however this decrease is not
   directly proportional.
• As a rough rule of the thumb, you need to
   increase the sample size fourfold to halve the
   sampling error but bear in mind that non sampling
   errors are likely to increase with large samples.
b.     the sampling fraction.
• this is of lesser influence but as the sample size
   increases as a fraction of the population, the
   sampling error should decrease.


  To put your footer here go to View > Header and Footer   24
Factors Affecting Sampling Error – cont’d
c.    the variability within the population.
• More variable populations give rise to larger
   errors as the samples or the estimates calculated
   from different samples are more likely to have
   greater variation.
• The effect of variability within the population can
   be reduced by the use of stratification that allows
   explaining some of the variability in the
   population.
d.    sample design.
• An efficient sampling design will help in reducing
   sampling error.

  To put your footer here go to View > Header and Footer   25
Characteristics of the sampling error
• generally decreases in magnitude as the sample
  size increases (but not proportionally).
• depends on the variability of the characteristic of
  interest in the population.
• can be accounted for and reduced by an
  appropriate sample plan.
• can be measured and controlled in probability
  sample surveys.




  To put your footer here go to View > Header and Footer   26
Reducing sampling error
 If sampling principles are applied carefully within
  the constraints of available resources, sampling
  error can be kept to a minimum.




  To put your footer here go to View > Header and Footer   27
Sources
  – http://www.nss.gov.au/nss/home.nsf/S
    urveyDesignDoc/4354A8928428F834CA2
    571AB002479CE?OpenDocument
  – http://www.statcan.ca/english/edu/pow
    er/ch6/nonsampling/nonsampling.htm
  – http://www.statcan.ca/english/edu/pow
    er/ch6/sampling/sampling.htm




  To put your footer here go to View > Header and Footer   28
To put your footer here go to View > Header and Footer   29

								
To top