International Program for Development

Document Sample
International Program for Development Powered By Docstoc
					   RealWorld Evaluation
Designing Evaluations under Budget, Time, Data
           and Political Constraints
       American Evaluation Association
      Professional pre-session workshop
                November 5, 2008

           Facilitated by:
         Michael Bamberger
           and Jim Rugh

          Workshop Objectives
1. The seven steps of the RealWorld Evaluation
   approach for addressing common issues and
   constraints faced by evaluators such as: when the
   evaluator is not called in until the project is nearly
   completed and there was no baseline nor
   comparison group; or where the evaluation must be
   conducted with inadequate budget and insufficient
   time; and where there are political pressures and
   expectations for how the evaluation should be
   conducted and what the conclusions should say

          Workshop Objectives
2. Identifying and assessing various design options
   that could be used in a particular evaluation setting
3. Ways to reconstruct baseline data when the
   evaluation does not begin until the project is well
   advanced or completed.
4. How to identify and to address threats to the validity
   or adequacy of quantitative, qualitative and mixed
   methods designs with reference to the specific
   context of RealWorld evaluations

      Workshop Objectives
Note: Given time constraints the workshop
will focus on project-level impact
evaluations. However, if the results of a pre-
workshop survey of participants call for it, a
brief introduction to the application of RWE
techniques in other forms of evaluation,
including the assessment of country
programs and policy interventions, could be

                 Workshop agenda

8.00 – 8.20: Session 1: Introduction:
    •   Workshop objectives
    •   Feedback from participant survey
    •   Handout: RealWorld Evaluation Overview (summary chapter of book)

8.20 – 8.50: Session 2: RealWorld Evaluation overview and
addressing the counterfactual
    •   Handout: “Why evaluators can’t sleep at night”

8.50 - 9.20: Session 3: Small Group discussions.
    •   Participants will introduce themselves and then share experiences on the types of constraints they have
        faced when designing and conducting evaluations, and what they did to try to address those constraints.

9:20-10.00: Session 4: RWE Steps 1, 2 and 3: Scoping the
evaluation and strategies for addressing budget and time
    •   Presentation and discussion

10.00-10.15: BREAK
        Workshop agenda, cont.
10:15 – 10:45: Session 5: RWE Step 4: Addressing data constraints
    •   Presentation and discussion

10.45 – 11.15: Session 6: Mixed methods
    •   Presentation and discussion

11.15 – 12.00: Session 7: Small groups read their case studies and
begin to discuss the learning exercise.
    •   We will use a low-cost housing case study. All four groups will discuss the same project but from different

12:00 – 1:00: LUNCH
2.10 – 1.45: Session 8: Identifying and addressing threats to the
validity of the evaluation design and conclusions
1.45 – 2.30: Session 9: Small groups complete exercise.
    •   Negotiate with your paired group how you propose to modify the ToR of your case study.

2.30 – 2.45: Session 10: Feedback from exercise.
    •   Discussion of lessons learned from the case study or the RealWorld Evaluation approach in general.

2.45 – 3.00: Session 11: Wrap up and workshop evaluation
RealWorld Evaluation
Designing Evaluations under Budget,

Time, Data and Political Constraints

          Session 2. a

RealWorld Evaluation Scenarios
Scenario 1: Evaluator(s) not brought in until near
   end of project
For political, technical or budget reasons:
   • There was no baseline survey
   • Project implementers did not collect
      adequate data on project participants at the
      beginning or during the life of the project
   • It is difficult to collect data on comparable
      control groups

RealWorld Evaluation Scenarios
Scenario 2: The evaluation team is called in
   early in the life of the project
But for budget, political or methodological
 The ‘baseline’ was a needs assessment,
   not comparable to eventual evaluation
 It was not possible to collect baseline data
   on a comparison group

Reality Check – Real-World
Challenges to Evaluation
•   All too often, project designers do not think
    evaluatively – evaluation not designed until the
•   There was no baseline – at least not one with data
    comparable to evaluation
•   There was/can be no control/comparison group.
•   Limited time and resources for evaluation
•   Clients have prior expectations for what the
    evaluation findings will say
•   Many stakeholders do not understand evaluation;
    distrust the process; or even see it as a threat
    (dislike of being judged)
RealWorld Evaluation
Quality Control Goals
   Achieve maximum possible evaluation rigor
    within the limitations of a given context
   Identify and control for methodological
    weaknesses in the evaluation design
   Negotiate with clients trade-offs between
    desired rigor and available resources
   Presentation of findings must recognize
    methodological weaknesses and how they
    affect generalization to broader populations

    The Need for the RealWorld
    Evaluation Approach

   As a result of these kinds of constraints, many
    of the basic principles of impact evaluation
    design (comparable pre-test-post test design,
    comparison group, instrument development and
    testing, random sample selection, control for
    researcher bias, thorough documentation of the
    evaluation methodology etc.) are often

The RealWorld Evaluation

                 An integrated approach to
                 ensure acceptable standards
                 of methodological rigor while
                 operating under real-world
                 budget, time, data and
                 political constraints.

     See handout summary chapter extracted from
      RealWorld Evaluation book for more details

The RealWorld Evaluation
   Developed to help evaluation practitioners
    and clients
    • managers, funding agencies and external
   A work in progress
   Originally designed for developing countries,
    but equally applicable in industrialized

Special Evaluation Challenges in
Developing Countries
   Unavailability of needed data
   Scarce local evaluation resources
   Limited budgets for evaluations
   Institutional and political constraints
   Lack of an evaluation culture
   Many evaluations are designed by, and for,
    external funding agencies and seldom reflect
    local and national stakeholder priorities

Special Evaluation Challenges in
Developing Countries

 Despite these challenges, there is a
 growing demand for methodologically
 sound evaluations which assess the
 impacts, sustainability and replicability of
 development projects and programs

Most RealWorld Tools are not New—
Only the Integrated Approach is New

    Most of the RealWorld Evaluation data
     collection and analysis tools will be familiar to
     most evaluators
    What is new is the integrated approach
     which combines a wide range of tools to
     produce the best quality evaluation under
     real-world constraints

Who Uses RealWorld Evaluation
and When?
   Two main users:
     • Evaluation practitioners
     • Managers, funding agencies and external
   The evaluation may start at:
     • The beginning of the project
     • After the project is fully operational
     • During or near the end of project
     • After the project is finished
What is Special About the
RealWorld Evaluation Approach?

   There is a series of steps, each with
    checklists for identifying constraints and
    determining how to address them
   These steps are summarized on the following
    slide and then the more detailed flow-chart
                 (See page6of handout)

The Steps of the RealWorld
Evaluation Approach

Step 1: Planning and scoping the evaluation
Step 2: Addressing budget constraints
Step 3: Addressing time constraints
Step 4: Addressing data constraints
Step 5: Addressing political constraints
Step 6: Assessing and Addressing the strengths
   and weaknesses of the evaluation design
Step 7: Helping clients use the evaluation

                                           The Real-World Evaluation Approach
                                                   Step 1: Planning and scoping the evaluation
                    A. Defining client information needs and understanding the political context
                    B. Defining the program theory model
                    C. Identifying time, budget, data and political constraints to be addressed by the RWE
                    D. Selecting the design that best addresses client needs within the RWE constraints

            Step 2                               Step 3                                 Step 4                                  Step 5
     Addressing budget               Addressing time constraints             Addressing data constraints                Addressing political
         constraints              All Step 2 tools plus:                   A. Reconstructing baseline data                    influences
A. Modify evaluation design       F. Commissioning preparatory             B. Recreating comparison                A. Accommodating pressures
B. Rationalize data needs         studies                                  groups                                  from funding agencies or
C. Look for reliable secondary    G. Hire more resource persons            C. Working with non-equivalent          clients on evaluation design.
data                              H. Revising format of project            comparison groups                       B. Addressing stakeholder
D. Revise sample design           records to include critical data for     D. Collecting data on sensitive         methodological preferences.
E. Economical data collection     impact analysis.                         topics or from difficult to reach       C. Recognizing influence of
methods                           I. Modern data collection and            groups                                  professional research
                                  analysis technology                      E. Multiple methods                     paradigms.

                                 Step 6                                                                        Step 7
  Assessing and addressing the strengths and weaknesses of the
                          evaluation design                                                     Helping clients use the evaluation
An integrated checklist for multi-method designs                                 A. Utilization
A. Objectivity/confirmability                                                    B. Application
B. Replicability/dependability                                                   C. Orientation
C. Internal validity/credibility/authenticity                                    D. Action
D. External validity/transferability/fittingness
RealWorld Evaluation
Designing Evaluations under Budget,

Time, Data and Political Constraints

          Session 2.b
     The challenge of the
Attribution and counterfactuals

 How do we know if the observed changes in
 the project participants or communities
  •    income, health, attitudes, school attendance etc
 are due to the implementation of the project
  •    credit, water supply, transport vouchers, school
      construction etc
 or to other unrelated factors?
  •    changes in the economy, demographic movements,
      other development programs etc

The Counterfactual
   What would have been the condition of
    the project population at the time of the
    evaluation if the project had not taken

Where is the counterfactual?

After families had been living
  in a new housing project for
  3 years, a study found
  average household income
  had increased by an 50%

Does this show that housing is
  an effective way to raise

      Comparing the project with two
      possible comparison groups
m                       Project group. 50% increase

                                Scenario 1. 50% increase in
                               comparison group income. No
                                evidence of project impact
                             Scenario 2. No increase in
                             comparison group income.
250                         Potential evidence of project

          2000       2002
5 main evaluation strategies
for addressing the counterfactual

Randomized designs
I. True experimental designs
II. Randomized field designs
Quasi-experimental designs
III. Strong quasi-experimental designs
IV. Weaker quasi-experimental designs
Non-experimental designs.
V. No logically defensible counterfactual

       The best statistical design option in most field
       settings: Randomized or strong quasi-experimental
       evaluation designs

                                          T1           T2            T3
                                        Pre-test   Treatment       Post-
                                                    [project]       test
Subjects randomly      Project group         P1        X              P2
  assigned to the
project and control
 groups or control
  group selected
using statistical or   Control group         C1                       C2

                                                           Conditions of both
             Gain score [impact] = P2 – P1                  groups are not
                                                           controlled during
                                   C2– C1                     the project
         Control group and comparison group

   Control group = randomized allocation of
    subjects to project and non-treatment group
   Comparison group = separate procedure for
    sampling project and non-treatment groups

Reference sources for
randomized field trial designs
1. MIT Poverty Action Lab

2. Center for Global Development
“When will we ever learn?”

3. International Initiative for Impact Evaluation = 3ie

The limited use of strong
evaluation designs
   It is estimated that
    • Less only 5-10% of impact evaluations use a
        strong quasi-experimental design
    •   Significantly less than 5% use randomized
        control trials

Introductory small-group
 Introduce yourselves, including something
 about your experience in coordinating or
 conducting evaluations.

 In particular share experiences on the types
 of constraints you have faced when
 designing and conducting evaluations, and
 what you did to try to address those

RealWorld Evaluation
Designing Evaluations under Budget,

Time, Data and Political Constraints

       Session 4, Step #1

Step 1: Planning and Scoping the

   Understanding client information needs
   Defining the program theory model
   Preliminary identification of constraints to
    be addressed by the RealWorld

A. Understanding client information

Typical questions clients want answered:
 Is the project achieving its objectives?

 Are all sectors of the target population
 Are the results sustainable?

 Which contextual factors determine the
  degree of success or failure?

A. Understanding client information

A full understanding of client information
  needs can often reduce the types of
  information collected and the level of
  detail and rigor necessary.

However, this understanding could also
 increase the amount of information

B. Defining the program theory
All programs are based on a set of assumptions
   (hypothesis) about how the project’s
   interventions should lead to desired outcomes.
 Sometimes this is clearly spelled out in project
 Sometimes it is only implicit and the evaluator
   needs to help stakeholders articulate the
   hypothesis through a logic model.

B. Defining the program theory
   Defining and testing critical assumptions
    are a essential (but often ignored)
    elements of program theory models.

   The following is an example of a model
    to assess the impacts of microcredit on
    women’s social and economic

     Critical Hypothesis for a Gender-Inclusive
     Micro-Credit Program

   Outputs
     • If credit is available women will be willing and able to obtain loans
       and technical assistance.
   Short-term outcomes
     • If women obtain loans they will start income-generating activities.
     • Women will be able to control the use of loans and reimburse them.
   Medium/long-term impacts
     • Economic and social welfare of women and their families will
     • Increased women’s economic and social empowerment.
   Sustainability
     • Structural changes will lead to long-term impacts.
C. Determining appropriate (and
feasible) evaluation design

   Based on an understanding of client
    information needs, required level of rigor,
    and what is possible given the
    constraints, the evaluator and client
    need to determine what evaluation
    design is required and possible under
    the circumstances.

Let’s focus for a while on evaluation
design (a quick review)
1: Review different evaluation
  (experimental/research) designs
2: Develop criteria for determining appropriate
  Terms of Reference (ToR) for evaluating a
  project, given its own (planned or un-
  planned) evaluation design.
3: Defining levels of rigor
4: A life-of-project evaluation design

      An introduction to various evaluation designs
       Illustrating the need for quasi-experimental
        longitudinal time series evaluation design
          Project participants

                    Comparison group

 baseline                         end of project   post project
                                  evaluation       evaluation
scale of major impact indicator
  OK, let’s stop the action to
  identify each of the major
types of evaluation (research)
            design …

  … one at a time, beginning with the
       most rigorous design.

   First of all: the key to the traditional symbols:

      X = Intervention (treatment), I.e. what the
       project does in a community
      O = Observation event (e.g. baseline, mid-term
       evaluation, end-of-project evaluation)

      P (top row): Project participants
      C (bottom row): Comparison (control) group

Note: the RWE evaluation designs are laid out in Table 3 on page 46 of your handout

            Design #1: Longitudinal Quasi-experimental
       P1       X         P2        X      P3       P4
       C1                 C2               C3       C4

Project participants

                        Comparison group

    baseline           midterm    end of project   post project
                                  evaluation       evaluation
     Design #1+: Longitudinal Randomized Control Trial
       P1       X         P2            X       P3    P4
       C1                 C2                    C3    C4

Project participants
                         Research subjects
                         randomly assigned
                         either to project or
                         control group.

                        Control group

    baseline           midterm      end of project   post project
                                    evaluation       evaluation
              Design #2: Randomized Control Trial
       P1                    X                 P2
       C1                                      C2

Project participants
                        Research subjects
                        randomly assigned
                        either to project or
                        control group.

                       Control group

    baseline                       end of project
Design #3: Quasi-experimental (pre+post, with comparison)
       P1                   X             P2
       C1                                 C2

Project participants

                       Comparison group

    baseline                     end of project
                Design #7: Truncated Longitudinal
                X         P1        X      P2
                          C1               C2

Project participants

                        Comparison group

                       midterm    end of project
    Design #8: Pre+post of project; post-only comparison
       P1                X                P2

Project participants

                       Comparison group

    baseline                     end of project
     Design #9: Post-test only of project and comparison
                         X                P

Project participants

                       Comparison group

                                 end of project
       Design #10: Pre+post of project; no comparison
       P1              X             P2

Project participants

    baseline                 end of project
       Design #11: Post-test only of project participants
                       X               P

Project participants

                               end of project
     Some of the questions to consider as
     you customize an evaluation Terms of
     Reference (ToR):

1.    Who asked for the evaluation? (Who are
      the key stakeholders)?
2.    What are the key questions to be
3.    Will this be a formative or summative
4.    Will there be a next phase, or other
      projects designed based on the findings of
      this evaluation?
 Other questions to answer as
 you customize an evaluation
5.   What decisions will be made in response
     to the findings of this evaluation?
6.   What is the appropriate level of rigor?
7.   What is the scope / scale of the
     evaluation / evaluand (thing to be
8.   How much time will be needed /
9.   What financial resources are needed /
 Other questions to answer as
 you customize an evaluation
10.   Should the evaluation rely mainly on
      quantitative or qualitative methods?
11.   Should participatory methods be used?
12.   Can / should there be a household
13.   Who should be interviewed?
14.   Who should be involved in planning /
      implementing the evaluation?
15.   What are the most appropriate media
      for communicating the findings to
      different stakeholder audiences?        60
  Evaluation (research) design?     Resources available?
    Key questions?                      Time available?

Evaluand (what to evaluate)?                 Skills available?

  Appropriate level of rigor?     Evaluation FOR whom?

 Does this help, or just confuse things more? Who
   said evaluations (like life) would be easy?!! 61
Now, where were we?

Oh, yes, we’re ready for Steps 2 and
  3 of the RealWorld Evaluation

Let’s continue …

RealWorld Evaluation
Designing Evaluations under Budget,

Time, Data and Political Constraints

           Steps 2 + 3

Step 2: Addressing budget
A.   Clarifying client information needs
B.   Simplifying the evaluation design
C.   Look for reliable secondary data
D.   Review sample size
E.   Reducing costs of data collection and

2A: Simplifying the evaluation
   For quantitative evaluations it is possible
    to select among the most common
    evaluation designs (noting the trade-offs
    when using a simpler design).
   For qualitative evaluations the options
    will vary depending on the type of

2A (cont): Qualitative designs
   Depending upon the design, some of the
    options might include:
    • Reducing the number of units studied
        (communities, families, schools)
    •   Reducing the number of case studies or the
        duration and complexity of the cases.
    •   Reducing the duration or frequency of

2.B. Rationalize data needs

   Use information from Step 1 to identify
    client information needs
   Review all data collection instruments
    and cut out any questions not directly
    related to the objectives of the

2.C. Look for reliable
secondary sources
   Planning studies, project administrative
    records, government ministries, other
    NGOs, universities / research institutes,
    mass media.

2.C. Look for reliable
secondary sources, cont.
Assess the relevance and reliability of
  sources for the evaluation with respect
 Coverage of the target population

 Time period

 Relevance of the information collected

 Reliability and completeness of the data

 Potential biases
2.D. Seeking ways to reduce
sample size
Accepting a lower level of precision
  significantly reduces the required
  number of interviews:
 To test for a 5% change in proportions
  requires a maximum sample of 1086
 To test for a 10% change in proportions
  requires a maximum sample of up to 270

2.E. Reducing costs of data
collection and analysis
   Use self-administered questionnaires
   Reduce length and complexity of
   Use direct observation
   Obtain estimates from focus groups and
    community forums
   Key informants
   Participatory assessment methods
   Multi-methods and triangulation
Step 3: Addressing time
In addition to Step 2 methods:
 Reduce time pressures on external
    • Commission preparatory studies
    • Video conferences
   Hire more consultants/researchers
   Incorporate outcome indicators in project
    monitoring systems and documents
   Technology for data inputting/coding

Addressing time constraints

   It is important to distinguish between approaches
    that reduce the:
    a) duration in terms of time over the life of the
    project (e.g. from baseline to final evaluation over 5
    b) duration in terms of the time needed to undertake
    the actual evaluation study/studies (e.g. 6 weeks,
    whether completed in an intensive consecutive 6
    weeks or a cumulative total of 6 weeks periodically
    over the course of a year), and
    b) the level of effort (person-days, i.e. number of
    staff x total days required).

Addressing time constraints

Negotiate with the client to discuss questions such as the
1. What information is essential and what could be
   dropped or reduced?
2. How much precision and detail is required for the
   essential information? E.g. is it necessary to have
   separate estimates for each geographical region or
   sub-group or is a population average acceptable?
3. Is it necessary to analyze all project components and
   services or only the most important?
4. Is it possible to obtain additional resources (money,
   staff, computer access, vehicles etc) to speed up the
   data collection and analysis process?
RealWorld Evaluation
Designing Evaluations under Budget,

Time, Data and Political Constraints

          Session 5
       Addressing data
                                       Step 4 Addressing data constraints

                                                            Step 1
                                              Planning and Scoping the Evaluation

    Step 2                           Step 3                            Step 4
Addressing budget                 Addressing time
                                                                                                         Step 5
                                                                   Addressing data
   constraints                     constraints
                                                                                                   Addressing political
                                                                     constraints                       constraints

            Step 6 Assessing the strengths and weaknesses
                         of the evaluation

                                                                                             Step 4
                                                                              Addressing data constraints
                         Step 7 Strengthening the                             A. Reconstructing baseline data
                        Evaluation Design                                     B. Special challenges in working with
                                                                                 comparison groups.
                                                                              C. Collecting data on sensitive topics
                                                                              D. Collecting data on difficult to
                                                                                 reach groups
Two kinds of data constraints:

 1.   Reconstructing baseline
 2.   Special data issues for
      comparison groups

1. Reconstructing baseline
   conditions for project
   and comparison groups
   [see Table 10, p. 59]
               1. The importance
                  of baseline data
   Hard to assess change without data on pre-
    project conditions
   Post-test comparisons do not fully address:
    •   Selection bias: initial differences between participants
        and non-participants
         • Propensity score matching and instrumental variables
           partially addresses this
    •   Historical factors influencing outcomes that were
        assumed to have been caused by the project

1. Ways to reconstruct baseline
A.   Secondary data.
B.   Project records.
C.   Recall
D.   Key informants
E.   PRA and other participatory techniques
     such as timelines, and critical incidents
     to help establish the chronology of
     important changes in the community
1-A. Assessing the utility of
potential secondary data
   Reference period
   Population coverage
   Inclusion of required indicators
   Completeness
   Accuracy
   Free from bias

1-A. Using secondary data to
reconstruct baselines

   Census
   Surveys
   Project administrative data
   Agency reports
   Special studies by NGOs, donors
   University studies
   Mass media (newspapers, radio, TV)

1-A. Using secondary data to
reconstruct baselines

   Community organization records
   Notices in offices, community centers etc
   Posters
   Birth/death records
   Wills and documents concerning
   Private Sector data

1-B. Using project records
Types of data
 Feasibility/planning studies
 Application/registration forms
 Supervision reports
 MIS data
 Meeting reports
 Community and agency meeting minutes
 Progress reports
 Construction costs

    1-B. Assessing the reliability of
            project records
   Who collected the data and for what
   Were they collected for record-keeping or to
    influence policymakers or other groups?
   Do monitoring data only refer to project
    activities or do they also cover changes in
   Were the data intended exclusively for
    internal use? For use by a restricted group?
    Or for public use?
1-B. Assessing the reliability of
project records

   How accurate and complete are the
    data? Are there obvious gaps? Were
    these intentional or due to poor record-
   Potential biases with respect to the key
    indicators required for the impact

1-B. Working with the client to improve the
utility of project data for evaluation

   Collecting additional information on
    applicants or participants
   Ensure identification data is included and
   Ensure data organized in the way
    needed for evaluation [by community/
    types of service/ family rather than just
    individuals/ economic level etc]

1-C. Using recall to reconstruct
baseline data
   School attendance and time/cost of travel
   Sickness/use health facilities
   Income and expenditures
   Community/individual knowledge and skills
   Social cohesion/conflict
   Water usage/quality/cost
   Periods of stress
   Travel patterns

1-C. Where Knowledge about
Recall is Greatest
   Areas where most research has been
    done on the validity of recall
    • Income and expenditure surveys
    • Demographic data and fertility behavior
   Types of Questions
    • Yes/No; fact
    • Scaled
    • Easily related to major events
1-C. Limitations of recall

   Generally not reliable for precise
    quantitative data
   Sample selection bias
   Deliberate or unintentional distortion
   Few empirical studies (except on
    expenditure) to help adjust estimates.

1-C. Sources of bias in recall

   Who provides the information
   Under-estimation of small and routine expenditures
   “Telescoping” of recall concerning major expenditures.
   Distortion to conform to accepted behavior.
    •   Intentional
    •   Romanticizing the past
   Contextual factors:
    •   Time intervals used in question
    •   Respondents expectations of what interviewer wants to
   Implications for the interview protocol

1-C. Improving the validity of
   Conduct small studies to compare recall
    with survey or other findings.
   Ensure all groups interviewed
   Triangulation
   Link recall to important reference events
    • Elections
    • Drought/floods
    • Construction of road, school etc

1-D. Key informants
   Not just officials and high status people
   Everyone can be a key informant on
    their own situation:
    • Single mothers
    • Factory workers
    • Users of public transport
    • Sex-workers
    • Street children
1-D. Guidelines for key-
informant analysis
   Triangulation greatly enhances validity
    and understanding
   Include informants with different
    experiences and perspectives
   Understand how each informant fits into
    the picture.
   Employ multiple rounds if necessary
   Carefully manage ethical issues

1-E. PRA and related participatory

   PRA techniques collect data at the group
    or community [rather than individual]
   Can either seek to identify consensus or
    identify different perspectives.
   Risk of bias:
    • Only certain sectors of the community attend
    • Certain people dominate the discussion
1-E. Time-related PRA techniques
useful for reconstructing the past

   Time line
   Trend analysis
   Historical transect
   Seasonal diagram
   Daily activity schedule
   Participatory genealogy
   Dream map
   Critical incidents
1-E. Using PRA recall methods: seasonal calendars

Seasonal Calendar of Poverty Drawn by Villagers in Nyamira,
               Jan   Feb   Mar   April   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec

Light meals    OOO   OOO   O     O                                                 OO

Begging        OOO   OOO   O                                                       OOO
               OOO   OOO                                                           OO

Migration      OOO   OOO   OO    O       O     OO

Unemployment   OOO   OOO   OO
               OOO   OOO

Income                     O     OOO     OO    OOO   OOO   OOO   OOO   O     O     O
                                 O       OO    O     OOO   OOO   OO

Disease                    O     OOO     OO    OOO   OO    OOO   OO
                                 O       OO

Rainfall                   OO    OOO                       O     O     OO    OO    O
                           OO    O                                     O     O

Source: Rietbergen-McCracken and Narayan 1997
1-F. Issues in baseline
   Variations in reliability of recall.
   Memory distortion.
   Secondary data not easy to use
   Secondary data incomplete or unreliable.
   Key informants may distort the past

2. Reconstructing comparison
(control) groups

2. Ways to reconstruct control
   Judgmental matching of communities.
   When phased introduction of project
    services beneficiaries entering in later
    phases can be used as “pipeline” control
   Internal controls when different subjects
    receive different combinations and levels
    of services

2. Using propensity scores to
strengthen comparison groups
   Propensity score matching
   Rapid assessment studies can compare
    characteristics of project and control groups
    •   Observation
    •   Key informants
    •   Focus groups
    •   Secondary data
    •   Aerial photos and GIS data

2. Using propensity scores to
strengthen comparison groups
   Logistical regression (Logit) on project and
    comparison population to identify determinants
    of project participation
   Select “nearest neighbors” (usually around 5)
    from comparison group who most closely
    match a participant.
   Project impact = gain score = difference
    between project participant score and mean
    score for nearest neighbors.

Issues in reconstructing control
   Project areas often selected purposively and
    difficult to match.
   Differences between project and control
    groups - difficult to assess if outcomes due to
    project or to these initial differences.
   Lack of good data to select control groups
   Contamination
   Econometric methods cannot fully adjust for
    initial differences between the groups

   Bamberger, Rugh and Mabry (2006).
    RealWorld Evaluation. Chapter 5
   Kumar, S (2002). Methods for Community
    Participation. A complete guide for
   Patton, M.Q. (2002). Qualitative research
    and evaluation methods. Chapters 6 and 7.
   Roche, C. 1999. Impact assessment for
    development agencies. Chapter 5.
RealWorld Evaluation
Designing Evaluations under Budget,

Time, Data and Political Constraints

         Session 6
  Mixed-method evaluations
     It should NOT be a fight between pure

(verbiage alone)   OR       (numbers alone)

  Quantoid!                        Qualoid!

 “Your human
 interest story
sounds nice, but
let me show you
 the statistics.”

                          “Your numbers
 QUANTITATIVE            look impressive,
                          but let me tell
                          you the human
                          interest story.”
What’s needed is the right combination of

I. Mixed Method Designs
1. Quantitative data collection
   Structured surveys (household, farm,
    transport usage etc)
   Structured observation
   Anthropometric methods
   Aptitude and behavioral test

    1. Quantitative data collection methods
          Strengths and weaknesses

             Strengths                            Weaknesses
   Generalization                       Surveys cannot capture many
   Statistically representative          types of information
   Estimate magnitude and               Do not work for difficult to reach
    distribution of impacts               groups
   Clear documentation of               No analysis of context
    methods                              Survey situation may alienate
   Standardized approach                Long delay in obtaining results
   Statistical control of bias and      Data reduction loses information
    external factors

2. Qualitative data collection methods

                                                                Analysis of
    Interviewing             Observation                       Documents and
    Structured           Participant observation       Project documents
    Semi-structured      Structured observation        Published reports
    Unstructured         Unstructured observation      E-mail
    Focus groups         Photography and video         Legal documents:
    Community             recording                       •   birth and death certificates,
     interviews                                            •   property transfer documents
    PRA                                                   •   marriage certificates
    Audio recording                                   Posters
                                                       Decorations in the house
                                                       Clothing and gang insignia

2. Qualitative data collection methods

   The researcher’s perspective is an integral part of
    what is recorded about the social world
   Scientific detachment is not possible
   Meanings given to social phenomena and situations
    must be understood
   Programs cannot be studied independently of their
   Cause and effect cannot be defined and change
    must be studied holistically.

2. Qualitative data collection methods
Strengths and weaknesses

            Strengths                               Weaknesses
   Flexible to evolve                      Lack of clear design may
   Sampling focuses on high                 frustrate clients
    value subjects                          Lack of generalizability
   Holistic focus (“the big picture”)      Multiple perspectives -hard to
                                             reach consensus
   Multiple sources provide
                                            Individual factors not isolated.
    complex understanding
                                            Interpretive methods appear too
   Narrative more accessible to             subjective
   Triangulation strengthens
    validity of findings

3. Mixed method evaluation designs

   Combine the strengths of both QUANT and QUAL
   One approach ( QUANT or QUAL) is often
    dominant and the other complements it
   Can have both approaches equal but harder to
    design and manage.
   Can be used sequentially or concurrently

                                Determining appropriate precision and mix of multiple methods
                                        High rigor, high quality, more time & expense

                                                                                                Participatory --- Qualitative
Extractive --- Quantitative

                                                surveys                           Focus

                              measurements                                        Focus
                                                  HH                              Groups

                                    Low rigor, questionable quality, quick and cheap
      3. Mixed method evaluation designs
      How quantitative and qualitative methods
             complement each other

A. Broaden the conceptual framework
     • Combining theories from different disciplines:
     • Exploratory QUAL studies can help define framework
B. Combine generalizability with depth and context
     • Random subject selection ensures representativity and generalizability
     • Case studies, focus groups etc can help understand the characteristics of the
             different groups selected in the sample
C. Permit access to difficult to reach groups [QUAL]
     • PRA, focus groups, case studies, snowball samples, etc can be effective
             ways to reach women, ethnic minorities and other vulnerable groups
         •   Direct observation can provide information on groups difficult to interview.
             For example, informal sector and illegal economic activities
D. Enable Process analysis [QUAL]
     • Observation, focus groups and informal conversations are more effective for
             understanding group processes or interaction between people and public
             agencies, and studying the organization
         3. Mixed method evaluation designs
         How quantitative and qualitative methods
             complement each other (cont.)
D.   Analysis and control for underlying structural factors [QUANT]
     •   Sampling and statistical analysis can avoid misleading conclusions
     •   Propensity scores and multivariate analysis can statistically control for
         differences between project and control groups
     •   Meetings with women may suggest gender biases in local firms’ hiring
         practices; however,
     •   Using statistical analysis to control for years of education or experience
         may show there are no differences in hiring policies for workers with
         comparable qualifications
     •   Participants who volunteer to attend a focus group may be strongly in
         favor or opposed to a certain project, but
     •   A rapid sample survey may show that most community residents have
         different views
   3. Mixed method evaluation designs
  How quantitative and qualitative methods
      complement each other (cont.)

F. Triangulation and consistency checks
    •   Direct observation may identify inconsistencies in interview responses
    •   Examples:
        •   A family may say they are poor but observation shows they have new
            furniture, good clothes etc.
        •   A woman may say she has no source of income, but an early morning visit
            may show she operates an illegal beer brewing business

G. Broadening the interpretation of findings:
    •   Combining personal experience with “social facts”
    •   Statistical analysis frequently includes unexpected or interesting
        findings which cannot be explained through the statistics. Rapid
        follow-up visits may help explain the findings

3. Mixed method evaluation designs
     How quantitative and qualitative methods
         complement each other (cont.)

G. Interpreting findings
    • A QUANT survey of community water management in
       Indonesia found that with only one exception all village water
       supply was managed by women
    • Follow-up visits found that in the one exceptional village
       women managed a very profitable dairy farming business –
       so men were willing to manage water to allow women time to
       produce and sell dairy produce
       Source: Brown (2000)

Using Qualitative methods to improve
the Evaluation design and results

 Use recall to reconstruct pre-test situation
 Interviews with key informants to identify other changes
  in the community or in gender relations
 Interviews or focus groups with women and men to
   •   assess the effect of loans on gender relations within the
       household, such as
        • changes in control of resources and decision-making
   •   identify other important results or unintended consequences:
         • increase in women’s work load,
         • increase in incidence of gender-based or domestic violence

     Enough of our
presentations: it’s time for
  you (THE RealWorld
PEOPLE!) to get involved
Small group case study work
1.   Some of you are playing the role of
     evaluation consultants, others are clients
     coordinating the evaluation.
2.   Decide what your group will do to
     address the given constraints/
3.   Prepare to negotiate the ToR with the
     other group after lunch.

RealWorld Evaluation
Designing Evaluations under Budget,

Time, Data and Political Constraints

              Session 8
Identifying and addressing threats
  to the validity of the evaluation
      design and conclusions

                                             The Real World Evaluation [RWE] Approach

                                                                       Step 1
                                                         Planning and scoping the evaluation

                                                                                          Step 4
          Step 2                               Step 3                                                                             Step 5
                                                                                 Addressing data constraints
Addressing budget constraints         Addressing time constraints                                                        Addressing political influences

                                      Step 6
                        Strengthening the evaluation design
                                    and validity

                                                                                                              Step 6
                                      Step 7                                                   Strengthening the evaluation
                                Helping clients use the evaluation                             design and the validity of the conclusions

                                                                                               A. Identifying threats to validity of quasi-
                                                                                               experimental designs
                                                                                               B. Assessing the adequacy of qualitative
                                                                                               C. An integrated checklist for mixed-method
                                                                                               D. Addressing threats to quantitative
                                                                                               evaluation designs
                                                                                               E. Addressing threats to the adequacy of
                                                                                               qualitative designs
                                                                                               F. Addressing threats to mixed-method designs
Session outline

1.   What is validity and why does it
2.   General guidelines for assessing
3.   Additional threats to validity for
     quantitative evaluation designs
4.   Strategies for addressing threats to
1. What is validity and
why does it matter?
Defining validity
The degree to which the evaluation findings and
  recommendations are supported by:
 The conceptual framework describing how the
  project is supposed to achieve its objectives
 Statistical techniques (including sample design)

 How the project and the evaluation were
 The similarities between the project population and
  the wider population to which findings are

Importance of validity
Evaluations provide recommendations for
  future decisions and action. If the
  findings and interpretation are not valid:
 Programs which do not work may
  continue or even be expanded
 Good programs may be discontinued

 Priority target groups may not have
  access or benefit
RWE quality control goals
   The evaluator must achieve greatest possible
    methodological rigor within the limitations of a given
   Standards must be appropriate for different types of
   The evaluator must identify and control for
    methodological weaknesses in the evaluation design.
   The evaluation report must identify methodological
    weaknesses and how these affect generalization to
    broader populations.

2. General guidelines for assessing the
   validity of all evaluation designs
   [see Overview Handbook Appendix 1]

A.   Confirmability
B.   Reliability
C.   Credibility
D.   Transferability
E.   Utilization

A.    Confirmability
Are the conclusions drawn from the available evidence
  and is the research relatively free of researcher
A-1: Inadequate documentation of methods and
A-2: Is data presented to support the conclusions and
  are the conclusions consistent with the findings?
  [Compare the executive summary with the data in
  the main report]

B.    Reliability
Is the process of the study consistent, reasonably
   stable over time and across researchers and
B-2: Data was only collected from people who
   attended focus groups or community meetings
B-4: Were coding and quality checks made and
   did they show agreement?

C. Credibility

Are the findings credible to the people studied and to
     readers? Is there an authentic picture of what is being
C-1: Is there sufficient information to provide a credible
     description of the subjects or situations studied?
C-3: Was triangulation among methods and data sources
     systematically applied? Were findings generally
     consistent? What happened if they were not?

D. Transferability

Do the conclusions fit other contexts and how
    widely can they be generalized?
D-1: Are the characteristics of the sample
    described in enough detail to permit
    comparisons with other samples?
D-4: Does the report present enough detail for
    readers to assess potential transferability?

E. Utilization

Were findings useful to clients,
   researchers and communities studied?
E-1: Were findings intellectually and
   physically accessible to potential
E-3: Do the findings provide guidance for
   future action?

3. Additional threats to validity for
   Quasi-Experimental Designs [QED]
     [see Overview Handbook Appendix 1]

F.   Threats to statistical conclusion validity
     why inferences about statistical association between two variables
     (for example project intervention and outcome) may not be valid

G.   Threats to internal validity why assumptions that
     project interventions have caused observed outcomes may not be

H.   Threats to construct validity why selected
     indicators may not adequately describe the constructs and causal
     linkages in the evaluation model

I.   Threats to external validity why assumptions
     about the potential replicability of a project in other locations or with
     other groups may not be valid
F. Statistical conclusion validity
The statistical design and analysis may incorrectly
  assume that program interventions have contributed
  to the observed outputs.
   The wrong tests are used or they are
    applied/interpreted incorrectly
   Problems with sample design
   Measurement errors

G. Threats to internal validity
It may be incorrectly assumed that there is a
   causal relationship between project
   interventions and observed outputs.
   Unclear temporal sequence between the
    project and the observed outcomes.
   Need to control for external factors
   Effects of time
   Unreliable measures

Example of threat to internal
validity: The assumed causal model

                          Increases women’s
 Women join the village
 bank where they
 receive loans, learn
 skills and gain
                           Increases women’s
                               control over
 WHICH ………
                          household resources
An alternative causal model

                     Women who had taken
                      literacy training are
                        more likely to join
                         the village bank.
                     Their literacy and self-
  Some women            confidence makes
        had           them more effective       Women’s income and
 previously taken          entrepreneurs              control over
 literacy training                              household resources
 which increased                                   increased as a
     their self-                                 combined result of
 confidence and                                      literacy, self-
     work skills                                confidence and loans
H. Threats to construct validity
The indicators of outputs, impacts and contextual
  variables may not adequately describe and
  measure the constructs [hypotheses/concepts]
  on which the program theory is based.
 Indicators may not adequately measure key
 The program theory model and the interactions
  between stages of the model may not be
  adequately specified.
 Reactions to the experimental context are not
  well understood.

I. Threats to external validity
Assumptions about how the findings could be
  generalized to other contexts may not be valid.
 Some important characteristics of the project
  context may not be understood.
 Important characteristics of the project
  participants may not be understood.
 Seasonal and other cyclical effects may have
  been overlooked.

RealWorld Evaluation book
• Appendix 2 gives a worksheet for
assessing the quality and validity of an
evaluation design
• Appendix 3 provides worked examples

4. Addressing generic
threats to validity for
all evaluation designs
A. Confirmability

Example: Threat A-1: inadequate
  documentation of methods and procedures
Possible ways to address:
 Request the researchers to revise their
  documentation to explain more fully their
  methodology or to provide missing material.
 Rapid data collection methods (surveys, desk
  research, secondary data) to fill gaps

B. Reliability

Example: Threat B-4: data were not collected across
  the full range of appropriate settings, times,
  respondents etc.

Possible ways to address

   If the study has not yet been conducted revise the
    sample design or use qualitative methods to cover
    the missing settings, times or respondents.
   If data collection has already been completed
    consider using rapid assessment methods such as
    focus groups, interviews with key informants,
    participant observation etc to fill in some of the gaps.

C. Credibility
Example: Threat C-2: The account does not ring true
  and does not reflect the local context
Possible ways to address

   If the study has not yet been conducted, revise the
    sample design or use qualitative methods to cover the
    missing settings, times or respondents.
   If data collection has already been completed consider
    using rapid assessment methods such as focus groups,
    interviews with key informants, participant observation etc
    to fill in some of the gaps.

D. Transferability
Example: Threat D-3: Sample does not permit
  generalization to other populations
Possible ways to address:
   Organize workshops or consult key informants to assess
    whether the problems concern missing information,
    factual issues or how the material was interpreted by the

   Return to the field to fill in the gaps or include the
    impressions of key informants, focus group participants,
    or participant observers to provide different perspectives.

E. Utilization
Example: Threat E-2: The findings do not provide
  guidance for future action
Possible ways to address
 If the researchers have the necessary information, ask
  them to make their recommendations more explicit.
 It they do not have the information organize
  brainstorming sessions with community groups or the
  implementing agencies to develop more specific
  recommendations for action.

Lightning feedback

What are some of the most serious threats to
  validity affecting your evaluations?
 How can they be addressed?
Time for more discussion   158
Small group case study work,
1.   Evaluation ‘consultants’ meet with
     ‘clients’ working on same case study
     (1A+1B) and (2A+2B)
2.   Negotiate your proposed modification of
     the ToR in order to cope with the given
3.   Be prepared to summarize lessons
     learned from this exercise (and
In conclusion:
Evaluators must be prepared to:
1. Enter at a late stage in the project cycle;
2. Work under budget and time restrictions;
3. Not have access to comparative baseline
4. Not have access to identified comparison
5. Work with very few well qualified evaluation
6. Reconcile different evaluation paradigms and
   information needs of different stakeholders.
     Main workshop messages
1.   Evaluators must be prepared for real-world
     evaluation challenges
2.   There is considerable experience to draw on
3.   A toolkit of rapid and economical “RealWorld”
     evaluation techniques is available
4.   Never use time and budget constraints as an
     excuse for sloppy evaluation methodology
5.   A “threats to validity” checklist helps keep you
     honest by identifying potential weaknesses in
     your evaluation design and analysis

Shared By: