RealWorld Evaluation
Designing Evaluations under Budget, Time, Data and Political Constraints
By Jim Rugh
(based on joint work with Michael Bamberger and Linda Mabry)
AEA/CDC Summer Evaluation Institute
Atlanta, June 24, 2008 Advanced : Practicing the Approach
1
Workshop Objectives
1. Very brief summary of what‘s involved in the RealWorld Evaluation approach. 2. Share our own experiences in dealing with constraints of budget, time, data and political pressures as we have conducted evaluations. 3. Use a case study to practice applying some of the RWE criteria. 4. Take home ideas for how to apply these perspectives and methods in our own evaluation practice.
2
Proposed Workshop Agenda
Session 1. Introduction: objectives and agenda for today‘s workshop Session 2. Brief summary of the RWE approach Session 3. Initial small group work: sharing personal experiences Session 4. Using the RWE checklist to strengthen evaluations Session 5. Small group work on case studies Session 6. Pairs of groups meet together Session 7. Discussion, conclusions, evaluation of this workshop. Note: We are going on the assumption that you participated in yesterday’s introductory workshop and/or have at least read the RWE summary chapter, if not the book itself.
3
Designing Evaluations under Budget, Time, Data and Political Constraints
RealWorld Evaluation
BRIEF RECAP OF THE
RWE APPROACH
4
The RealWorld Evaluation Approach
An integrated approach to ensure acceptable standards of methodological rigor while operating under real-world budget, time, data and political constraints.
See handout summary chapter extracted from RealWorld Evaluation book for more details
5
The main RWE constraints
Inadequate budget for the kind of evaluation we think is necessary Not enough time to do what good evaluative practice requires Lack of needed data (baseline, counterfactual) Facing political pressures
How can we possibly do a good evaluation?
6
RealWorld Evaluation Scenarios
(click)
An adequate budget but lack of data No comparable baseline data and/or inability to include comparison group in evaluation design A limited budget but plenty of time Local evaluation teams may not have the resources to bring in foreign expertise or to conduct large scale sample surveys—but they may have plenty of time to use qualitative methods and small-scale longitudinal studies An adequate budget but limited time This is often the situation when external evaluators are contracted to work under tight deadlines and with limited time in the field
7
RealWorld Evaluation Scenarios
(click)
A ToR that calls for an independent, „objective‟ evaluation of impact … but pressure from clients to show positive results Typically expectations from key stakeholders that the findings of the evaluation will support continuing funding for the program. Preconceived ideas on what evaluation methodologies to use
Whether that be large-scale quantitative household surveys or a few case studies (written up as interesting human interest stories); can also be bias or preferences of evaluator(s).
8
The Steps of the RealWorld Evaluation Approach
Step 1: Planning and scoping the evaluation Step 2: Addressing budget constraints Step 3: Addressing time constraints Step 4: Addressing data constraints Step 5: Addressing political constraints Step 6: Assessing and addressing the strengths and weaknesses of the evaluation design Step 7: Communicating the findings to clients in useful ways, promoting utilization
See Figure 1, page 7 of summary chapter
9
The Real-World Evaluation Approach
Step 1: Planning and scoping the evaluation A. Defining client information needs and understanding the political context B. Defining the program theory model C. Identifying time, budget, data and political constraints to be addressed by the RWE D. Selecting the design that best addresses client needs within the RWE constraints
Step 2 Addressing budget constraints A. Modify evaluation design B. Rationalize data needs C. Look for reliable secondary data D. Revise sample design E. Economical data collection methods
Step 3 Addressing time constraints All Step 2 tools plus: F. Commissioning preparatory studies G. Hire more resource persons H. Revising format of project records to include critical data for impact analysis. I. Modern data collection and analysis technology
Step 4 Addressing data constraints A. Reconstructing baseline data B. Recreating comparison groups C. Working with non-equivalent comparison groups D. Collecting data on sensitive topics or from difficult to reach groups E. Multiple methods
Step 5 Addressing political influences A. Accommodating pressures from funding agencies or clients on evaluation design. B. Addressing stakeholder methodological preferences. C. Recognizing influence of professional research paradigms.
Step 6 Strengthening the evaluation design and the validity of the conclusions A. Identifying threats to validity of quasi-experimental designs B. Assessing the adequacy of qualitative designs C. An integrated checklist for multi-method designs D. Addressing threats to quantitative designs E. Addressing threats to the adequacy of qualitative designs F. Addressing threats to mix-method designs
Step 7 Helping clients utilize the evaluation A. Ensuring active participation of clients in the scoping phase B. Formative evaluation strategies C. Constant communication with all stakeholders throughout the evaluation process D. Evaluation capacity building E. Appropriate strategies for communicating findings F. Developing and monitoring the follow-up action plan
10
Step 1: Planning and Scoping the Evaluation
A. Understanding client information needs B. Defining the program theory model C. Preliminary identification of constraints to be addressed by the RealWorld Evaluation
11
B. Defining the Program Theory Model
All programs are based on a set of internal logic (hypothesis) about how the project‘s interventions should lead to desired outcomes.
Sometimes this is clearly spelled out in project documents. Sometimes it is only implicit and the evaluator needs to help stakeholders articulate the hypothesis through a logic model.
12
One form of Program Theory (Logic) Model
Economic context in which the project operates Political context in which the project operates Institutional and operational context
Diagnosis and Design
Inputs
Implementation Process
Outputs
Outcomes
Impacts
Sustainability
Socio-economic and cultural characteristics of the affected populations
Note: The orange boxes are included in conventional Program Theory Models. The addition of the blue boxes provides the more complete analysis recommended for RWE.
13
Reduction in poverty
Women empowered
Women in leadership roles
Women able to reimburse loans
Women educated
Women achieve rights within household S&L groups organized
Credit provided to entrepreneurs MFI provides credit
Improved economic conditions Training of agents
(see Table 3 on page 50 of the summary chapter)
1. 2.
The 7 RWE evaluation designs
3.
4.
5.
6.
7.
Longitudinal quasi-experimental Simple quasi-experimental (pre- and post-test of project and comparison groups) Truncated longitudinal design (midterm but no baseline) Pre+post-test project group, post-test only of comparison group Post-test only of project and comparison group (no baseline) Pre+post-test of project group (no control) Post-test only of project group (no baseline, no comparison group)
Design #1: Longitudinal Quasi-experimental P1 C1
Project participants
X
P2 C2
X
P3 C3
P4 C4
Comparison group
baseline
midterm
end of project evaluation
post project evaluation
16
Design #1+: Longitudinal Randomized Control Trial P1 C1
Project participants Research subjects randomly assigned either to project or control group.
X
P2 C2
X
P3 C3
P4 C4
Control group
baseline
midterm
end of project evaluation
post project evaluation
17
17
Design #2+: Randomized Control Trial P1 C1
Project participants Research subjects randomly assigned either to project or control group.
X
P2 C2
Control group
baseline
end of project evaluation
18
18
Design #2: Quasi-experimental (pre+post, with ‘control’) P1 C1
Project participants
X
P2 C2
Comparison group
baseline
end of project evaluation
19
19
Design #3: Truncated Longitudinal X P1 C1
Project participants
X
P2 C2
Comparison group
midterm
end of project evaluation
20
Design #4: Pre+post of project; post-only comparison P1 X P2 C
Project participants
Comparison group
baseline
end of project evaluation
21
Design #5: Post-test only of project and comparison X P C
Project participants
Comparison group
end of project evaluation
22
22
Design #6: Pre+post of project; no comparison P1 X P2
Project participants
baseline
end of project evaluation
23
Design #7: Post-test only of project participants X P
Project participants
end of project evaluation
24
What else is missing in those examples of evaluation designs?
Assumes one clearly defined, quantifiable ‗impact‘ indicator agreeable to all key stakeholders Does not examine the process (‗X‘) itself, treating the intervention as a given Does not recognize multiple actors and changes in external circumstances Does not construct the logic model to describe the project‘s cause-effect hypothesis
25
Step 2: Addressing budget constraints
A. B. C. D. E.
Clarifying client information needs Simplifying the evaluation design Look for reliable secondary data Review sample size Reducing costs of data collection and analysis
26
2B, cont. Simplifying the evaluation design
Depending upon the design, some of the options might include:
• Reducing the number of units studied
•
•
(communities, families, schools) Reducing the number of case studies or the duration and complexity of the cases Reducing the duration or frequency of observations
27
2C: Look for reliable secondary sources
Planning studies, project administrative records, government ministries, other NGOs, universities / research institutes, mass media.
28
2E: Reducing costs of data collection and analysis
Use self-administered questionnaires Reduce length and complexity of instrument Use direct observation Obtain estimates from focus groups and community forums Key informants Participatory assessment methods Multi-methods and triangulation
29
Step 3: Addressing time constraints
In addition to Step 2 methods: Reduce time pressures on external consultants:
• Commission preparatory studies • Video conferences
Hire more consultants/researchers. Incorporate outcome indicators in project monitoring systems and documents. Technology for data inputting/coding.
30
Addressing time constraints
Negotiate with the client to discuss questions such as the following:
1.
What information is essential and what could be dropped or reduced?
How much precision and detail is required for the essential information? E.g. is it necessary to have separate estimates for each geographical region or subgroup or is a population average acceptable? Is it necessary to analyze all project components and services or only the most important?
2.
3.
4.
Is it possible to obtain additional resources (money, staff, computer access, vehicles etc) to speed up the data collection and analysis process?
31
Step 4: Addressing data constraints
Lack of baseline data on project population Lack of comparison groups Statistical problems with non-equivalent comparison groups Collecting data on sensitive topics or from inaccessible groups
32
Ways to reconstruct baseline conditions
Use of secondary data Project records Recall Key informants PRA (Participatory Rapid Appraisal) or PLA (Participatory Learning and Action) and other participatory techniques such as timelines, and critical incidents to help establish the chronology of important changes in the community
33
Recall
Generally not reliable for precise quantitative data Useful for recalling major events or the impacts of a new service where none existed before Have you used recall? Give examples.
34
Improving the validity of recall
Conduct small studies to compare recall with survey or other findings. Triangulation Link recall to important reference events Define the context
• Clarify expectations • Define time period
35
Reconstructing comparison groups
Judgmental matching of communities When phased introduction of project services, subsequent beneficiaries can be used as comparison (―rolling baseline‖) Internal controls when different subjects receive different combinations and levels of services
37
Problems of non-equivalency of comparison groups
Differences in the characteristics of project and comparison groups makes it difficult to assess if outcome differences are due to project or to these initial differences Econometric methods used to control for differences between project and comparison groups in final evaluation (Model 5) cannot adjust for initial differences between the groups
38
Collecting difficult to obtain information
A.
B.
Collecting information on sensitive topics Collecting information on difficult to reach groups
39
Step 5: Addressing political influences on the evaluation design
• Accommodating pressures from funding
agencies and clients on the evaluation design • Addressing stakeholder methodological preferences • Recognizing the influence of professional research paradigms
40
Coping with political issues
An evaluator may consider: one agrees to perform.
• Preliminary negotiation or selection of the work
• Developing a contract that specifies sufficient • Consider the public good - think in terms of
resources and the client's obligation to ensure access to crucial or sensitive data.
broad stakeholder groups and right-to-know and need-to-know audiences.
41
Coping with political issues
Program managers representing the donor or implementing agency are generally evaluators' main contacts, their clients; what Michael Quinn Patton calls primary users. This group of stakeholders makes program decisions, formalizes procedures, allocates resources. To promote utility and formative use of evaluation for program improvement, some approaches explicitly prioritize the provision of data and findings to assist these stakeholders with programmatic decision-making
42
Coping with political issues
Evaluators:
• May need to cite the Program Evaluation
Standards: "the full set of evaluation findings along with pertinent limitations are made accessible to the persons affected by the evaluation and any others with expressed legal rights to receive the results" exercise sharp alertness throughout the course of a project.
43
• Should expect political dimensions and
That’s enough of a recap of the introduction to the RWE approach
We‘ll look at Steps 6 and 7 in more detail later But first let‘s pause to see if there are any questions so far about Steps 1-5 And then do some small-group introductory work
44
TIME FOR DISCUSSION
45
45
Small group introductions
Let‟s get acquainted and share our own practical experiences: 1. Have you faced these or other constraints in your evaluative practice? 2. If so, how did you handle them? 3. Summarize common major constraints addressed by members of your group.
46
Designing Evaluations under Budget, Time, Data and Political Constraints
RealWorld Evaluation
Step 6
STRENGTHENING WEAKNESSES IN THE EVALUATION DESIGN
47
Step 6: Assessing threats to the validity and adequacy of evaluation conclusions
1. 2. 3.
Threats to statistical conclusion validity
(why
inferences about causality may be incorrect)
Threats to internal validity (why assumptions
that the project has caused impacts may be incorrect)
Threats to construct validity (why indicators
may not adequately describe the constructs in the evaluation model)
4.
Threats to external validity (why
assumptions about the generalizability of the pilot project may be incorrect)
48
Rapid evaluation is not an excuse for sloppy methodology
Assess trade-offs relating to: Sample design and selection Specification of the evaluation model and constructs Instrument development Documentation of the data collection process Controlling researcher influence over subjects
49
Rapid evaluation is not an excuse for sloppy methodology
As a result, generalizations are often made from the findings that are not justified.
Results in sloppy research and has discredited rapid, qualitative evaluation in some circles.
50
This is not just a weakness of qualitative research
Quantitative researchers who work “creatively” with limited data sets can also be criticized: Inadequate proxy indicators Not recognizing the limitations of non-randomly selected comparison groups No control for effects of history when using crosssectional ex-post data Methods for reconstructing baseline data ignored
51
RWE quality control goals
Greatest possible methodological rigor within the limitations of a given context. Appropriate standards for different types of evaluation. Identify and control for methodological weaknesses in the evaluation design. Report identifies methodological weaknesses and how affects generalization to broader populations.
52
Common RWE issues concerning statistical conclusion validity
Sample too small to identify statistical significance Unreliability of measures weakens tests Restriction of range: key issue as most development projects are targeted to poor Unreliability of treatment implementation. A common problem
53
Threats to internal validity
It may be incorrectly assumed that there is a causal relationship between project interventions and observed outputs Unclear temporal sequence between the project and the observed outcomes Unreliable measures
54
Common RWE issues concerning threats to construct validity
1.
2.
3.
4.
Inadequate explanation of constructs: definitions often vague (e.g. well-being, empowerment, improved health) Construct confounding: need to define program interventions more clearly and how they are administered. Reactive self-reports: subjects may have incentive to misreport or may not have full knowledge Treatment diffusion: comparison groups, or people not involved in study at all, often receive some benefits, even if indirectly
55
Parallel criteria for quantitative and qualitative evaluations
Quantitative (PostPositivist) criteria
Objectivity Reliability Internal validity External validity
Qualitative criteria Confirmability Dependability Credibility Transferability
Utilization
Utilization
56
A. Objectivity/confirmability
Example: inadequate documentation of methods and procedures
Review documentation to ensure it fully explains the methodology and/or provides missing material.
57
B. Reliability/dependability
Example: Reliability/dependability. The findings do not seem to be true and do not reflect local context Possible solutions:
Organize workshops or consult key informants to determine whether problem concerns: • missing information (not all groups interviewed) • factual issues • how the findings were interpreted
58
B. Reliability/dependability Example: data were not collected across the full range of appropriate settings, times, respondents etc.
Study not yet conducted: consider revising the sample design or use qualitative methods to cover the missing settings, times or respondents. Data collection completed: consider using rapid assessment methods such as focus groups, interviews with key informants, participant observation, etc., to fill in some of the gaps.
59
C. Internal validity/ credibility/ authenticity Example: The findings do not seem to be true and do not reflect the local context
If stakeholders question the evaluation findings:
Seek out individuals or groups of appropriately qualified key informants to determine whether there was missing information (e.g. only men interviewed), or whether there were problems in how the evaluators interpreted the data.
60
D. External validity / transferability / ‘generalizable’
Example: Sample does not permit generalization to other populations
Obtain comparable secondary data on other communities or national statistics
Determine how unique or similar the target communities are in terms of socio-economic and other characteristics Make a judgment as to whether or not the scope, expense and type of interventions used by the project could feasibly be replicated elsewhere
61
E. Utilization / application / action orientation
Example: The findings do not provide guidance for future action
If the evaluators have the necessary information: ask them to make their recommendations more explicit and practical. If evaluators do not have the information: organize brainstorming sessions with community groups or the implementing agencies to develop more specific recommendations for action.
62
Mixed methods
Have you noticed how often the recommended solutions for coping with RWE constraints have involved the use of a combination of methods?
63
Designing Evaluations under Budget, Time, Data and Political Constraints
RealWorld Evaluation
Step 7
PROMOTING THE UTILIZATION OF THE EVALUATION’S FINDINGS AND RECOMMENDATIONS
64
Utilization of evaluation
Were findings useful to clients, researchers and communities studied? Examples: Were findings intellectually and physically accessible to potential users? Do the findings provide guidance for future action? Is there evidence that the evaluation influenced policy decisions?
65
The RWE Threats to Validity Checklist can be useful for this purpose
For the evaluation team to check on the quality and utility of their work To help communicate with the clients regarding the methodology and findings, including understanding the credibility and generalizability of the results, and what could be done to strengthen them
66
When to use the checklist
When the TOR are issued
•
To alert consultants to the criteria used to assess their proposals and their work to assess the strengths and weaknesses of the proposed evaluation design to identify and correct problems
At the start of the evaluation
•
•
During the implementation of the evaluation
When the near-final evaluation report is presented to clients
67
When to use the checklist… continued
When the draft evaluation report is presented
time to make some corrections
• To identify validity problems when there is still • To assess the credibility of the findings
When the evaluation is completed
68
Who can use the checklist?
Evaluation practitioners Clients and other users of the evaluation Funding agencies
69
How to use the threats to validity checklists
How to use the checklist 1. The cover sheet
• Use as a summary for management
• Reasons for conducting the evaluation • Evaluation design used (and why) • Methodologies used (and why) • Findings • Recommended follow-up actions.
70
How to use the checklist (continued)
2.
Summary assessment for each component Use for middle-level management and people who need a little more detail Summary of findings and recommendations [If required] overall rating of the methodological soundness of this component [5 point scale] [If required] indicates how many of the indicators in the corresponding appendix were rated as having problems or serious problems
71
How to use the checklist (continued)
3. Detailed checklists for each of the 7 dimensions Provides more detailed technical assessment [if required] General components on this component [can be as long and detailed as required] Rating on 5 point scale of each indicator Some agencies may prefer just to check the indicators that have problems – rather than using the rating scale
72
Enough of my presentations: it‘s time for you (THE RealWorld PEOPLE!) to get involved yourselves.
Small group case study work
1.
2.
3.
Some of you are playing the role of evaluation consultants, others are clients coordinating the evaluation. Decide what your group will do to address the given constraints/ challenges. Prepare to negotiate the ToR with the other group.
74
In conclusion:
Evaluators must be prepared to: 1. Enter at a late stage in the project cycle; 2. Work under budget and time restrictions; 3. Not have access to comparative baseline data; 4. Not have access to identified comparison groups; 5. Work with very few well qualified evaluation researchers; 6. Reconcile different evaluation paradigms and information needs of different stakeholders.
75
Main workshop messages
1.
2. 3.
4.
5.
Evaluators must be prepared for real-world evaluation challenges There is considerable experience to draw on A toolkit of rapid and economical “RealWorld” evaluation techniques is available Never use time and budget constraints as an excuse for sloppy evaluation methodology A “threats to validity” checklist helps keep you honest by identifying potential weaknesses in your evaluation design and analysis
76
77
77