VIEWS: 15 PAGES: 155 POSTED ON: 11/16/2012
RealWorld Evaluation Designing Evaluations under Budget, Time, Data and Political Constraints American Evaluation Association Professional pre-session workshop Denver November 5, 2008 Facilitated by: Michael Bamberger and Jim Rugh 1 Workshop Objectives 1. The seven steps of the RealWorld Evaluation approach for addressing common issues and constraints faced by evaluators such as: when the evaluator is not called in until the project is nearly completed and there was no baseline nor comparison group; or where the evaluation must be conducted with inadequate budget and insufficient time; and where there are political pressures and expectations for how the evaluation should be conducted and what the conclusions should say 2 Workshop Objectives 2. Identifying and assessing various design options that could be used in a particular evaluation setting 3. Ways to reconstruct baseline data when the evaluation does not begin until the project is well advanced or completed. 4. How to identify and to address threats to the validity or adequacy of quantitative, qualitative and mixed methods designs with reference to the specific context of RealWorld evaluations 3 Workshop Objectives Note: Given time constraints the workshop will focus on project-level impact evaluations. However, if the results of a pre- workshop survey of participants call for it, a brief introduction to the application of RWE techniques in other forms of evaluation, including the assessment of country programs and policy interventions, could be included. 4 Workshop agenda 8.00 – 8.20: Session 1: Introduction: • Workshop objectives • Feedback from participant survey • Handout: RealWorld Evaluation Overview (summary chapter of book) 8.20 – 8.50: Session 2: RealWorld Evaluation overview and addressing the counterfactual • Handout: “Why evaluators can’t sleep at night” 8.50 - 9.20: Session 3: Small Group discussions. • Participants will introduce themselves and then share experiences on the types of constraints they have faced when designing and conducting evaluations, and what they did to try to address those constraints. 9:20-10.00: Session 4: RWE Steps 1, 2 and 3: Scoping the evaluation and strategies for addressing budget and time constraints • Presentation and discussion 10.00-10.15: BREAK Workshop agenda, cont. 10:15 – 10:45: Session 5: RWE Step 4: Addressing data constraints • Presentation and discussion 10.45 – 11.15: Session 6: Mixed methods • Presentation and discussion 11.15 – 12.00: Session 7: Small groups read their case studies and begin to discuss the learning exercise. • We will use a low-cost housing case study. All four groups will discuss the same project but from different perspectives. 12:00 – 1:00: LUNCH 2.10 – 1.45: Session 8: Identifying and addressing threats to the validity of the evaluation design and conclusions 1.45 – 2.30: Session 9: Small groups complete exercise. • Negotiate with your paired group how you propose to modify the ToR of your case study. 2.30 – 2.45: Session 10: Feedback from exercise. • Discussion of lessons learned from the case study or the RealWorld Evaluation approach in general. 2.45 – 3.00: Session 11: Wrap up and workshop evaluation RealWorld Evaluation Designing Evaluations under Budget, Time, Data and Political Constraints Session 2. a OVERVIEW OF THE RWE APPROACH 7 RealWorld Evaluation Scenarios Scenario 1: Evaluator(s) not brought in until near end of project For political, technical or budget reasons: • There was no baseline survey • Project implementers did not collect adequate data on project participants at the beginning or during the life of the project • It is difficult to collect data on comparable control groups 8 RealWorld Evaluation Scenarios Scenario 2: The evaluation team is called in early in the life of the project But for budget, political or methodological reasons: The ‘baseline’ was a needs assessment, not comparable to eventual evaluation It was not possible to collect baseline data on a comparison group 9 Reality Check – Real-World Challenges to Evaluation • All too often, project designers do not think evaluatively – evaluation not designed until the end • There was no baseline – at least not one with data comparable to evaluation • There was/can be no control/comparison group. • Limited time and resources for evaluation • Clients have prior expectations for what the evaluation findings will say • Many stakeholders do not understand evaluation; distrust the process; or even see it as a threat (dislike of being judged) 11 RealWorld Evaluation Quality Control Goals Achieve maximum possible evaluation rigor within the limitations of a given context Identify and control for methodological weaknesses in the evaluation design Negotiate with clients trade-offs between desired rigor and available resources Presentation of findings must recognize methodological weaknesses and how they affect generalization to broader populations 12 The Need for the RealWorld Evaluation Approach As a result of these kinds of constraints, many of the basic principles of impact evaluation design (comparable pre-test-post test design, comparison group, instrument development and testing, random sample selection, control for researcher bias, thorough documentation of the evaluation methodology etc.) are often sacrificed. 13 The RealWorld Evaluation Approach An integrated approach to ensure acceptable standards of methodological rigor while operating under real-world budget, time, data and political constraints. See handout summary chapter extracted from RealWorld Evaluation book for more details 15 The RealWorld Evaluation approach Developed to help evaluation practitioners and clients • managers, funding agencies and external consultants A work in progress Originally designed for developing countries, but equally applicable in industrialized nations 16 Special Evaluation Challenges in Developing Countries Unavailability of needed data Scarce local evaluation resources Limited budgets for evaluations Institutional and political constraints Lack of an evaluation culture Many evaluations are designed by, and for, external funding agencies and seldom reflect local and national stakeholder priorities 17 Special Evaluation Challenges in Developing Countries Despite these challenges, there is a growing demand for methodologically sound evaluations which assess the impacts, sustainability and replicability of development projects and programs ……………………. 18 Most RealWorld Tools are not New— Only the Integrated Approach is New Most of the RealWorld Evaluation data collection and analysis tools will be familiar to most evaluators What is new is the integrated approach which combines a wide range of tools to produce the best quality evaluation under real-world constraints 19 Who Uses RealWorld Evaluation and When? Two main users: • Evaluation practitioners • Managers, funding agencies and external consultants The evaluation may start at: • The beginning of the project • After the project is fully operational • During or near the end of project implementation • After the project is finished 21 What is Special About the RealWorld Evaluation Approach? There is a series of steps, each with checklists for identifying constraints and determining how to address them These steps are summarized on the following slide and then the more detailed flow-chart … (See page6of handout) 22 The Steps of the RealWorld Evaluation Approach Step 1: Planning and scoping the evaluation Step 2: Addressing budget constraints Step 3: Addressing time constraints Step 4: Addressing data constraints Step 5: Addressing political constraints Step 6: Assessing and Addressing the strengths and weaknesses of the evaluation design Step 7: Helping clients use the evaluation 23 The Real-World Evaluation Approach Step 1: Planning and scoping the evaluation A. Defining client information needs and understanding the political context B. Defining the program theory model C. Identifying time, budget, data and political constraints to be addressed by the RWE D. Selecting the design that best addresses client needs within the RWE constraints Step 2 Step 3 Step 4 Step 5 Addressing budget Addressing time constraints Addressing data constraints Addressing political constraints All Step 2 tools plus: A. Reconstructing baseline data influences A. Modify evaluation design F. Commissioning preparatory B. Recreating comparison A. Accommodating pressures B. Rationalize data needs studies groups from funding agencies or C. Look for reliable secondary G. Hire more resource persons C. Working with non-equivalent clients on evaluation design. data H. Revising format of project comparison groups B. Addressing stakeholder D. Revise sample design records to include critical data for D. Collecting data on sensitive methodological preferences. E. Economical data collection impact analysis. topics or from difficult to reach C. Recognizing influence of methods I. Modern data collection and groups professional research analysis technology E. Multiple methods paradigms. Step 6 Step 7 Assessing and addressing the strengths and weaknesses of the evaluation design Helping clients use the evaluation An integrated checklist for multi-method designs A. Utilization A. Objectivity/confirmability B. Application B. Replicability/dependability C. Orientation C. Internal validity/credibility/authenticity D. Action D. External validity/transferability/fittingness 24 RealWorld Evaluation Designing Evaluations under Budget, Time, Data and Political Constraints Session 2.b The challenge of the counterfactual Attribution and counterfactuals How do we know if the observed changes in the project participants or communities • income, health, attitudes, school attendance etc are due to the implementation of the project • credit, water supply, transport vouchers, school construction etc or to other unrelated factors? • changes in the economy, demographic movements, other development programs etc 26 The Counterfactual What would have been the condition of the project population at the time of the evaluation if the project had not taken place? 27 Where is the counterfactual? After families had been living in a new housing project for 3 years, a study found average household income had increased by an 50% Does this show that housing is an effective way to raise income? 28 Comparing the project with two possible comparison groups I n c o m Project group. 50% increase e 750 Scenario 1. 50% increase in comparison group income. No evidence of project impact 500 Scenario 2. No increase in comparison group income. 250 Potential evidence of project impact 2000 2002 5 main evaluation strategies for addressing the counterfactual Randomized designs I. True experimental designs II. Randomized field designs Quasi-experimental designs III. Strong quasi-experimental designs IV. Weaker quasi-experimental designs Non-experimental designs. V. No logically defensible counterfactual 30 The best statistical design option in most field settings: Randomized or strong quasi-experimental evaluation designs T1 T2 T3 Pre-test Treatment Post- [project] test Subjects randomly Project group P1 X P2 assigned to the project and control groups or control group selected using statistical or Control group C1 C2 judgmental matching Conditions of both Gain score [impact] = P2 – P1 groups are not controlled during C2– C1 the project Control group and comparison group Control group = randomized allocation of subjects to project and non-treatment group Comparison group = separate procedure for sampling project and non-treatment groups 32 Reference sources for randomized field trial designs 1. MIT Poverty Action Lab www.povertyactionlab.org 2. Center for Global Development “When will we ever learn?” http://www.cgdev.org/content/publications/detail/7973 3. International Initiative for Impact Evaluation = 3ie http://www.3ieimpact.org/ 33 The limited use of strong evaluation designs It is estimated that • Less only 5-10% of impact evaluations use a strong quasi-experimental design • Significantly less than 5% use randomized control trials 34 TIME FOR DISCUSSION 35 Introductory small-group discussions Introduce yourselves, including something about your experience in coordinating or conducting evaluations. In particular share experiences on the types of constraints you have faced when designing and conducting evaluations, and what you did to try to address those constraints. 36 RealWorld Evaluation Designing Evaluations under Budget, Time, Data and Political Constraints Session 4, Step #1 PLANNING AND SCOPING THE EVALUATION 37 Step 1: Planning and Scoping the Evaluation Understanding client information needs Defining the program theory model Preliminary identification of constraints to be addressed by the RealWorld Evaluation 38 A. Understanding client information needs Typical questions clients want answered: Is the project achieving its objectives? Are all sectors of the target population benefiting? Are the results sustainable? Which contextual factors determine the degree of success or failure? 39 A. Understanding client information needs A full understanding of client information needs can often reduce the types of information collected and the level of detail and rigor necessary. However, this understanding could also increase the amount of information required! 40 B. Defining the program theory model All programs are based on a set of assumptions (hypothesis) about how the project’s interventions should lead to desired outcomes. Sometimes this is clearly spelled out in project documents. Sometimes it is only implicit and the evaluator needs to help stakeholders articulate the hypothesis through a logic model. 41 B. Defining the program theory model Defining and testing critical assumptions are a essential (but often ignored) elements of program theory models. The following is an example of a model to assess the impacts of microcredit on women’s social and economic empowerment 42 Critical Hypothesis for a Gender-Inclusive Micro-Credit Program Outputs • If credit is available women will be willing and able to obtain loans and technical assistance. Short-term outcomes • If women obtain loans they will start income-generating activities. • Women will be able to control the use of loans and reimburse them. Medium/long-term impacts • Economic and social welfare of women and their families will improve. • Increased women’s economic and social empowerment. Sustainability • Structural changes will lead to long-term impacts. 43 C. Determining appropriate (and feasible) evaluation design Based on an understanding of client information needs, required level of rigor, and what is possible given the constraints, the evaluator and client need to determine what evaluation design is required and possible under the circumstances. 44 Let’s focus for a while on evaluation design (a quick review) 1: Review different evaluation (experimental/research) designs 2: Develop criteria for determining appropriate Terms of Reference (ToR) for evaluating a project, given its own (planned or un- planned) evaluation design. 3: Defining levels of rigor 4: A life-of-project evaluation design perspective. 45 45 An introduction to various evaluation designs Illustrating the need for quasi-experimental longitudinal time series evaluation design Project participants Comparison group baseline end of project post project evaluation evaluation scale of major impact indicator 46 OK, let’s stop the action to identify each of the major types of evaluation (research) design … … one at a time, beginning with the most rigorous design. 47 First of all: the key to the traditional symbols: X = Intervention (treatment), I.e. what the project does in a community O = Observation event (e.g. baseline, mid-term evaluation, end-of-project evaluation) P (top row): Project participants C (bottom row): Comparison (control) group Note: the RWE evaluation designs are laid out in Table 3 on page 46 of your handout 48 Design #1: Longitudinal Quasi-experimental P1 X P2 X P3 P4 C1 C2 C3 C4 Project participants Comparison group baseline midterm end of project post project evaluation evaluation 49 Design #1+: Longitudinal Randomized Control Trial P1 X P2 X P3 P4 C1 C2 C3 C4 Project participants Research subjects randomly assigned either to project or control group. Control group baseline midterm end of project post project evaluation evaluation 50 Design #2: Randomized Control Trial P1 X P2 C1 C2 Project participants Research subjects randomly assigned either to project or control group. Control group baseline end of project evaluation 51 Design #3: Quasi-experimental (pre+post, with comparison) P1 X P2 C1 C2 Project participants Comparison group baseline end of project evaluation 52 Design #7: Truncated Longitudinal X P1 X P2 C1 C2 Project participants Comparison group midterm end of project evaluation 53 Design #8: Pre+post of project; post-only comparison P1 X P2 C Project participants Comparison group baseline end of project evaluation 54 Design #9: Post-test only of project and comparison X P C Project participants Comparison group end of project evaluation 55 Design #10: Pre+post of project; no comparison P1 X P2 Project participants baseline end of project evaluation 56 Design #11: Post-test only of project participants X P Project participants end of project evaluation 57 Some of the questions to consider as you customize an evaluation Terms of Reference (ToR): 1. Who asked for the evaluation? (Who are the key stakeholders)? 2. What are the key questions to be answered? 3. Will this be a formative or summative evaluation? 4. Will there be a next phase, or other projects designed based on the findings of this evaluation? 58 Other questions to answer as you customize an evaluation ToR: 5. What decisions will be made in response to the findings of this evaluation? 6. What is the appropriate level of rigor? 7. What is the scope / scale of the evaluation / evaluand (thing to be evaluated)? 8. How much time will be needed / available? 9. What financial resources are needed / available? 59 Other questions to answer as you customize an evaluation ToR: 10. Should the evaluation rely mainly on quantitative or qualitative methods? 11. Should participatory methods be used? 12. Can / should there be a household survey? 13. Who should be interviewed? 14. Who should be involved in planning / implementing the evaluation? 15. What are the most appropriate media for communicating the findings to different stakeholder audiences? 60 Evaluation (research) design? Resources available? Key questions? Time available? Evaluand (what to evaluate)? Skills available? Qualitative? Participatory? Quantitative? Extractive? Scope? Appropriate level of rigor? Evaluation FOR whom? Does this help, or just confuse things more? Who said evaluations (like life) would be easy?!! 61 TIME FOR DISCUSSION 65 Now, where were we? Oh, yes, we’re ready for Steps 2 and 3 of the RealWorld Evaluation Approach. Let’s continue … 66 RealWorld Evaluation Designing Evaluations under Budget, Time, Data and Political Constraints Steps 2 + 3 ADDRESSING BUDGET AND TIME CONSTRAINTS 67 Step 2: Addressing budget constraints A. Clarifying client information needs B. Simplifying the evaluation design C. Look for reliable secondary data D. Review sample size E. Reducing costs of data collection and analysis 68 2A: Simplifying the evaluation design For quantitative evaluations it is possible to select among the most common evaluation designs (noting the trade-offs when using a simpler design). For qualitative evaluations the options will vary depending on the type of design. 69 2A (cont): Qualitative designs Depending upon the design, some of the options might include: • Reducing the number of units studied (communities, families, schools) • Reducing the number of case studies or the duration and complexity of the cases. • Reducing the duration or frequency of observations 70 2.B. Rationalize data needs Use information from Step 1 to identify client information needs Review all data collection instruments and cut out any questions not directly related to the objectives of the evaluation. 71 2.C. Look for reliable secondary sources Planning studies, project administrative records, government ministries, other NGOs, universities / research institutes, mass media. 72 2.C. Look for reliable secondary sources, cont. Assess the relevance and reliability of sources for the evaluation with respect to: Coverage of the target population Time period Relevance of the information collected Reliability and completeness of the data Potential biases 73 2.D. Seeking ways to reduce sample size Accepting a lower level of precision significantly reduces the required number of interviews: To test for a 5% change in proportions requires a maximum sample of 1086 To test for a 10% change in proportions requires a maximum sample of up to 270 74 2.E. Reducing costs of data collection and analysis Use self-administered questionnaires Reduce length and complexity of instrument Use direct observation Obtain estimates from focus groups and community forums Key informants Participatory assessment methods Multi-methods and triangulation 76 Step 3: Addressing time constraints In addition to Step 2 methods: Reduce time pressures on external consultants • Commission preparatory studies • Video conferences Hire more consultants/researchers Incorporate outcome indicators in project monitoring systems and documents Technology for data inputting/coding 77 Addressing time constraints It is important to distinguish between approaches that reduce the: a) duration in terms of time over the life of the project (e.g. from baseline to final evaluation over 5 years) b) duration in terms of the time needed to undertake the actual evaluation study/studies (e.g. 6 weeks, whether completed in an intensive consecutive 6 weeks or a cumulative total of 6 weeks periodically over the course of a year), and b) the level of effort (person-days, i.e. number of staff x total days required). 78 Addressing time constraints Negotiate with the client to discuss questions such as the following: 1. What information is essential and what could be dropped or reduced? 2. How much precision and detail is required for the essential information? E.g. is it necessary to have separate estimates for each geographical region or sub-group or is a population average acceptable? 3. Is it necessary to analyze all project components and services or only the most important? 4. Is it possible to obtain additional resources (money, staff, computer access, vehicles etc) to speed up the data collection and analysis process? 79 TIME FOR DISCUSSION 80 80 RealWorld Evaluation Designing Evaluations under Budget, Time, Data and Political Constraints Session 5 Addressing data constraints Step 4 Addressing data constraints Step 1 Planning and Scoping the Evaluation Step 2 Step 3 Step 4 Addressing budget Addressing time Step 5 Addressing data constraints constraints Addressing political constraints constraints Step 6 Assessing the strengths and weaknesses of the evaluation design Step 4 Addressing data constraints Step 7 Strengthening the A. Reconstructing baseline data Evaluation Design B. Special challenges in working with comparison groups. C. Collecting data on sensitive topics D. Collecting data on difficult to reach groups Two kinds of data constraints: 1. Reconstructing baseline data 2. Special data issues for comparison groups 83 1. Reconstructing baseline conditions for project and comparison groups [see Table 10, p. 59] 1. The importance of baseline data Hard to assess change without data on pre- project conditions Post-test comparisons do not fully address: • Selection bias: initial differences between participants and non-participants • Propensity score matching and instrumental variables partially addresses this • Historical factors influencing outcomes that were assumed to have been caused by the project intervention 85 1. Ways to reconstruct baseline conditions A. Secondary data. B. Project records. C. Recall D. Key informants E. PRA and other participatory techniques such as timelines, and critical incidents to help establish the chronology of important changes in the community 86 1-A. Assessing the utility of potential secondary data Reference period Population coverage Inclusion of required indicators Completeness Accuracy Free from bias 87 1-A. Using secondary data to reconstruct baselines Census Surveys Project administrative data Agency reports Special studies by NGOs, donors University studies Mass media (newspapers, radio, TV) 88 1-A. Using secondary data to reconstruct baselines Community organization records Notices in offices, community centers etc Posters Birth/death records Wills and documents concerning property Private Sector data 89 1-B. Using project records Types of data Feasibility/planning studies Application/registration forms Supervision reports MIS data Meeting reports Community and agency meeting minutes Progress reports Construction costs 90 1-B. Assessing the reliability of project records Who collected the data and for what purpose? Were they collected for record-keeping or to influence policymakers or other groups? Do monitoring data only refer to project activities or do they also cover changes in outcomes? Were the data intended exclusively for internal use? For use by a restricted group? Or for public use? 91 1-B. Assessing the reliability of project records How accurate and complete are the data? Are there obvious gaps? Were these intentional or due to poor record- keeping. Potential biases with respect to the key indicators required for the impact evaluation? 92 1-B. Working with the client to improve the utility of project data for evaluation Collecting additional information on applicants or participants Ensure identification data is included and accurate. Ensure data organized in the way needed for evaluation [by community/ types of service/ family rather than just individuals/ economic level etc] 93 1-C. Using recall to reconstruct baseline data School attendance and time/cost of travel Sickness/use health facilities Income and expenditures Community/individual knowledge and skills Social cohesion/conflict Water usage/quality/cost Periods of stress Travel patterns 94 1-C. Where Knowledge about Recall is Greatest Areas where most research has been done on the validity of recall • Income and expenditure surveys • Demographic data and fertility behavior Types of Questions • Yes/No; fact • Scaled • Easily related to major events 95 1-C. Limitations of recall Generally not reliable for precise quantitative data Sample selection bias Deliberate or unintentional distortion Few empirical studies (except on expenditure) to help adjust estimates. 96 1-C. Sources of bias in recall Who provides the information Under-estimation of small and routine expenditures “Telescoping” of recall concerning major expenditures. Distortion to conform to accepted behavior. • Intentional • Romanticizing the past Contextual factors: • Time intervals used in question • Respondents expectations of what interviewer wants to know Implications for the interview protocol 97 1-C. Improving the validity of recall Conduct small studies to compare recall with survey or other findings. Ensure all groups interviewed Triangulation Link recall to important reference events • Elections • Drought/floods • Construction of road, school etc 98 1-D. Key informants Not just officials and high status people Everyone can be a key informant on their own situation: • Single mothers • Factory workers • Users of public transport • Sex-workers • Street children 99 1-D. Guidelines for key- informant analysis Triangulation greatly enhances validity and understanding Include informants with different experiences and perspectives Understand how each informant fits into the picture. Employ multiple rounds if necessary Carefully manage ethical issues 100 1-E. PRA and related participatory techniques PRA techniques collect data at the group or community [rather than individual] level. Can either seek to identify consensus or identify different perspectives. Risk of bias: • Only certain sectors of the community attend • Certain people dominate the discussion 101 1-E. Time-related PRA techniques useful for reconstructing the past Time line Trend analysis Historical transect Seasonal diagram Daily activity schedule Participatory genealogy Dream map Critical incidents 102 1-E. Using PRA recall methods: seasonal calendars Seasonal Calendar of Poverty Drawn by Villagers in Nyamira, Kenya Jan Feb Mar April May Jun Jul Aug Sep Oct Nov Dec Light meals OOO OOO O O OO Begging OOO OOO O OOO OOO OOO OO Migration OOO OOO OO O O OO Unemployment OOO OOO OO OOO OOO Income O OOO OO OOO OOO OOO OOO O O O O OO O OOO OOO OO Disease O OOO OO OOO OO OOO OO O OO Rainfall OO OOO O O OO OO O OO O O O Source: Rietbergen-McCracken and Narayan 1997 1-F. Issues in baseline reconstruction Variations in reliability of recall. Memory distortion. Secondary data not easy to use Secondary data incomplete or unreliable. Key informants may distort the past 104 2. Reconstructing comparison (control) groups 105 2. Ways to reconstruct control groups Judgmental matching of communities. When phased introduction of project services beneficiaries entering in later phases can be used as “pipeline” control group. Internal controls when different subjects receive different combinations and levels of services 106 2. Using propensity scores to strengthen comparison groups Propensity score matching Rapid assessment studies can compare characteristics of project and control groups using: • Observation • Key informants • Focus groups • Secondary data • Aerial photos and GIS data 107 2. Using propensity scores to strengthen comparison groups Logistical regression (Logit) on project and comparison population to identify determinants of project participation Select “nearest neighbors” (usually around 5) from comparison group who most closely match a participant. Project impact = gain score = difference between project participant score and mean score for nearest neighbors. 108 Issues in reconstructing control groups Project areas often selected purposively and difficult to match. Differences between project and control groups - difficult to assess if outcomes due to project or to these initial differences. Lack of good data to select control groups Contamination Econometric methods cannot fully adjust for initial differences between the groups [unobservables]. 109 References Bamberger, Rugh and Mabry (2006). RealWorld Evaluation. Chapter 5 Kumar, S (2002). Methods for Community Participation. A complete guide for practitioners. Patton, M.Q. (2002). Qualitative research and evaluation methods. Chapters 6 and 7. Roche, C. 1999. Impact assessment for development agencies. Chapter 5. 110 Pause for DISCUSSION RealWorld Evaluation Designing Evaluations under Budget, Time, Data and Political Constraints Session 6 Mixed-method evaluations It should NOT be a fight between pure QUALITATIVE QUANTITATIVE (verbiage alone) OR (numbers alone) Quantoid! Qualoid! 113 “Your human interest story sounds nice, but let me show you the statistics.” QUALITATIVE “Your numbers QUANTITATIVE look impressive, but let me tell you the human interest story.” 114 What’s needed is the right combination of BOTH QUALITATIVE methods AND QUANTITATIVE methods 115 I. Mixed Method Designs 1. Quantitative data collection methods Structured surveys (household, farm, transport usage etc) Structured observation Anthropometric methods Aptitude and behavioral test 116 1. Quantitative data collection methods Strengths and weaknesses Strengths Weaknesses Generalization Surveys cannot capture many Statistically representative types of information Estimate magnitude and Do not work for difficult to reach distribution of impacts groups Clear documentation of No analysis of context methods Survey situation may alienate Standardized approach Long delay in obtaining results Statistical control of bias and Data reduction loses information external factors 117 2. Qualitative data collection methods Analysis of Interviewing Observation Documents and Artifacts Structured Participant observation Project documents Semi-structured Structured observation Published reports Unstructured Unstructured observation E-mail Focus groups Photography and video Legal documents: Community recording • birth and death certificates, interviews • property transfer documents PRA • marriage certificates Audio recording Posters Decorations in the house Clothing and gang insignia 118 2. Qualitative data collection methods Characteristics The researcher’s perspective is an integral part of what is recorded about the social world Scientific detachment is not possible Meanings given to social phenomena and situations must be understood Programs cannot be studied independently of their context. Cause and effect cannot be defined and change must be studied holistically. 119 2. Qualitative data collection methods Strengths and weaknesses Strengths Weaknesses Flexible to evolve Lack of clear design may Sampling focuses on high frustrate clients value subjects Lack of generalizability Holistic focus (“the big picture”) Multiple perspectives -hard to reach consensus Multiple sources provide Individual factors not isolated. complex understanding Interpretive methods appear too Narrative more accessible to subjective non-specialists Triangulation strengthens validity of findings 120 3. Mixed method evaluation designs Combine the strengths of both QUANT and QUAL approaches One approach ( QUANT or QUAL) is often dominant and the other complements it Can have both approaches equal but harder to design and manage. Can be used sequentially or concurrently 121 Determining appropriate precision and mix of multiple methods High rigor, high quality, more time & expense Participatory --- Qualitative Nutritional Extractive --- Quantitative measurements HH surveys Focus Groups Nutritional measurements Focus HH Groups surveys Key Informant interviews Large group Low rigor, questionable quality, quick and cheap 3. Mixed method evaluation designs How quantitative and qualitative methods complement each other A. Broaden the conceptual framework • Combining theories from different disciplines: • Exploratory QUAL studies can help define framework B. Combine generalizability with depth and context • Random subject selection ensures representativity and generalizability • Case studies, focus groups etc can help understand the characteristics of the different groups selected in the sample C. Permit access to difficult to reach groups [QUAL] • PRA, focus groups, case studies, snowball samples, etc can be effective ways to reach women, ethnic minorities and other vulnerable groups • Direct observation can provide information on groups difficult to interview. For example, informal sector and illegal economic activities D. Enable Process analysis [QUAL] • Observation, focus groups and informal conversations are more effective for understanding group processes or interaction between people and public agencies, and studying the organization 123 3. Mixed method evaluation designs How quantitative and qualitative methods complement each other (cont.) D. Analysis and control for underlying structural factors [QUANT] • Sampling and statistical analysis can avoid misleading conclusions • Propensity scores and multivariate analysis can statistically control for differences between project and control groups Example: • Meetings with women may suggest gender biases in local firms’ hiring practices; however, • Using statistical analysis to control for years of education or experience may show there are no differences in hiring policies for workers with comparable qualifications Example: • Participants who volunteer to attend a focus group may be strongly in favor or opposed to a certain project, but • A rapid sample survey may show that most community residents have different views 124 3. Mixed method evaluation designs How quantitative and qualitative methods complement each other (cont.) F. Triangulation and consistency checks • Direct observation may identify inconsistencies in interview responses • Examples: • A family may say they are poor but observation shows they have new furniture, good clothes etc. • A woman may say she has no source of income, but an early morning visit may show she operates an illegal beer brewing business G. Broadening the interpretation of findings: • Combining personal experience with “social facts” • Statistical analysis frequently includes unexpected or interesting findings which cannot be explained through the statistics. Rapid follow-up visits may help explain the findings 125 3. Mixed method evaluation designs How quantitative and qualitative methods complement each other (cont.) G. Interpreting findings Example: • A QUANT survey of community water management in Indonesia found that with only one exception all village water supply was managed by women • Follow-up visits found that in the one exceptional village women managed a very profitable dairy farming business – so men were willing to manage water to allow women time to produce and sell dairy produce Source: Brown (2000) 126 Using Qualitative methods to improve the Evaluation design and results Use recall to reconstruct pre-test situation Interviews with key informants to identify other changes in the community or in gender relations Interviews or focus groups with women and men to • assess the effect of loans on gender relations within the household, such as • changes in control of resources and decision-making • identify other important results or unintended consequences: • increase in women’s work load, • increase in incidence of gender-based or domestic violence 127 Enough of our presentations: it’s time for you (THE RealWorld PEOPLE!) to get involved Small group case study work 1. Some of you are playing the role of evaluation consultants, others are clients coordinating the evaluation. 2. Decide what your group will do to address the given constraints/ challenges. 3. Prepare to negotiate the ToR with the other group after lunch. 129 RealWorld Evaluation Designing Evaluations under Budget, Time, Data and Political Constraints Session 8 Identifying and addressing threats to the validity of the evaluation design and conclusions 130 The Real World Evaluation [RWE] Approach Step 1 Planning and scoping the evaluation Step 4 Step 2 Step 3 Step 5 Addressing data constraints Addressing budget constraints Addressing time constraints Addressing political influences Step 6 Strengthening the evaluation design and validity Step 6 Step 7 Strengthening the evaluation Helping clients use the evaluation design and the validity of the conclusions A. Identifying threats to validity of quasi- experimental designs B. Assessing the adequacy of qualitative designs C. An integrated checklist for mixed-method designs D. Addressing threats to quantitative evaluation designs E. Addressing threats to the adequacy of qualitative designs F. Addressing threats to mixed-method designs Session outline 1. What is validity and why does it matter? 2. General guidelines for assessing validity 3. Additional threats to validity for quantitative evaluation designs 4. Strategies for addressing threats to validity 132 1. What is validity and why does it matter? Defining validity The degree to which the evaluation findings and recommendations are supported by: The conceptual framework describing how the project is supposed to achieve its objectives Statistical techniques (including sample design) How the project and the evaluation were implemented The similarities between the project population and the wider population to which findings are generalized 134 Importance of validity Evaluations provide recommendations for future decisions and action. If the findings and interpretation are not valid: Programs which do not work may continue or even be expanded Good programs may be discontinued Priority target groups may not have access or benefit 135 RWE quality control goals The evaluator must achieve greatest possible methodological rigor within the limitations of a given context Standards must be appropriate for different types of evaluation The evaluator must identify and control for methodological weaknesses in the evaluation design. The evaluation report must identify methodological weaknesses and how these affect generalization to broader populations. 136 2. General guidelines for assessing the validity of all evaluation designs [see Overview Handbook Appendix 1] A. Confirmability B. Reliability C. Credibility D. Transferability E. Utilization 137 A. Confirmability Are the conclusions drawn from the available evidence and is the research relatively free of researcher bias? Examples: A-1: Inadequate documentation of methods and procedures A-2: Is data presented to support the conclusions and are the conclusions consistent with the findings? [Compare the executive summary with the data in the main report] 138 B. Reliability Is the process of the study consistent, reasonably stable over time and across researchers and methods? Examples: B-2: Data was only collected from people who attended focus groups or community meetings B-4: Were coding and quality checks made and did they show agreement? 139 C. Credibility Are the findings credible to the people studied and to readers? Is there an authentic picture of what is being studied? Examples: C-1: Is there sufficient information to provide a credible description of the subjects or situations studied? C-3: Was triangulation among methods and data sources systematically applied? Were findings generally consistent? What happened if they were not? 140 D. Transferability Do the conclusions fit other contexts and how widely can they be generalized? Examples: D-1: Are the characteristics of the sample described in enough detail to permit comparisons with other samples? D-4: Does the report present enough detail for readers to assess potential transferability? 141 E. Utilization Were findings useful to clients, researchers and communities studied? Examples: E-1: Were findings intellectually and physically accessible to potential users? E-3: Do the findings provide guidance for future action? 142 3. Additional threats to validity for Quasi-Experimental Designs [QED] [see Overview Handbook Appendix 1] F. Threats to statistical conclusion validity why inferences about statistical association between two variables (for example project intervention and outcome) may not be valid G. Threats to internal validity why assumptions that project interventions have caused observed outcomes may not be valid H. Threats to construct validity why selected indicators may not adequately describe the constructs and causal linkages in the evaluation model I. Threats to external validity why assumptions about the potential replicability of a project in other locations or with other groups may not be valid 143 F. Statistical conclusion validity The statistical design and analysis may incorrectly assume that program interventions have contributed to the observed outputs. The wrong tests are used or they are applied/interpreted incorrectly Problems with sample design Measurement errors 144 G. Threats to internal validity It may be incorrectly assumed that there is a causal relationship between project interventions and observed outputs. Unclear temporal sequence between the project and the observed outcomes. Need to control for external factors Effects of time Unreliable measures 145 Example of threat to internal validity: The assumed causal model Increases women’s income Women join the village bank where they receive loans, learn skills and gain self-confidence Increases women’s control over WHICH ……… household resources An alternative causal model Women who had taken literacy training are more likely to join the village bank. Their literacy and self- Some women confidence makes had them more effective Women’s income and previously taken entrepreneurs control over literacy training household resources which increased increased as a their self- combined result of confidence and literacy, self- work skills confidence and loans H. Threats to construct validity The indicators of outputs, impacts and contextual variables may not adequately describe and measure the constructs [hypotheses/concepts] on which the program theory is based. Indicators may not adequately measure key concepts The program theory model and the interactions between stages of the model may not be adequately specified. Reactions to the experimental context are not well understood. 148 I. Threats to external validity Assumptions about how the findings could be generalized to other contexts may not be valid. Some important characteristics of the project context may not be understood. Important characteristics of the project participants may not be understood. Seasonal and other cyclical effects may have been overlooked. 149 RealWorld Evaluation book • Appendix 2 gives a worksheet for assessing the quality and validity of an evaluation design • Appendix 3 provides worked examples 150 4. Addressing generic threats to validity for all evaluation designs A. Confirmability Example: Threat A-1: inadequate documentation of methods and procedures Possible ways to address: Request the researchers to revise their documentation to explain more fully their methodology or to provide missing material. Rapid data collection methods (surveys, desk research, secondary data) to fill gaps 152 B. Reliability Example: Threat B-4: data were not collected across the full range of appropriate settings, times, respondents etc. Possible ways to address If the study has not yet been conducted revise the sample design or use qualitative methods to cover the missing settings, times or respondents. If data collection has already been completed consider using rapid assessment methods such as focus groups, interviews with key informants, participant observation etc to fill in some of the gaps. 153 C. Credibility Example: Threat C-2: The account does not ring true and does not reflect the local context Possible ways to address If the study has not yet been conducted, revise the sample design or use qualitative methods to cover the missing settings, times or respondents. If data collection has already been completed consider using rapid assessment methods such as focus groups, interviews with key informants, participant observation etc to fill in some of the gaps. 154 D. Transferability Example: Threat D-3: Sample does not permit generalization to other populations Possible ways to address: Organize workshops or consult key informants to assess whether the problems concern missing information, factual issues or how the material was interpreted by the evaluator. Return to the field to fill in the gaps or include the impressions of key informants, focus group participants, or participant observers to provide different perspectives. 155 E. Utilization Example: Threat E-2: The findings do not provide guidance for future action Possible ways to address If the researchers have the necessary information, ask them to make their recommendations more explicit. It they do not have the information organize brainstorming sessions with community groups or the implementing agencies to develop more specific recommendations for action. 156 Lightning feedback What are some of the most serious threats to validity affecting your evaluations? How can they be addressed? Time for more discussion 158 Small group case study work, cont. 1. Evaluation ‘consultants’ meet with ‘clients’ working on same case study (1A+1B) and (2A+2B) 2. Negotiate your proposed modification of the ToR in order to cope with the given constraints 3. Be prepared to summarize lessons learned from this exercise (and workshop) 159 In conclusion: Evaluators must be prepared to: 1. Enter at a late stage in the project cycle; 2. Work under budget and time restrictions; 3. Not have access to comparative baseline data; 4. Not have access to identified comparison groups; 5. Work with very few well qualified evaluation researchers; 6. Reconcile different evaluation paradigms and information needs of different stakeholders. 160 Main workshop messages 1. Evaluators must be prepared for real-world evaluation challenges 2. There is considerable experience to draw on 3. A toolkit of rapid and economical “RealWorld” evaluation techniques is available 4. Never use time and budget constraints as an excuse for sloppy evaluation methodology 5. A “threats to validity” checklist helps keep you honest by identifying potential weaknesses in your evaluation design and analysis 161 162 162
"International Program for Development "