Why Use Randomized Evaluation

Document Sample
Why Use Randomized Evaluation Powered By Docstoc
					 Why Use Randomized
 Evaluation?

Shawn Cole, Harvard Business School and J-PAL


       DIME-FPD
       AADAPT Workshop South Asia
       Dakar, Senegal
       Goa, December 17-21, 2009
       February 1
                                                1
Fundamental Question

 What is the effect of a program or
  intervention?
   Does microfinance reduce poverty?
   Does streamlining business registration
    encourage entrepreneurship?
   Does auditing reduce tax evasion?
Explaining to grandparents
 Nicholas Kristoff, New York Times Columnist (11/20/2009)
    “One of the challenges with the empirical approach is that aid
     organizations typically claim that every project succeeds.
     Failures are buried so as not to discourage donors, and
     evaluations are often done by the organizations themselves —
     ensuring that every intervention is above average. Yet recently
     there has been a revolution in evaluation, led by economists at
     Poverty Action Lab at MIT.
    The idea is to introduce new aid initiatives randomly in some
     areas and not in others [or to some people and not to others],
     and to measure how much change occurred and at what cost.
     This approach is expensive but gives a much clearer sense of
     which interventions are most cost-effective.”
Objective


 To Identify the causal effect of an
  intervention
   Identify the impact of the program
 Need to find out what would have happened
  without the program
   Cannot observe the same person with and
    without the program at the same point of time

                                                    4
Correlation is not causation
     Question: Does providing credit increase firm profits?

     Suppose we observe that firms with more credit also
     earn higher profits.

1)    Credit Use             Higher
                             profits

OR
                                                Higher profits   ?
            Business
2)           Skills
                                                Credit           ?
Illustration: Credit Program
(Before-After)
            Treatment Group
                              Treatment Group


14
                                        (+6) increase in gross
12                                      operating margin
10

 8                              A credit program was
 6                              offered in 2008.
 4
                                Why did operating
 2
                                margin increase?
 0
     2007         2009                                           6
Motivation
 Hard to distinguish causation from correlation by analyzing
  existing (retrospective) data
   However complex, statistics can only see that X moves with Y
   Hard to correct for unobserved characteristics, like motivation/ability
   May be very important- also affect outcomes of interest
 Selection bias a major issue for impact evaluation
   Projects started at specific times and places for particular reasons
   Participants may be selected or self-select into programs
   People who have access to credit are likely to be very different from
    the average entrepreneur, looking at their profits will give you a
    misleading impression of the benefits of credit



                                                                              7
Illustration: Credit Program
(Valid Counterfactual)
                     Control Group
                     Treatment Group

 14
                                     (+4) Impact of the
 12                                  program

 10
                                     (+2) Impact of other
                                     (external) factors
  8
                          * Macroeconomic
  6
                          environment affects
  4                       control group
                          * Program impact easily
  2                       identified
                                                            8
  0
Experimental Design

 All those in the study have the same chance of
  being in the treatment or comparison group
 By design, treatment and comparison have the
  same characteristics (observed and unobserved), on
  average
   Only difference is treatment
 Yields unbiased impact estimates



                                                       9
Medical Trials Analogy

 Medical trials:
   Take 1,000 subjects
   Assign 50% to treatment group, 50% to control
   On average
    ▪ Age in treatment and control group the same
    ▪ Pre-existing health in both groups the same
    ▪ Expected evolution of health in both groups the same
   Track outcomes for treatment and control groups
   “Gold standard” of scientific research
 Development projects
   Many projects amenable to similar design
Options for Randomization

 Lottery (0nly some receive)
   Lottery to receive new loans
 Random phase-in (everyone gets it eventually)
   Some groups or individuals get credit each year
 Variation in treatment
   Some get matching grant, others get credit, others get
    business development services etc
 Encouragement design
   Some farmers get home visit to explain loan product,
    others do not

                                                             11
Lottery among the qualified

           Must receive the
           program


           Randomize who gets the
           program


           Not suitable for the
           program
Opportunities

 Budget constraint prevents full coverage
   Random assignment (lottery) is fair and
    transparent
 Limited implementation capacity
   Phase-in gives all the same chance to go first
 No evidence on which alternative is best
   Random assignment to alternatives
    with equal ex ante chance of success


                                                     13
Opportunities for
Randomization
 Take up of existing program is not complete
   Provide information or incentive for some to sign
    up- Randomize encouragement
 Pilot a new program
   Good opportunity to test design before scaling
    up
 Operational changes to ongoing programs
   Good opportunity to test changes before scaling
    them up

                                                        14
Different levels you can
randomize at

    Individual/owner/firm    Women’s association
    Business Association     Regulatory
    Village level             jurisdiction/
                               administrative
                               district
                              School level



                                                15
Group or individual
randomization?

 If a program impacts a whole group-- usually
  randomize whole community to treatment or
  comparison
 Easier to get big enough sample if randomize
  individuals
 Individual randomization   Group randomization
Unit of Randomization

 Randomizing at higher level sometimes necessary:
   Political constraints on differential treatment within community
   Practical constraints—confusing to implement different
    versions
   Spillover effects may require higher level randomization

 Randomizing at group level requires many groups
  because of within community correlation
   Micro-credit program to treat 100,000 people.
    Choose Senegal and Gambia, and randomly offer program in
    one country.
   What do we learn?
   Similar problem if choose only 4 or only 10 districts

                                                                       17
     Elements of an experimental design

                                       Target population
                                               SMEs



                                      Potential participants
                        Tailors                         Furniture manufacturers


                                       Evaluation sample



Treatment Group                       Random assignment         Control Group
• Participants   ￿ Non-participants

                                                                                  18
External and Internal Validity (1)

 External validity
   The evaluation sample is representative of the total
    population
   The results in the sample represent the results in the
    population  We can apply the lessons to the whole
    population
 Internal validity
   The intervention and comparison groups are truly
    comparable
    estimated effect of the intervention/program on the
    evaluated population reflects the real impact on that
    population

                                                             19
External and Internal Validity (2)

 An evaluation can have internal validity without
  external validity
   Example: A randomized evaluation of encouraging
    informal firms to register in urban areas may not tell us
    much about impact of a similar program in rural areas
 An evaluation without internal validity, can’t have
  external validity
   If you don’t know whether a program works in one place,
    then you have learnt nothing about whether it works
    elsewhere.

                                                                20
Internal & external validity
                                        National Population




                       Random Sample-
                        Randomization

                                          Representative
                                        Sample of National
                                            Population


              Randomization



                                                              21
   Internal validity

Example:                                                       Population

Evaluating a
program that
targets women
                                        Stratification


                                                         Population stratum
Samples of Population
      Stratum


                        Randomization




                                                                              22
Representative but biased:
useless

Example:                            National Population

Randomly select
1 in 100 firms in
Senegal.
Among this          Randomization
sample, compare                      Non-random
those with bank                      assignment
                                      USELESS!
loans to those
without.


                                                    23
Efficacy & Effectiveness

 Efficacy
   Proof of concept
   Smaller scale
   Pilot in ideal conditions
 Effectiveness
   At scale
   Prevailing implementation arrangements -- “real life”

 Higher or lower impact?
 Higher or lower costs?

                                                            24
Advantages of “experiments”

 Clear and precise causal impact
 Relative to other methods
     Provide correct estimates
     Much easier to analyze- Difference in averages
     Easier to explain
     More convincing to policymakers
     Methodologically uncontroversial


                                                       25
Randomly assigning machines within a
plant to receive regular maintenance

  Machines do NOT
   Raise ethical or practical concerns about
    randomization
   Fail to comply with Treatment
   Find a better Treatment
   Move away—so lost to measurement
   Refuse to answer questionnaires

   Human beings can be a little more
    challenging!
What if there are constraints on
randomization?

 Some interventions can’t be assigned
  randomly
 Partial take up or demand-driven
  interventions: Randomly promote the
  program to some
   Participants make their own choices about
    adoption
 Perhaps there is contamination- for instance,
  if some in the control group take-up
  treatment
                                                27
Randomly Assigned Marketing
(Encouragement Design)
 Those who get receive marketing treatment are
  more likely to enroll
 But who got marketing was determined
  randomly, so not correlated with other
  observables/non-observables
   Compare average outcomes of two groups:
    promoted/not promoted
   Effect of offering the encouragement (Intent-To-
    Treat)
   Effect of the intervention on the complier population
    (Local Average Treatment Effect)
    ▪ LATE= ITT/proportion of those who took it up
    Randomization
               Assigned to   Assigned to   Difference                 Impact: Average
               treatment     control                                  treatment effect
                                                                      on the treated




Non-treated


Treated




Proportion     100%          0%            100%                       100%
treated                                    Impact of assignment

Mean outcome   103           80            23                         23/100%=23
                                           Intent-to-treat estimate   Average treatment on the
                                                                      treated
    Random encouragement
                   Randomly     Not encouraged   Difference                 Impact: Average
                   Encouraged                                               treatment effect
                                                                            on compliers




Non-treated
(did not take up
program)
Treated
(did take up
program)

Proportion         70%          30%              40%                        100%
treated                                          Impact of encouragement


Outcome            100          92               8                          8/40%=20
                                                 Intent-to-treat estimate   Local average treatment
                                                                            effect
Common pitfalls to avoid

 Calculating sample size incorrectly
   Randomizing one district to treatment and one
    district to control and calculating sample size on
    number of people you interview
 Collecting data in treatment and control
  differently
 Counting those assigned to treatment who
  do not take up program as control—don’t
  undo your randomization!!
                                                         31
When is it really not possible?

 The treatment already assigned and
  announced
   and no possibility for expansion of treatment
 The program is over (retrospective)
 Universal take up already
 Program is national and non excludable
   Freedom of the press, exchange rate policy
  (sometimes some components can be randomized)
 Sample size is too small to make it worth it
                                                   32
Further Resources

 DEC
 JPAL, IPA
 J-PAL Course on Impact Evaluation
 Duflo, Glennerster, Kremer (2006)
 Email presenters (e.g., scole@hbs.edu)
 Engage academics in studies
Thank You




            34

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:6/7/2013
language:Unknown
pages:34
jiang lifang jiang lifang
About