Lectures on Empirical Public Finance Kevin Milligan University of British Columbia August 2009 For the Bavarian Graduate Program in Economics
Lecture 3A: Regression Discontinuity: Introduction
1.0
Regression Discontinuity: Introduction
The regression discontinuity design is in essence very simple. The idea is to exploit the presence of a discontinuity in treatment at some value of a ‘forcing’ variable to estimate a treatment effect. The first known usage of this methodology appeared 50 years ago in a paper by Thistlethwaite and Campbell (1960) in The Journal of Educational Psychology. They were interested in the impact of winning a merit scholarship on subsequent educational outcomes. Since the awarding of the scholarship depended on a known test score, you could compare those who just barely qualified for a scholarship to those who just barely did not qualify. 1.1 Defining terms
To begin, let’s use a hypothetical example from Lee and Lemieux to define some terms and give an idea what we’re talking about.
Lectures on Empirical Public Finance: Lecture 3A
1
The forcing variable here is X. The point of discontinuity is C. The outcome variable of interest is Y. The estimated ‘gap’ in the outcome variable at C is the regression discontinuity estimate of the treatment effect. This method can be used whenever there is an observable discontinuity in some forcing variable. Note that we do not need to assume that X is unrelated to Y. In the above example, there is a clear linear relationship. Instead, what is needed is simply a discontinuity in the relationship between X and Y. Some important terminology should be introduced here. So far, we have talked about treatment being deterministically assigned depending on what side of C one falls. This is referred to as a ‘sharp’ RD design. What if treatment is not deterministically assigned on either side of the discontinuity? Instead, what if your probability of treatment jumps discontinuously? The answer is that the RD design is still valid. There need only be a jump in the probability of treatment. This is referred to as a ‘fuzzy’ RD design. If the jump in the probability is 1.0, then this collapses back to the sharp design. The regression discontinuity design is covered in MHE and in Imbens and Wooldridge (2008). However, the best two sources are recent papers by Imbens and Lemieux (2008) and Lee and Lemieux (2009). The first of these is a bit more technical with more econometric detail, while the second is a bit more conversational.
Lectures on Empirical Public Finance: Lecture 3A
2
We’ll now turn to four quick examples, just to make sure the concept gets across. There are dozens of RD studies listed in Lee and Lemieux (2009). Example 1: Almond and Doyle (2009) What’s the effect of time in hospital post-birth on child health outcomes?
Look at kids born on either side of midnight when hospital rules permit longer stays for those born at 12:05am. Little observable impact on readmission probabilities.
Lectures on Empirical Public Finance: Lecture 3A
3
1.2
Example 2: DiNardo and Lee (2004)
Effect of unions on employee and employer outcomes—look at unionization votes that were just <50% and those just over 50%. Big impact on unionization; little impact on firm sales.
Lectures on Empirical Public Finance: Lecture 3A
4
1.3
Example 3: Oreopoulos (2006)
What’s the effect of another year of schooling on later in life earnings? Uses a change in the compulsory schooling age in the UK. Finds impact on earnings.
Lectures on Empirical Public Finance: Lecture 3A
5
1.4
Example 4: Cipollone and Rosolia (2007)
After an earthquake in Italy, government paid to rebuild schools in affected area. Compare towns just within and just outside the specified reconstruction area to see the impact of new schools on outcomes of interest.
2.0
How does RD solve the evaluation problem?
The econometrics of the regression discontinuity approach was initially compared to instrumental variables. Angrist and Lavy (1999) explicitly do instrumental variables and talk about interpretation of the estimate as a LATE. Hahn, Todd, and Van der Klaauw also set their discussion in a LATE type of framework. However, Lee and Lemieux (2009) argue that RD looks much more like a randomized experiment than an IV estimator. I think they’re right—let’s see why.
Lectures on Empirical Public Finance: Lecture 3A
6
2.1
The RD Estimator and the TAD
Imagine that we have a forcing variable X and we are interested in a point of discontinuity, X=c. For all values of X ≥ c , we have D=1. For all values of X < c , we have D=0. So, this is a sharp RD. What is the treatment effect that we want to measure? Let’s call it the TAD—treatment at discontinuity. TAD = E [Y1 | D = 1, X = c ] − E [Y0 | D = 1, X = c ] Can we just go to the data and estimate this? No—there are two problems. First, we may not have any data points exactly at X=c. Second part, we clearly will not be able to observe the untreated state of those with X=c, so the second object is not observable. To address the first problem, let’s take a limit from above instead of using points exactly at X=c. For the second problem, let’s substitute the experience of the observable untreated outcome among the untreated who are just to the left of the point of discontinuity. Let’s call this the RD estimator, Δ RD : Δ RD = lim E [Y1 | X = x, D = 1] − lim E [Y0 | X = x, D = 0] .
x ↓c x↑c
2.2
Characterizing bias with RD
Now we know how to estimate something ( Δ RD ), and we know the parameter we are interested in (the TAD). What’s next is to characterize the conditions under which Δ RD will give us the TAD. As we have done before, we will add and subtract the same term to the Δ RD estimator and see if we can group the terms into meaningful expressions. Let’s add and subtract three things. First, the unobserved Y0 outcome for those with D=1 and who are right at the discontinuity X=c. Second, the (possibly) observed outcome Y1 for those with D=1 who happen to be right at the point X=c. Third, the unobserved outcome Y0 for those with D=0 if they had been right at the discontinuity (and not assigned to treatment).
Δ RD = (E [Y1 | X = c, D = 1] − E [Y0 | X = c, D = 1]) + (E [Y0 | X = c, D = 1] − E [Y0 | X = c, D = 0]) + ⎛ lim E [Y | X = x, D = 1] − E [Y | X = c, D = 1]⎞ + ⎛ E [Y | X = c, D = 0] − lim E [Y | X = x, D = 0]⎞. ⎜ ⎟ ⎜ ⎟ 1 1 0 0 x↑c ⎝ x↓c ⎠ ⎝ ⎠
We can characterize these four terms as TAD + conventional selection bias + limit gap from above + limit gap from below. The TAD is clear; we defined it before. This is what we want.
Lectures on Empirical Public Finance: Lecture 3A
7
The selection bias is exactly as we saw for the case of a randomized experiment. It captures any difference between the Y0 of the treated group and its proxy, the Y0 of the untreated group. We can assume this term away based on reasoning very similar to that from a randomized experiment. (See Lee and Lemieux 2009, section 3.1 on local randomization.) So long as individuals cannot precisely control the forcing variable X, whether they fall on one or the other side of the discontinuity is as good as a random experiment. This means that their Y0’s are the same and that the selection bias term goes to zero. The limit gap from above tells us how far off is the limit as x approaches c from the value at c. However, if we assume that Y1 is continuous at X=c, then this assumption makes the two objects equal. This makes the bracket go to zero. The limit gap from below tells us how far off is the limit as x approaches c from the value exactly at c. We now must assume that Y0 is continuous at X=c to make the bracket go to zero. These three assumptions are sufficient to identify the TAD. To see more elegant proofs of identification, see Imbens and Lemieux (2008). 3.0
Challenges for RD
There are several important considerations when using an RD design. Below is a discussion of each. 3.1 Reliance on functional form and bandwidth
Ideally, one would have a lot of data close to the point of discontinuity. In reality, this isn’t always so. This means that an arbitrary decision has to be made in how far away from the discontinuity to go in order to get a good estimate. The farther away one goes, the more stretched becomes the assumption that the use of the untreated is a good proxy for what the treated would have looked like without treatment. Two distinct issues are raised. First is functional form. Should we use a linear OLS specification? Quadratic? A non-parametric estimator? Local linear regression? Theory is no guide here. The second issue is the bandwidth. Once we’ve decided on the appropriate functional form, how far away from the discontinuity should we go? Should we use a weighting scheme? Again, theory is of little use here. 3.2 Manipulation of the forcing variable
Imagine that an individual could choose the value of his forcing variable X. For example, imagine if one knew that a scholarship is given to everyone scoring more than 80% on a test. One might imagine that students who might achieve a 79% with a normal effort might put in some extra effort to get over 80%. Similarly, those who might score 85%
Lectures on Empirical Public Finance: Lecture 3A
8
might slack off a bit. In this way, the existence of the policy changes the distribution of X. McCrary (2008) provides a partial test for this possibility. He argues that we should examine the density of X. If the density of X shows some discontinuity—bunching—at the point of discontinuity, then this is evidence that there may be some selection and manipulation of the forcing variable. 3.3 Context specificity
Thinking back to our discussion of treatment effects, we must ask what we are estimating here. What we find is an extremely local treatment effect—the value of treatment right at the point of discontinuity. In some circumstances, this might be an interesting parameter. However, in others, this might not be terribly informative—especially if there is a lot of heterogeneity in individual treatment effects.
3.4
Limited Application
As attractive as the RD design may be, it is not applicable to that many circumstances if there is no discontinuity to be found. Moreover, even when there may be an RD to exploit, there may not be sufficient sample sizes to permit the data-intensive RD strategy to be used. Creative researchers are finding inventive RDs that can be exploited all the time, but if we relied only on RD we would not be able to answer many interesting economic questions. 4.0
Regression Discontinuity: summary
The RD design is increasingly popular. Lee and Lemieux advocate it as having particularly favourable properties—akin to randomized experiments, and with weaker assumptions than necessary for IV estimators. It is also very nice that so many of the assumptions that underlie estimation can be tested rather clearly. On the downside, the parameter estimated is very context specific with limited external validity. There are also a limited number of applications for RD estimators—you need to find a convincing discontinuity with lots of data!
Lectures on Empirical Public Finance: Lecture 3A
9
Imbens and Lemieux (2008) close with some practical advice to follow: 1. Graph the data. 2. Estimate effect using linear regressions on each side of the point. Use some bandwidth you think is reasonable. 3. Check robustness of assumptions.
Lectures on Empirical Public Finance: Lecture 3A
10