# Generating Plausible Causal Hypotheses

Document Sample

```					Generating Plausible Causal
Hypotheses

By
Larry V. Hedges
Northwestern University

Presented at the 2010 IES Research Conference
Goals
Provide a brief introduction to causal inference

Explain why experiments provide model free estimates of causal effects

Examine the possibility of causal inference from a few quasi-
experimental designs
-Assignment based on a covariate
-Regression discontinuity design
-Nonequivalent control group design

Examine the difference in differences approach in more detail
What is Causal Inference?
We all think we know what we mean by cause and effect

But a formal treatment is useful

It turns out that there are several treatments of cause and
effect

The modern statistical approach is often called the Rubin-
Holland-Rosenbaum model

(But its roots go back as far as Neyman, 1923)
The Rubin Holland Model
Key concepts

Units (e.g., individuals)

Treatments (e.g., 0, 1)

Responses (e.g., r0, r1)
ri0 the response of unit i if it got treatment 0
ri1 the response of unit i if it got treatment 1

Causal effect of treatment 1 versus 0 on unit i
τ i = r i1 – r i0
The Rubin Holland Model
The definition of the causal effect of treatment 1 versus 0 on unit i

τi = ri1 – ri0

• This is a relative definition: The effect of treatment 1 compared to
treatment 0

• This is a counterfactual definition, you can’t observe both ri0 and ri1
• The (relative) causal effect of a treatment on a single unit cannot be
(Although with additional assumptions single subject designs
attempt to do so)
Causal Inference and Missing Data
Note that causal inference is a missing data problem

You cannot observe both ri0 and ri1—one of them is always missing

Not surprisingly, modern ideas for causal inference sometimes draw on
modern ideas for handling missing data

Missing data methods try to find conditions that reduce the missing
data to be (conditionally) as if “random sampling”

Methods for causal inference try to find conditions that reduce the
missing data to be (conditionally) “as if” random assignment

We will discuss some of these later
The Rubin Holland Model
Example                     Unit   r1   r0   τ

1     20   10   10

Note that we assume          2     20   10   10

that both ri0 and ri1 are    3     20   10   10

known for the purposes       4     20   10   10

of illustration              5     11   20   -9

6     11   20   -9

7     11   20   -9
8     11   20   -9
The Rubin Holland Model
Example                     Unit   r1   r0   τ

1     20   10   10
Any particular experiment    2     20   10   10
would assign some units      3     20   10   10
to treatment, others to
4     20   10   10
control, so some ri0’s
5     11   20   -9
would be observed,
11   20   -9
some ri0’s would be          6

observed                     7     11   20   -9
8     11   20   -9
The Rubin Holland Model
Example                    Unit   r1   r0   τ

1     20   10   10
Each possible experiment    2     20   10   10
would get a different       3     20   10   10
average treatment effect
4     20   10   10
but the average over all
5     11   20   -9
possible assignments
6     11   20   -9
would be the average
treatment effect            7     11   20   -9
8     11   20   -9
The Rubin Holland Model
Example                     Unit   r1   r0   τ

1     20   10   10

Note that assigning the      2     20   10   10

best treatment to a unit     3     20   10   10

does not give an unbiased    4     20   10   10

estimate of the average      5     11   20   -9

treatment effect             6     11   20   -9

7     11   20   -9
8     11   20   -9
The Rubin Holland Model
Randomized experiments

Define the assignment variable Z via Z = 0 if a unit gets
control and Z = 1 if a unit gets treatment

Random assignment means that

E r 0 | Z  0  E r 0 | Z  1  E r 0 
E r1 | Z  0  E r1 | Z  1  E r1

Therefore (r0, r1) is independent of Z (assignment)
The Rubin Holland Model
Randomized experiments give model free estimates of
the average (relative) causal effect of a treatment

Why?

Because, independence of Z (assignment) and (r0, r1)
implies

E r  r 0   E r1  E r 0   E r1 | Z  1  E r 0 | Z  0
.
The Rubin Holland Model
This is all very simple

But this is deceptive

1974)

Why are there only 2 possible outcomes?

What if the treatment I get affects your response to treatment?

This assumption is called “no interference between units” (e.g., Cox,
1958) or the stable unit treatment value (SUTV) (e.g., Rubin,
assumption
The Rubin Holland Model
SUTV can be wrong!

Consider response to vaccines

The response to the smallpox vaccine (or not) depends on who else
is vaccinated

This is how eradication is possible

Consider classrooms or schools where social interaction is possible
(indeed probable)

Contamination is a violation of SUTV
The Rubin Holland Model
Some associations cannot be causal

Suppose one of ri0 or ri1 does not exist

• Some individuals would never accept treatment (refusers)

• Some individuals would always get treatment (always takers)

• Some individuals would always do the opposite of what they were
assigned (defiers)

This leads to the concept of compliers and complier average treatment
effect
The Rubin Holland Model
On a more philosophical level, not all “what if” questions have causal

The idea of a randomized experiment helps clarify what effects might
be causal

If you cannot imagine an experiment that assigns the treatments being
compared, it may not be sensible to talk of causal effects

It may not be sensible to talk of sex differences as causal effects

But, it might be sensible to talk of gender (social) differences causal
effects
The Rubin Holland Model
Similarly, it may not make sense to talk about causal
effects of treatments on

• Never takers

• Always takers

• Defiers

It makes sense to explicitly limit the scope of our attempts
at causal inference to the compliers
Scope of Causal Inference
Randomized experiments give model-free estimates of average causal
effects

Is there any other way to get them?

No other model-free methods are known

Many other methods can give estimates of causal effects given that a
model is true

The key problem with these methods is that the model must be
assumed to be true, and the model assumptions are often difficult or
impossible to verify

But such methods are useful when experiments cannot be done or to
suggest plausible causal hypothese
Estimating Treatment Effects
Consider treatment assignment (dummy variable) Z and outcome Y

Regress Y on Z

Yi = β0 + β1 Zi + εi

The estimate of β1 is just the difference between the mean Y for Z = 1
(the treatment group) and the mean Y for Z = 0 (the control group)
Y1  β0  β1  ε1
Y0  β0  ε0
Thus the OLS estimate is

Y1  Y0 = β 1 + 1   0 
Estimating Treatment Effects
(With Random Assignment)
If the treatment is randomly assigned, then Z is uncorrelated with ε (X is
exogenous)

If X is uncorrelated with ε if and only if 1   0

But if 1   0 , then the mean difference is

Y1  Y0 = β 1 + 1   0  = β 1

This implies that standard methods (OLS) give an unbiased estimate of
β1, which is the average treatment effect

That is, the treatment-control mean difference is an unbiased estimate
of β1,
What goes wrong without randomization?
(Simple Case)
If we do not have randomization, there is no guarantee that Z is
uncorrelated with ε (Z may be endogenous)

Thus the OLS estimate is still

Y1  Y0 = β1 +   1   0 
If Z is correlated with ε, then 1   0

Hence Y1  Y0 does not estimate β1, but some other quantity that
depends on the correlation of Z and ε

If Z is correlated with ε, then standard methods give a biased estimate
of β1
Instrumental Variables
One way to see this is in terms of two regression equations
Yi = β0 + β1Zi + εi
Zi = γ0 + γ1Xi + ηi

Note that, in this model Z is endogenous (may be
correlated with ε)

The instrumental variables model requires that:
1. γ1 ≠ 0 so that X predicts Z, and
2. X uncorrelated with ε (X is exogenous) [Cov{ε, X} = 0]
Estimating Causal Effects
(IV Studies)
Angrist, Imbens, & Rubin (1996) showed that IV can estimate average
causal effects of Z on Y, if the following assumptions hold:

1.   SUTVA
2.   Random assignment of X
3.   Exclusion restriction (exogeneity of X)
4.   Nonzero causal effect of X on Z
5.   Monotonicity (no defiers)

Then the IV estimate is an estimate of the average treatment effect for
those who comply with assignment
Assignment by Covariate Value
Let X be a covariate and x be the value of X

Suppose that units with the same X value are randomly assigned with
probability π(x), where 0 < π(x) < 1

E r 0 | X , Z  0  E r 0 | X , Z  1  E r 0 | X 
Thus

E r1 | X , Z  0  E r1 | X , Z  1  E r1 | X 
Conditional independence of Z (assignment) and (r0, r1) given X
implies

E r  r 0 | X   E r1 | X   E r 0 X   E r1 | X , Z  1  E r 0 | X , Z  0

Thus the experiment estimates the conditional causal effect given X
Assignment by Covariate Value
The conditional causal effect of treatment τ(x) might be called the local
average treatment effect at X = x

The weighted average of local average treatment effects

     x   x 
x
estimates the average causal effect of treatment

Note that the overall treatment-control mean difference (even
controlling for X) does not necessarily estimate the average causal
effect of treatment, because there may be more
Regression Discontinuity Designs
Regression discontinuity designs (RDD) assign to treatment by
covariate value, but assign all units with X > c to treatment

1 if         xc
  x  
0 if         xc
but violate the principle that 0 < π(x) < 1

However, RDDs can estimate the local average causal effect of the
treatment at X = x

The reason is that the RDD is a randomized experiment at the cutpoint
X=c

More properly, the limit as x → c is a randomized experiment.
Regression Discontinuity Designs
Note that the RDD design can support estimation of causal effects,

The causal effect that can be estimated, τ(c), is

  c   lim E r1 | Z  1  lim E r 0 | Z  0
x c                     x c 

In other words, the causal effect (local average treatment effect) at the
value X = c, which is the gap or discontinuity at X = c

But not every analysis of the design estimates the causal effect

Analyses that use models assuming functional form (e.g., linear
regression) depend on that functional form assumption
Regression Discontinuity Designs
Nonparametric regression methods can, in principle,
provide model-free estimates of the causal effect of
treatment at X = c

But these methods themselves make technical

Thus estimation of treatment effects in RDD are in practice
somewhat model dependent

Designs with multiple cutpoints can provide estimates of
treatment effects at multiple points or more externally
valid average causal effects
Nonequivalent Control Group Designs
These designs compare a treatment group with a (non-
randomized) comparison group

There is a huge range of quality in these designs, ranging
from pretty good to awful

Often matching or adjustment for covariates (a form of
pseudo-matching), or both, are used

Can such designs ever provide estimates of average
causal effects?

Yes, but essentially never estimates that are model free
Nonequivalent Control Group Designs
How well they work depends on how well the analytic model captures
essential features of the data

This is not always possible to determine empirically

If we can assume conditional independence of Z (assignment) and (r0,
r1) given X or even that
E r 0 | X , Z  0  E r 0 | X 
E r1 | X , Z  1  E r1 | X 
Then the experiment can estimate the causal effect of treatment, since

E r  r 0 | X   E r1 | X   E r 0 X   E r1 | X , Z  1  E r 0 | X , Z  0
.
Nonequivalent Control Group Designs
Note that this is the equivalent of making the treatment
assignment “as if random” conditional on the covariate
(or matching variable) X

This is the basic strategy of matching for causal inference
(e.g., Rubin, Rosenbaum, Cochran)

It is also the basic strategy for inference under missing data
Find covariates so that, conditional on the observed
covariates, the missing data is “as if random”

In missing data theory, this is called “strong ignorability”
Nonequivalent Control Group Designs
This is all very abstract

Make it concrete by considering response functions—that is r0 or r1 as
a function of covariates or other effects

For example, suppose that
ri0 = α + βxi + εi0
ri1 = α + τ + βxi + εi1
and that εi0 and εi1 are independent of x

Then it easily follows that the usual estimate of the average treatment
effect is unbiased
Nonequivalent Control Group Designs
But suppose that the response functions are a little different
ri0 = α + β0xi + εi0
r i1 = α + τ + β 1 x i + ε i1
and that εi0 and εi1 are independent of x

Then it easily follows that the usual estimate of the
treatment effect is
    x1  x 0 
where  is an “average” of β0 and β1
Nonequivalent Control Group Designs
The analysis could be “fixed up” to remove the bias if we knew the
response function

But that is exactly the point

To get an unbiased estimate of the causal effect, you have to know
the right model, so analyses will be model dependent

It is not easy (maybe impossible) to know what the right model is

Moreover, I choose a very simple model (homogeneous treatment
effects with responses a linear function of the observed covariates)
Differences in Differences
The difference in differences idea can be seen as a particular kind of
nonequivalent control group design

It is frequently used to evaluate the effects of policies in education and
elsewhere

Assume that there is a series of longitudinal observations in locations
(e.g., states) where a policy has been implemented at some time in
some locations

Crudely, we estimate the effect of a policy by comparing
• the difference in outcome before and after the policy is implemented
for individuals affected by the policy, compared to
• the difference for individuals unaffected by the policy

That is why it is a difference in differences estimator
Differences in Differences
More elaborate (and convincing) analyses control for location and time or
model variation as random effects

Let Yist be the outcome for individual i in location s at time t

Let Xist be the corresponding individual level covariates

Then the model might be

Yist = αs + πt + γXist + βTst + εist
where αs and πt are location and time fixed effects, is a vector of covariate
effects, Tst is a dummy variable for treatment, and εist is a residual

There may be clustering by location, which needs to be taken into account
Differences in Differences
Obviously the difference in differences estimator has great
appeal

Given a good longitudinal data set, it is easy to use

It is simple to understand and explain to policy makers

It is a natural analysis to learn from “natural experiments”
where a policy has been tried some place and not others
or has been tried at different times in different locations
Differences in Differences
This model may seem hard to formulate in causal model terms

The treatment effect is identified by the difference between post-
policy and pre-policy outcomes, in the treatment (got policy)
group versus the control group

Let ri0 and ri0 be the possible outcomes after treatment and X be the
pretreatment variable

This estimate is estimating

E r1  X | Z  1  E r 0  X | Z  0 not E r1  E r 0 
It can estimate the average causal effect under several
circumstances
Differences in Differences
This estimate is estimating
E r1  X | Z  1  E r 0  X | Z  0
It can estimate the average causal effect under some circumstances

For example, if the response functions are
ri0 = αi + xi + εi0
ri1 = αi + τ + xi + εi1
and that εi0 and εi1 are independent of xi, then the difference in
differences estimate does estimate τ, the average causal effect of
treatment
What Can Go Wrong?
One big problem

Z can be correlated with (r0 – X , r1 – X)

• X can cause both the policy and be correlated
with outcome

• Something else can cause both X and Z

• This is the general endogeneity problem
What Can Go Wrong?
Informal checks

• Look at trends beyond the time of policy implementation

• Estimate effects of treatment where there is no policy
change as a check (you should see no effect)

These are suggestive not definitive

They can invalidate an analysis, not validate one
What Can Go Wrong?
One smaller problem

The data often exhibit large autocorrelations, and this can lead to large
underestimates of standard errors, making tests reject (far) too often

There are three reasons for this:

• Data are often based on long time series

• Data are highly positively correlated over time

• The treatment variable does not change much
What Can Go Wrong?
The standard error problem is difficult to solve

Parametric analysis (generalized least squares with
autocorrelation) can be done, but inference for
autocorrelation is poor

Randomization tests seem to perform well for problems like
these

Collapsing the data into two time periods is sometimes
useful and improves performance of tests
Conclusion

Without randomization, causal inference is
much harder and more model dependent
References
Estimators, Working Paper, Kennedy School of Government,
Harvard University.

Bertrand, M., Duflo, E., & Mullainathan, S. (2001). How much should
we trust difference in differences estimators? MIT Department of
Economics Working Paper Series 01-34.

Meyer, B. (1995). Natural and Quasi-Natural Experiments in
Economics, Journal of Business and Economic Statistics, 13, 151-
162.

Moulton, B. R. (1990). An Illustration of a Pitfall in Estimating the
Effects of Aggregate Variables in Micro Units, Review of Economics
and Statistics, 72 , 334-338.
References (cont.)
Newey, W. & West, K. D. (1987). A Simple, Positive Semi-definite,
Heteroskedasticity and Autocorrelation Consistent-Covariance Matrix,”
Econometrica, 55, 703-708.

Nickell, S. (1981). Biases in Dynamic Models with Fixed Effects, Econometrica,
49,1417-1426.

Rosenbaum, P. (1993). Hodges-Lehmann Point Estimates of Treatment Effect
in Observational Studies, Journal of the American Statistical Association,
88, 1250-1253.

Rosenbaum, P. (1996). Observational Studies and Nonrandomized
Experiments, In S. Ghosh and C.R.Rao, (Eds), Handbook of Statistics, 13.

Solon, G. (1984). Estimating Auto-correlations in Fixed-Effects Models, NBER
Technical Working Paper No. 32, 1984.

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 0 posted: 8/7/2012 language: pages: 46
How are you planning on using Docstoc?