Review of Probability and Statistics in Simulation by tWoP2J

VIEWS: 9 PAGES: 72

									      Lecture 8
Instrumental Variables




                         1
The IV Problem
• We start with our CLM:
           y = X + .           (DGP)
 - Let's pre-multiplying the DGP by X'
            X' y = X' X + X' .
  - We can interpret b as the solution obtained by first approximating
X' by zero, and then solving the k equations in k unknowns
            X' y = X'X b                (normal equations).

Note: What makes b consistent when X' /T p 0 is that
                                              
approximating (X'/T ) by 0 is reasonably accurate in large samples.

• Now, we challenge the assumption that {xi,εi} is a sequence of
independent observations. That is,
         plim (X’/T) ≠ 0.
The IV Problem
• Now, we assume that     plim (X’/T) ≠ 0.

• This problem is not rare, especially in corporate finance. Suppose we
want to study the relation between a firm’s CEO’s compensation (y)
and a firm’s board (x). Usually, a linear regression model is used,
relating y and x, with additional “control variables” (W) controlling for
other features that make one CEO’s compensation different from
another. The term  represents the effects of individual variation that
have not been controlled for with W or x. The model is:
                    y = x + Wγ + 

If the firm’s board is selected by the CEO, we have a problem: y and x
are both endogeneous –i.e., influenced by the unobserved CEO’s
skills. Then,
                   Cov(x,)≠0           (=> by LLN, plim (X’/T) ≠ 0)
The IV Problem

• Q: When might an explanatory variable (a regressor) be correlated
with the error term?
 - Correlated shocks across linked equations
 - Simultaneous equations
 - Errors in variables
 - Model has a lagged dependent variable and a serially correlated
error term
The IV Problem
• We start with our linear model
          y = X + .

• Now, assume plim(X’/T) ≠ 0.

• Then,
          plim b = plim  + plim (X X/T)-1 plim (X/T)
                 =  + Q-1 plim (X/T) ≠ 

Under the new assumption, b is not a consistent estimator of .

Note: For finite samples, we could have challenged assumption (A2)
E[|X] = 0. Then, Cov(X,)≠ 0 => b ≠ β.
Instrumental Variables
• New Framework:
(A1) DGP: y = X  + .
(A2’) plim (X’ /T) ≠ 0
(A3) Var[|X] = σ2 IT
(A4) X has full column rank – rank(X)=k-, where T ≥ k.
       => b is not a consistent estimator of .

• We want to construct a consistent estimator of .

• We assume that there exists a set of l variables, Z such that
          (1) plim(Z’X/T)  0            (relevant condition)
          (2) plim(Z’/T) = 0            (valid condition –or exogeneity)

• The variables in Z are called instrumental variables (IV).
Instrumental Variables
• We can also write the new framework, emphasizing endogeneity, as:
(A1) DGP: y = Y  + U γ + .
(A2’) plim (Y’ /T) ≠ 0        (Y: “problem,” endogeneous, variables)
(A2’) plim (U’ /T) ≠ 0        (U: clean variables)
(A3) Var[|Y,U] = σ2 IT
(A4) Y and U have full column rank. Say kx and ku.

• We assume we have Z, a matrix of l “excluded” instruments –the
IV. We relate Y to Z (and U) linearly by:
       Y =ZП+UΦ+V                      - V ~ D(0, σV2 IT)

Note: When the number kx of “problem” variables is greater than
one, there will be a system of multiple equations. We will call the
estimation of this equation “first stage.”
Instrumental Variables
• Concentrating on the two equations:
(A1) y = Y  + U γ + 
      Y =ZП+UΦ+V

Replacing the second equation in (A1):
       y = (Z П + U Φ + V)  + U γ +  = Z П  + U φ + ξ
where
       φ =Φ+γ
       ξ=V+

This equation is called reduced form.

• In empirical applications, interest often is focused on β on the rhs
 endogenous variable Y.
Instrumental Variables
• New assumption: we have l instrumental variables, Z such that
      plim(Z’X/T)  0 but plim(Z’/T) = 0

• Then, we state assumptions to construct an alternative (to OLS)
consistent estimator of .

Assumptions:
{xi, zi, εi} is a sequence of RVs.
E[X’X] = Qxx (pd and finite)          (LLN => plim(X’X/T) =Qxx )
E[Z’Z] = Qzz (finite)                 (LLN => plim(Z’Z/T) =Qzz )
E[Z’X] = Qzx (pd and finite)          (LLN => plim(Z’X/T) =Qzx )
E[Z’] = 0                            (LLN => plim(Z’/T) = 0 )
Instrumental Variables
• To construct a new estimator, we start by pre-multiplying the DGP
  by W'Z’, where W l×k weighting matrix that we choose:
       W'Z’y = W'Z’(X+) = W'Z’X+ W'Z’

• Following the same idea as in OLS, we get a system of equations:
        W'Z’X bIV = W'Z’y

• We have two cases:
• Case 1: l = k -i.e., number of instruments = number of regressors.
   (This case is called just identified.)
  - In this case, W is irrelevant, say, W=I.
  - Then,
                  bIV = (Z’X)-1Z’y
IV Estimators
• Properties of bIV
(1) Consistent
    bIV = (Z’X)-1Z’y = (Z’X)-1Z’(X+)
         = (Z’X/T)-1 (Z’X/T)  + (Z’X/T)-1Z’ε/T
         =  + (Z’X/T)-1 Z’ε/T   p
                                           (under assumptions)

(2) Asymptotic normality
    T (bIV - ) =  T (Z’X)-1Z’ε
                  =  T (Z’X/T)-1(Z’ε/T)
   Using the Lindberg-Feller CLT         T (Z’ε/T)  N(0, σ2Qzz)
                                                     
                                                    d




   Then,     T (bIV - )   d
                                N(0,(σ2/T) Qzx-1QzzQxz-1)
IV Estimators
• Properties of  2, under IV estimation
                 ˆ
- We define  2:
             ˆ
                 T                   T

                                    
             1                   1
      2
      ˆ                  2
                        e IV               ( yi  x ' bIV ) 2
             T   i 1
                                 T   i 1

where eIV = y - X bIV = y - X(Z’X)-1Z’y = [I - X(Z’X)-1Z’]y = Mzx y
- Then,
  2= eIV'eIV /T = 'Mzx'Mzx/T
  ˆ
    = '/T – 2 'X (Z’X)-1Z’/T + 'Z (Z'X)-1X’X(Z’X)-1Z’/T

=> plim  2= plim('/T) - 2 plim( 'X (Z’X)-1Z'/T) +
         ˆ
                                                              d

                 + plim('Z (Z’X)   -1X’X(Z’X)-1Z’/T) = σ2



 Est Asy. Var[bIV] = E[(Z'X)-1 Z’'Z (Z’X)-1]=  2(Z’X)-1 Z'Z(Z’X)-1
                                                ˆ
IV Estimators: 2SLS
• Case 2: l > k -i.e., number of instruments > number of regressors.
  - This is the usual case. We can throw l-k instruments, but throwing
away information is never optimal.
  - The IV normal equations are an l x k system of equations:
        Z’y = Z’X+ Z’
  Note: We cannot approximate all the Z’ by 0 simultenously. There
will be at least l-k non-zero residuals. (Similar setup to a regression!)

 - From the IV normal equations       => W'Z’X bIV = W'Z’y
 - We define a different IV estimator
      - Let ZW = Z(Z’Z)-1Z’X = PZX = X   ˆ
      - Then,         X'PZX bIV = X'PZy
           ˆ ˆ        ˆ                                       ˆ ˆ        ˆ ˆ
   bIV  ( X ' X ) 1 X ' y  ( X ' PZ X ) 1 X ' PZ PZ y  ( X ' X ) 1 X ' y
IV Estimators: 2SLS (2-Stage Least Squares)
• We can easily derive properties for bIV:
          ˆ ˆ        ˆ                                       ˆ ˆ        ˆ ˆ
  bIV  ( X ' X ) 1 X ' y  ( X ' PZ X ) 1 X ' PZ PZ y  ( X ' X ) 1 X ' y

 (1) bIV is consistent
 (2) bIV is asymptotically normal.
- This is estimator is also called GIVE (Generalized IV estimator)

• Interpretations of bIV

                     ˆ ˆ        ˆ
    bIV  b2 SLS  ( X ' X ) 1 X ' y         This is the 2SLS interpretation
      IV
            ˆ        ˆ
    b  ( X ' X ) 1 X ' y                    This is the usual IV Z  X ˆ
IV Estimators: 2SLS
• Interpretation of bIV as a 2SLS regression -Theil (1953).
                                  ˆ ˆ        ˆ
                       b2 SLS  ( X ' X ) 1 X ' y

                                                              ˆ
- First stage, an OLS regression of X on Z. Get fitted values X .
- Second stage, another OLS regression of y on X. Get bIV= b2SLS.
                                                  ˆ

Note that in the first stage, any variable in X that is also in Z will
achieve a perfect fit (these X are clean), so that this variable is carried
over without modification to the second stage.

• The 2SLS estimator can be interpreted as a member
of the family of GMM estimators.
                         Henri Theil (1924-2000, Netherlands)
IV Estimators: 2SLS
• To check the factors that affect the behavior of IV, let's go back to a
two equation setting in the endogeneous system:
         y1 = Y  +             --  ~ N(0, σεε)
         Y =ZП+V                 -- V ~ N(0, σVV)
Then,
         b2SLS = [Y'Pz Y]-1 Y'Pz y
               = [(П'Z'+V') Pz (ZП+V)]-1 [(П'Z'+V') Pz (Y  + )
  b2SLS -  = [П'Z'ZП+ V'PzV+ П'Z'V+ V'ZП]-1 (П'Z' + V'Pz)

The parameter λ= П'Z'ZП /σVV is called the concentration parameter.

• The bias depends on the behavior of Z' -correlation between Z'-,
V'Z –exogeneity of Z-, and ZП –corrrelation between Z'X.
IV Estimators: 2SLS
• Example: Two endogenous variable, one IV.
      y1 = y2  +             --  ~ N(0, σεε)
       y2 = z π + v            -- v ~ N(0, σVV)
Then,
      b2SLS = (z' y2)-1 z' y1 =  + (z' y2)-1 z' 
      plim(b2SLS) -  = Cov(z,)/ Cov(z y2)

Now, let’s look at the bias term:
 b2SLS -  = [π2 z‘z+ v'v +2 π z'v]-1 (π z' + v')
Let λ = π2 z‘z /σVV be the concentration parameter.

• When Cov(Z'), 2SLS is inconsistent. If, in addition, Corr(z,y2) is
not high enough –i.e., λ is small- the bias term will get larger.
IV Estimators: 2SLS
• Case 3: l < k -i.e., number of instruments < number of regressors.
  - We cannot estimate . This is the case. We can throw l-k
instruments, but throwing away information is never optimal.
  - This is the identification problem. We do not have enough information
in Z to estimate .
 - When we can estimate , we say the model is identified. This happens
when l ≥ k.

Note : When l ≥ k, we have two cases:
       -When l = k , we say the model is just identified.
       -When l > k , we say the model is over-identified.
OLS as an IV Estimator
• Recall the simple IV estimator
        bIV = (Z’X)-1Z’y
Now, let Z = X. Then, the least squares estimator b is
         bIV = b = (X X)-1Xy

That is, under the usual assumptions b is an IV estimator with X as its
own instrument.

Note: If plim(X’X/T)=Qxx (pd and finite) and plim(X’ε/T)=0, => b
is consistent. But, bIV is also consistent!

Remark: When plim(X’ε/T)0, only the IV estimator is consistent.
Thus, we have an estimator that is consistent when b is not.
Asymptotic Covariance Matrix for 2SLS

General Result for Instrumental Variable Estimation
E[(bIV  )(bIV  ) ' | X, Z]  2 (Z'X ) 1 Z ' Z(X'Z)-1
                               ˆ
Specialize for 2SLS, using Z = X = (I - MZ ) X
                                          ˆ        ˆ ˆ      ˆ
E[(b2SLS  )(b2SLS  ) ' | X, Z]  2 ( X'X ) 1 X ' X (X'X )-1
                                            ˆ ˆ     ˆ ˆ ˆ ˆ
                                      2 ( X'X) 1 X ' X(X'X) -1
                                            ˆ ˆ
                                      2 ( X'X ) 1
2SLS Has Larger Variance than LS
A comparison to OLS
                   ˆ ˆ
Asy.Var[2SLS]=2 ( X ' X)-1
Neglecting the inconsistency,
Asy.Var[LS] =2 ( X ' X )-1
(This is the variance of LS around its mean, not β)
Asy.Var[2SLS]  Asy.Var[LS] in the matrix sense.
Compare inverses:
                                                           ˆ ˆ
{Asy.Var[LS]} -1 - {Asy.Var[2SLS]} -1  (1 / 2 )[ X ' X - X ' X ]
 (1 / 2 )[X ' X - X '(I  MZ ) X ]=(1 / 2 )[X ' MZ X ]
This matrix is nonnegative definite. (Not positive definite
as it might have some rows and columns which are zero.)
Implication for "precision" of 2SLS.
The problem of "Weak Instruments"
Estimating σ2
Estimating the asymptotic covariance matrix -
a caution about estimating 2 .
                                                    ˆ
Since the regression is computed by regressing y on x ,
one might use
                                   ˆ
                2  1 n1 (y i  x'b2sls )
                ˆ    n i

This is inconsistent. Use
                2  1 n1 (y i  x'b2sls )
                ˆ    n i

(Degrees of freedom correction is optional. Conventional,
but not necessary.)
Asymptotic Efficiency
• The variance is larger than that of 0LS. (A large sample type
of Gauss-Markov result is at work.)
(1) OLS is inconsistent.
(2) Mean squared error is uncertain:

MSE[estimator|β] = Variance + square of bias.

IV may be better or worse. Depends on the data: X and ε.
 A Popular Misconception
• A popular misconception. If only one variable in X is correlated with
, the other coefficients are consistently estimated. False.
  Suppose only the first variable is correlated with ε
                                               1 
                                                   
                                                 0 
  Under the assumptions, plim(X'ε/n) =               . Then
                                               ... 
                                                   
                                                 . 
                              1         q 11 
                                          21 
                                0           q 
  plim b - β = plim(X'X/n)-1         1 
                              ...         ... 
                                          K1 
                                           q 
                              .                
   1 times the first column of Q-1

The problem is “smeared” over the other coefficients.
Two Problems with 2SLS

•   Z’X/T may not be sufficiently large. The covariance matrix for
    the IV estimator is Asy. Cov(b) = σ2[(Z’X)(Z’Z)-1(X’Z)]-1
    – If Z’X/T goes to 0 (weak instruments), the variance explodes.
    – Additional problems:
        • 2SLS biased toward plim OLS
        • Asymptotic results for inference fall apart.

•                                    ˆ
    When there are many instruments, X is too close to X; 2SLS
    becomes OLS.
Small sample properties of IV
• What are the finite sample properties of IV estimators? Now, we
  do not have the condition E(ε|X) = 0, we cannot get simple
  expressions for the moments of bIV:
        b2SLS = [W'Z'X]-1 W'Z'y = β + [W'Z'X]-1 W'Z'ε
  by first taking expectations of conditioned on X and Z.

  We can write the bias as:
       b2SLS - β = [W'Z'X]-1 W'Z'ε

• In particular, we cannot conclude that bIV is unbiased, or that it has
  a Var[b2SLS] equal to its asymptotic covariance matrix.

• In fact, b2SLS can have very bad small-sample properties.
Small sample properties of IV
• In fact, b2SLS can have very bad small-sample properties.

• Example: Let T=l. In this case, Z is a square matrix:,
  b2SLS = [W'Z'X]-1 W'Z'y = [X'Z(Z’Z)-1Z’X ]-1 X'Z(Z’Z)-1Z’y
        = [X'Z Z-1 Z'-1 Z’X ]-1 X'Z Z-1 Z'-1 Z’y = [X'X ]-1 X'y = b
  => b is inconsistent when E(ε|X)≠0, then b2SLS is also biased if
  we let the number of instruments grow linearly with T.

• For the IV asymptotic theory to be a good approximation, T must
  be much larger than l.

• Rule-of-thumb for IV: T - l > 40, and should grow linearly with T.
Small sample properties of IV
• In fact, b2SLS can have very bad small-sample properties.

• Example: Let T=l. In this case, Z is a square matrix:,
  b2SLS = [W'Z'X]-1 W'Z'y = [X'Z(Z’Z)-1Z’X ]-1 X'Z(Z’Z)-1Z’y
        = [X'Z Z-1 Z'-1 Z’X ]-1 X'Z Z-1 Z'-1 Z’y = [X'X ]-1 X'y = b
  => b is inconsistent when E(ε|X)≠0, then b2SLS is also biased if
  we let the number of instruments grow linearly with T.

• For the IV asymptotic theory to be a good approximation, T must
  be much larger than l.

• Rule-of-thumb for IV: T - l > 40, and should grow linearly with T.
Small sample properties of IV
• To study the behavior of bIV, for small T, we set up a simple two
variable Monte Carlo experiment using a model appropriate to the
context.


• Recall the asymptotic distribution of bIV
                               2 1 
            T bIV     N  0, 2  2 
                        d
                                     
                               X rXZ 

• We will see that the small sample behavior of bIV will depend on the
nature of the model, the correlation between X and ε, and the
correlation between X and Z.

                                                                      16
 Small sample properties of IV
• We start with a simple linear model:
      Y  1   2 X  
      X  l1Z  l 2U  
with observations on Z, U, and ε are drawn independently from a
N(0,1). We think of Z and U as variables and of ε as the error term
in the model. l1 and l2 are constants.
• By construction, X is not independent of ε. OLS will yield
inconsistent estimates and the standard errors and other diagnostics
will be invalid.
• Z is correlated with X, but independent of ε. It an serve as an
instrument. (U is included to provide some variation in X not
connected with either Z or ε.)                                         17
 Small sample properties of IV
• To start the simulation, we set:
         1  10, 2  5, l1 = 0.5, and l2 = 2.0.
• That is,
             Y  10  5 X         ~ iid N (0,1)
             X  0.5Z  2.0U         Z ~ iid N (0,1); U ~ iid N (0,1)

• We draw n=25, n=100 & n=3,200. We do 1 million simulations.
• Given the information above, it is easy to verify that plim b2,OLS =
5.19. Of course, plim b2,IV = 5.00



                                                                          20
  Small sample properties of IV

    10
                                              OLS, n = 100




     5



                           IV, n = 100           OLS, n = 25

                         IV, n = 25


     0
         4                               5                        6



• b2,IV has a greater variance than b2,OLS. For n = 25 one might prefer
the latter. It is biased, but the MSE can be lower. For n = 100, b2,IV
estimator looks better. As n grows, b2,IV and b2,OLS tend to their plims
(b2,IV more slowly than b2,OL, because it has a larger variance).          21
 Small sample properties of IV

                                                 n = 25
                                                 n = 100
        limiting normal distribution   0.2
                                                  n = 3,200




                                       0.1




                                        0
  -6          -4              -2             0      2         4   6


• We have the distribution of √n (b2,IV – b2) for n = 25, 100, and 3,200.
It also shows, as the dashed red line, the limiting normal distribution
predicted by the CLT. For n = 3,200 is very close to the limiting
distribution. Inference would be OK with samples of this magnitude.24
 Small sample properties of IV

                                                 n = 25
                                                 n = 100
        limiting normal distribution   0.2
                                                  n = 3,200




                                       0.1




                                        0
  -6          -4              -2             0      2         4   6



• For n=25 and n=100, the tail are too fat. Inference would give rise to
excess instances of Type I error (under rejection). The distortion for
small sample sizes is partly attributable to the low correlation (weak
instruments) between X and Z =0.22. This is common in IV estimation.   24
Cornwell and Rupert Data
Cornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 Years
Variables in the file are
EXP       = work experience
WKS       = weeks worked
OCC       = occupation, 1 if blue collar,
IND       = 1 if manufacturing industry
SOUTH     = 1 if resides in south
SMSA      = 1 if resides in a city (SMSA)
MS        = 1 if married
FEM       = 1 if female
UNION     = 1 if wage set by union contract
ED        = years of education
BLK       = 1 if individual is black
LWAGE     = log of wage = dependent variable in regressions


These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel
Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied
Econometrics, 3, 1988, pp. 149-155. See Baltagi, page 122 for further analysis. The data
were downloaded from the website for Baltagi's text.
Application: Wage Equation
• Are earnings affected by education? In a linear regression, we
  expect the education coefficient to be positive (and signficance, if
  human capital theory is correct).

• Linear regression model:
        logWage = y = Xβ + ε
        X = one, exp, occ, ed, wks
  - We expect Wks -weeks worked- to be endogenous
  - Instruments: Z = one,exp,occ,ed,ind,south,smsa,ms,fem

• Q: How do we know when a variable is exogenous?
            Estimated Wage Equation
+----------------------------------------------------+
| Ordinary      least squares regression              |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
|Constant|      5.30277***       .07406      71.605   .0000           |
|EXP     |       .01294***       .00058      22.393   .0000    19.8538|
|OCC     |      -.08511***       .01575      -5.403   .0000     .51116|
|ED      |       .06694***       .00288      23.204   .0000    12.8454|
|WKS     |       .00641***       .00120       5.330   .0000    46.8115|
+--------+------------------------------------------------------------+
+----------------------------------------------------+
| Two stage     least squares regression              |
+----------------------------------------------------+
+---------------------------------------------------------------------+
|Instrumental Variables:                                              |
|ONE        EXP        OCC       ED      IND        SOUTH    SMSA     |
|MS         FEM                                                       |
+---------------------------------------------------------------------+
|Constant|    -6.60400***       1.81742      -3.634   .0003           |
|EXP     |       .01735***       .00205       8.457   .0000    19.8538|
|OCC     |      -.04375          .05325       -.822   .4113     .51116|
|ED      |       .07840***       .00984       7.968   .0000    12.8454|
|WKS     |       .25530***       .03785       6.745   .0000    46.8115|
+--------+------------------------------------------------------------+
Endogeneity Test (Hausman)
              Exogenous                 Endogenous
  OLS      Consistent, Efficient       Inconsistent

  2SLS     Consistent, Inefficient     Consistent

• Base a test on d = b2SLS - bOLS
   - We can use a Wald statistic: d’[Var(d)]-1d
Note: Under H0 (plim (X’/T) = 0) bOLS = b2SLS = b
   - Also, under H0: Var[b2SLS ]= V2SLS > Var[bOLS ]= VOLS
        => Under H0, one estimator is efficient, the other one is not.

• Q: What to use for Var(d)?
   - Hausman (1978): V = Var(d) = V2SLS - VOLS
       H = (b2SLS - bOLS)’[V2SLS - VOLS ]-1(b2SLS - bOLS)   
                                                            
                                                            d    χ2rank(V)
Endogeneity Test (Hausman)
Q: What to use for Var(d)?
  - Hausman (1978): V = Var(d) = V2SLS - VOLS
      H = (b2SLS - bOLS)’[V2SLS - VOLS ]-1(b2SLS - bOLS)

• Hausman gets Var(d) by using the following result:
   "The covariance between an efficient estimator (bE) and its
   difference from an inefficient estimator (bE - bI) is zero." That is,
        Cov (bE, bE - bI) = Cov(bE, bE) - Cov (bE,bI)
                          = Var(bE) - Cov (bE,bI) = 0
        => Var(bE) = Cov (bE,bI)

• Hausman's case: aVar(b) = aCov (b,b2SLS)
  Then, aVar(d) = aVar(b) + aVar(b2SLS) - 2 aCov (b,b2SLS)
                = aVar(b2SLS) - aVar(b)
Endogeneity Test: The Wu Test
• The Hausman test is complicated to calculate
• Simplification: The Wu test.
• Consider a regression y = Xβ + ε, an array of proper instruments Z,
and an array of instruments W that includes Z plus other variables
that may be either clean or contaminated.
• Wu test: Setup
(1) Regress X on Z. Keep fitted values X = Z(Z’Z)-1Z’X
                                        ˆ
(2) Using W as instruments, do a 2SLS regression of y on X, keep
RSS1.
(3) Do a 2SLS regression of y on X and a subset of m columns of X  ˆ
that are linearly independent of X. Keep RSS2.
(4) Do an F-test:       F = [(RSS1 - RSS2)/m]/[RSS2/(T-k)].
Endogeneity Test: The Wu Test
• Under H0: X is clean, the F statistic has an approximate Fm,T-k
distribution.

• The test can be interpreted as a test for whether the m auxiliary
variables from X should be omitted from the regression.
                ˆ

                       ˆ
• When a subset of X of maximum possible rank is chosen, this
statistic turns out to be asymptotically equivalent to the Hausman test
statistic.

• This type of exogeneity tests are usually known as DHW (Durbin,
Hausman, Wu) tests.
Endogeneity Test: Augmented DWH Test
• Davidson and MacKinnon (1993) suggest an augmented regression
test (DWH test), by including the residuals of each endogenous right-
hand side variable.

• Model:         y = X β + Uγ + ,         we suspect X is endogeneous).

• Steps for augmented regression DWH test:
1. Regress x on IV (Z) and U:
        x = Z П + U φ + υ => save residuals vx
2. Do an augmented regression:             y = Xβ + + Uγ + vx δ + ε
3. Do a t-test of δ. If the estimate of δ, say d, is significantly different
from zero, then OLS is not consistent.
  Wu Test
+----------------------------------------------------+
| Ordinary    least squares regression               |
| LHS=LWAGE    Mean                 =   6.676346     |
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
|Constant|   -6.60400***       .50833      -12.992   .0000            |
|EXP     |     .01735***       .00057       30.235   .0000     19.8538|
|OCC     |    -.04375***       .01489       -2.937   .0033      .51116|
|ED      |     .07840***       .00275       28.489   .0000     12.8454|
|WKS     |     .00355***       .00114        3.120   .0018     46.8115|
|WKSHAT |      .25176***       .01065       23.646   .0000     46.8115|
+--------+------------------------------------------------------------+
| Note: ***, **, * = Significance at 1%, 5%, 10% level.               |
+---------------------------------------------------------------------+

--> Calc     ; list ; Wutest = b(kreg)^2 / Varb(kreg,kreg) $
+------------------------------------+
| Listed Calculator Results          |
+------------------------------------+
 WUTEST =     559.119128
Measurement Error
• DGP: y* = x* +            -  ~ iid D(0, σε2)
                              - all of the CLM assumptions apply.
• But, we do not observe or measure correctly x*. We observe x, y:
    x = x* + u                u ~ iid D(0, σu2) -no correlation to ,v
    y = y* + v                v ~ iid D(0, σv2) -no correlation to ,u

• Let’s consider two cases:

CASE 1 - Only x* is measured with error (y=y*):
  y = (x- u) +  = x +  - u = x + w
  E[x’w] = E[(x* + u)’( - u)] = -σu2 ≠ 0
             => CLM assumptions violated!
Measurement Error
• Q: What happens when y is regressed on x?
A: Least squares attenuation:

                cov(x,y)   cov(x * u, x * )
   plim b =              
                 var(x)        var(x * u)
                                var(x*)
                         =                    < 
                            var(x*)  var(u)

CASE 2 - Only y* is measured with error.
       y* = y  - v = x* + 
       =>   y - v = x* +  + v = x* + ( + v)
• Q: What happens when y is regressed on x?
A: Nothing! We have our usual OLS problem since  and v are
independent of each other and x*. CLM assumptions are not violated!
Measurement Error
• Q: Why is OLS attenuated?
    y = x* + 
    x = x* + u
    y = x + ( - u) = x + v,                cov(x,v) = -  var(σu2)

Some of the variation in x is not associated with variation in y. The
effect of variation in x on y is dampened by the measurement error.

• Q: Is measurement error in finance/economics a problem?
  A: Yes! In surveys and forms, mistakes are common. Most relevant
problem: often, economic theories deal with unobservables (x*).
Famous unobservables: Market portfolio, innovation, growth
opportunities, potential output, target debt-equity ratio.
Measurement Error: Proxy Variables
• Often, economic theories deal with unobservables (x*). To test these
theories, practitioners use a proxy (x), instead of x*.

A proxy is a variable that has a “close” relation (usually, linear) with
the unobservable:
       x = δ x* + u              (typical measurement error problem!)

Example: The CAPM:              Ri - Rf = i (RMP - Rf )
The market portfolio (MP) is unobservable. According to Roll's
(1977) critique, this makes the CAPM untestable!

In practice, we proxy it by a representative stock market index:
        RIndex = δ RMP + u
Measurement Error: Proxy Variables

• Example: Testing the CAPM I.
 (1) CAPM regression:
           Ri - Rf = αi + i (RMP - Rf ) + 
           H0: αi=0 (αi is the pricing error. Jensen’s alpha.)
 (2) MP unobservable. Proxy: S&P 500 stock market index
           RSP500 = η RMP + u           => RMP = θ RSP500 + u’
 (3) Working CAPM regression
           Ri - Rf = αi + i (θ RSP500 + u’) - Rf) + 
                   = αi + iθ RSP500 - i Rf + ξ        (ξ=i u’ + )
    Or,
           Ri = αi + δi Rf + γi RSP500 + ξ
 where γi = iθ and δi = 1-i => i cannot be estimated directly!
Measurement Error: Proxy Variables

•     Ri = αi + δi Rf + γi RSP500 + ξ   (ξ=i u’ + )
    (4) Usually, Rf is assumed constant
       Ri = αi‘ + γi RSP500 + ξ
    where αi‘ = αi + δi Rf
                Ri = αi‘ + γi RSP500 + ξ
    We can do an OLS regression to estimate αi‘ and γi.
    But since γi = iθ      => i cannot be estimated!

Note: It is common to just work with “excess returns” directly. In this
case, the proxy would be:
    RSP500 - Rf = η (RMP - Rf)+ u
Measurement Error: Proxy Variables

• Example: Testing the CAPM II. We extend the CAPM (APT style):
  (1) CAPM regression with more explanatory variables (W):
             Ri - Rf = αi + i (RMP - Rf ) + ψi W + 
             H0: ψi=0
  (2) MP unobservable. Proxy: S&P 500 stock market index
             RSP500 = η RMP + u           => RMP = θ RSP500 + u’
  (3) Working CAPM extended regression
     Ri = αi + (1- i) Rf + γi RSP500 + ψi W + ξ       (ξ=i u’ + )

   Under the assumption of a constant Rf, we are back to the
previous case: OLS estimates αi‘, γi, ψi (but, i cannot be estimated).
However, we do estimate ψi and, thus, can test the extended CAPM!
Measurement Error in Multiple Regression
 Multiple regression: y = 1 x 1 * 2 x 2 *  
 x1 * is measured with error; x 1  x 1 * u
 x 2 is measured with out error.
 The regression is estimated by least squares
 Popular myth #1. b1 is biased downward, b 2 consistent.
 Popular myth #2. All coefficients are biased toward zero.
 Result for the simplest case. Let
 ij  cov(x i *, x j *), i, j  1, 2 (2x2 covariance matrix)
 ij  ijth element of the inverse of the covariance matrix
 2  var(u)
 For the least squares estimators:
                  1                           2 12 
 plim b1  1      2 11 
                           , plim b2  2  1             
               1                           1  2 11 
 The effect is called "smearing."
Measurement error and IV: Twinsville

• Q: Does education affects earnings?
A: We expect two people with similar natural abilities but different
levels of education to be better paid. To estimate returns-to-
schooling, economists often use a linear regression model relating
log earnings (y) to years of education (x*) , with additional control
variables (U). The error term represents the effects of person-to-
person variation that have not been controlled for.
        y = x* + Uγ + 
 • We expect two people with similar natural abilities, but different
levels of education to be better paid. We expect >0.
• Problem: x* is self-reported, and often reported with error.
Measurement error and IV: Twinsville
• Linear model:       y = x + Uγ + 
• H0: =0.
• We do not observe x*, we observe self reported x.

• Famous application from the econ literature: Ashenfelter/Kreuger
(AER,1994) : A wage equation for twins that includes two measures
of x: each twin reports their own and their twin’s schooling.

• The data suggests that between 8% and 12% of the measured
variance in schooling levels is error.

• Instrument: Reported schooling by the twin.
Measurement error and IV: Twinsville
Measurement error and IV: Twinsville
Finding an Instrument: Not Easy
• Q: Does education affects earnings?
A: Same setup as before. We use a linear regression model relating
log earnings (y) to years of education (x) , with control variables (U):
                y = x + Uγ + 
• We expect >0.
• In practice, U does not capture much of the variation of earnings.
• Problem: If some of the factors (unobserved skills) that influence
x are also factors that are in , => Cov(x,) ≠ 0. (OLS not good!)
We can think of this problem as an “omitted variables problem.”

• Solution: We need data on variables (Z) such that
        (1) Cov(x,Z) ≠ 0      -relevance condition
        (2) Cov(Z,) ≠ 0      -valid (exogeneity) condition
Finding an Instrument: Not Easy
• In the education/earnings problem, we need variables (Z) that
    (1) Explain the variation in years of schooling -i.e., Cov(x,Z)≠0
    (2) Do not directly affect earnings potential –i.e., Cov(Z,)≠0.

Then, we do a first-stage regression to obtain fitted values of X:
       x = ZП + Uδ + V                  -V ~N(0, σV2I)
Then, using the fitted values we estimate and do tests on .

• Finding a Z that meets both requirements is not easy. Historically,
the emphasis has been on the valid (exogeneity) condition. But, in the
past 20 years, there has been an additional source of concern:
     The correlation of X and Z may not be high enough.
Finding an Instrument: Not Easy

• The explanatory power of Z may not be enough to allow
inference on . In this case, we say Z is a weak instrument.

• IVs are weak if the mean component of X that depends on the IVs
--ZП– is small relative to the variability of X, or equivalently, to the
variability of the error V.

• There is a theoretical problem when under a null hypothesis, we
have unidentifed parameters. Under H0: П = 0,  is not identified.

• Results from Gleser and Hwang (1987) and Dufour (1997) show
that CIs and tests based on t-tests and F (Wald) tests are not robust
to weak IVs.
Finding an Instrument: Not Easy
• The concern is not just theoretical: numerical studies show that
coverage rates of conventional TSLS CIs can be very poor when
instruments are weak, even if the sample size is large.

• Usual tests for H0: П = 0: Standard F-test on Z in the 1st stage
regression and partial-R2 (the exogenous variable U is partialed out).
Finding an Instrument
• Linear model:       y = x + Uγ + 
• H0: =0.
• U does not capture much of the variation of earnings.
• Cov(x,)≠0. (OLS biased and inconsistent!)

• Angrist and Krueger (1991, QJE) Idea: school boards have age at
entry requirements. States have compulsory schooling laws according
to age. So a one-day difference in birth date can create a one year
difference in lifetime schooling.

• Then, z is a valid instrument if Cov(z, ) =0 -i.e., quarter of birth
(QOB)- affects earnings only through its effect on schooling.
Finding an Instrument
• Years of schooling vary by quarter of birth (QOB):
    – Someone born in Q1 is a little older and will be able to drop
      out sooner than someone born in Q4.
• Q.O.B. can be treated as a source of exogeneity in schooling.




                    Source: Angrist and Krueger (1991), Figure I
Finding an Instrument
• People born in Q1 do obtain less schooling
   – But pay close attention to the scale of the y-axis
   – Mean difference between Q1 and Q4 is only 0.124, or 1.5
      months
• Thus, we need large T since R2X,Z will be very small
   – A&K had over 300,000 observations for the 1930-39 cohort
• Final 2SLS model interacted QOB with year of birth (30), state of
  birth (150)
   – OLS: b = .0628 (s.e. = .0003)
   – 2SLS: b2SLS = .0811 (s.e. = .0109)
• OLS estimate does not appear to be badly biased.
   – But...
Weak Instruments
• True story: The graduate labor class at the University of Michigan
does replication exercises. Two students, Regina Baker and David
Jaeger replicated the results in Angrist and Krueger (1991).

• Two things bother them and their professor, John Bound:
(1) The results are imprecise and unstable when the controls and
instrument sets change.
(2) The results become precise and stable only when the first stage F
tests cannot reject coefficients which are jointly zero –i.e., when
instruments are not weak.

Note: Consider the first stage: x = ZП + ξ.
Even if П=0 in the DGP, as the number of instruments increases the
R2 of the first stage regression in the sample can only increase.
Weak Instruments
Weak Instruments


                   As an illustration, BBJ
                   estimated the IV
                   coefficient with
                   a randomly assigned Z
                   so that π =0 by
                   construction.

                   They reproduced the
                   OLS estimate.
Weak Instruments
• Potential problems with QOB as an IV:
   (1) Correlation between QOB and schooling is weak
       - Small Cov(X,Z) introduces finite-sample bias, which will be
         exacerbated with the inclusion of many IV’s
   (2) QOB may not be completely exogenous
       - Even small Cov(Z,e) will cause inconsistency, and this will
         be exacerbated when Cov(X,Z) is small.

• QOB qualifies as a weak instrument that may be correlated with
  unobserved determinants of wages (e.g., family income).
Weak Instruments: Finance application
• Finance example: The consumption CAPM.
• In both linear and nonlinear versions of the model, IVs are weak, --
see Neeley, Roy, and Whiteman (2001), and Yogo (2004).

• In the linear model in Yogo (2004):
X (endogenous variable): consumption growth
Z (the IVs): twice lagged nominal interest rates, inflation,
consumption growth, and log dividend-price ratio.

• But, log consumption is close to a random walk, consumption
growth is difficult to predict. This leads to the IVs being weak.
         => Yogo (2004) finds F-statistics for H0: П = 0 in the 1st
stage regression that lie between 0.17 and 3.53 for different countries.
Weak Instruments: Summary
• Even if the instrument is “good” –i.e., it meets the relevant
  condition--, matters can be made far worse with IV as opposed to
  OLS (“the cure can be worse...”).

• Weak correlation between IV and endogenous regressor can pose
  severe finite-sample bias.

• Even small Cov(Z,e) will cause inconsistency, and this will be
  exacerbated when Cov(X,Z) is small.

• Large T will not help. A&K and Consumption CAPM tests have
  very large samples!
Weak Instruments: Detection and Remedies
•   Symptom: The relevance condition, plim(Z’X/T ) not zero, is close
    to being violated.
•   Detection of weak IV:
    – Standard F test in the 1st stage regression of xk on Z. Staiger
        and Stock (1997) suggest that F < 10 is a sign of problems.
    – Low partial-R2X,Z.
    – Large Var[bIV] as well as potentially severe finite-sample bias.

•   Remedy:
    – Not much – most of the discussion is about the condition,
       not what to do about it.
    – Use LIML? Requires a normality assumption. Probably not
       too restrictive. (Text, 375-77)
Weak Instruments: Detection and Remedies
•   Symptom: The valid condition, plim(Z’ε/T ) zero, is close to being
    violated.

•   Detection of instrument exogeneity:
    – Endogenous IV’s: Inconsistency of bIV that makes it no
       better (and probably worse) than bOLS
    – Durbin-Wu-Hausman test: Endogeneity of the problem
       regressor(s)

•   Remedy:
    – Avoid endogeneous weak instruments. (Also avoid weak IV!)
    – General problem: It is not easy to find good instruments in
       theory and in practice.
Weak Instruments: Pre-testing
• If one uses an F-test to detect weak IVs as a pre-test procedure, then
the usual pre-testing issues arise for subsequent inference --see Hall,
Rudebusch, and Wilcox (1996).
Excessive Overidentification
•   Symptom: Z has many more columns than X
    – First stage of 2SLS almost reproduces X
    – Second stage of 2SLS becomes OLS which is biased.
•   Detection: Visual – there is no test.
•   Remedy:
    – Fewer instruments? (Several methodological problems with
       this idea)
    – Jackknife estimation –see Ackerberg and Devereux, ReStat,
       (2009).

								
To top