Docstoc

Limited Dependent Variable Models

Document Sample
Limited Dependent Variable Models Powered By Docstoc
					Limited Dependent Variable Models
    Limited dependent variables typically are
   (i) qualitative dependent variables;
   (ii) dependent variables having limited support.

    Models for such data are often derived from latent
    variable models.

 1. Probabilistic Choice Models

1.1 Conditional Logit (D. McFadden, R.D. Luce)

    Suppose decision maker faces m discrete choice al-
    ternatives.

    Let Yj⋆ = indirect conditional utility of jth alterna-
    tive; latent, i.e. unobserved by econometrician.

    Econometrician observes:
          Yj   = 1 if alternative j is chosen;
               = 0 if alternative j is not chosen;
               = 1{Yj⋆ =max{Y1⋆ ,··· ,Ym }} .
                                       ⋆




    Notation: Binary variable 1{A} takes value 1 if event
    A occurs, and zero otherwise.

    Assume: no ties between alternatives.
                                                   1
Consider latent variable model:
    Yj⋆ = V (xj , θ) + ϵj , j = 1, · · · , m
     xj = attributes of alternative j
V (·, ·) = indirect utility function,
           known up to parameter vector θ
      ϵj = residual, variation in tastes, perceptions,
            unobserved by econometrician.

Assumption about {ϵj , j = 1, · · · , m}: ϵ1 , · · · , ϵm are
i.i.d. with type 1 extreme value distribution, i.e.
with CDF
             F (ϵ) = exp(− exp(−ϵ)), ϵ ∈ R,
and with pdf
               f (ϵ) = exp(−ϵ − exp(−ϵ)).

Lemma: Under above model assumptions,
                             exp(Vi)
            Pr(Yi = 1|X) = ∑m            ,
                            j=1 exp(Vj )

where X = [x1 , · · · , xm ] and Vj = V (xj , θ), j =
1, · · · , m.



                                                     2
Proof:

   Notice first,
             Yi = 1 ⇔      Yi⋆ = max{y1 , · · · , Ym }
                                         ⋆         ⋆

                    ⇔      Vi + ϵi > Vj + ϵj , ∀j ̸= i
                    ⇔      ϵj < ϵi + Vi − Vj , ∀j ̸= i
    ⇒ Pr (Yi = 1|X) =      Pr (ϵj < ϵi + Vi − Vj , ∀j ̸= i)
                           ∫ ∏
         (by i.i.d.)   =          F (ϵi + Vi − Vj )f (ϵi )dϵi.
                            R j̸=i




                                                       3
Consider the integrand:
     ∏
        F (ϵi + Vi − Vj )f (ϵi)
      j̸=i
      ∏
  =          exp(− exp(−ϵi − Vi + Vj )) exp(−ϵi − exp(−ϵi))
      j̸=i
      ∏
  =          exp(− exp(−ϵi − Vi ) exp(Vj )) exp(−ϵi − exp(−ϵi ))
      j̸=i
                                                                     ∑
  = exp(−ϵi − exp(−ϵi )) exp(− exp(−ϵi − Vi)                                exp(Vj ))
                                                                     j̸=i
                                                                               
                                                          ∑ exp(Vj )
  = exp(−ϵi − exp(−ϵi )) exp − exp(−ϵi)                                        
                                                                     exp(Vi)
                                                              j̸=i
                                                            
                                                 ∑ exp(Vj )
  = exp −ϵi − exp(−ϵi) 1 +                                  
                                 exp(Vi)
                                                 j̸=i
                                   
                          ∑ exp(Vj )
                          m
  = exp −ϵi − exp(−ϵi)              .
                         j=1
                             exp(Vi)
               [∑                 ]
                   m   exp(Vj )
Let λi = ln        j=1 exp(Vi )       , so that

                                       ∑ exp(Vj )
                                       m
                  exp(λi) =
                                       j=1
                                             exp(Vi)
                                          exp(Vi)
                 exp(−λi) =             ∑m            .
                                         j=1 exp(Vj )
                                                                     4
Then,
                   ∫
Pr(Yi = 1|X) =      exp(−ϵi − exp(−ϵi + λi))dϵi
                  R       ∫
               = exp(−λi) exp(−˜i − exp(−˜i))d˜i
                                  ϵ          ϵ  ϵ
                              R
                 where ˜i = ϵi − λi , distributed
                        ϵ
                 type 1 extreme value, shifted by λi,
               = exp(−λi)
                    exp(Vi)
               = ∑m             .
                   j=1 exp(Vj )




Typical specification: conditional indirect utility lin-
ear in attributes, so that
                    exp(x′ θ)
   Pr(Yi = 1|X) = ∑m      i
                                 , i = 1, · · · , m.
                   j=1 exp(x′ θ)
                              j




                                                5
Primary Features and Limitations of the Con-
ditional Logit Model

(i) Independence of Irrelevant Alternatives (IIA) prop-
    erty: The “odds ratio” for choice alternatives i and
    j is
          Pr(Yi = 1|X)   exp(Vi)
                       =          , i, j = 1, · · · , m;
          Pr(Yj = 1|X)   exp(Vj )
    i.e. independent of (a) alternatives other than i and
    j, and (b) the total number m of alternatives.

(ii) The IIA property is inappropriate in many applica-
     tions in which some choice alternatives are similar,
     or more closely related than others. Example: “Red
     bus, blue bus” problem (first pointed out by G. De-
     breu) - Suppose there are 3 transport options, (op-
     tion 1) red bus, (option 2) blue bus, and (option
     3) car, and travellers do not care about bus color
     and are indifferent between car and bus. Then, one
     expects
                                             1
        Pr(red bus|bus) = Pr(blue bus|bus) =
                                             2
                                             1
                         Pr(bus) = Pr(car) =
                                             2
                                             1
                Pr(red bus) = Pr(blue bus) = .
                                             4

                                                       6
Hence, the odds of red bus vs. car are 1:1 if blue
busses are not present, and 1:2 is blue busses are
present.

This is in contrast to the IIA property implied by
the conditional logit model, which applied in this
setting would imply Pr(red bus) = Pr(blue bus) =
Pr(car) = 1 .
           3

The problematic implication of the conditional logit
model here: Model implicitly assumes that all three
choice alternatives are independent, conditional on
attributes (suppressed in above notation), while red
and blue busses are perceived as similar (colors do
not matter to travelers) and therefore cannot be
considered independent.

An appropriate model might hierarchically nest the
choices: first bus vs. car, and second, conditional
on bus, red vs. blue. In choice situations as this, the
conditional logit model predicts a joint probability
for bus ( 2 ) which is higher than the true probability
          3
of choosing bus ( 1 ).
                    2




                                                7
(iii) Consider the change in demand for i in response to
      a change in an element of xj (e.g. increase in price
      xjl , the lth element of xj - the cross-price effect of
      j on i). Let Vk = x′ θ, k = 1, , · · · , m. Then,
                           k

      ∂                         exp(x′ θ) exp(x′ θ)
                                     i          j
          Pr(Yi = 1|X) = −      (∑m               ) θl
     ∂xjl                                    ′ θ) 2
                                   k=1 exp(xk
                           =    = −Pr(Yi = 1|X)Pr(Yj |X)θl ,
     where θl ≤ 0 (conditional indirect utility is non-
     increasing in price).

     Therefore, the cross-price elasticity of demand for
     alternative i with respect to the price of alternative
     j is
                      ∂
          ηij   :=        Pr(Yi = 1|X)xjl /Pr(Yi = 1|X)
                     ∂xjl
                =    −xjl θl Pr(Yj = 1|X) > 0,
     which is seen to be independent of (and hence iden-
     tical across all ) i and proportional to Pr(Yj = 1|X).

     Lost demand for alternative j is re-distributed in
     equal proportions to all other alternatives, regard-
     less of their proximity to j in the attribute or char-
     acteristics space.

     In the “red bus, blue bus” example, for instance, the
     conditional logit model would imply that a reduction
     in the frequency of blue busses leads to as many
     people switching to car as to red busses.

                                                     8
(iv) The conditional logit model is conditional on the
     choice set, i.e. all substitution occurs within the
     set of the m given alternatives; there is no “outside
     option” (such as not consuming any of the alter-
     natives).

    Hence, the model implies no change in demand (or
    no lost demand) in response to an increase in all
    prices by the same proportion.

(v) Nonetheless:
     – Due to its convenience, model is widely applied,
       esp. in microeconometric demand analysis and
       in empirical Industrial Organization (cd. Berry
       (1994)).
     – IIA problems avoided when coefficients θ are
       considered random (so-called Mixed MNL, cf.
       McFadden - Train; application in empirical IO:
       Berry, Levinsohn and Pakes (1995)).




                                                   9
1.2 Multinomial Probit

   This model has the advantage that it overcomes
   the limitations of the IIA property imposed by the
   conditional logit model, but it is computationally
   much more demanding in the case of large m.

   Latent model as before:
            Yi⋆ = V (xi; θ) + ϵi , i = 1, · · · , m
             Yi = 1{Yi⋆ =max{Y1⋆ ,··· ,Ym}}
                                        ⋆


              ϵ = (ϵ1 , · · · , ϵm )′ ∼ N (0, Σ)
   where Σ is m × m p.d.s., and non-zero off-diagonal
   elements allow correlations between alternatives, which
   can be interpreted as arising form unobserved (by
   econometrician) attributes. Then,
     Pr(Yi = 1|X) =       Pr(Yi⋆ > Yj⋆ , ∀j ̸= i)
                  =       Pr(ϵj − ϵi < Vi − Vj ∀j ̸= i)
                  =       m − 1 comparisons
                  =       m − 1 dimensional integral.

   For m ≥ 4, computing such integrals is very costly,
   no analytical solution exists, although such integrals
   can be approximated using simulation methods.



                                                      10
1.3 Nested Multinomial Logit Model (NMNL,
McFadden)

   This model also to some extent overcomes IIA re-
   strictions and is tractable for large-dimensional (large
   m) problems, but it requires a hierarchical nesting
   structure (which amounts to a testable assump-
   tion).

   Example: Residential heating systems - level 1:
   room vs. central heating system; level 2: electric,
   gas, oil, photo-cellular system; transport - level 1:
   bus vs. car; level 2 (in ‘bus’ nest): red, blue bus.

              ⋆
   Let Yij = Vij + ϵij denote the conditional indirect
   utility of alternative j (on the 2nd, bottom level)
   in nest i (1st, top level), for i = 1, · · · , c, and j =
   1, · · · , Ni ; again, assume that ϵij are i.i.d. type 1
   extreme value distributed r.v.s.




                                                     11
Then, in abbreviated notation,
            Prij    = Prj|iPri
                          exp(Vij )
           Prj|i    = ∑Ni
                        m=1 exp(Vim )
                           ∑Ni
                             j=1 exp(Vij )
             Pri    = ∑c     ∑Nm
                        m=1     n=1 exp(Vmn )
                              exp(Vij )
            Prij    = ∑c     ∑Nm
                        m=1     n=1 exp(Vmn )
i.e. the NMNL has an conditional logit structure at
the bottom level within nests.

So, NMNL exhibits IIA within nests, but not across
nests.

Define the inclusive value for nest i,
                            
                 ∑
                 Ni
       Ii = ln     exp(Vij ) , i = 1, · · · , c.
                    j=1


Then,
                           exp(Vij )
               Prj|i =
                           exp(Ii )
                              exp(Ii )
                   Pri =   ∑c             .
                             m=1 exp(Im )

                                                     12
Estimation

(i) ML, using standard techniques (Newton-Raphson,
    Method of Scoring or BHHH Algorithm); in the case
    of Multinomial Probit with proviso that the choice
    probabilities can be computed.

(ii) Sequential methods

    Consider an extension of NMNL:
                          exp(x′ β)
                               ij
               Prj|i =
                           exp(Ii)
                             exp(z′ δ + Ii)
                Pri =     ∑c       i
                                      ′
                            m=1 exp(zm δ + Im )
    where the z′ s vary across, but not within nest i,
                      i
    i = 1, · · · , c.

    Estimate β from conditional choice models Prj|i ;
                               ˆ
    given the ensuing estimate β , estimate inclusive val-
    ues
                                         
                           ∑
                           Ni
                  ˆ = ln 
                  Ii          exp(x′ β ) ,
                                     ij
                                        ˆ
                           j=1

    and use these and marginal choice probabilities for
    the nests, Pri, to estimate δ.


                                                   13
Extension: Generalized Extreme Value Model
   The NMNL model above implicitly imposes a coef-
   ficient on the inclusive value which is restricted to
   one. The general NMNL model relaxes this to
                      exp(z′ δ + (1 − σ)Ii )
             Pri = ∑c      i
                                              ,
                    m=1 exp(z′ δ + (1 − σ)Im)
                               m
   where σ ∈ [0, 1] captures the possibility that there
   may be some dependence between the choices.

   Such models arise from the Generalized Extreme
   Value (GEV) Distribution
    F (ϵ1 , · · · , ϵn ) = exp (−G(exp(−ϵ1 ), · · · , exp(−ϵn ))) ,
   where G(Y1 , · · · , Yn) is a non-negative function, ho-
   mogenous of degree 1, of (Y1 , · · · , Yn ) ≥ 0.

   Further restrictions are that limYi→+∞ G(Y1 , · · · , Yn) =
   ∞ for any i = 1, · · · , n, and that any kth partial
   derivative is non-negative for k odd and non-positive
   for k even.

   The special case
                                             ∑
                                             n
                      G(Y1 , · · · , Yn) =         Yi
                                             i=1

   yields the MNL model (independent extreme value
   distribution).

                                                            14
A bivariate example of a widely used GEV Distribu-
tion is
                         ( 1      1 )1−σ

            G(Y1 , Y2 ) = Y1 + Y2
                            1−σ  1−σ
                                         .

Here, σ is approximately equal to the correlation
between Y1 and Y2 . The implied binomial choice
probabilities are
                  exp (Vi/(1 − σ))
          Pri = ∑2                     , (⋆)
                 k=1 exp (Vk /(1 − σ))
where Vi is the indirect conditional utility of alter-
native i = 1, 2.

If σ → 0, then these reduce to MNL choice prob-
abilities; if σ → 1, then the indirect utilities of the
alternatives are highly correlated, so that the in-
duced choice probabilities amount to pure chance,
i.e. Pr1 = Pr2 = 1/2.




                                                15
Such GEV specifications allow to avoid the IIA prob-
lems, as in the “red bus, blue bus” problem. For
example, with 3 alternatives, define
                              ( 1      1 )1−σ

       G(Y1 , Y2 , Y3 ) = Y1 + Y2 + Y3
                                 1−σ  1−σ
                                              .

The implied choice probabilities can be shown to
be
                                     1 (    1         1 )−σ


              Y1                  Y21−σ
                                         Y2 + Y3
                                           1−σ       1−σ


   Pr1 =                  , Pr2 =                           ,
         G(Y1 , Y2 , Y3 )               G(Y1 , Y2 , Y3 )
with an analogous expression for Pr3 .

If only alternatives 1 and 2 are available (Y3 = 0),
then the model reduces to the standard binomial
logit.

If only alternative 2 and 3 are available, then the
implied binomial choice probabilities are as in (⋆).

If all three alternatives are available, the odds ratio
of 1 vs. 2 depends on the indirect utility of 3 (i.e.
on Y3 = exp(V3 )).




                                                     16
2. Models for Censored and Truncated Data

(i) Censoring: occurs if values of a r.v. in a certain
    subset of its support are transformed into a single
    value.

(ii) Truncation: occurs if sample data are drawn from a
     certain subset of the support of the population dis-
                      ⋆                                  ⋆
     tribution; e.g. yn latent data, and observe yn = yn if
     yn ≥ 0; here, unlike in (i), whenever yn < 0, no ob-
      ⋆                                       ⋆

     servation is recorded (while in (i), zero is recorded).

    Note: data can be censored, truncated, or trun-
    cated and censored.




                                                     17
2.1 Models for Truncated Data


                     ⋆    ⋆
   Observe yn = yn if yn > c, for some con-
                                  ⋆        ⋆
   stant c. Let CDF of i.i.d. Yn be FY ⋆ (yn),
   fY ⋆ (yn) its pdf, and c ∈ supp(Y ⋆). Then,
          ⋆

   for y > c, the CDF of observation Yn is

    FY (y) = FY ⋆ (y|Y ⋆ is observed)
           = Pr(c < Y ⋆ < y)/Pr(Y ⋆ > c)
           = [FY ⋆ (y) − FY ⋆ (c)] /(1 − FY ⋆ (c)),
   and its pdf is fY (y) = fY ⋆ (y)/(1 − FY ⋆ (c)).


   Example: Yn ∼ i.i.d. N (µ, σ 2); observe Yn =
                 ⋆
                                         (     )
    ⋆ if Y ⋆ > c. Letting ϕ(x) = √1 exp − 1 x2 ,
   Yn     n                                 2
                                   ∫x
                                  2π
   the pdf of N (0, 1), and Φ(x) = −∞ ϕ(z)dz,
   it follows that
                            (     )
                         1 ϕ y−µ
                         σ     σ
               fY (y) =       (     ).
                                c−µ
                        1−Φ σ

                                             18
Truncated Regression Model
   Selection on the basis of the response variable.

   Example: observe (log) earnings of low income fam-
   ilies; want to estimate general (log) earnings equa-
   tion in the population (Hausman and Wise, 1977):
            yn = x′ β0 + ϵn, ϵn|xn ∼ N (0, σ0 );
             ⋆
                  n
                                            2

                                       ⋆
   “low income family” defined by yn < c, for some
   known constant c; observe only family incomes for
          ⋆
   which yn < c:
                  yn < c ⇔ ϵn < c − x′ β0 .
                   ⋆
                                     n
   Hence,
                                              (                )
                                       1          yn −x′n β0
                                       σ0
                                          ϕ           σ0
             fY (yn|xn; β0 , σ0 c) =
                              2               (              ) ,
                                                  c−x′n β0
                                        Φ           σ0

   and therefore, the average log-likelihood is
        LN   = LN (β, σ 2 ; y, X, c)
                          ( y −x′ β ) 
                            1
               1 ∑  σϕ
                   N               n   n


                                ( ′ )
                                     σ
             =        ln
               N n=1                 x
                             Φ c−σ nβ

                   1 ( 2)     1 ∑
                                    N
             ∝   − ln σ −        2
                                       (yn − x′ β)2
                                              n
                   2         2N σ n=1
                           [ (          )]
                   1 ∑
                      N
                               c − x′ β
                 −       ln Φ       n
                                           .
                   N n=1          σ

                                                                   19
Note:
 – The first two summands correspond to the aver-
   age log-likelihood of the normal linear regression
   model;
 – the third (last) summand is a correction term
   that accounts for the truncation of the sample.
   This can be exploited in estimation (see below).

Under the usual regularity conditions (as discussed
                                               √
under the heading of ML Theory), the MLE is N -
CAN and efficient.




                                              20
If β0 were estimated by OLS, instead of ML, what
would be the implication for the resulting estima-
tor?

Just saw: ML̸=OLS estimator, so OLS estimator
cannot be BLUE. Which of the Gauss-Markov as-
sumptions is/are not satisfied?

Need to consider the conditional mean and condi-
tional variance of the observed data. Useful

Lemma: If X ∼ N (µ, σ 2 ) and c=const., then
                                ( )
                              ϕ c−µ
     E[X|X < c] = µ − σ ( c−µ ) ,  σ
                             Φ σ
                          (        (      ))
                                     c−µ
   var(X|X < c) = σ 2 1 − δ                  ,
                                       σ
         (     )         ( c−µ ) [          ( c−µ ) ]
           c−µ       ϕ σ           c−µ     ϕ σ
 where δ         =       ( c−µ )         + ( c−µ ) .
            σ        Φ σ             σ     Φ σ




                                                21
The following auxiliary result will be useful (here
and elsewhere).

The derivative of the normal pdf with respect to x
is
     (    )              (       )
 1 ′ x−µ           d 1     x−µ
   ϕ          =        ϕ
 σ     σ          dx σ       σ
                                  (           )
                   d    1             1
              =      √       exp − 2 (x − µ)2
                  dx 2πσ 2           2σ
                                    (            )
                      1 x−µ              1
              = −√              exp − 2 (x − µ)2
                      2π σ 3           2σ
                              (      )
                    x − µ1      x−µ
              = − 2         ϕ          .
                     σ σ          σ




                                            22
Proof of the Lemma:
                 ∫ c 1 ( x−µ )
                        ϕ σ
  E[X|X < c] =       x σ ( c−µ ) dx
                  −∞   Φ σ
                 ∫ c               (     )
                                1
                                  ϕ x−µ
              =      (x ± µ) σ ( c−µ ) dx
                                       σ
                  −∞            Φ σ
                        ∫ c                 1
                                              ( x−µ )
                               −(x − µ) σ ϕ σ
              = µ − σ2                        ( c−µ ) dx
                         −∞        σ2       Φ σ
                        ∫ c 1 ′     ( x−µ )
                                 ϕ
                               σ ( σ)
              = µ − σ2               c−µ dx
                         −∞ Φ
                             ( c−µ ) σ
                        1ϕ σ
              = µ − σ2       ( )
                        σ Φ c−µ
                         ( c−µ )σ
                       ϕ σ
              = µ − σ ( c−µ ) .
                      Φ σ

The expression for the conditional covariance can
be derived in an analogous fashion (cp. Greene
(2008), 6th edition, p.866).

Applying the Lemma to the truncated regression
model:
    E[yn|xn, obs.d] = x′ β0 + E[ϵn |ϵn < c − x′ β0 ]
                       n          ( ′ )         n

                                ϕ c−xnβ0
                                      σ
                    = x′ β 0 − σ (
                       n                     ).
                                    c−x′n β0
                                Φ      σ

                                                   23
 Hence,
 (i) the conditional mean of yn , given xn, for the
     observed data is not linear in the estimable pa-
     rameter β0 ;
                                 ( ′ )
                                       ϕ
                                           c−   xn β0
                                           (                 ) amounts to
                                                    σ
(ii) the “inverse Mills ratio”
                                       Φ
                                               c−   x′n β0
                                                     σ


     an omitted variable in the OLS linear regres-
     sion equation, and so OLS suffers from omitted-
     variable bias and is inconsistent; essentially, the
     OLS linear regression equation is a mis-specified
     model, because the correct regression equation
     is nonlinear in β0 ;
(iii) the conditional variance of yn , given xn , for the
      observed data is not homoskedastic; in fact,
                                    (      (       ′β
                                                       ))
                                              c − xn 0
          var(yn|xn, obs.d) = σ0 1 − δ
                                  2
                                                           ,
                                                 σ0
      where
                          ( ′ )                      ( ′ )
        (           )   ϕ  c−xn β0                        x
                                                    ϕ c−σ0nβ0
          c − x′ β0                    c − x′ β0
                                    )                           ) ;
                             σ0
      δ        n
                      = (       ′
                                            n
                                                 + (
             σ0            c−xn β0        σ0            c−x′n β0
                        Φ    σ0
                                                    Φ     σ0

     i.e. this non-linear regression problem is het-
     eroskedastic, the conditional variances are infor-
     mative about β0 , and efficient estimators need
     to account for heteroskedasticity (which OLS
     does not).
2.2 Models for Censored Data
                 ⋆
   Observe yn = yn1{yn >0} .
                     ⋆




   Again, let FY ⋆ (y) be CDF of latent Y ⋆ , and let 0 ∈
          ⋆
   supp(Yn ) for all n.

   Then, CDF of observation yn is
           FY (yn ) = [FY ⋆ (0)]1{yn=0} [FY ⋆ (yn)]1{yn >0} .

   “Pdf” is a mixture of probability mass function (at
   zero) and pdf (when yn > 0):

            fY (y) = [FY ⋆ (0)]1{yn =0} [fY ⋆ (yn )]1{yn>0} .

   Example: yn ∼ N (µ, σ 2 ); observe yn = yn1{yn >0} , so
                ⋆                              ⋆   ⋆

   that
                  [ ( µ )]1{yn=0} [ 1 ( y − µ )]1{yn>0}
                                         n
       fY (yn ) = Φ −                 ϕ                 .
                      σ             σ      σ




                                                                24
Censored Regression Model
   Example: Willingness to pay for a resource; surveys
   typically record non-negative WTP.
            ⋆
   Let WTP yn be modelled as
              yn = x′ β0 + ϵn, ϵn|xn ∼ N (0, σ0 ).
               ⋆
                    n
                                              2


   Observe yn = 0 if yn ≤ 0 and yn = yn if yn > 0.
                      ⋆               ⋆     ⋆


   Then,
                       [  ( ′ )]1{yn=0} [      (              )]1{yn>0}
                             xn β0        1      yn − x ′ β 0
   f (yn|xn; β0 , σ0 ) = Φ −
                   2
                                             ϕ          n
                              σ0          σ0         σ0

   Assuming i.i.d. data, log-likelihood function is
                     ∑N              ( ( ′ ))
                                              x β
   L(β, σ; y, X) =       1{yn =0} ln Φ − n
                     n=1
                                               σ
                        ∑N             (     (              ))
                                         1      yn − x ′ βn
                     +      1{yn>0} ln     ϕ
                        n=1
                                         σ            σ
                     ∑[
                      N               ( ( ′ ))
                                               x β
                 =         1{yn=0} ln Φ − n
                                                 σ
                     n=1
                                   (        ( ′ ))]
                                                 x β
                     +1{yn>0} ln 1 − Φ − n
                                                   σ
                                             (           )
                                          1     yn −x′n β
                        ∑N                  ϕ
                     +      1{yn>0} ln σ          ( ′ ) .
                                                    σ

                        n=1              1 − Φ xσβ     n



                                                       25
 ⇒ Log-likelihood function is sum of
    (i) binomial probit log-likelihood (censored vs. non-
        censored);
   (ii) log-likelihood of truncated data (subsample of
        non-censored data).

    This suggests two approaches to estimation:
                                               √
(1) ML: Under regularity conditions, MLE is N -CAN
    and efficient; MLE can be obtained from both rep-
    resentations of the log-likelihood function.

(2) Two-step procedure (due to J. Heckman), exploit-
    ing decomposition of log-likelihood in (i) and (ii).




                                                   26
Heckman 2-Step Procedure
               ′
(i) Estimate β0 /σ0 = (β01 /σ0 , · · · , β0k /σ0 ) using bino-
    mial probit on censored vs. non-censored; denote
                           (ˆ)
    these ML estimates by β .σ


(ii) Estimate the inverse Mills ratio by
                             (    (ˆ))
                           ϕ −x′ β
                                 n σ
                   ˆ
                  Mn =         (     (ˆ)) ,
                         1 − Φ −x′ βn σ

    as an estimate in conditional mean of truncated
    data (non-censored subsample):
                                             ( ′ )
                                           ϕ − xσβ0
                                                n


         E[yn |xn , yn > 0] = x′ β0 + σ0      ( ′ ),
                                                  0

                               n
                                         1 − Φ − xσβ0
                                                    n
                                                      0


    i.e. impute missing variable in OLS regression;
                                        2
    then, in 2nd step, estimate β0 and σ0 by OLS (yn
               ˆ
    on xn and Mn).




                                                       27
Note:
                √
– This yields       N -CAN estimator.
– This estimator is inefficient (relative to MLE),
  because information about β0 and σ0 in condi-
  tional variance is not used.
– Procedure is computationally attractive, because
  it uses canned estimation routines (probit and
  OLS; trade-off: computational ease vs. effi-
  ciency).
– In censored and truncated regression models,
  conditional mean is nonlinear in β0 , even though
  latent variable model is linear. These mod-
  els belong to large class of nonlinear regression
  models:
                    E[yn |xn ; θ] = g(xn , θ),
   for some (smooth) known function g(·, ·), and
   θ ∈ Rk unknown.




                                                 28
Incidental Truncation: Non-random Sample Se-
lection, Self-Selection

 • General setup:
  (1) (latent) selection equation;
  (2) equation of (primary) interest.

   Classic example:
   1. Female Labor Supply
      (1) wage equation: difference between market
      and reservation wage, as function of covariates;
      (2) hours worked equation: only observed if
      woman is working, i.e. whenever market wage
      exceeds reservation wage.
   2. Migration Models
      (1) net benefit from migrating;
      (2) income of migrants: only observed for mi-
      grants.




                                               29
Formal setup

(1) (latent) selection equation:
                                    zn = w′ γ0 + un;
                                     ⋆
                                          n


(2) equation of interest:
                                    yn = x′ β0 + ϵn.
                                          n

                                   ⋆
    Sampling rule: yn observed if zn > 0.

    Suppose
         (          )                ((     ) ( 2       ))
              ϵn           i.i.d.         0    σϵ ρσϵσu
                            ∼ N              ,             ,
              un                          0     ·  σu2


    where ρ =       σϵu
                   σϵ σu
                            ∈ (−1, 1).

    Then,
        E[yn|xn, obs.d] = x′ β0 + E[ϵn |zn > 0]
                           n
                                         ⋆

                        = x′ β0 + E[ϵn |un > −w′ γ0 ],
                           n                    n
    i.e. need conditional distribution of ϵn , given un .
    Refer to handout.



                                                           30
Application of properties of conditional normal dis-
tribution yields
                    (                  )
                       σϵ
           ϵn|un ∼ N ρ un, σϵ (1 − ρ2 ) .
                              2
                       σu

Following similar steps as in computing moments of
truncated normal, use this conditional distribution
to get moments of incidentally truncated normal.
General result:

Lemma: Suppose
    (   )        ((    ) ( 2       ))
      ϵ   i.i.d.    µϵ    σϵ ρσϵσu
           ∼ N          ,             .
      u             µu     ·  σu2

Then, for c a scalar constant,
                                             (          )
                                                 c−µu
                                         ϕ        σu
         E[ϵ|u > c] = µϵ + ρσϵ                   (          )
                                                     c−µu
                           1−Φ                        σu
                      (     (        ))
                              c − µu
    var(ϵ|u > c) = σϵ 1 − δ
                    2
                                σu
                            [            ]
                      ϕ(z)      ϕ(z)
            δ(z) =                     −z .
                   1 − Φ(z) 1 − Φ(z)

Hence,
                                                        (            )
                                                            w′n γ0
                                               ϕ − σ0
  (†)    E[yn |xn , obs.d] =   x′ β0
                                n      + ρσϵ      (         ).
                                                     w′n γ0
                                             1 − Φ − σ0

                                                                     31
Conclusion: OLS applied to (2) yields biased and
inconsistent estimates since inverse Mills ratio is
omitted; inverse Mills ration accounts for non-random
sample selection, induced by (1).

Estimation:

Notice first:
    2
 – σu is not identified, only ratio γ0 /σu is identifi-
                  2
   able; impose σu = 1;
    2
 – σϵ is not identified, only product ρσϵ is; impose
    2
   σϵ = 1.

Approaches:
(i) ML: efficient, but computationally burdensome.
(ii) Heckman 2-step procedure:
  (1) binomial probit to estimate γ0 ; impute inverse
      Mills ratio for selected observations:
                          ϕ (−w′ ˆ)
                                nγ
                   ˆ
                   Mn =                .
                        1 − Φ (−w′ ˆ)
                                   n γ
  (2) OLS in (†) for selected observations:        For
                                  ˆ
      these, regress yn on xn and Mn.



                                              32

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:8
posted:10/1/2012
language:Unknown
pages:33