Limited Dependent Variable Models

Document Sample

```					Limited Dependent Variable Models
Limited dependent variables typically are
(i) qualitative dependent variables;
(ii) dependent variables having limited support.

Models for such data are often derived from latent
variable models.

1. Probabilistic Choice Models

1.1 Conditional Logit (D. McFadden, R.D. Luce)

Suppose decision maker faces m discrete choice al-
ternatives.

Let Yj⋆ = indirect conditional utility of jth alterna-
tive; latent, i.e. unobserved by econometrician.

Econometrician observes:
Yj   = 1 if alternative j is chosen;
= 0 if alternative j is not chosen;
= 1{Yj⋆ =max{Y1⋆ ,··· ,Ym }} .
⋆

Notation: Binary variable 1{A} takes value 1 if event
A occurs, and zero otherwise.

Assume: no ties between alternatives.
1
Consider latent variable model:
Yj⋆ = V (xj , θ) + ϵj , j = 1, · · · , m
xj = attributes of alternative j
V (·, ·) = indirect utility function,
known up to parameter vector θ
ϵj = residual, variation in tastes, perceptions,
unobserved by econometrician.

Assumption about {ϵj , j = 1, · · · , m}: ϵ1 , · · · , ϵm are
i.i.d. with type 1 extreme value distribution, i.e.
with CDF
F (ϵ) = exp(− exp(−ϵ)), ϵ ∈ R,
and with pdf
f (ϵ) = exp(−ϵ − exp(−ϵ)).

Lemma: Under above model assumptions,
exp(Vi)
Pr(Yi = 1|X) = ∑m            ,
j=1 exp(Vj )

where X = [x1 , · · · , xm ] and Vj = V (xj , θ), j =
1, · · · , m.

2
Proof:

Notice ﬁrst,
Yi = 1 ⇔      Yi⋆ = max{y1 , · · · , Ym }
⋆         ⋆

⇔      Vi + ϵi > Vj + ϵj , ∀j ̸= i
⇔      ϵj < ϵi + Vi − Vj , ∀j ̸= i
⇒ Pr (Yi = 1|X) =      Pr (ϵj < ϵi + Vi − Vj , ∀j ̸= i)
∫ ∏
(by i.i.d.)   =          F (ϵi + Vi − Vj )f (ϵi )dϵi.
R j̸=i

3
Consider the integrand:
∏
F (ϵi + Vi − Vj )f (ϵi)
j̸=i
∏
=          exp(− exp(−ϵi − Vi + Vj )) exp(−ϵi − exp(−ϵi))
j̸=i
∏
=          exp(− exp(−ϵi − Vi ) exp(Vj )) exp(−ϵi − exp(−ϵi ))
j̸=i
∑
= exp(−ϵi − exp(−ϵi )) exp(− exp(−ϵi − Vi)                                exp(Vj ))
j̸=i
                                  
∑ exp(Vj )
= exp(−ϵi − exp(−ϵi )) exp − exp(−ϵi)                                        
exp(Vi)
j̸=i
                                               
∑ exp(Vj )
= exp −ϵi − exp(−ϵi) 1 +                                  
exp(Vi)
j̸=i
                           
∑ exp(Vj )
m
= exp −ϵi − exp(−ϵi)              .
j=1
exp(Vi)
[∑                 ]
m   exp(Vj )
Let λi = ln        j=1 exp(Vi )       , so that

∑ exp(Vj )
m
exp(λi) =
j=1
exp(Vi)
exp(Vi)
exp(−λi) =             ∑m            .
j=1 exp(Vj )
4
Then,
∫
Pr(Yi = 1|X) =      exp(−ϵi − exp(−ϵi + λi))dϵi
R       ∫
= exp(−λi) exp(−˜i − exp(−˜i))d˜i
ϵ          ϵ  ϵ
R
where ˜i = ϵi − λi , distributed
ϵ
type 1 extreme value, shifted by λi,
= exp(−λi)
exp(Vi)
= ∑m             .
j=1 exp(Vj )

Typical speciﬁcation: conditional indirect utility lin-
ear in attributes, so that
exp(x′ θ)
Pr(Yi = 1|X) = ∑m      i
, i = 1, · · · , m.
j=1 exp(x′ θ)
j

5
Primary Features and Limitations of the Con-
ditional Logit Model

(i) Independence of Irrelevant Alternatives (IIA) prop-
erty: The “odds ratio” for choice alternatives i and
j is
Pr(Yi = 1|X)   exp(Vi)
=          , i, j = 1, · · · , m;
Pr(Yj = 1|X)   exp(Vj )
i.e. independent of (a) alternatives other than i and
j, and (b) the total number m of alternatives.

(ii) The IIA property is inappropriate in many applica-
tions in which some choice alternatives are similar,
or more closely related than others. Example: “Red
bus, blue bus” problem (ﬁrst pointed out by G. De-
breu) - Suppose there are 3 transport options, (op-
tion 1) red bus, (option 2) blue bus, and (option
3) car, and travellers do not care about bus color
and are indiﬀerent between car and bus. Then, one
expects
1
Pr(red bus|bus) = Pr(blue bus|bus) =
2
1
Pr(bus) = Pr(car) =
2
1
Pr(red bus) = Pr(blue bus) = .
4

6
Hence, the odds of red bus vs. car are 1:1 if blue
busses are not present, and 1:2 is blue busses are
present.

This is in contrast to the IIA property implied by
the conditional logit model, which applied in this
setting would imply Pr(red bus) = Pr(blue bus) =
Pr(car) = 1 .
3

The problematic implication of the conditional logit
model here: Model implicitly assumes that all three
choice alternatives are independent, conditional on
attributes (suppressed in above notation), while red
and blue busses are perceived as similar (colors do
not matter to travelers) and therefore cannot be
considered independent.

An appropriate model might hierarchically nest the
choices: ﬁrst bus vs. car, and second, conditional
on bus, red vs. blue. In choice situations as this, the
conditional logit model predicts a joint probability
for bus ( 2 ) which is higher than the true probability
3
of choosing bus ( 1 ).
2

7
(iii) Consider the change in demand for i in response to
a change in an element of xj (e.g. increase in price
xjl , the lth element of xj - the cross-price eﬀect of
j on i). Let Vk = x′ θ, k = 1, , · · · , m. Then,
k

∂                         exp(x′ θ) exp(x′ θ)
i          j
Pr(Yi = 1|X) = −      (∑m               ) θl
∂xjl                                    ′ θ) 2
k=1 exp(xk
=    = −Pr(Yi = 1|X)Pr(Yj |X)θl ,
where θl ≤ 0 (conditional indirect utility is non-
increasing in price).

Therefore, the cross-price elasticity of demand for
alternative i with respect to the price of alternative
j is
∂
ηij   :=        Pr(Yi = 1|X)xjl /Pr(Yi = 1|X)
∂xjl
=    −xjl θl Pr(Yj = 1|X) > 0,
which is seen to be independent of (and hence iden-
tical across all ) i and proportional to Pr(Yj = 1|X).

Lost demand for alternative j is re-distributed in
equal proportions to all other alternatives, regard-
less of their proximity to j in the attribute or char-
acteristics space.

In the “red bus, blue bus” example, for instance, the
conditional logit model would imply that a reduction
in the frequency of blue busses leads to as many
people switching to car as to red busses.

8
(iv) The conditional logit model is conditional on the
choice set, i.e. all substitution occurs within the
set of the m given alternatives; there is no “outside
option” (such as not consuming any of the alter-
natives).

Hence, the model implies no change in demand (or
no lost demand) in response to an increase in all
prices by the same proportion.

(v) Nonetheless:
– Due to its convenience, model is widely applied,
esp. in microeconometric demand analysis and
in empirical Industrial Organization (cd. Berry
(1994)).
– IIA problems avoided when coeﬃcients θ are
considered random (so-called Mixed MNL, cf.
McFadden - Train; application in empirical IO:
Berry, Levinsohn and Pakes (1995)).

9
1.2 Multinomial Probit

This model has the advantage that it overcomes
the limitations of the IIA property imposed by the
conditional logit model, but it is computationally
much more demanding in the case of large m.

Latent model as before:
Yi⋆ = V (xi; θ) + ϵi , i = 1, · · · , m
Yi = 1{Yi⋆ =max{Y1⋆ ,··· ,Ym}}
⋆

ϵ = (ϵ1 , · · · , ϵm )′ ∼ N (0, Σ)
where Σ is m × m p.d.s., and non-zero oﬀ-diagonal
elements allow correlations between alternatives, which
can be interpreted as arising form unobserved (by
econometrician) attributes. Then,
Pr(Yi = 1|X) =       Pr(Yi⋆ > Yj⋆ , ∀j ̸= i)
=       Pr(ϵj − ϵi < Vi − Vj ∀j ̸= i)
=       m − 1 comparisons
=       m − 1 dimensional integral.

For m ≥ 4, computing such integrals is very costly,
no analytical solution exists, although such integrals
can be approximated using simulation methods.

10
1.3 Nested Multinomial Logit Model (NMNL,

This model also to some extent overcomes IIA re-
strictions and is tractable for large-dimensional (large
m) problems, but it requires a hierarchical nesting
structure (which amounts to a testable assump-
tion).

Example: Residential heating systems - level 1:
room vs. central heating system; level 2: electric,
gas, oil, photo-cellular system; transport - level 1:
bus vs. car; level 2 (in ‘bus’ nest): red, blue bus.

⋆
Let Yij = Vij + ϵij denote the conditional indirect
utility of alternative j (on the 2nd, bottom level)
in nest i (1st, top level), for i = 1, · · · , c, and j =
1, · · · , Ni ; again, assume that ϵij are i.i.d. type 1
extreme value distributed r.v.s.

11
Then, in abbreviated notation,
Prij    = Prj|iPri
exp(Vij )
Prj|i    = ∑Ni
m=1 exp(Vim )
∑Ni
j=1 exp(Vij )
Pri    = ∑c     ∑Nm
m=1     n=1 exp(Vmn )
exp(Vij )
Prij    = ∑c     ∑Nm
m=1     n=1 exp(Vmn )
i.e. the NMNL has an conditional logit structure at
the bottom level within nests.

So, NMNL exhibits IIA within nests, but not across
nests.

Deﬁne the inclusive value for nest i,
             
∑
Ni
Ii = ln     exp(Vij ) , i = 1, · · · , c.
j=1

Then,
exp(Vij )
Prj|i =
exp(Ii )
exp(Ii )
Pri =   ∑c             .
m=1 exp(Im )

12
Estimation

(i) ML, using standard techniques (Newton-Raphson,
Method of Scoring or BHHH Algorithm); in the case
of Multinomial Probit with proviso that the choice
probabilities can be computed.

(ii) Sequential methods

Consider an extension of NMNL:
exp(x′ β)
ij
Prj|i =
exp(Ii)
exp(z′ δ + Ii)
Pri =     ∑c       i
′
m=1 exp(zm δ + Im )
where the z′ s vary across, but not within nest i,
i
i = 1, · · · , c.

Estimate β from conditional choice models Prj|i ;
ˆ
given the ensuing estimate β , estimate inclusive val-
ues
                
∑
Ni
ˆ = ln 
Ii          exp(x′ β ) ,
ij
ˆ
j=1

and use these and marginal choice probabilities for
the nests, Pri, to estimate δ.

13
Extension: Generalized Extreme Value Model
The NMNL model above implicitly imposes a coef-
ﬁcient on the inclusive value which is restricted to
one. The general NMNL model relaxes this to
exp(z′ δ + (1 − σ)Ii )
Pri = ∑c      i
,
m=1 exp(z′ δ + (1 − σ)Im)
m
where σ ∈ [0, 1] captures the possibility that there
may be some dependence between the choices.

Such models arise from the Generalized Extreme
Value (GEV) Distribution
F (ϵ1 , · · · , ϵn ) = exp (−G(exp(−ϵ1 ), · · · , exp(−ϵn ))) ,
where G(Y1 , · · · , Yn) is a non-negative function, ho-
mogenous of degree 1, of (Y1 , · · · , Yn ) ≥ 0.

Further restrictions are that limYi→+∞ G(Y1 , · · · , Yn) =
∞ for any i = 1, · · · , n, and that any kth partial
derivative is non-negative for k odd and non-positive
for k even.

The special case
∑
n
G(Y1 , · · · , Yn) =         Yi
i=1

yields the MNL model (independent extreme value
distribution).

14
A bivariate example of a widely used GEV Distribu-
tion is
( 1      1 )1−σ

G(Y1 , Y2 ) = Y1 + Y2
1−σ  1−σ
.

Here, σ is approximately equal to the correlation
between Y1 and Y2 . The implied binomial choice
probabilities are
exp (Vi/(1 − σ))
Pri = ∑2                     , (⋆)
k=1 exp (Vk /(1 − σ))
where Vi is the indirect conditional utility of alter-
native i = 1, 2.

If σ → 0, then these reduce to MNL choice prob-
abilities; if σ → 1, then the indirect utilities of the
alternatives are highly correlated, so that the in-
duced choice probabilities amount to pure chance,
i.e. Pr1 = Pr2 = 1/2.

15
Such GEV speciﬁcations allow to avoid the IIA prob-
lems, as in the “red bus, blue bus” problem. For
example, with 3 alternatives, deﬁne
( 1      1 )1−σ

G(Y1 , Y2 , Y3 ) = Y1 + Y2 + Y3
1−σ  1−σ
.

The implied choice probabilities can be shown to
be
1 (    1         1 )−σ

Y1                  Y21−σ
Y2 + Y3
1−σ       1−σ

Pr1 =                  , Pr2 =                           ,
G(Y1 , Y2 , Y3 )               G(Y1 , Y2 , Y3 )
with an analogous expression for Pr3 .

If only alternatives 1 and 2 are available (Y3 = 0),
then the model reduces to the standard binomial
logit.

If only alternative 2 and 3 are available, then the
implied binomial choice probabilities are as in (⋆).

If all three alternatives are available, the odds ratio
of 1 vs. 2 depends on the indirect utility of 3 (i.e.
on Y3 = exp(V3 )).

16
2. Models for Censored and Truncated Data

(i) Censoring: occurs if values of a r.v. in a certain
subset of its support are transformed into a single
value.

(ii) Truncation: occurs if sample data are drawn from a
certain subset of the support of the population dis-
⋆                                  ⋆
tribution; e.g. yn latent data, and observe yn = yn if
yn ≥ 0; here, unlike in (i), whenever yn < 0, no ob-
⋆                                       ⋆

servation is recorded (while in (i), zero is recorded).

Note: data can be censored, truncated, or trun-
cated and censored.

17
2.1 Models for Truncated Data

⋆    ⋆
Observe yn = yn if yn > c, for some con-
⋆        ⋆
stant c. Let CDF of i.i.d. Yn be FY ⋆ (yn),
fY ⋆ (yn) its pdf, and c ∈ supp(Y ⋆). Then,
⋆

for y > c, the CDF of observation Yn is

FY (y) = FY ⋆ (y|Y ⋆ is observed)
= Pr(c < Y ⋆ < y)/Pr(Y ⋆ > c)
= [FY ⋆ (y) − FY ⋆ (c)] /(1 − FY ⋆ (c)),
and its pdf is fY (y) = fY ⋆ (y)/(1 − FY ⋆ (c)).

Example: Yn ∼ i.i.d. N (µ, σ 2); observe Yn =
⋆
(     )
⋆ if Y ⋆ > c. Letting ϕ(x) = √1 exp − 1 x2 ,
Yn     n                                 2
∫x
2π
the pdf of N (0, 1), and Φ(x) = −∞ ϕ(z)dz,
it follows that
(     )
1 ϕ y−µ
σ     σ
fY (y) =       (     ).
c−µ
1−Φ σ

18
Truncated Regression Model
Selection on the basis of the response variable.

Example: observe (log) earnings of low income fam-
ilies; want to estimate general (log) earnings equa-
tion in the population (Hausman and Wise, 1977):
yn = x′ β0 + ϵn, ϵn|xn ∼ N (0, σ0 );
⋆
n
2

⋆
“low income family” deﬁned by yn < c, for some
known constant c; observe only family incomes for
⋆
which yn < c:
yn < c ⇔ ϵn < c − x′ β0 .
⋆
n
Hence,
(                )
1          yn −x′n β0
σ0
ϕ           σ0
fY (yn|xn; β0 , σ0 c) =
2               (              ) ,
c−x′n β0
Φ           σ0

and therefore, the average log-likelihood is
LN   = LN (β, σ 2 ; y, X, c)
 ( y −x′ β ) 
1
1 ∑  σϕ
N               n   n

( ′ )
σ
=        ln
N n=1                 x
Φ c−σ nβ

1 ( 2)     1 ∑
N
∝   − ln σ −        2
(yn − x′ β)2
n
2         2N σ n=1
[ (          )]
1 ∑
N
c − x′ β
−       ln Φ       n
.
N n=1          σ

19
Note:
– The ﬁrst two summands correspond to the aver-
age log-likelihood of the normal linear regression
model;
– the third (last) summand is a correction term
that accounts for the truncation of the sample.
This can be exploited in estimation (see below).

Under the usual regularity conditions (as discussed
√
under the heading of ML Theory), the MLE is N -
CAN and eﬃcient.

20
If β0 were estimated by OLS, instead of ML, what
would be the implication for the resulting estima-
tor?

Just saw: ML̸=OLS estimator, so OLS estimator
cannot be BLUE. Which of the Gauss-Markov as-
sumptions is/are not satisﬁed?

Need to consider the conditional mean and condi-
tional variance of the observed data. Useful

Lemma: If X ∼ N (µ, σ 2 ) and c=const., then
( )
ϕ c−µ
E[X|X < c] = µ − σ ( c−µ ) ,  σ
Φ σ
(        (      ))
c−µ
var(X|X < c) = σ 2 1 − δ                  ,
σ
(     )         ( c−µ ) [          ( c−µ ) ]
c−µ       ϕ σ           c−µ     ϕ σ
where δ         =       ( c−µ )         + ( c−µ ) .
σ        Φ σ             σ     Φ σ

21
The following auxiliary result will be useful (here
and elsewhere).

The derivative of the normal pdf with respect to x
is
(    )              (       )
1 ′ x−µ           d 1     x−µ
ϕ          =        ϕ
σ     σ          dx σ       σ
(           )
d    1             1
=      √       exp − 2 (x − µ)2
dx 2πσ 2           2σ
(            )
1 x−µ              1
= −√              exp − 2 (x − µ)2
2π σ 3           2σ
(      )
x − µ1      x−µ
= − 2         ϕ          .
σ σ          σ

22
Proof of the Lemma:
∫ c 1 ( x−µ )
ϕ σ
E[X|X < c] =       x σ ( c−µ ) dx
−∞   Φ σ
∫ c               (     )
1
ϕ x−µ
=      (x ± µ) σ ( c−µ ) dx
σ
−∞            Φ σ
∫ c                 1
( x−µ )
−(x − µ) σ ϕ σ
= µ − σ2                        ( c−µ ) dx
−∞        σ2       Φ σ
∫ c 1 ′     ( x−µ )
ϕ
σ ( σ)
= µ − σ2               c−µ dx
−∞ Φ
( c−µ ) σ
1ϕ σ
= µ − σ2       ( )
σ Φ c−µ
( c−µ )σ
ϕ σ
= µ − σ ( c−µ ) .
Φ σ

The expression for the conditional covariance can
be derived in an analogous fashion (cp. Greene
(2008), 6th edition, p.866).

Applying the Lemma to the truncated regression
model:
E[yn|xn, obs.d] = x′ β0 + E[ϵn |ϵn < c − x′ β0 ]
n          ( ′ )         n

ϕ c−xnβ0
σ
= x′ β 0 − σ (
n                     ).
c−x′n β0
Φ      σ

23
Hence,
(i) the conditional mean of yn , given xn, for the
observed data is not linear in the estimable pa-
rameter β0 ;
( ′ )
ϕ
c−   xn β0
(                 ) amounts to
σ
(ii) the “inverse Mills ratio”
Φ
c−   x′n β0
σ

an omitted variable in the OLS linear regres-
sion equation, and so OLS suﬀers from omitted-
variable bias and is inconsistent; essentially, the
OLS linear regression equation is a mis-speciﬁed
model, because the correct regression equation
is nonlinear in β0 ;
(iii) the conditional variance of yn , given xn , for the
observed data is not homoskedastic; in fact,
(      (       ′β
))
c − xn 0
var(yn|xn, obs.d) = σ0 1 − δ
2
,
σ0
where
( ′ )                      ( ′ )
(           )   ϕ  c−xn β0                        x
ϕ c−σ0nβ0
c − x′ β0                    c − x′ β0
)                           ) ;
σ0
δ        n
= (       ′
n
+ (
σ0            c−xn β0        σ0            c−x′n β0
Φ    σ0
Φ     σ0

i.e. this non-linear regression problem is het-
eroskedastic, the conditional variances are infor-
mative about β0 , and eﬃcient estimators need
to account for heteroskedasticity (which OLS
does not).
2.2 Models for Censored Data
⋆
Observe yn = yn1{yn >0} .
⋆

Again, let FY ⋆ (y) be CDF of latent Y ⋆ , and let 0 ∈
⋆
supp(Yn ) for all n.

Then, CDF of observation yn is
FY (yn ) = [FY ⋆ (0)]1{yn=0} [FY ⋆ (yn)]1{yn >0} .

“Pdf” is a mixture of probability mass function (at
zero) and pdf (when yn > 0):

fY (y) = [FY ⋆ (0)]1{yn =0} [fY ⋆ (yn )]1{yn>0} .

Example: yn ∼ N (µ, σ 2 ); observe yn = yn1{yn >0} , so
⋆                              ⋆   ⋆

that
[ ( µ )]1{yn=0} [ 1 ( y − µ )]1{yn>0}
n
fY (yn ) = Φ −                 ϕ                 .
σ             σ      σ

24
Censored Regression Model
Example: Willingness to pay for a resource; surveys
typically record non-negative WTP.
⋆
Let WTP yn be modelled as
yn = x′ β0 + ϵn, ϵn|xn ∼ N (0, σ0 ).
⋆
n
2

Observe yn = 0 if yn ≤ 0 and yn = yn if yn > 0.
⋆               ⋆     ⋆

Then,
[  ( ′ )]1{yn=0} [      (              )]1{yn>0}
xn β0        1      yn − x ′ β 0
f (yn|xn; β0 , σ0 ) = Φ −
2
ϕ          n
σ0          σ0         σ0

Assuming i.i.d. data, log-likelihood function is
∑N              ( ( ′ ))
x β
L(β, σ; y, X) =       1{yn =0} ln Φ − n
n=1
σ
∑N             (     (              ))
1      yn − x ′ βn
+      1{yn>0} ln     ϕ
n=1
σ            σ
∑[
N               ( ( ′ ))
x β
=         1{yn=0} ln Φ − n
σ
n=1
(        ( ′ ))]
x β
+1{yn>0} ln 1 − Φ − n
σ
      (           )
1     yn −x′n β
∑N                  ϕ
+      1{yn>0} ln σ          ( ′ ) .
σ

n=1              1 − Φ xσβ     n

25
⇒ Log-likelihood function is sum of
(i) binomial probit log-likelihood (censored vs. non-
censored);
(ii) log-likelihood of truncated data (subsample of
non-censored data).

This suggests two approaches to estimation:
√
(1) ML: Under regularity conditions, MLE is N -CAN
and eﬃcient; MLE can be obtained from both rep-
resentations of the log-likelihood function.

(2) Two-step procedure (due to J. Heckman), exploit-
ing decomposition of log-likelihood in (i) and (ii).

26
Heckman 2-Step Procedure
′
(i) Estimate β0 /σ0 = (β01 /σ0 , · · · , β0k /σ0 ) using bino-
mial probit on censored vs. non-censored; denote
(ˆ)
these ML estimates by β .σ

(ii) Estimate the inverse Mills ratio by
(    (ˆ))
ϕ −x′ β
n σ
ˆ
Mn =         (     (ˆ)) ,
1 − Φ −x′ βn σ

as an estimate in conditional mean of truncated
data (non-censored subsample):
( ′ )
ϕ − xσβ0
n

E[yn |xn , yn > 0] = x′ β0 + σ0      ( ′ ),
0

n
1 − Φ − xσβ0
n
0

i.e. impute missing variable in OLS regression;
2
then, in 2nd step, estimate β0 and σ0 by OLS (yn
ˆ
on xn and Mn).

27
Note:
√
– This yields       N -CAN estimator.
– This estimator is ineﬃcient (relative to MLE),
because information about β0 and σ0 in condi-
tional variance is not used.
– Procedure is computationally attractive, because
it uses canned estimation routines (probit and
OLS; trade-oﬀ: computational ease vs. eﬃ-
ciency).
– In censored and truncated regression models,
conditional mean is nonlinear in β0 , even though
latent variable model is linear. These mod-
els belong to large class of nonlinear regression
models:
E[yn |xn ; θ] = g(xn , θ),
for some (smooth) known function g(·, ·), and
θ ∈ Rk unknown.

28
Incidental Truncation: Non-random Sample Se-
lection, Self-Selection

• General setup:
(1) (latent) selection equation;
(2) equation of (primary) interest.

Classic example:
1. Female Labor Supply
(1) wage equation: diﬀerence between market
and reservation wage, as function of covariates;
(2) hours worked equation: only observed if
woman is working, i.e. whenever market wage
exceeds reservation wage.
2. Migration Models
(1) net beneﬁt from migrating;
(2) income of migrants: only observed for mi-
grants.

29
Formal setup

(1) (latent) selection equation:
zn = w′ γ0 + un;
⋆
n

(2) equation of interest:
yn = x′ β0 + ϵn.
n

⋆
Sampling rule: yn observed if zn > 0.

Suppose
(          )                ((     ) ( 2       ))
ϵn           i.i.d.         0    σϵ ρσϵσu
∼ N              ,             ,
un                          0     ·  σu2

where ρ =       σϵu
σϵ σu
∈ (−1, 1).

Then,
E[yn|xn, obs.d] = x′ β0 + E[ϵn |zn > 0]
n
⋆

= x′ β0 + E[ϵn |un > −w′ γ0 ],
n                    n
i.e. need conditional distribution of ϵn , given un .
Refer to handout.

30
Application of properties of conditional normal dis-
tribution yields
(                  )
σϵ
ϵn|un ∼ N ρ un, σϵ (1 − ρ2 ) .
2
σu

Following similar steps as in computing moments of
truncated normal, use this conditional distribution
to get moments of incidentally truncated normal.
General result:

Lemma: Suppose
(   )        ((    ) ( 2       ))
ϵ   i.i.d.    µϵ    σϵ ρσϵσu
∼ N          ,             .
u             µu     ·  σu2

Then, for c a scalar constant,
(          )
c−µu
ϕ        σu
E[ϵ|u > c] = µϵ + ρσϵ                   (          )
c−µu
1−Φ                        σu
(     (        ))
c − µu
var(ϵ|u > c) = σϵ 1 − δ
2
σu
[            ]
ϕ(z)      ϕ(z)
δ(z) =                     −z .
1 − Φ(z) 1 − Φ(z)

Hence,
(            )
w′n γ0
ϕ − σ0
(†)    E[yn |xn , obs.d] =   x′ β0
n      + ρσϵ      (         ).
w′n γ0
1 − Φ − σ0

31
Conclusion: OLS applied to (2) yields biased and
inconsistent estimates since inverse Mills ratio is
omitted; inverse Mills ration accounts for non-random
sample selection, induced by (1).

Estimation:

Notice ﬁrst:
2
– σu is not identiﬁed, only ratio γ0 /σu is identiﬁ-
2
able; impose σu = 1;
2
– σϵ is not identiﬁed, only product ρσϵ is; impose
2
σϵ = 1.

Approaches:
(i) ML: eﬃcient, but computationally burdensome.
(ii) Heckman 2-step procedure:
(1) binomial probit to estimate γ0 ; impute inverse
Mills ratio for selected observations:
ϕ (−w′ ˆ)
nγ
ˆ
Mn =                .
1 − Φ (−w′ ˆ)
n γ
(2) OLS in (†) for selected observations:        For
ˆ
these, regress yn on xn and Mn.

32

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 8 posted: 10/1/2012 language: Unknown pages: 33