Document Sample

Limited Dependent Variable Models Limited dependent variables typically are (i) qualitative dependent variables; (ii) dependent variables having limited support. Models for such data are often derived from latent variable models. 1. Probabilistic Choice Models 1.1 Conditional Logit (D. McFadden, R.D. Luce) Suppose decision maker faces m discrete choice al- ternatives. Let Yj⋆ = indirect conditional utility of jth alterna- tive; latent, i.e. unobserved by econometrician. Econometrician observes: Yj = 1 if alternative j is chosen; = 0 if alternative j is not chosen; = 1{Yj⋆ =max{Y1⋆ ,··· ,Ym }} . ⋆ Notation: Binary variable 1{A} takes value 1 if event A occurs, and zero otherwise. Assume: no ties between alternatives. 1 Consider latent variable model: Yj⋆ = V (xj , θ) + ϵj , j = 1, · · · , m xj = attributes of alternative j V (·, ·) = indirect utility function, known up to parameter vector θ ϵj = residual, variation in tastes, perceptions, unobserved by econometrician. Assumption about {ϵj , j = 1, · · · , m}: ϵ1 , · · · , ϵm are i.i.d. with type 1 extreme value distribution, i.e. with CDF F (ϵ) = exp(− exp(−ϵ)), ϵ ∈ R, and with pdf f (ϵ) = exp(−ϵ − exp(−ϵ)). Lemma: Under above model assumptions, exp(Vi) Pr(Yi = 1|X) = ∑m , j=1 exp(Vj ) where X = [x1 , · · · , xm ] and Vj = V (xj , θ), j = 1, · · · , m. 2 Proof: Notice ﬁrst, Yi = 1 ⇔ Yi⋆ = max{y1 , · · · , Ym } ⋆ ⋆ ⇔ Vi + ϵi > Vj + ϵj , ∀j ̸= i ⇔ ϵj < ϵi + Vi − Vj , ∀j ̸= i ⇒ Pr (Yi = 1|X) = Pr (ϵj < ϵi + Vi − Vj , ∀j ̸= i) ∫ ∏ (by i.i.d.) = F (ϵi + Vi − Vj )f (ϵi )dϵi. R j̸=i 3 Consider the integrand: ∏ F (ϵi + Vi − Vj )f (ϵi) j̸=i ∏ = exp(− exp(−ϵi − Vi + Vj )) exp(−ϵi − exp(−ϵi)) j̸=i ∏ = exp(− exp(−ϵi − Vi ) exp(Vj )) exp(−ϵi − exp(−ϵi )) j̸=i ∑ = exp(−ϵi − exp(−ϵi )) exp(− exp(−ϵi − Vi) exp(Vj )) j̸=i ∑ exp(Vj ) = exp(−ϵi − exp(−ϵi )) exp − exp(−ϵi) exp(Vi) j̸=i ∑ exp(Vj ) = exp −ϵi − exp(−ϵi) 1 + exp(Vi) j̸=i ∑ exp(Vj ) m = exp −ϵi − exp(−ϵi) . j=1 exp(Vi) [∑ ] m exp(Vj ) Let λi = ln j=1 exp(Vi ) , so that ∑ exp(Vj ) m exp(λi) = j=1 exp(Vi) exp(Vi) exp(−λi) = ∑m . j=1 exp(Vj ) 4 Then, ∫ Pr(Yi = 1|X) = exp(−ϵi − exp(−ϵi + λi))dϵi R ∫ = exp(−λi) exp(−˜i − exp(−˜i))d˜i ϵ ϵ ϵ R where ˜i = ϵi − λi , distributed ϵ type 1 extreme value, shifted by λi, = exp(−λi) exp(Vi) = ∑m . j=1 exp(Vj ) Typical speciﬁcation: conditional indirect utility lin- ear in attributes, so that exp(x′ θ) Pr(Yi = 1|X) = ∑m i , i = 1, · · · , m. j=1 exp(x′ θ) j 5 Primary Features and Limitations of the Con- ditional Logit Model (i) Independence of Irrelevant Alternatives (IIA) prop- erty: The “odds ratio” for choice alternatives i and j is Pr(Yi = 1|X) exp(Vi) = , i, j = 1, · · · , m; Pr(Yj = 1|X) exp(Vj ) i.e. independent of (a) alternatives other than i and j, and (b) the total number m of alternatives. (ii) The IIA property is inappropriate in many applica- tions in which some choice alternatives are similar, or more closely related than others. Example: “Red bus, blue bus” problem (ﬁrst pointed out by G. De- breu) - Suppose there are 3 transport options, (op- tion 1) red bus, (option 2) blue bus, and (option 3) car, and travellers do not care about bus color and are indiﬀerent between car and bus. Then, one expects 1 Pr(red bus|bus) = Pr(blue bus|bus) = 2 1 Pr(bus) = Pr(car) = 2 1 Pr(red bus) = Pr(blue bus) = . 4 6 Hence, the odds of red bus vs. car are 1:1 if blue busses are not present, and 1:2 is blue busses are present. This is in contrast to the IIA property implied by the conditional logit model, which applied in this setting would imply Pr(red bus) = Pr(blue bus) = Pr(car) = 1 . 3 The problematic implication of the conditional logit model here: Model implicitly assumes that all three choice alternatives are independent, conditional on attributes (suppressed in above notation), while red and blue busses are perceived as similar (colors do not matter to travelers) and therefore cannot be considered independent. An appropriate model might hierarchically nest the choices: ﬁrst bus vs. car, and second, conditional on bus, red vs. blue. In choice situations as this, the conditional logit model predicts a joint probability for bus ( 2 ) which is higher than the true probability 3 of choosing bus ( 1 ). 2 7 (iii) Consider the change in demand for i in response to a change in an element of xj (e.g. increase in price xjl , the lth element of xj - the cross-price eﬀect of j on i). Let Vk = x′ θ, k = 1, , · · · , m. Then, k ∂ exp(x′ θ) exp(x′ θ) i j Pr(Yi = 1|X) = − (∑m ) θl ∂xjl ′ θ) 2 k=1 exp(xk = = −Pr(Yi = 1|X)Pr(Yj |X)θl , where θl ≤ 0 (conditional indirect utility is non- increasing in price). Therefore, the cross-price elasticity of demand for alternative i with respect to the price of alternative j is ∂ ηij := Pr(Yi = 1|X)xjl /Pr(Yi = 1|X) ∂xjl = −xjl θl Pr(Yj = 1|X) > 0, which is seen to be independent of (and hence iden- tical across all ) i and proportional to Pr(Yj = 1|X). Lost demand for alternative j is re-distributed in equal proportions to all other alternatives, regard- less of their proximity to j in the attribute or char- acteristics space. In the “red bus, blue bus” example, for instance, the conditional logit model would imply that a reduction in the frequency of blue busses leads to as many people switching to car as to red busses. 8 (iv) The conditional logit model is conditional on the choice set, i.e. all substitution occurs within the set of the m given alternatives; there is no “outside option” (such as not consuming any of the alter- natives). Hence, the model implies no change in demand (or no lost demand) in response to an increase in all prices by the same proportion. (v) Nonetheless: – Due to its convenience, model is widely applied, esp. in microeconometric demand analysis and in empirical Industrial Organization (cd. Berry (1994)). – IIA problems avoided when coeﬃcients θ are considered random (so-called Mixed MNL, cf. McFadden - Train; application in empirical IO: Berry, Levinsohn and Pakes (1995)). 9 1.2 Multinomial Probit This model has the advantage that it overcomes the limitations of the IIA property imposed by the conditional logit model, but it is computationally much more demanding in the case of large m. Latent model as before: Yi⋆ = V (xi; θ) + ϵi , i = 1, · · · , m Yi = 1{Yi⋆ =max{Y1⋆ ,··· ,Ym}} ⋆ ϵ = (ϵ1 , · · · , ϵm )′ ∼ N (0, Σ) where Σ is m × m p.d.s., and non-zero oﬀ-diagonal elements allow correlations between alternatives, which can be interpreted as arising form unobserved (by econometrician) attributes. Then, Pr(Yi = 1|X) = Pr(Yi⋆ > Yj⋆ , ∀j ̸= i) = Pr(ϵj − ϵi < Vi − Vj ∀j ̸= i) = m − 1 comparisons = m − 1 dimensional integral. For m ≥ 4, computing such integrals is very costly, no analytical solution exists, although such integrals can be approximated using simulation methods. 10 1.3 Nested Multinomial Logit Model (NMNL, McFadden) This model also to some extent overcomes IIA re- strictions and is tractable for large-dimensional (large m) problems, but it requires a hierarchical nesting structure (which amounts to a testable assump- tion). Example: Residential heating systems - level 1: room vs. central heating system; level 2: electric, gas, oil, photo-cellular system; transport - level 1: bus vs. car; level 2 (in ‘bus’ nest): red, blue bus. ⋆ Let Yij = Vij + ϵij denote the conditional indirect utility of alternative j (on the 2nd, bottom level) in nest i (1st, top level), for i = 1, · · · , c, and j = 1, · · · , Ni ; again, assume that ϵij are i.i.d. type 1 extreme value distributed r.v.s. 11 Then, in abbreviated notation, Prij = Prj|iPri exp(Vij ) Prj|i = ∑Ni m=1 exp(Vim ) ∑Ni j=1 exp(Vij ) Pri = ∑c ∑Nm m=1 n=1 exp(Vmn ) exp(Vij ) Prij = ∑c ∑Nm m=1 n=1 exp(Vmn ) i.e. the NMNL has an conditional logit structure at the bottom level within nests. So, NMNL exhibits IIA within nests, but not across nests. Deﬁne the inclusive value for nest i, ∑ Ni Ii = ln exp(Vij ) , i = 1, · · · , c. j=1 Then, exp(Vij ) Prj|i = exp(Ii ) exp(Ii ) Pri = ∑c . m=1 exp(Im ) 12 Estimation (i) ML, using standard techniques (Newton-Raphson, Method of Scoring or BHHH Algorithm); in the case of Multinomial Probit with proviso that the choice probabilities can be computed. (ii) Sequential methods Consider an extension of NMNL: exp(x′ β) ij Prj|i = exp(Ii) exp(z′ δ + Ii) Pri = ∑c i ′ m=1 exp(zm δ + Im ) where the z′ s vary across, but not within nest i, i i = 1, · · · , c. Estimate β from conditional choice models Prj|i ; ˆ given the ensuing estimate β , estimate inclusive val- ues ∑ Ni ˆ = ln Ii exp(x′ β ) , ij ˆ j=1 and use these and marginal choice probabilities for the nests, Pri, to estimate δ. 13 Extension: Generalized Extreme Value Model The NMNL model above implicitly imposes a coef- ﬁcient on the inclusive value which is restricted to one. The general NMNL model relaxes this to exp(z′ δ + (1 − σ)Ii ) Pri = ∑c i , m=1 exp(z′ δ + (1 − σ)Im) m where σ ∈ [0, 1] captures the possibility that there may be some dependence between the choices. Such models arise from the Generalized Extreme Value (GEV) Distribution F (ϵ1 , · · · , ϵn ) = exp (−G(exp(−ϵ1 ), · · · , exp(−ϵn ))) , where G(Y1 , · · · , Yn) is a non-negative function, ho- mogenous of degree 1, of (Y1 , · · · , Yn ) ≥ 0. Further restrictions are that limYi→+∞ G(Y1 , · · · , Yn) = ∞ for any i = 1, · · · , n, and that any kth partial derivative is non-negative for k odd and non-positive for k even. The special case ∑ n G(Y1 , · · · , Yn) = Yi i=1 yields the MNL model (independent extreme value distribution). 14 A bivariate example of a widely used GEV Distribu- tion is ( 1 1 )1−σ G(Y1 , Y2 ) = Y1 + Y2 1−σ 1−σ . Here, σ is approximately equal to the correlation between Y1 and Y2 . The implied binomial choice probabilities are exp (Vi/(1 − σ)) Pri = ∑2 , (⋆) k=1 exp (Vk /(1 − σ)) where Vi is the indirect conditional utility of alter- native i = 1, 2. If σ → 0, then these reduce to MNL choice prob- abilities; if σ → 1, then the indirect utilities of the alternatives are highly correlated, so that the in- duced choice probabilities amount to pure chance, i.e. Pr1 = Pr2 = 1/2. 15 Such GEV speciﬁcations allow to avoid the IIA prob- lems, as in the “red bus, blue bus” problem. For example, with 3 alternatives, deﬁne ( 1 1 )1−σ G(Y1 , Y2 , Y3 ) = Y1 + Y2 + Y3 1−σ 1−σ . The implied choice probabilities can be shown to be 1 ( 1 1 )−σ Y1 Y21−σ Y2 + Y3 1−σ 1−σ Pr1 = , Pr2 = , G(Y1 , Y2 , Y3 ) G(Y1 , Y2 , Y3 ) with an analogous expression for Pr3 . If only alternatives 1 and 2 are available (Y3 = 0), then the model reduces to the standard binomial logit. If only alternative 2 and 3 are available, then the implied binomial choice probabilities are as in (⋆). If all three alternatives are available, the odds ratio of 1 vs. 2 depends on the indirect utility of 3 (i.e. on Y3 = exp(V3 )). 16 2. Models for Censored and Truncated Data (i) Censoring: occurs if values of a r.v. in a certain subset of its support are transformed into a single value. (ii) Truncation: occurs if sample data are drawn from a certain subset of the support of the population dis- ⋆ ⋆ tribution; e.g. yn latent data, and observe yn = yn if yn ≥ 0; here, unlike in (i), whenever yn < 0, no ob- ⋆ ⋆ servation is recorded (while in (i), zero is recorded). Note: data can be censored, truncated, or trun- cated and censored. 17 2.1 Models for Truncated Data ⋆ ⋆ Observe yn = yn if yn > c, for some con- ⋆ ⋆ stant c. Let CDF of i.i.d. Yn be FY ⋆ (yn), fY ⋆ (yn) its pdf, and c ∈ supp(Y ⋆). Then, ⋆ for y > c, the CDF of observation Yn is FY (y) = FY ⋆ (y|Y ⋆ is observed) = Pr(c < Y ⋆ < y)/Pr(Y ⋆ > c) = [FY ⋆ (y) − FY ⋆ (c)] /(1 − FY ⋆ (c)), and its pdf is fY (y) = fY ⋆ (y)/(1 − FY ⋆ (c)). Example: Yn ∼ i.i.d. N (µ, σ 2); observe Yn = ⋆ ( ) ⋆ if Y ⋆ > c. Letting ϕ(x) = √1 exp − 1 x2 , Yn n 2 ∫x 2π the pdf of N (0, 1), and Φ(x) = −∞ ϕ(z)dz, it follows that ( ) 1 ϕ y−µ σ σ fY (y) = ( ). c−µ 1−Φ σ 18 Truncated Regression Model Selection on the basis of the response variable. Example: observe (log) earnings of low income fam- ilies; want to estimate general (log) earnings equa- tion in the population (Hausman and Wise, 1977): yn = x′ β0 + ϵn, ϵn|xn ∼ N (0, σ0 ); ⋆ n 2 ⋆ “low income family” deﬁned by yn < c, for some known constant c; observe only family incomes for ⋆ which yn < c: yn < c ⇔ ϵn < c − x′ β0 . ⋆ n Hence, ( ) 1 yn −x′n β0 σ0 ϕ σ0 fY (yn|xn; β0 , σ0 c) = 2 ( ) , c−x′n β0 Φ σ0 and therefore, the average log-likelihood is LN = LN (β, σ 2 ; y, X, c) ( y −x′ β ) 1 1 ∑ σϕ N n n ( ′ ) σ = ln N n=1 x Φ c−σ nβ 1 ( 2) 1 ∑ N ∝ − ln σ − 2 (yn − x′ β)2 n 2 2N σ n=1 [ ( )] 1 ∑ N c − x′ β − ln Φ n . N n=1 σ 19 Note: – The ﬁrst two summands correspond to the aver- age log-likelihood of the normal linear regression model; – the third (last) summand is a correction term that accounts for the truncation of the sample. This can be exploited in estimation (see below). Under the usual regularity conditions (as discussed √ under the heading of ML Theory), the MLE is N - CAN and eﬃcient. 20 If β0 were estimated by OLS, instead of ML, what would be the implication for the resulting estima- tor? Just saw: ML̸=OLS estimator, so OLS estimator cannot be BLUE. Which of the Gauss-Markov as- sumptions is/are not satisﬁed? Need to consider the conditional mean and condi- tional variance of the observed data. Useful Lemma: If X ∼ N (µ, σ 2 ) and c=const., then ( ) ϕ c−µ E[X|X < c] = µ − σ ( c−µ ) , σ Φ σ ( ( )) c−µ var(X|X < c) = σ 2 1 − δ , σ ( ) ( c−µ ) [ ( c−µ ) ] c−µ ϕ σ c−µ ϕ σ where δ = ( c−µ ) + ( c−µ ) . σ Φ σ σ Φ σ 21 The following auxiliary result will be useful (here and elsewhere). The derivative of the normal pdf with respect to x is ( ) ( ) 1 ′ x−µ d 1 x−µ ϕ = ϕ σ σ dx σ σ ( ) d 1 1 = √ exp − 2 (x − µ)2 dx 2πσ 2 2σ ( ) 1 x−µ 1 = −√ exp − 2 (x − µ)2 2π σ 3 2σ ( ) x − µ1 x−µ = − 2 ϕ . σ σ σ 22 Proof of the Lemma: ∫ c 1 ( x−µ ) ϕ σ E[X|X < c] = x σ ( c−µ ) dx −∞ Φ σ ∫ c ( ) 1 ϕ x−µ = (x ± µ) σ ( c−µ ) dx σ −∞ Φ σ ∫ c 1 ( x−µ ) −(x − µ) σ ϕ σ = µ − σ2 ( c−µ ) dx −∞ σ2 Φ σ ∫ c 1 ′ ( x−µ ) ϕ σ ( σ) = µ − σ2 c−µ dx −∞ Φ ( c−µ ) σ 1ϕ σ = µ − σ2 ( ) σ Φ c−µ ( c−µ )σ ϕ σ = µ − σ ( c−µ ) . Φ σ The expression for the conditional covariance can be derived in an analogous fashion (cp. Greene (2008), 6th edition, p.866). Applying the Lemma to the truncated regression model: E[yn|xn, obs.d] = x′ β0 + E[ϵn |ϵn < c − x′ β0 ] n ( ′ ) n ϕ c−xnβ0 σ = x′ β 0 − σ ( n ). c−x′n β0 Φ σ 23 Hence, (i) the conditional mean of yn , given xn, for the observed data is not linear in the estimable pa- rameter β0 ; ( ′ ) ϕ c− xn β0 ( ) amounts to σ (ii) the “inverse Mills ratio” Φ c− x′n β0 σ an omitted variable in the OLS linear regres- sion equation, and so OLS suﬀers from omitted- variable bias and is inconsistent; essentially, the OLS linear regression equation is a mis-speciﬁed model, because the correct regression equation is nonlinear in β0 ; (iii) the conditional variance of yn , given xn , for the observed data is not homoskedastic; in fact, ( ( ′β )) c − xn 0 var(yn|xn, obs.d) = σ0 1 − δ 2 , σ0 where ( ′ ) ( ′ ) ( ) ϕ c−xn β0 x ϕ c−σ0nβ0 c − x′ β0 c − x′ β0 ) ) ; σ0 δ n = ( ′ n + ( σ0 c−xn β0 σ0 c−x′n β0 Φ σ0 Φ σ0 i.e. this non-linear regression problem is het- eroskedastic, the conditional variances are infor- mative about β0 , and eﬃcient estimators need to account for heteroskedasticity (which OLS does not). 2.2 Models for Censored Data ⋆ Observe yn = yn1{yn >0} . ⋆ Again, let FY ⋆ (y) be CDF of latent Y ⋆ , and let 0 ∈ ⋆ supp(Yn ) for all n. Then, CDF of observation yn is FY (yn ) = [FY ⋆ (0)]1{yn=0} [FY ⋆ (yn)]1{yn >0} . “Pdf” is a mixture of probability mass function (at zero) and pdf (when yn > 0): fY (y) = [FY ⋆ (0)]1{yn =0} [fY ⋆ (yn )]1{yn>0} . Example: yn ∼ N (µ, σ 2 ); observe yn = yn1{yn >0} , so ⋆ ⋆ ⋆ that [ ( µ )]1{yn=0} [ 1 ( y − µ )]1{yn>0} n fY (yn ) = Φ − ϕ . σ σ σ 24 Censored Regression Model Example: Willingness to pay for a resource; surveys typically record non-negative WTP. ⋆ Let WTP yn be modelled as yn = x′ β0 + ϵn, ϵn|xn ∼ N (0, σ0 ). ⋆ n 2 Observe yn = 0 if yn ≤ 0 and yn = yn if yn > 0. ⋆ ⋆ ⋆ Then, [ ( ′ )]1{yn=0} [ ( )]1{yn>0} xn β0 1 yn − x ′ β 0 f (yn|xn; β0 , σ0 ) = Φ − 2 ϕ n σ0 σ0 σ0 Assuming i.i.d. data, log-likelihood function is ∑N ( ( ′ )) x β L(β, σ; y, X) = 1{yn =0} ln Φ − n n=1 σ ∑N ( ( )) 1 yn − x ′ βn + 1{yn>0} ln ϕ n=1 σ σ ∑[ N ( ( ′ )) x β = 1{yn=0} ln Φ − n σ n=1 ( ( ′ ))] x β +1{yn>0} ln 1 − Φ − n σ ( ) 1 yn −x′n β ∑N ϕ + 1{yn>0} ln σ ( ′ ) . σ n=1 1 − Φ xσβ n 25 ⇒ Log-likelihood function is sum of (i) binomial probit log-likelihood (censored vs. non- censored); (ii) log-likelihood of truncated data (subsample of non-censored data). This suggests two approaches to estimation: √ (1) ML: Under regularity conditions, MLE is N -CAN and eﬃcient; MLE can be obtained from both rep- resentations of the log-likelihood function. (2) Two-step procedure (due to J. Heckman), exploit- ing decomposition of log-likelihood in (i) and (ii). 26 Heckman 2-Step Procedure ′ (i) Estimate β0 /σ0 = (β01 /σ0 , · · · , β0k /σ0 ) using bino- mial probit on censored vs. non-censored; denote (ˆ) these ML estimates by β .σ (ii) Estimate the inverse Mills ratio by ( (ˆ)) ϕ −x′ β n σ ˆ Mn = ( (ˆ)) , 1 − Φ −x′ βn σ as an estimate in conditional mean of truncated data (non-censored subsample): ( ′ ) ϕ − xσβ0 n E[yn |xn , yn > 0] = x′ β0 + σ0 ( ′ ), 0 n 1 − Φ − xσβ0 n 0 i.e. impute missing variable in OLS regression; 2 then, in 2nd step, estimate β0 and σ0 by OLS (yn ˆ on xn and Mn). 27 Note: √ – This yields N -CAN estimator. – This estimator is ineﬃcient (relative to MLE), because information about β0 and σ0 in condi- tional variance is not used. – Procedure is computationally attractive, because it uses canned estimation routines (probit and OLS; trade-oﬀ: computational ease vs. eﬃ- ciency). – In censored and truncated regression models, conditional mean is nonlinear in β0 , even though latent variable model is linear. These mod- els belong to large class of nonlinear regression models: E[yn |xn ; θ] = g(xn , θ), for some (smooth) known function g(·, ·), and θ ∈ Rk unknown. 28 Incidental Truncation: Non-random Sample Se- lection, Self-Selection • General setup: (1) (latent) selection equation; (2) equation of (primary) interest. Classic example: 1. Female Labor Supply (1) wage equation: diﬀerence between market and reservation wage, as function of covariates; (2) hours worked equation: only observed if woman is working, i.e. whenever market wage exceeds reservation wage. 2. Migration Models (1) net beneﬁt from migrating; (2) income of migrants: only observed for mi- grants. 29 Formal setup (1) (latent) selection equation: zn = w′ γ0 + un; ⋆ n (2) equation of interest: yn = x′ β0 + ϵn. n ⋆ Sampling rule: yn observed if zn > 0. Suppose ( ) (( ) ( 2 )) ϵn i.i.d. 0 σϵ ρσϵσu ∼ N , , un 0 · σu2 where ρ = σϵu σϵ σu ∈ (−1, 1). Then, E[yn|xn, obs.d] = x′ β0 + E[ϵn |zn > 0] n ⋆ = x′ β0 + E[ϵn |un > −w′ γ0 ], n n i.e. need conditional distribution of ϵn , given un . Refer to handout. 30 Application of properties of conditional normal dis- tribution yields ( ) σϵ ϵn|un ∼ N ρ un, σϵ (1 − ρ2 ) . 2 σu Following similar steps as in computing moments of truncated normal, use this conditional distribution to get moments of incidentally truncated normal. General result: Lemma: Suppose ( ) (( ) ( 2 )) ϵ i.i.d. µϵ σϵ ρσϵσu ∼ N , . u µu · σu2 Then, for c a scalar constant, ( ) c−µu ϕ σu E[ϵ|u > c] = µϵ + ρσϵ ( ) c−µu 1−Φ σu ( ( )) c − µu var(ϵ|u > c) = σϵ 1 − δ 2 σu [ ] ϕ(z) ϕ(z) δ(z) = −z . 1 − Φ(z) 1 − Φ(z) Hence, ( ) w′n γ0 ϕ − σ0 (†) E[yn |xn , obs.d] = x′ β0 n + ρσϵ ( ). w′n γ0 1 − Φ − σ0 31 Conclusion: OLS applied to (2) yields biased and inconsistent estimates since inverse Mills ratio is omitted; inverse Mills ration accounts for non-random sample selection, induced by (1). Estimation: Notice ﬁrst: 2 – σu is not identiﬁed, only ratio γ0 /σu is identiﬁ- 2 able; impose σu = 1; 2 – σϵ is not identiﬁed, only product ρσϵ is; impose 2 σϵ = 1. Approaches: (i) ML: eﬃcient, but computationally burdensome. (ii) Heckman 2-step procedure: (1) binomial probit to estimate γ0 ; impute inverse Mills ratio for selected observations: ϕ (−w′ ˆ) nγ ˆ Mn = . 1 − Φ (−w′ ˆ) n γ (2) OLS in (†) for selected observations: For ˆ these, regress yn on xn and Mn. 32

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 8 |

posted: | 10/1/2012 |

language: | Unknown |

pages: | 33 |

OTHER DOCS BY alicejenny

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.