# Lecture 6 â€“ Limited Dependent Variable Models

Document Sample

```					Limited Dependent Variable
Models & Sample Selection

Lecture 10
EMET 8002
August 28, 2009
Review
In Thursday’s lecture we studied two types of limited dependent
variables:
Binary variables (i.e., y=1 or y=0)
Corner Solutions (i.e., y>0 and y=0)

We learned in both cases the OLS estimator remains unbiased
and consistent (under the first four OLS assumptions), but the
linear model could lead to some strange behavior:
Predicted values greater than 1 or less than 0 in the binary
context
Predicted values less than 0 in the corner solution context
Review
We subsequently studied probit and logit models for the case of
a binary dependent variable and the Tobit model for the case of
a corner solution dependent variable

Using maximum likelihood estimation we obtain consistent
estimators of the model parameters

The biggest difference with using OLS is that the coefficients
cannot be directly interpreted as the partials effects on the
E(y|x)

Instead the partial effects of xj on E(y|x) depends on all the
parameters (including σ in the Tobit model) and on the values
of x
Count data
Today we are going to introduce a third commonly
encountered type of limited dependent variable, a
count variable – a dependent variable which can
take on nonnegative integer values: {0,1,2,…}
We are especially interested in cases where y takes
on a relatively small number of values
For example:
The   number   children born to a woman
The   number   of courses a student enrolls in
The   number   of jobs a person has throughout their career
The   number   of unemployment spells
Count data
Similar to the problems encountered for binary and Tobit
response variables, a linear model for E(y|x) may not provide a
good fit for all values of the explanatory variables
linear model if for no other reason than this is a very simple
model to interpret and understand

A useful approach is to model E(y|x) as an exponential function:
E ( y | x ) = exp ( β 0 + β1 x1 + ... + β k xk )

Since exp(z) is always positive, the predicted values of y will
always be positive
Count response variable
Moreover, we have the added benefit of easy
interpretation when we use the exponential function:
ln ⎡ E ( y | x ) ⎤ = β 0 + β1 x1 + ... + β k xk
⎣             ⎦
∂ ln ⎡ E ( y | x ) ⎤
⎣             ⎦
∴β j =
∂x j
or
%ΔE ( y | x ) ≈ (100 β j ) Δx j
Poisson Regression Model
If we need a more accurate estimate, instead of the
previous approximation, we can keep all explanatory
variables fixed, except for the one of interest and find
the proportionate change in E(y|x):
exp ( β k Δxk ) − 1

Bottom line: we can interpret the coefficients as if
the model is linear, where ln(y) is the dependent
variable
Poisson Regression Model
Since the E(y|x) is modelled as being nonlinear in the
parameters we cannot use linear regression models

We could use nonlinear least squares, which also minimizes the
sum of squared residuals, but this estimator ignores the
heteroskedasticity present in standard count data (see
Wooldridge, 2002, Chapter 12)

Thus, we will use maximum likelihood estimation. This requires
to assume a distribution for the error term.
Poisson Regression Model
Unlike the Tobit and probit models, it makes no
sense to assume a normal distribution for the error
term since the normal distribution is for continuous
variables

Count data is obviously not continuous!

Instead, the conventional distribution assumed is the
Poisson distribution
Poisson Regression Model
The Poission distribution is determined entirely by its
mean, so we only need to specify a model for E(y|x),
which we assume has the same form as specified
earlier

Then:
exp ⎡ − exp ( xβ ) ⎤ ⎡exp ( xβ ) ⎤
h

P ( y = h | x) =       ⎣              ⎦⎣            ⎦       , h = 0,1, 2,...
h!
Poisson Regression Model
Given a random sample of x and y, we can construct
the log-likelihood function and use maximum
likelihood estimation to produce consistent estimators
of the model parameters
n            n
L ( β ) = ∑ li ( β ) = ∑ ⎡ yi xi β − exp ( xi β ) ⎤
⎣                        ⎦
i =1         i =1

We cannot directly compare the magnitudes of the
Poisson estimates with OLS estimates

The partial effect of xj with respect to E(y|x) is
exp(xβ)βj
Poisson Regression Model
Thus, we need a scale factor, the same as for the
models studied yesterday

The Average Partial Effect (APE) is given by:

∑ exp ( xβ )βˆ
n                                  n
APE = n   −1

i =1
ˆ
j
ˆ
= β jn   −1
∑y
ˆ
i =1
i
ˆ
= yβ j

Thus, we can compare the magnitude of the OLS
estimate to the Poisson estimate multiplied by the
sample mean of the dependent variable
Poisson Regression Model
Using the Poisson MLE is a natural first step, but it
can be restrictive, as all higher moments of the
Poisson distribution are determined by the mean
(notice how we did not make a variance assumption
yet!)

In particular, the conditional variance is equal to the
conditional mean under the Poisson distribution
var(y|x)=E(y|x)
Poisson Regression Model
Obviously this is a restrictive assumption to make –
we never made any assumption like that before when
we were doing OLS

Nonetheless, even if the Poisson distribution fails to
hold, we will still get consistent, asymptotically
normal estimators of β! (see Wooldridge, 2002,
Chapter 19)

Unfortunately, the reported standard errors will not
be correct
Poisson Regression Model
If we use the Poisson MLE, but do not assume that
the Poisson distribution is entirely correct, we use a
procedure called quasi-maximum likelihood
estimation (QMLE)

We allow for a simple adjustment of the standard
errors:
var ( y | x ) = σ 2 E ( y | x )
where σ2>0 is an unknown parameter
Poisson Regression Model
Under this relaxed conditional variance assumption,
it is easy to adjust the reported standard errors:
1.   Obtain the Poisson MLE estimates
2.   Define the residuals as ui = yi − yi
ˆ         ˆ
3.   A consistent estimator of σ2 is given by:
1        n
ui2
ˆ
∑y
n − k − 1 i =1 ˆi
4.   We then multiply the standard errors from the
Poisson estimate by the square root of the variance
estimate to obtain the Poisson QMLE
Example 17.3: Number of
Arrests
We’ll apply the Poisson model to study the number of
arrests using the dataset CRIME1.RAW

The dependent variable is narr86, the number of
times a man is arrested in 1986
Example 17.3: Number of
Arrests
OLS      Poisson   Poisson     APE
(QMLE)    Poisson
(QMLE)
Proportion      -0.132    -0.402   -0.402     -0.162
of prior       (0.040)   (0.085)   (1.05)    (0.040)
arrests that
led to
conviction
Black           0.327     0.661     0.661     0.328
(0.045)   (0.074)   (0.091)   (0.061)
Poisson Regression Example:
Number of Arrests
The OLS estimate suggests that if the proportion of prior arrests
leading to a conviction increase by 10 percent (i.e., Δx=0.1)
then the expected number of arrests falls by 0.013

We cannot directly compare this to the Poisson coefficient,
which suggests that the same increase in the proportion of prior
arrests leading to a conviction reduces expected arrests by

We can compare the magnitude of the OLS coefficient to the
APE from the Poisson model, which suggests a fall of 0.016 in
the number of arrests
Example of Poisson: Silva and
Tenreyro (2006)
International trade theory gives predictions for a
gravity model of international trade flows

Often the equation looks like:
α
Tij = α 0Yiα1Y jα 2 Dij 3

The stochastic model is given by:
Tij = α 0Yiα1Y jα 2 Dij 3ηij , E (ηij | Yi , Y j , Dij ) = 1
α
Example of Poisson: Silva and
Tenreyro (2006)
estimating this model using the following
specification:
ln Tij = ln α 0 + α1 ln Yi + α 2 ln Y j + α 3 ln Dij + ln ηij
This introduces two problems:
Possible inconsistency
What to do with 0 trade flows
Example of Poisson: Silva and
Tenreyro (2006)
Possible inconsistency:
The expected value of the ln of a random variable
depends both on the mean of the random variable and
the higher order moments
Hence, if the higher order moments of η depend on
any of the explanatory variables, then the expected
value of ln(η) will also depend on these explanatory
variables
Thus, OLS is no longer consistent as the explanatory
variables are correlated with the error term ln(η)
Example of Poisson: Silva and
Tenreyro (2006)
What to do with 0 trade flows
ln(0) is undefined
Researchers have generally used one of three options:
Drop all observations with 0 trade flows from the sample
Use a Tobit estimator
Each of these can introduce inconsistencies into the
econometric model
Example of Poisson: Silva and
Tenreyro (2006)
The stochastic model can be written in an
exponential model using:
Tij = exp ( ln α 0 + α1 ln Yi + α 2 ln Y j + α 3 ln Dij ) + ε ij
Example of Poisson: Silva and
Tenreyro (2006)
OLS       OLS         PPML      PPML
ln(Tij)   ln(Tij+1)   Tij>0     Tij

log          0.938     1.128       0.721     0.733
exporter’s   (0.012)   (0.011)     (0.027)   (0.027)
GDP
log          0.798     0.866       0.732     0.741
importer’s   (0.012)   (0.012)     (0.028)   (0.027)
GDP
log distance -1.166    -1.151      -0.766    -0.784
(0.034)   (0.040)     (0.055)   (0.055)

RESET test   0.000     0.000       0.941     0.331
p-value
Example of Poisson: Silva and
Tenreyro (2006)
Interpretation:
First, notice that the coefficient estimates for the two PPML
columns are very similar. This suggests that sample
truncation (dropping 0 observations) is not important for
explaining differences between the OLS and PPML estimates.
Instead, heteroskedasticity is the more likely culprit.
The null hypothesis of homoskedastic errors is strongly rejected
The PPML estimates reveal that the elasticity of trade to
exporter’s and importer’s GDP is not 1, as commonly
believed
Geographical distance is less of a trade deterrent according
to the PPML estimates
Perform a heteroskedasticity-robust RESET test
The OLS specifications do not pass the RESET test, whereas
the PPML specifications do
Example of Poisson: Silva and
Tenreyro (2006)
Take home message:
Pay attention to theory as a motivator for your
empirical specification
In this case, log-linearizing the empirical model in the
presence of heteroskedasticity lead to inconsistent
estimates
Censored Regression Models
In the previous examples of limited dependent variable models,
there was never an issue of observing the outcome:
In the Tobit model it was the case that a non-trivial fraction
of the observations took the value 0
Similarly, in the Poisson model we observed the outcome for
each observation, but again some values were 0

In Censored Regression Models we do not observe the true
value of the dependent variable for some observations
This is essentially a missing variable problem, but where we
have some information on the missing value
Censored Regression Models
Examples:
In some household surveys, household income may be
“top coded” to help ensure the anonymity of
respondents. In other words, for individuals with high
incomes we don’t observe the income level, but rather
that it is above some threshold
y={10000, 75000, >250000, 25000, 43000, …}

In jurisdictions with minimum wage laws, we do not
observe the value of the marginal product of labour for
workers paid the minimum wage
Censored Regression Models
and Tobit Models
Recall that in the Tobit model we motivated the
estimation strategy by the use of an unobserved
latent variable y*

The observed outcome variable, y, was related to y*
according to y=max(0,y*)

It is a common mistake to think of this situation as a
censored model since y* is censored at 0. However,
y* is not our variable of interest. The variable of
interest is y and we have no problems observing y.
Censored Regression Models
We are going to study the censored normal
regression model:
yi = xi β + ui , ui | xi , ci ~ N ( 0, σ 2 )
wi = min ( yi , ci )

We only observe yi if it is below the threshold ci.
Otherwise we observe ci. We have explicitly allowed
the threshold to differ between units of observation.

This censoring is represented by the variable wi.

31
Censored Regression Models
If we observed a random sample of (x,y) we could
simply use OLS and statistical inference would be
standard

If we tried to use OLS estimation only on the
uncensored observations (i.e., yi<ci) the estimators is
inconsistent

An OLS estimator on the sample (x,w) is similarly
inconsistent unless there is no censoring (i.e.,
(x,w)=(x,y))
Censored Regression Models
If the model is given by:
yi = xi β + ui , ui | xi , ci ~ N ( 0, σ 2 )
wi = min ( yi , ci )
We can estimate it using maximum likelihood on a
random sample of (xi,wi), but we need the density of
wi given (xi,ci)
For wi=yi, the density of wi is the same as that for yi:
N(xiβ,σ2)
For wi=ci, we need the probability given xi
⎡ ci − xi β ⎤
P ( wi = ci | xi ) = P ( yi ≥ ci | xi ) = P ( ui ≥ ci − xi β ) = 1 − Φ ⎢
⎣ σ ⎥       ⎦
Censored Regression Models
We can combine these two parts to obtain the
density of wi given (xi,ci):
⎧      ⎡ ci − xi β ⎤
⎪1 − Φ ⎢ σ ⎥ , w = ci
⎪      ⎣           ⎦
f ( w | xi , ci ) = ⎨
⎪ 1 φ ⎡ ci − xi β ⎤ , w < c
⎪σ ⎢ σ ⎥
⎩     ⎣           ⎦
i
Censored Regression Models
Unlike Tobit models, we can directly interpret the
coefficients in terms of their partial effects on the
observed variable y (just like using OLS on a linear
model)
Example: Duration of
Recidivism
Using the dataset recid.dta we are going to explore what factors
influence the number of months until a former inmate is
arrested again after being released from prison

Out of 1,445 former inmates, 893 had not been arrested since
their release during the time period in which they were followed
Thus, these observations are censored
They are top coded according to the number of months that
they were observed after release
This threshold varies between 70 and 81 months, depending
on how early they were released relative to when the study
concluded
The dependent variable is ln(duration)
Example: Duration of
Recidivism
Workprg            -0.063   Drugs      -2.98
(0.120)             (0.133)
# of prior         -0.137   Black      -0.543
convictions       (0.021)             (0.117)
Total months       -0.019   Married    0.341
spent in prison   (0.003)             (0.140)
Felon              0.444    Educ       0.023
(0.145)             (0.025)
alcohol            -0.635   Age        0.0039
(0.144)             (0.0006)
Example: Duration of
Recidivism
An inmate with one more prior conviction has a duration until next
arrest that is almost 14% less   (-0.137×1 – remember that the
dependent variable is ln(duration) so the coefficients are interpreted as
% changes)

An addition year of time served reduces the duration until next offence

Those with a history of alcohol or drugs have substantially shorter
expected durations, as do black men compared to white men – this
might suggest running two separate models, one each for white and
black men

Being involved in a work program, the only policy variable in the
regression, does not have a statistically significant impact
Censored Regression Models
As per usual, if any of our modeling assumptions are false, the
ML estimators are generally inconsistent
Heteroskedasticity
Nonnormality

Thus, censoring is perhaps very costly. If the data were not
censored, we could use OLS to obtain consistent estimators
without assuming homoskedasticity or normality

Of course, we can always turn to Wooldridge (2002) for more
advanced methods that will allow us to loosen some of these
assumptions (see Chapter 16 for censored regression models
and Chapter 20 for duration analysis)
Truncated Regression Models
Unlike censored models, we do not observe any
information for a subset of the population, including
explanatory variables

The truncated normal regression model begins
with an underlying population model that satisfies the
classic linear assumptions:
y = xβ + u , u | x ~ N ( 0, σ 2 )
Truncated Regression Models
If we observe a random sample, then OLS is the most efficient
estimator, but we are violating assumption MLR.2 since we do
not have a random sample

In particular, a random draw (xi,yi) is observed only if yi≤ci,
where the truncation threshold can depend on exogenous
variables including the x’s

To estimate the parameters in the model (βj’s and σ) we need
the distribution of yi given it is below the threshold ci
Truncated Regression Models
f ( y | x i β, σ 2 )
g ( y | xi , ci ) =                              ,   y ≤ ci
F ( ci | xi β, σ   2
)
From this, we can construct the log-likelihood
function and estimate the parameters using MLE

This leads to consistent, approximately normal
estimators
Truncated Regression Models
Example:
Hausman and Wise (1977) emphasize the OLS applied
to sample truncated from above generally produces
estimators biased towards 0
Suppose we are investigating the relationship between
income and education, but we only observe people for
whom income is below some threshold
Wages and education
25          20
average hourly earnings
10     5
0  15

0   5           10           15   20
years of education
Sample Selection Corrections
Often, respondents fail to respond to particular survey questions
(stupid respondents). This generally leaves us with two initial
options:
Don’t include variables in which there are missing values, to
keep the full sample size
Drop observations with missing values to keep all the
variables of interest

This situation is known as nonrandom sample selection.
Thus far, for all our estimators except for the truncated model,
we have assumed that we had a random sample of (x,y). This
assumption was always part of the set of assumptions involved
in claiming we had either an unbiased or consistent estimator.
Sample Selection Corrections
Nonrandom sample selection can arise in panel data settings:
Suppose we have two years of data, but, due to attrition,
some people leave the sample are thus not reinterviewed in
the second year
This can be very problematic for policy analysis if attrition is
related to the effectiveness of a program
Example:
Suppose we are testing for convergence of per capita income
across countries, as predicted by the Solow model.
Furthermore, suppose we only select those countries for which
data is easily available to be included in our sample. In the
past, only rich countries could afford decent statistics bureaus.
Thus, the final sample was only composed of rich countries,
regardless of whether they were rich or poor at the beginning
of the sample. We’ll find convergence due to the nonrandom
sample.
Sample Selection Corrections
Income

Time
Sample Selection Corrections:
Is OLS Biased?
Consider the following population model:
y = β 0 + β1 x1 + ... + β k xk + u , E (u | x) = 0

Let n be the size of a random sample from the
population

Define si=1 if we observe all of (xi,yi) for observation
for i and si=0 if we don’t. Thus si=1 indicates that we
will use the observation in our analysis and si=0
indicates that we will drop it
Sample Selection Corrections:
Is OLS Biased?
We are interested in the properties of the OLS estimator applied to the
selected sample, with n1<n observations

We effectively estimate the following regression:
si yi = si xi β + si ui
The OLS estimators from the above regression are consistent if the
error term has zero mean and is uncorrelated with each explanatory
variable
E(su)=0
E[(sxj)(su)]=E(sxju)=0

The key condition for an unbiased estimator is:
E(su|x)=0
Sample Selection Corrections: When is
OLS on the Selected Sample Consistent?

There are three cases for which OLS is still unbiased
or at least consistent:
s is a function only of the explanatory variables
sample selection is entirely random
s depends on the explanatory variables and additional
random terms that are independent of x and u
Sample Selection Corrections: When is
OLS on the Selected Sample Consistent?

s is a function only of the explanatory variables
E(su|sx)=E(s|sx)E(u|sx)=sE(u|sx)=sE(u|x)=0
Thus, OLS on the selected sample is unbiased in this
case
Example: Suppose we are estimating a wage
regression where the dependent variables are
education, experience, tenure, gender, and marital
status which are assumed to be exogenous. If
selection is based solely on these variables (i.e., only
individuals with at least one year of tenure) then OLS
is unbiased.
Sample Selection Corrections: When is
OLS on the Selected Sample Consistent?

Sample selection is entirely random
E(sxju)=E(s)E(xju)=0
Intuitively, if we started with a random sample and
randomly dropped observations all we have done is
decreased the sample size.
Assuming there is no perfect multicollinearity in the
selected sample, OLS will still be unbiased
Sample Selection Corrections: When is
OLS on the Selected Sample Consistent?
s depends on the explanatory variables and additional random
terms that are independent of x and u
The argument here is very similar to the argument when s is
a function solely of x
Example:
Suppose IQ is an explanatory variable in a wage regression,
but this information is missing for some individuals
Suppose s=1 if IQ≥v and s=0 if IQ<v where v is an
unobserved random variable that is independent of IQ, u and
the remaining x’s
Thus, conditional on the explanatory variables, s is independent
of u
Sample Selection Corrections:
Incidental Truncation
Let’s consider the case of Incidental Truncation
We observe all of the x variables for each observation,
but we only observe y for some of the observations

A good example is when we are interested in the ln of
the wage offer as the dependent variable. For
individuals not in the work force we do not observe the
wage offer.

This is incidental truncation because it depends on
another variable, labour force participation
Sample Selection Corrections
The usual approach to incidental truncation is to add an explicit
selection equation into the model
y = xβ + u , E (u | x) = 0
s = 1[ zγ + v ≥ 0]
We will generally assume that the explanatory variables in the
selection equation are exogenous in the population model
equation:
E(u|x,z)=0

Furthermore, we will require x to be a strict subset of z (i.e.,
there is at least one variable that influences selection, but does
not influence the outcome)
Sample Selection Corrections
Further estimation assumptions:
The error term, v, in the sample selection equation is independent of z
v has a standard normal distribution

Correlation between u and v causes a sample selection problem

E ( y | z, v ) = E ( xβ + u | z, v )               Definition of y

= E ( xβ | z, v ) + E ( u | z, v )                Property of E() operator

= xβ + E ( u | z, v )                       We assumed z (x) is exogenous

= xβ + E ( u | v )                     We’ll assume that (u,v) is independent of z
Sample Selection Corrections
Now, suppose that (u,v) has a jointly normal
distribution. One of the implications is that for some
parameter, ρ, we can express the conditional
expectation as: E(u|v)=ρv. Then…
E ( y | z, v ) = xβ + ρ v

We don’t observe v, but we can use the above
equation to compute E(y|z,s)
Sample Selection Corrections
E ( y | z, s ) = E ( xβ + u | z, s )   Definition of y

= E ( xβ | z, s ) + E ( u | z, s )     Property of the E() operator

= xβ + E ( u | z, s )                  We assumed z is exogenous

= xβ + E ⎡ E ( u | z, v ) | z, s ⎤
⎣                       ⎦     Law of Iterated Expectations

= xβ + E ⎡ E ( u | v ) | z, s ⎤
⎣                    ⎦        We assumed (u,v) is independent of z

= xβ + E [ ρ v | z, s ]                We assumed (u,v) are jointly normal

= xβ + ρ E [ v | z, s ]
Sample Selection Corrections
Because the selected sample has only s=1, we need only find
E(y|z,s=1)

If s=1, then it must be that:
zγ + v > 0 or v > − zγ

Since we assumed v is normally distributed
φ ( zγ )
∴ E [ v | z, s ] = E ( v | v > − zγ ) =              = λ ( zγ )
Φ ( zγ )
(See Johnson, Kotz and Balakrishna (1994, pp. 156-158) for details)

Note, this formula applies to truncation from below.
Sample Selection Corrections
Thus, we can rewrite our conditional expectation as:
E ( y | z, s = 1) = xβ + ρλ ( zγ )

This formula allows us to see that using OLS on the
selected sample will be consistent if ρ=0, since the
adjustment factor is no longer relevant

ρ=0 when u and v are uncorrelated (in other words,
the unobserved characteristics that influence
selection cannot be correlated with the unobserved
characteristics that influence the outcome)
Sample Selection Corrections
Of course, we do not observe γ and thus we need to
estimate it. Under the previous assumptions:
P ( s = 1| z ) = Φ ( zγ )

We can use a probit model (on the entire sample) to
estimate γ
Summary of the Heckit
Procedure
1.   Using all observations, estimate a probit model of si on zi and
ˆ
obtain the estimates γ

2.   Compute the inverse Mills ratio for each observation:
λi = λ ( z i γ )
ˆ            ˆ

3.   Using the selected sample, run the regression of
yi           ˆ
on xi , λi

The estimator is consistent and approximately normally
distributed
Testing for sample selection
Under the null hypothesis of no sample selection,
ˆ
ρ=0 and we can use the usual t statistic on λ
Implications
All variables that appear in the outcome equations (the x’s)
should appear in the selection equation

We need at least one variable to appear in the selection
equation that does not appear in the outcome equation
This is not absolutely necessary
The inverse Mills ratio is a nonlinear function of z. However,
it is often well approximated by a linear function. Thus, if
z=x, the estimate of the inverse Mills ratio will be highly
correlated with x, potentially leading to a problem of
multicollinearity
Example: Wage offer for
married women
We’re going to apply the sample selection procedure
to the data on married women in mroz.dta

y=ln(wage)
x={education, experience, experience2}
z={x, other income, # young children, # old
children}

heckman lwage educ exper expersq, select(inlf=educ
exper expersq nwifeinc age kidslt6 kidsge6) twostep
Example: Wage offer for
married women
OLS        Heckit

Education       0.108       0.109
(0.014)     (0.016)
Experience      0.042       0.044
(0.012)     (0.016)
Experience2    -0.00081    -0.00086
(0.00039)   (0.00044)
ˆ                           0.032
λ
(0.134)
Example: Wage offer for
married women
The Heckitt coefficients can be directly interpreted as
partial effects on the ln(wage). Remember, these are
from a linear equation estimated using OLS. It is
simply for a reduced sample (only those women that
report a wage) and with the adjustment for selection
factor.

In this instance, there is no statistical evidence of a
selection effect. The estimate on the inverse Mills
ratio is not statistically significant.
Endogeneity and selection
We need an exogenous variable that influences
selection and another exogenous variable that
influences the endogenous variable, but both have to
be excluded for the structural model
Practice questions:
17.4, 17.5(iii), 17.6, 17.7
C17.5(i and ii), C17.6, C17.7, C17.8
Computer Exercise C17.8
The file JTRAIN2.RAW contains data on a job training
experiment for a group of men. Men could enter the program
starting in January 1976 through about mid-1977. The program
ended in December 1977. The idea is to test whether
participation in the job training program had an effect on
unemployment probabilities and earnings in 1978.

The variable train is the job training indicator. How many men
in the sample participated in the job training program? What
was the highest number of months a man actually participated
in the program?
Computer Exercise C17.8
tab train
This shows us that 185 men out of 445 participated
in the program

tab mostrn
This shows us that the maximum duration of
participating in the program was 24 months
Computer Exercise C17.8
Run a linear regression of train on unem74, unem75,
age, educ, black, hisp, and married. Are these
variables jointly significant at the 5% level?

regress train unem74 unem75 age educ black hisp
married
Computer Exercise C17.8
Source |       SS       df      MS          Number of obs =   445
-------------+------------------------------   F( 7, 437) = 1.43
Model | 2.41922955          7 .345604222        Prob > F   = 0.1915
Residual | 105.670658 437 .241809286                R-squared   = 0.0224
Total | 108.089888 444 .243445693               Root MSE    = .49174

------------------------------------------------------------------------------
train |      Coef. Std. Err.        t P>|t|        [95% Conf. Interval]
-------------+----------------------------------------------------------------
unem74 |        .02088 .0772939           0.27 0.787 -.1310341            .172794
unem75 | -.0955711 .0719021 -1.33 0.184                        -.236888 .0457459
age | .0032057 .0034027               0.94 0.347         -.003482 .0098933
educ | .0120131 .0133419               0.90 0.368 -.0142092 .0382354
black | -.0816663 .0877325 -0.93 0.352 -.2540963 .0907637
hisp | -.2000168 .1169708 -1.71 0.088 -.4299122 .0298785
married | .0372887 .0644037                0.58 0.563 -.0892909 .1638683
_cons | .3380222 .1894451                1.78 0.075 -.0343147 .7103591
Computer Exercise C17.8
Estimate a probit version of the previous regression.
Are the variables jointly significant?

probit train unem74 unem75 age educ black hisp
married
Computer Exercise C17.8
Probit regression                      Number of obs =      445
LR chi2(7)   =    10.18
Prob > chi2   =   0.1785
Log likelihood = -297.0088                 Pseudo R2    =   0.0169

------------------------------------------------------------------------------
train |      Coef. Std. Err.        z P>|z|         [95% Conf. Interval]
-------------+----------------------------------------------------------------
unem74 | .0530256 .1992686                  0.27 0.790 -.3375337 .4435849
unem75 | -.2477249            .18505 -1.34 0.181 -.6104163 .1149665
age | .0083443 .0087982               0.95 0.343 -.0088999 .0255886
educ | .0314431 .0343238               0.92 0.360 -.0358304 .0987165
black | -.2069299 .2249003 -0.92 0.358 -.6477264 .2338666
hisp | -.5397772 .3085029 -1.75 0.080 -1.144432 .0648773
married | .0966251 .1655823                0.58 0.560 -.2279101 .4211604
_cons | -.4241079 .4870267 -0.87 0.384 -1.378663 .5304469
Computer Exercise C17.8
Based on the previous two answers, can we treat
participation in job training as exogenous for
explaining unemployment in 1978? Explain.
Since participation in the program was randomly
assigned it is not surprising that train appears to be
independent appears to be independent of other
factors.
However, there can be a difference between eligibility
and actual participation as men can refuse to participate
Computer Exercise C17.8
Run a simple regression of unem78 on train.
Interpret.
unem78   Coef.    Std. Err.   t       P>t     [95% Conf.Interval]

train    -.1106029 .0441888   -2.50   0.013   -.1974486 -.0237572
_cons    .3538462 .0284917    12.42   0.000   .2978505 .4098418

Participating in the job training program lowers the
estimated probability of being unemployed in 1978
by 11.1 percent. The probability of being unemployed
in 1978 in this sample was 35.4 percent, so this is a
large difference.
Computer Exercise C17.8
Do the same with a probit model.
unem78      Coef.        Std. Err.   z            P>z         [95% Conf.Interval]

train       -.3209508 .1284763       -2.50        0.012       -.5727597 -.0691418
_cons       -.3749572 .0797458       -4.70        0.000       -.5312561 -.2186583

Recall, we cannot directly compare the probit
estimates with the LPM estimates, so let’s calculate
the average partial effect
Average marginal effects on Prob(unem78==1) after probit

------------------------------------------------------------------------------
unem78 |         Coef. Std. Err.         z P>|z|        [95% Conf. Interval]
-------------+----------------------------------------------------------------
train | -.1106029 .0432942 -2.55 0.011 -.1954579 -.0257479

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 91 posted: 8/29/2010 language: English pages: 78
How are you planning on using Docstoc?