# QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS

Document Sample

QUALITATIVE AND LIMITED
DEPENDENT VARIABLE
MODELS
• Model to describe choice behavior
• Dependent variables that are limited, that
is the range of values is constrained

• Or

• The values are not completely observable
e.g.
• a worker decides to drive to work or not

• a high school graduate decides to go to
college or not

• a household decides to purchase a house
or rent
• why are some loan applications accepted
and others not?
Qualitative Choice Models:
Binary choice models
Occurs when an individual is making a
choice.
Models with Binary Dependent Variables
• If the dependent variable assumes only 2
values
1 if the outcome is chosen and 0 if not
• For these models, least squares
estimation methods are not the best
choices.

• In this case OLS is both biased and
inconsistent
• OLS suffers from heteroscedasticity
problem
(MLE) is the usual method used
• MLE of probit (or logit) discrete choice
model
• Slopes can only be estimated up to a
scale factor
• But coefficient signs and t-values have the
usual interpretation
An example
• Y = Affairs = 1 if individual has had an
affair
• X1 = Dummy for male
• X2 = Years of marriage
• X3 = Dummy for kids in the marriage
• X4 = Dummy for religion
• X5 = Years of education
• X6 = Dummy for Happy
Logit (MLE) estimate gives

Y = 1.29 + 0.25X1 + 0.05X2 + 0.44X3 -
0.89X4 + 0.01X5 -0.87X6

X2, X4, X6 have p-values < 5%
Logit coefficients do not directly measure
marginal effects so its hard to interpret
them.
However, we can interpret the signs.
Odds ratio
• Marginal effects on the odds ratio.
• Suppose the coefficient on Happy under
the odds ratio is 0.42, how do we interpret
this number.
• Happy is a dummy variable  if a person
switches from an unhappy relationship to a
happy one, the odds ratio would be 42% of
what it was before.
Suppose an individual had an odds ratio of
4.
i.e. P(Y=1) = 4/5 and P(Y=0) = 1/5  there’s
an 80% chance that an individual will have
an affair.
If the individual’s marriage becomes Happy,
then the odds ratio becomes 42% higher
as before.
4*0.42 = 1.68 
There is a 63% chance the individual will
have an affair
The linear probability model

Suppose we wish to explain an individual’s
choice between driving to work (private
transportation) and taking the bus (public
transportation)

Individual’s choice can be represented by a
dummy variable
Y = 1 if individual drives to work
= 0 if takes bus

We can collect a random sample of workers then
the outcome Y will have a probability function

where p = probability that y=1.

E(y) = p
Explanatory variable = X
Difference between time by bus and time by
car

Expectation:
As x increases, an individual would be more
inclined to drive to work
We expect a positive relationship between x
and p (the probability to drive to work)
The linear regression model = the linear
probability model
=
Y= E(y) + e
Y=p+e
Y = Bo + B1X + e

This model is heteroscedastic, because the
variance of the error term varies from one
observation to another.
• If we use OLS to estimate the model,
we’d obtain
Consider what happens when you use this
model to predict behavior:
If you substitute different values of x in the
the equation, you might obtain values of
phat that are

1. Less than 0 (negative)
2. Greater than 1

Values that do not make sense as
probabilities
• Generally, in a linear regression model the
slope coefficient (if positive) suggests that
the increase in x will have a constant
effect on y

• But in probability models (binary
dependent variable model), the constant
rate of increase is impossible because
0≥ p ≤1
To overcome this problem we use nonlinear
probit and logit models.

In E-views, choose Objects, New Object,
Equation and select the Binary Estimation
option.
Specify your equation in the equation box.

E-views uses Maximum Likelihood estimator
• The estimates produces by a probit or logit
model that look like slopes and intercepts
in the output are actually standardized
slopes and intercepts, known only up to a
scale factor

• So the size of these coefficients is not
viewed in the same way (as the change in
Y for a one-unit change in X). Instead, we
focus on the sign and the statistical
significance of these coefficients, in
particular, the "slopes.“
• If a "slope" is positive, it means that an
increase in the corresponding X variable
increases the latent propensity to choose
the 1 alternative, thereby also increasing
the predicted probability of choosing the 1
alternative. (Analogously for negative
slope coefficients.)
• If the fitted probability is greater than 0.5,
we predict the individual will choose 1. If it
is less than 0.5, we predict they will
choose 0.
• The asymptotic t-ratios allow us to test
zero hypotheses about these "slopes." If
the t-ratios exceed (roughly) 2 in absolute
value, we can reject the hypothesis that
the relevant X variable has no effect on
fitted choice probabilities.
Missing Observations
• Collect data on Sleep and Age
• All data on Sleep but 20% of Age is
missing
• How do you use all the data to show the
effect of Age on Sleep?
Non-Experimental Data
• Non-experimental data can sometimes
make it very difficult to draw policy
implications from regression analysis
GUN CONTROL

• Suppose your sample consists of households that have
been victimized by robbery. The dependent variable
takes a value of 1 if a household member is shot during
the robbery and 0 otherwise. One of your explanatory
variables is a dummy variable equal to 1 if there is a
handgun present in the house, 0 otherwise. When a
handgun is present in a household, an occupant of that
house is much more likely to be shot in the process of a
robbery than when no handgun is present. Therefore, to
minimize injury and loss of life from robbery incidents,
private ownership of handguns should be banned.
Evaluate this policy proposal and the "evidence" upon
which it is premised
• Briefly describe the nature of the true
"experiment" that would allow an
unambiguous determination of the effect of
handgun presence on robbery shootings
via a regression like this.
LEGALIZATION OF MARIJUANA:
• Suppose you have a random sample of at-risk 18-year-
olds. The dependent variable is the number of times
each teenager has used heroin. Among the explanatory
variables is a dummy variable that takes a value of 1 if
the subject experimented with marijuana prior to age 13,
and 0 otherwise. You find that the coefficient on this
dummy variable is positive and strongly statistically
significant. Therefore, we should not legalize marijuana
use (which would make it much more accessible to pre-
Evaluate this policy proposal and the "evidence" upon
which it is premised
• Briefly describe the nature of the true
"experiment" that would allow an
unambiguous determination of the effect of
pre-teen marijuana use on subsequent
heroin use via a regression like this

DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 8 posted: 4/21/2011 language: English pages: 27