Limited Dependent Variable Models and Sample Selection Correlations

Document Sample
Limited Dependent Variable Models and Sample Selection Correlations Powered By Docstoc
					Limited Dependent Variable Models and Sample Selection
Correlations

1. Introduction
Limited dependent variable models have been developed to analyze the behavior of
individuals, families, or firms. Suppose we want to study the labor force participation of
adult males as a function of the unemployment rate, average wage rate, family income,
education etc. A person either is in the labor force or not. Hence, the dependent variable,
labor force participation, can take only two values: 1 if the person is in the labor force
and 0 if he or she is not. This type of 1/0 or yes/no response is called binary response and
is a class of discrete choice. The discrete choice may include more than two choices. The
discrete choice is a class of limited dependent variable (LDV).
         Consider the following simple model:
                  yi  1   2  i  ui ,
where
                  X i = set of explanatory variables,
                  y  1 if he of she is in the labor market,
                  y  0 if he of she is not in the labor market.
         When discrete choice involves more than two choices, i.e. three choices, an
example would be:
                  y 1      if he or she goes to America,
                  y2        if he or she goes to Australia,
                  y 3      if he or she goes to England,
                  X i = the price of air tickets.


         There are four most commonly used approaches to estimating such models:
            1. The linear probability model (LPM)
            2. The logit model
            3. The probit model
            4. The Tobit model


The 1st-3rd models are typically used in the case of a binary dependent variable. The
2nd and 3rd models can be extended to incorporate more general cases of discrete choice –
more than two choices (multinomial logit model), a count variable (ordered probit
model). The model 4 is used when the value of the dependent variable is bounded.
         The first model - linear probability model (LPM) is simple to estimate. It can be
generally formulated as:

         P( y  1 x )  P( y  1 x1 , x2 ,.........., xk )                          (1)

Despite its simplicity, it has some drawbacks. The two most important disadvantages
are that the fitted probabilities can be less than zero or greater than one and the partial
effect of any explanatory variable (appearing in level form) is constant.


2. Logit and probit models for binary response
2.1 Specifying logit and probit Models
To avoid the LPM limitations, consider a class of binary response models of the form

                    P ( y  1 x )  G (  0  1 x1  ............   k xk )  G (  0  xβ )    (2)

In the logit model, G is the logistic function:
                    G( z )  exp( z ) / [1  exp( z)]  ( z)                                    (3)
which is between zero and one for all real numbers z. This is the cumulative distribution
function for a standard logistic random variable. In the probit model, G is the standard
normal cumulative distribution function (cdf), which is expressed as an integral
                                         z
                    G ( z )  ( z )      (v)dv
                                         
where  ( z ) is the standard normal density
                     ( z)  (2 )1/2 exp( z 2 / 2) .                                          (4)


         The logistic function is plotted in the following figure.

                  Graph of logistic function G( z )  exp( z ) / [1  exp( z )]


         1




        .5




        0
             -3        -2         -1          0        1          2        3       z
         Logit and probit models can be derived from an underlying latent variable
model. Let y* be an unobserved, or latent variable, determined by
                   y*  0  x   ,                y  1[ y*  0] .                 (5)
We can derive the response probability for y:

                   P(y  1 x )  P( y*  0 x )  P[  (  0  x  ) x ]

                                 = 1  G[(0  x )]  G(0  x ) .
According to (3), the logit cdf of this specification will be:
                   G( 0  xβ )= exp[(0  xβ )] / [1  exp(( 0  xβ)] .
For the probit model,
                                       0  xβ

                   G ( 0  xβ )=        
                                        
                                                  (v)dv  ( 0  xβ )

To find the partial effect of roughly continuous variables on the response probability, we

must rely on calculus. If         xj    is a roughly continuous variable, its partial effect on


p ( x )  P(y  1 x )   is obtained from the partial derivative:

                   p( x )                                     dG ( z )
                            g (  0  xβ )  j , where g(z)                       (6)
                    x j                                        dz
Because G is the cdf (cumulative density function) of a continuous random variable, g is
the pdf (probability density function). This implies that
                           p ( x )
                   sign(            )  sign(  j ) .                                 (7)
                            x j
The sign of the marginal effect will not depend on x while its magnitude will be
proportional to g (0  xβ ) , thereby affected by x .


2.2 Maximum Likelihood Estimation of logit and probit Models

Because of nonlinear nature of               E(y x ) , OLS and WLS (weighted least squares) are

not applicable. We could instead use maximum likelihood estimation (MLE).
         Assume that we have a random sample of size n. To obtain the maximum
likelihood estimator, conditional on the explanatory variables, we need the density of
yi   given   xi . We can write this as

                    f ( y x i ;  )  [G( xi β )] y [1  G( xi β )]1 y , y  0,1    (9)
The log-likelihood function for observation i is a function of the parameters and the data
( xi , yi ) and is obtained by taking the log of (9):
                      i ( )  yi log[G( xi ,  )]  (1  yi ) log[1  G( xi β)]          (10)


The log-likelihood for the sample size of n is obtained by summing (10) across all
observations.
                     n                 m                       n
          L(  )    i (  )   yi log[G( xi ,  )]       (1  y ) log[1  G( x β)]
                                                                       i             i
                    i 1           i 1                     i  m 1

where yi  1 for i  1,...., m and yi  0 for i  m  1,..., n .

                                     ˆ
          The MLE of  , denoted by  , maximizes this log-likelihood. If                    G() is the


standard logit cdf, then          ˆ       is the logit estimator, if G() is the standard normal cdf,


      ˆ
then  is the probit estimator.



2.3 Testing multiple hypotheses
The likelihood ratio test is most commonly used to test multiple restrictions such as
1  2  ...  k  0 , in the logit and probit models. This is equivalent to the F test for
the model in the least squares estimation.


The likelihood ratio (LR) test
The likelihood ratio statistic is twice the difference in the log likelihoods:
                     LR  2( Lur  Lr )                                             (11)
where Lur the log-likelihood value for the unrestricted model (that with no constraint)
and Lr is the log-likelihood value for the restricted model ( 1  2  ...  k  0 in the
above example). Because            Lur  Lr , LR is nonnegative and usually strictly positive.


                                                    Stata commands
                   1. Logit model
                           logit y x1 x2
                           dlogit y x1 x2 (The marginal effects are returned)
                           mlogit y x1 x2 (Multinomial logit model)
                   2. Probit model
                           probit y x1 x2
                           dprobit y x1 x2 (The marginal effects are returned)
                           mprobit y x1 x2 (Multinomial probit model)
                           oprobit y x1 x2 (Ordered probit model – e.g. count data)