Quantitative Methods by dfhdhdhdhjr


									Quantitative Methods

  Analyzing dichotomous dummy
Logistic Regression Analysis

  Like ordinary regression and ANOVA, logistic
     regression is part of a category of models
     called generalized linear models.

  Generalized linear models were developed
    to unify various statistical models (linear
    regression, logistic regression, poisson
    regression). We can think of maximum
    likelihood as a general algorithm to
    estimate all these models.
Logistic Regression Analysis--GLM


       Each outcome of the dependent variable
       (that is, each Y) is assumed to be generated
       from a particular distribution function in the
       exponential family (normal, binomial, poisson,
Logistic Regression Analysis
(a diversion into probability distributions)

  Normal distribution—a family of distributions, each
     member of which can be defeind by the mean
     and variance—many physical phenomena can
     be approximated well by the normal distribution.

  Binomial distribution—probability distribution of # of
     successes in a sequence of Bermoulli trials (where
     outcomes fall into one of two categories—i.e.,
     ”occurred” and “did not occur”. Note that in
     large samples, if the dependent variable is not too
     skewed, then the normal distribution approximates
     the binomial distribution.
Logistic Regression Analysis
(a diversion into probability distributions)

  Poisson Distribution—expresses the probability of a #
     of events occurring in a fixed period of time, if the
     events occur with a known average rate, and
     independently of the time since the last event.
     (Note that the negative binomial distribution is
     used to model event counts that are skewed. One
     can also think about the “polya” distribution which
     can be used to model occurrences of
     “contagious” discrete events – tornado outbreaks.
Logistic Regression—when?

 Logistic regression models are appropriate for
   dependent variables coded 0/1.

 We only observe “0” and “1” for the
   dependent variable—but we think of the
   dependent variable conceptually as a
   probability that “1” will occur.
Logistic Regression--examples

 Some examples

   Vote for Obama (yes, no)

   Turned out to vote (yes, no)

   Sought medical assistance in last year (yes, no)
Logistic Regression—why not OLS?

  Why can’t we use OLS? After all, linear
    regression is so straightforward, and (unlike
    other models) actually has a “closed form
    solution” for the estimates.
Logistic Regression—why not OLS?

  Three problems with using OLS.

    First, what is our dependent variable,
        conceptually? It is the probability of y=1. But
        we only observe y=0 and y=1. If we use OLS,
        we’ll get predicted values that fall between 0
        and 1—which is what we want—but we’ll also
        get predicted values that are greater than 1,
        or less than 0. That makes no sense.
Logistic Regression—Why not OLS?

  Three problems using OLS.
       Second problem—there is heteroskedasticity in the
       model. Think about the meaning of “residual”. The
       residual is the difference between the observed and
       the predicted Y.

       By definition, what will that residual look like at the
       center of the distribution?

       By definition, what will that residual look like at the tails
       of the distribution?
Logistic Regression—why not OLS?

  Three problems using OLS.

      The third problem is substantive. The reality is that
      many choice functions can be modeled by an S-
      shaped curve. Therefore (much as when we
      discussed linear transformations of the X variable),
      it makes sense to model a non-linear relationship.
Logistic Regression—but similar to

  So. We actually could correct for the
    heteroskedasticity, and we could
    transform the equation so that it
    captured the “non-linear”
    relationship, and then use linear
    regression. But what we usually do....
Logistic Regression—but similar to

  ...is use logistic regression to predict the
      probability of the occurrence of an
Logistic Regression—s shaped curve
Logistic Regression—
S shaped curve and Bernoulli variables

  Note that the observed dependent
    variable is a Bernoulli (or binary)
    variable. But what we are really
    interested in is predicting the
    probability that an event occurs (i.e.,
    the probability that y=1).
Logistic Regression--advantage

 Logistic regression is particularly handy
   because (unlike, say, discriminant
   analysis) it makes no assumptions about
   how the independent variables are
   distributed. They don’t have to be
   continuous versus categorical, normally
   distributed—they can take any form.
Logistic Regression—
exponential values and natural logs
  Note—”exp” is the exponential function. Ln is
    the natural log. These are opposites.

  When we take the exponential function of any
    number, we take 2.72 raised to the power of
    that number. So, exp(3)=2.72 * 2.72 *

  If we take ln (20.09), we get the number 3.
Logistic Regression--transformation
     Note that you can think of logistic regression in terms of
      transforming the dependent variable so that it fits an s-shaped
      curve. Note that the odds ratio is the probability that a case
      will be a 1 divided by the probability that it will not be a 1. The
      natural log of the odds ratio is the “logit” and it is a linear
      function of the x’s (that is, of the right hand side of the model).
Logistic Regression--transformation
   Note that you can equivalently talk about
   modelling the probability that y=1 (theta,
   below), as below (these are the same
   mathematical expressions):
Logistic Regression

Note that the independent variables are not
  related to the probability that y=1.

However, the independent variables are
  linearly related to the logit of the
  dependent variables.
Logistic Regression--recap

 Logistic regression analysis, in other words, is
   very similar to OLS regression, just with a
   transformation of the regression formula. We
   also use binomial theory to conduct the
Logistic Regression—Model fit

 Recall that in OLS, we minimized the
   squared residuals in order to find the
   line that best fit the data.

 In logistic regression analysis, we use a
    calculus-based function called
    Maximum Likelihood.
Logistic Regression—MLE

Through an iterative process, it finds the
   function that will maximize our ability to
   predict the probability of y based on what
   we know about x. In other words, ML will
   find the best values for the estimated
   effect of party, ideology, sex, race, etc.
   the predict the likelihood that someone
   will vote for Obama.
Logistic Regression Analysis--
 In other words, MLE starts with an initial (arbitrary)
  guesstimate of what the coefficients will be, and
  then determines the direction and size change
  which will increase the log likelihood (goodness of
  fit—that is, how likely it is that the observed value
  of the dependent variable can be predicted from
  the observed variables of the independent
Logistic Regression Analysis--
 After estimating an initial function,
  the program continues estimating
  with new estimates to reach an
  improved function—until convergence
  is reached (that is, the log likelihood,
  or the goodness of fit, does not
  change significantly).
Logistic Regression--tests

  There are two main forms of the
    likelihood ratio test for goodness of
Logistic Regression--tests

 1. Test of the overall model (model chi-square test).
    Compares the researcher’s model to a reduced
    model (the baseline model with the constant only).
    A well fitting model is significant at the .05 level or
    above—that is, a well fitting model is one that fits
    the data better than a model with only the
    constant. A finding of significance means that one
    can reject the null hypothesis that all of the
    predictor effects are zero (this is equivalent to an
    “f” test in OLS.)
Logistic Regression--tests

  2. Test of individual model parameters.

  (Note that the Wald statistic has a chi-squared distribution, but other than
      that, it is just the same as the “t” that we use in OLS.)

  You can also calculate a likelihood ratio statistic. Essentially, one is
      comparing the goodness of fit for the overall model with the
      goodness of fit with a “nested” model which drops an independent
      variable. (This is generally considered preferable to the Wald statistic
      if the coefficient values are very high).
Logistic Regression--interpretation

  Most commonly, with all other variables held
    constant, there is a constant increase of b1
    in the logit (p) for every 1-unit increase in x1.

  But remember that even though the right hand
     side of the model is linearly related to the logit
     (that is, to the natural log of the odds-ratio),
     what does it mean for the actual probability
     that y=1?
Logistic Regression

  It’s fairly straightforward—it’s

  If b1 takes the value of 2.3 (and we know that
      exp(2.3)=10), then if x1 increases by 1, the
      odds that the dependent variable takes the
      value of 1 increase tenfold.
Logistic Regression—presentation

 Likewise, it’s difficult to explain to the reader what the
     parameter estimates mean—because they reflect
     changes in the logit (the natural log of the odds-ratio)
     for each one-unit change in x.

 But what you want to tell your readers is how much the
     probability that y=1 changes (given a 1-unit change in
Logistic Regression—transform back

  So, you need to transform into predicted probabilities.

  Create predicted y’s (just as you would in OLSpredicted y=a +
     bx + bx....)

  And then transform:

  epy / (1 + epy) = predicted probability

  (many software packages will do this for you. See Gary King. Or,
     if you are fond of rotary dial phones, create your own excel
     file to do this (which has the advantage of flexibility)).
Logistic Regression—logit v. probit

  What’s the difference? Well, MLE
   requires assumptions about the
   probability distribution of the errors—
   logistic regression uses the standard
   logistic probability distribution,
   whereas probit uses the standard
   normal distribution.
Logistic Regression—logit v. probit

  Logit is more common. And note that
    logit and probit often give the same

  But note that there can be differences
    between the two link functions—see
    this paper by Hahn and Soyer.
Logistic Regression—ordered logit

  Ordered models assume there's some
     underlying, unobservable true outcome variable,
     occurring on an interval

  We don't observe that interval-level information about
    the outcome, but only whether that unobserved value
    crosses some threshold(s) that put the outcome into
    a lower or a higher category, categories which are
    ranked, revealing ordinal but not interval-level
Logistic Regression—ordered logit

  If you are using ordered logit, you will
      get results that include “cut points”
      (intercepts) and coefficients.

  OLR essentially runs multiple
    equations—one less than the number
    of options on one’s scale.
Logistic Regression—ordered logit

  For example, assume that you have a 4 point
     scale, 1=not at all optimistic, 2=not very
     optimistic, 3=somewhat optimistic, and
     4=very optimistic.

  The first equation compares the likelihood that
     y=1 to the likelihood that y does not =1 (that
     is, y=2 or 3 or 4)
Logistic Regression—ordered logit

  The second equation compares the
    likelihood that y=1 or 2 to the
    likelihood that y=3 or 4.

  The third equation compares the
    likelihood that y=1, 2, or 3 to the
    likelihood that y=4.
Logistic Regression—ordered logit

  Note that OLR only reports one
    parameter estimate for each
    indpendent variable. That is, it
    constrains the parameter estimates
    to be constant across categories.
Logistic Regression—ordered logit

  It assumes that the coefficients for the
     variables would not vary if one
     actually separately estimated the
     different equations.
Logistic Regression—ordered logit

(Note that in Stata one can actually test if this
  assumption is true, without running the
  separate models. There’s some parallel here
  to the non-linearity issue we discussed last
  week, where OLS is assuming that your
  independent variable is linearly related to the
  dependent variable—but you can actually
  break apart the independent variable to test
  whether that is true.)
Logistic Regression—ordered logit

  The results also give you intercepts (check to see
     how these are coded—they generally mean
     the same thing, but the directions of the
     parameters are different in SAS versus Stata
     (just as an example). (SAS also models y=0 in
     a regular logistic regulation, so you need to flip
     the signs to get the more intuitive results).
Multinomial Analyses

 Multinomial logit can be used when
   categories of the dependent variable
   cannot be ordered in a meaningful way.

 One category is chosen as the “comparison
   category”, and the beta coefficient (b)
   represents the change in odds of being in
   the dependent variable category relative
   to the comparison category (for a one-unit
   change in the right-hand side variables).
Multinomial Analyses
 The model:
Multinomial Analyses
 Multinomial logit is simple to estimate—and is often

 However, it is appropriate only if the introduction or
  removal of a choice has no effect on the
  (proportional) probability of choosing each of the

 For example—Perot versus Clinton versus Bush, 1992.
  Does removing Perot from the equation mean that
  the probability of choosing Clinton relative to the
  probability of choosing Bush changes? If so
  multinomial logit is inappropriate.
Multinomial Analyses
 Multinomial probit does not require
  that assumption that choices are
  independent across alternatives.
  And, though it demands a great deal
  of computing resources, recent
  advances mean that it is increasingly
  practical to use.
Multinomial Analyses
 So, often Multinomial Probit is

 Dow and Endersby (2004) point out,
  however, that the choice of a model really
  depends on how you see the underlying
  choice process that generated the observed
  data. In reality, neither model (MNP or
  MNL) will be clearly advantageous.
Multinomial Analyses
 And Dow and Endersby argue that MNP
  sometimes “fails to converge at a global
  optimum”. Put simply, they argue that
  MNP often comes up with imprecise
  estimates—that is, there are multiple sets
  of estimates that fit the data equally well.

 Two studies that compare the MNP and
  MNL model: Alvarez and Nagler (2001)
  and Quinn et. al. (1999) Alvarez and
  Nagler argue for MNP—Quinn et. al. are
  more agnostic.
Multinomial logit
 Also, conditional logit: Conditional
  logit only includes variables that are
  related to the options being chosen
  for the dependent variable.

To top