Beyond Bivariate

Document Sample
Beyond Bivariate Powered By Docstoc
					Beyond Bivariate: Exploring
   Multivariate Analysis
             3 Topics Covered

1. Logic of introducing a third variable

2. Multiple linear regression: Which independent
   (predictor) variables are significantly related to
   the dependent (outcome) variable?

3. Logistic regression: Binary outcome variable
          A Focal Relationship

   Residential mobility and school achievement

This is a negative or inverse relationship:

  Higher residential mobility  Low achievement

The 0-Order Bivariate Relationship

We are going to call our initial bivariate relationship
the 0-order relationship:

    Residential mobility  School achievement
Spurious Relationship/Explanation

Could there be variables that are associated with
high levels of residential mobility and with low
school achievement, creating an apparent but
spurious relationship between residential mobility
and achievement — thus EXPLAINING AWAY the
initial bivariate relationship?
        Spurious Relationship

Do taller people like action movies more than
shorter people do?
 What is the third variable?

Do days of high lemonade sales have more
drowning fatalities than days with low lemonade
 What is the third variable?
Intervening Variables: Interpretation

 What variables can you suggest that “go in
 between” residential mobility and school
 achievement that might help us understand our
 focal relationship better?

  These intervening variables do NOT explain
   away the relationship — they clarify why/how it
   comes about.
Intervening Variables: Interpretation

Why do women have lower incomes than men?

 Maybe they have not acquired the technical and
  managerial skills that men have.

 Maybe they are less interested in promotions
  into management than men are.

(These interpretations suggest that gender
  discrimination in salary decisions is not the only
  reason women have lower incomes than men.)
      The Difference between
 Interpretation (Intervening) and
      Explanation (Spurious)

Gender  height  movie preferences
 Gender, the third variable, explains away the
  spurious height  movie preference

Gender  career choices  income
 Career choices, the intervening third variable,
  contributes to interpreting the initial relationship
  between gender and income.
Specification or Interaction Effects

Sometimes when we introduce a third variable, we
find that the initial bivariate (0-order) relationship is
different for different categories of the third
     Specification: Examples [1]

In research on school achievement we (Prof.
Bootcheck and I) looked at the relationship
between living in a nuclear family and grades.

 For whites, this relationship was positive.

 For all other racial-ethnic categories, there was
  no relationship.
     Specification: Examples [2]

Can you think of a variable we could introduce into
our statistical analysis technique of the relationship
between residential mobility and school
achievement that might have different bivariate
relationships (one strong, one absent) for different
categories of the third variable?
      Specification in a Crosstab

In a crosstab, this specification or interaction effect
would show up as a strong/significant relationship
in one of the tables for the layer variable (the third
variable), and it would be “Not Significant” in the
table for the other category of the layer variable.

In other words, the chi-square for one partial table
is significant, but it is not significant for the other
         Suppressed Effects [1]

Introducing a third variable can reveal its
suppressed effects, which work in opposing
directions, cancelling each other out.

Fictitious example: Religious intensity and death
penalty views

 0-order: There appears to be no relationship.
          Suppressed Effects [2]

 When we introduce region (north or south), we
  see that the effects are opposite:
   For people living in the north of this fictitious country,
    high religious intensity goes with opposition to the
    death penalty.
   For people living in the south, high religious intensity
    goes with support for the death penalty.

 The two inverse or opposed relationships cancel
  each other out, unless we break the data down
  by the regional variable.
    Final Possibility: Replication

It is possible that the initial bivariate relationship
persists when we introduce the third variable.

 The partial tables for the categories of the third
  (layer) variable look just the same as the initial
  two-variable table.
            Multivariate or
       Multiple Linear Regression

 We specify two or more independent variables.
 Each may have a significant and maybe
  moderate or even strong correlation with the
  dependent variable.
 When they are placed in the regression model,
  “only the strongest survive.”
 If they do not have a relationship with the DV
  independent of their relationship with each other,
  they will not be significant in the model.
Examples from the Country Data Set

Look at adjusted R2.

 Which variables have significant coefficients?

 What do the relative sizes of the betas tell you?

Hard to visualize.

Building models—all variables entered at the same
  time or stepwise. See Nardi (2006, p. 97), which
  is cited in Garner (2010, p. 333).
          Logistic Regression [1]

Currently, logistic regression is a very popular
statistical analysis!
 It involves a dichotomous (or binary) outcome variable.
We can compute an overall odds ratio for the two
possible outcomes of this variable.
 It involves examining predictor variables (IVs) to see if
  each one is related to a change in the odds ratio from its
  overall level.
Does growing up in a bilingual family raise or lower
an individual’s probability of completing high school,
compared to the overall odds of doing so?
        Logistic Regression [2]

Independent variables need to be interval-ratio or
dummied variables (categoric variable broken
down into binary variables).

     Alert: Which categories are defined as 0 and
     1 for all the binary variables?

Negative coefficients mean lower odds. The odds
ratio falls below 1.
  Logistic Regression: Example 1

Are income, race-ethnicity, gender, region, and
religion related to a vote for the Republican
presidential candidate?

 What characteristics raise the odds and which
  lower the odds of a Republican vote?

 Which categories are labelled 1? Which 0?
  (This will make a difference in how to read the
  table of coefficients.)
  Logistic Regression: Example 2

What individual characteristics are related to
experiencing foreclosure on one’s home?

Binary outcome = foreclosed or not foreclosed
            so logistic regression

Contrast this to a question that could be answered
with linear regression.

 What neighbourhood characteristics are related
  to a high foreclosure rate?