multivariate statistical analysis

Document Sample
multivariate statistical analysis Powered By Docstoc
					Multivariate Statistical Analysis:
Analyzing Criterion-predictor
   Chapter 18
Knowing Which Techniques to Use
• The type of analysis to be performed
  depends on several things
  – The number of variables
  – The existence of dependencies
  – Number of dependencies
  – Measurement of data (scaling—nominal,
• In many marketing research studies, the goal is
  to explain variation in one set of variables (i.e.,
  sales) with matching variation in other sets of
  variables (advertising)
   – Identifying criterion and predictor variables
     (dependent and independent variables)
   – Creating a function or equation that estimates the
     criterion from predictor variables
   – Establishing confidence in the relationship between
         Multiple Regression
• Same as simple regressions, but with
  more predictive terms
        Y = +  1 X 1+  2 X 2 +
• In this new case, β1 is a partial regression
  coefficient with respect to x1
  – Because x1 is usually correlated to x2, the
    question remains how to tell whether multiple
    regression is better than simple regression
  Regression Output Measures of
1. Regression equation
2. R2—both sample-determined and population
3. F-test for the overall equation
4. Individual t-tests and standard error for testing
   each partial regression coefficient
5. Partial correlation coeffecients
6. Accounted-for variance explained by each
   predictor (overlap goes to preceding predictor)
Contribution to Explained Variance
• Example
  – Bivariate regression
     Ŷ = 0.491 + 0.886 x1
  – Same data, multiple regression
     Ŷ = 0.247 + 0.493 x1 + 0.484 x2
  – Independent variables may be correlated, so looking
    at R2 values is critical
     • R2 of the first equation is 0.723, meaning 72% of the variation
       in Y is explained by variation in x1
     • R2 of the second equation is 0.783, meaning that adding the
       x2 to the equation only explains 6% more of the variation
•   Problem in regressions when predictor variables are
    highly correlated
    – Distorts coefficients, making it tough to tell which predictor
      variables are most relevant
•   No standard as to how much collinearity is too much
•   3 solutions for multicollinearity
    1. Ignore it
    2. Delete one or more correlated predictors
        •   As a rule of thumb, some researchers toss one or more correlated
            variables when the correlation is .9 or more
    3. Find predictor variable combinations that are uncorrelated
1. Randomly split cases into groups
2. Compute separate regressions for each group
3. Use equations to predict Y-values in other
4. Check coefficients to check for agreement—
   algebraic sign and magnitude
5. Compute a regression for the entire sample,
   using only the variables that were similar
   between groups
         Stepwise Regression
1. Run a regression with many predictor variables
2. Reviewing results and removing the least
   important variables
3. Keep re-running the regression and removing
   variables until a solid model emerges
  –   Can also start with a single variable and adding
      more, keeping those that seem promising
Downsides of Stepwise Regression
• Several supposedly ―optimal‖ equations
  emerge, depending on the variables
  started with and the method (eliminating or
  adding variables)
• Multicollinearity clouds which variables
  might make the most predictive model
        Discriminant Analysis
• Discriminant analysis is used when the predictor
  variables are interval-scaled data but the
  criterion is categorical
  – Example: A B2B company has used its database to
    segment its customers into groups that prefer
    ordering over the internet, through a catalog, or from
    a salesperson.
    With discriminant analysis, the company can use data
    on new customers (annual sales, number of
    employees) to predict which segment they might fall
Two Types of Discriminant Analysis
• Discriminant predictive (or explanatory)
  – Used to optimize predictive functions
• Discriminant classification analysis
  – Like other grouping techniques, this analysis
    seeks to minimize differences within a single
    group, and then maximize differences
    between several groups
 Chi-square Automatic Interaction
       Detection (CHAID)
• CHAID analysis is used to find
  dependencies between variables
• Typically used in data mining
• Useful when…
  – No desire to make assumptions about
  – Many potential explanatory variables
  – Large number of observations
           CHAID Process
• Computer takes a large data set and tests
  different ways data can be split up,
  selecting the best available split
• Groups can be further split
• Because there are so many splits, CHAID
  requires very large samples
              CHAID Example
A bank is planning a direct-mail campaign to sign up new
  customers for a credit card.
CHAID analysis can be run on a customer database to find
  variables that will lead to a greater response.
CHAID analysis might show that households earning
  $50,000 per year or more are more likely to sign up.
  Within that group women are more likely to respond, and
  women with household size of 5 or more are more likely
  to respond.
Purchasing a mailing list from a credit reporting agency for
  women in large households where reported income is
  over $50,000 results in a greater response rate over past
  campaigns—4% for a targeted list compared to 1% for a
  broad mailing—at reduced cost.
       Canonical Correlation
• Generalization of multiple correlation to
  two or more criterion variables
• Answers the question
  – How well does one group of variables predict
    Correspondence Analysis
• Specialized set of canonical correlation
• Useful for studying brands and attributes
• Includes a visual display of results to help
  interpret data
           Probit and Logit
• Deals with the same type of problem as a
  regression, but is designed for nominal- or
  ordinal-scaled data
  – Probit: Response is assumed to be normally
  – Logit: Logistic distribution is assumed
Path Analysis / Causal Modeling
• Hybrid of factor analysis and simultaneous
  equation regression
• The goal is measuring links between
  observed measures and unobservable
• These advanced techniques require the
  right software and expertise on the part of
  the researcher

Shared By: