# multivariate statistical analysis by jpl7986

VIEWS: 137 PAGES: 19

• pg 1
```									Multivariate Statistical Analysis:
Analyzing Criterion-predictor
Association
Chapter 18
Knowing Which Techniques to Use
• The type of analysis to be performed
depends on several things
– The number of variables
– The existence of dependencies
– Number of dependencies
– Measurement of data (scaling—nominal,
interval)
Covariation
• In many marketing research studies, the goal is
to explain variation in one set of variables (i.e.,
sales) with matching variation in other sets of
– Identifying criterion and predictor variables
(dependent and independent variables)
– Creating a function or equation that estimates the
criterion from predictor variables
– Establishing confidence in the relationship between
variables
Multiple Regression
• Same as simple regressions, but with
more predictive terms
Y = +  1 X 1+  2 X 2 +
• In this new case, β1 is a partial regression
coefficient with respect to x1
– Because x1 is usually correlated to x2, the
question remains how to tell whether multiple
regression is better than simple regression
Regression Output Measures of
Interest
1. Regression equation
2. R2—both sample-determined and population
3. F-test for the overall equation
4. Individual t-tests and standard error for testing
each partial regression coefficient
5. Partial correlation coeffecients
6. Accounted-for variance explained by each
predictor (overlap goes to preceding predictor)
Contribution to Explained Variance
• Example
– Bivariate regression
Ŷ = 0.491 + 0.886 x1
– Same data, multiple regression
Ŷ = 0.247 + 0.493 x1 + 0.484 x2
– Independent variables may be correlated, so looking
at R2 values is critical
• R2 of the first equation is 0.723, meaning 72% of the variation
in Y is explained by variation in x1
• R2 of the second equation is 0.783, meaning that adding the
x2 to the equation only explains 6% more of the variation
Multicollinearity
•   Problem in regressions when predictor variables are
highly correlated
– Distorts coefficients, making it tough to tell which predictor
variables are most relevant
•   No standard as to how much collinearity is too much
•   3 solutions for multicollinearity
1. Ignore it
2. Delete one or more correlated predictors
•   As a rule of thumb, some researchers toss one or more correlated
variables when the correlation is .9 or more
3. Find predictor variable combinations that are uncorrelated
Cross-validation
1. Randomly split cases into groups
2. Compute separate regressions for each group
3. Use equations to predict Y-values in other
groups
4. Check coefficients to check for agreement—
algebraic sign and magnitude
5. Compute a regression for the entire sample,
using only the variables that were similar
between groups
Stepwise Regression
1. Run a regression with many predictor variables
2. Reviewing results and removing the least
important variables
3. Keep re-running the regression and removing
variables until a solid model emerges
more, keeping those that seem promising
Downsides of Stepwise Regression
• Several supposedly ―optimal‖ equations
emerge, depending on the variables
started with and the method (eliminating or
• Multicollinearity clouds which variables
might make the most predictive model
Discriminant Analysis
• Discriminant analysis is used when the predictor
variables are interval-scaled data but the
criterion is categorical
– Example: A B2B company has used its database to
segment its customers into groups that prefer
ordering over the internet, through a catalog, or from
a salesperson.
With discriminant analysis, the company can use data
on new customers (annual sales, number of
employees) to predict which segment they might fall
into.
Two Types of Discriminant Analysis
• Discriminant predictive (or explanatory)
analysis
– Used to optimize predictive functions
• Discriminant classification analysis
– Like other grouping techniques, this analysis
seeks to minimize differences within a single
group, and then maximize differences
between several groups
Chi-square Automatic Interaction
Detection (CHAID)
• CHAID analysis is used to find
dependencies between variables
• Typically used in data mining
• Useful when…
– No desire to make assumptions about
relationships
– Many potential explanatory variables
– Large number of observations
CHAID Process
• Computer takes a large data set and tests
different ways data can be split up,
selecting the best available split
• Groups can be further split
• Because there are so many splits, CHAID
requires very large samples
CHAID Example
customers for a credit card.
CHAID analysis can be run on a customer database to find
variables that will lead to a greater response.
CHAID analysis might show that households earning
Within that group women are more likely to respond, and
women with household size of 5 or more are more likely
to respond.
Purchasing a mailing list from a credit reporting agency for
women in large households where reported income is
over \$50,000 results in a greater response rate over past
campaigns—4% for a targeted list compared to 1% for a
Canonical Correlation
• Generalization of multiple correlation to
two or more criterion variables
– How well does one group of variables predict
another?
Correspondence Analysis
• Specialized set of canonical correlation
techniques
• Useful for studying brands and attributes
• Includes a visual display of results to help
interpret data
Probit and Logit
• Deals with the same type of problem as a
regression, but is designed for nominal- or
ordinal-scaled data
– Probit: Response is assumed to be normally
distributed
– Logit: Logistic distribution is assumed
Path Analysis / Causal Modeling
• Hybrid of factor analysis and simultaneous
equation regression
• The goal is measuring links between
observed measures and unobservable
constructs
• These advanced techniques require the
right software and expertise on the part of
the researcher

```
To top