# Discriminant Analysis

Document Sample

```					Discriminant Analysis

Database Marketing
Instructor:Nanda Kumar
Multiple Regression
 Y = b0 + b1 X1 + b2 X2 + …+ bn Xn

 Same as Simple Regression in principle

 New Issues:
– Each Xi must represent something unique
– Variable selection
Multiple Regression
 Example 1:
– Spending = a + b income + c age

 Example 2:
– weight = a + b height + c sex + d age
Real Estate Example

 How is price related to the characteristics
of the house?
SAS Code

proc reg;
model price = section lotsize
bed bath age other;
run;
Interpreting the Regression Output

 Parameter Estimates or Slope Coefficients
capture the marginal impact of explanatory
variable on price
 Example: the coefficient of the variable
beds represents the impact of increasing
the number of bedrooms by one on price
Significance of the Coefficients
 Are they significantly different from zero?
– Look at the T values and p values
• T value higher than 1.8 or p<0.05 good
• Sometimes p<0.10 is considered reasonably
significant
 Overall Goodness of Fit
– Look at R2 (also refer to note in Session 1)
Where are we Now?
Segment 1

Secondary
Behavior
Data

Segment 2      Discriminant
Factor                      /Logit
Analysis     Cluster        Analysis
Analysis
Distinguishing
Targeting              Characteristics
Web Browsing
 Identified two groups of consumers
– One that visits your website frequently
– One that doesn’t
 Can the differences in behavior be related
to socio-demographic variables?
 Can we use these discriminators to classify
prospects into one of these two groups?
 Identified two consumer segments
– One which buys a lot
– Other which does not buy as much
 Can we find variables that help discriminate the
behavior of these two groups?
 Can we use these discriminators to classify other
consumers into one of these two groups?
Promotional Campaigns
 Identify groups based on their response to
promotional campaigns
– One group purchases a lot on promotion
– Other does not
 Identify characteristics that distinguish these two
groups
 Can we use these discriminators to identify price
sensitive prospects from the not so price sensitive
ones?
Segmentation Analysis
 General Problem
– Identified segments in the population based on
behavior

– Want to find targetable characteristics that
discriminate these groups
– Classify prospects into different groups
Data
Stock #    GE/A      ROI             Stock #    GE/A      ROI
1    0.158     0.182                13    -0.012    -0.031
2     0.21     0.206                14     0.036     0.053
3    0.207     0.188                15     0.038     0.036
4     0.28     0.236                16    -0.063    -0.074
5    0.197     0.193                17    -0.054    -0.119
6    0.227     0.173                18         0    -0.005
7    0.148     0.196                19     0.005     0.039
8    0.254     0.212                20     0.091     0.122
9    0.079     0.147                21    -0.036    -0.072
10    0.149     0.128                22     0.045     0.064
11       0.2     0.15                23    -0.026    -0.024
12    0.187     0.191                24     0.016     0.026
Good Stocks
Good Stocks

0.25

0.2

0.15
ROI

ROI

0.1

0.05

0
0   0.05   0.1   0.15   0.2   0.25   0.3
GE/A

0.15

0.1

0.05
ROI

0                       ROI
-0.1   -0.05           0     0.05   0.1
-0.05

-0.1

-0.15
GE/A
All Stocks
All Stocks
0.3

0.25

0.2

0.15

0.1
ROI

0.05

0
-0.1           0         0.1      0.2   0.3
-0.05

-0.1

-0.15            GE/A
Identifying the Best Discriminators

 Two groups appear to be well separated on
each ratio: ROI and GE/A
 Also well separated in two dimensional
space
 But this need not always be the case!
Discriminating Variables

X1

X2
Discriminant Analysis
 Identify a set of variables that best
discriminate between the two groups
 Does so by choosing a new line that
maximizes the similarity between members
of the same group and minimizing the
similarity between members belonging to
different groups
Discriminant Function

Z = w1 GEA + w2 ROI

Between-Group Sum of Squares – SSb
Within-Group Sum of Squares – SSw
 = (SSb/SSw)
More on the Criterion
 For Z to provide maximum separation
between the groups, the following must be
satisfied:
– The means of Z for the two groups should be
as far apart as possible (or high SSb)
– Values of Z for each group should be as
homogenous as possible (or low SSw)
Classification
 Discriminant Function: The line that
separates the members of the two groups
 Methods of Classification
– Cut-Off Value Method
– Decision Theory Approach
– Classification Function Approach
– Mahalanobis Distance Method
Cut-Off Value Method

 Uses the Discriminant Function line to
score new observations (prospects) and
classify them into one of two groups based
on a cut-off value
Classification

Cut-off
Value

Z
R2               R1
Classification Function Approach

 Classifications based on this approach are
identical to those done by Decision Theory
approach
 Classification functions are computed for
each group:
C1 = -7.87 + 61.237*GEA + 21.027*ROI
C2 = -0.004 + 2.551*GEA – 1.404*ROI
Basic Idea
 Score each new observation using these
two scoring functions

 The observation gets assigned to the group
with the higher score
What To Look For In The Results?

 Significance of the Discriminating
Variables
– Idea is to test whether the means of the
discriminating variables are statistically
different across the two groups
– Statistic: Wilks’ Lamda must be small (Look
for the p value/significance level)
Estimate of The Discriminant
Function
 Canonical Discriminant Function
Z = -2.0018 + 15.0919*GEA + 5.769*ROI
 It is possible that the group means are statistically
different even though for all practical purposes,
the differences between the groups may not be
large
 Look at the squared Canonical Correlation: ratio
of between group SS/Total SS (High is good)
Importance of the Discriminant Variables
and the Discriminant Function

 How important is a variable to the Discriminant
Function?
Canonical Structure
important
– Caution: If the variables are highly correlated relative
importance of the variables can change with sample
Classification Summary
 Look at Cross-Validation results
Web Browsing
 Can use the Discriminant function to
classify prospects into one of these two
groups

 Target Appropriately
 Classify other consumers into one of these
two groups

 Do stuff!
Promotional Campaigns
 Classify Prospects into price sensitive and
not so price sensitive segments

 Target appropriately
Summary
 Discriminant Analysis
 Extremely Useful Segmentation Analysis
tool
 Intermediate step in the overall picture –
helps classify prospects and devise the
appropriate targeting strategies

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 424 posted: 9/3/2009 language: English pages: 34