Embed
Email

brian

Document Sample
brian
Shared by: HC11120106245
Categories
Tags
Stats
views:
3
posted:
11/30/2011
language:
English
pages:
27
An Introduction to Regression

with Binary Dependent

Variables

Brian Goff

Department of Economics

Western Kentucky University

Introduction and Description

 Examples of binary regression

 Features of linear probability models

 Why use logistic regression?

 Interpreting coefficients

 Evaluating the performance of the model

Binary Dependent Variables

In many regression settings, the Y variable is (0,1)



A Few Examples:

 Consumer chooses brand (1) or not (0);

 A quality defect occurs (1) or not (0);

 A person is hired (1) or not (0);

 Evacuate home during hurricane (1) or not (0);

 Other Examples?

Scatterplot of with Y=(0,1):

Y = Hired-Not Hired; X= Experience







1









0

The Linear Probability Model (LPM)

If we estimate the slope using OLS regression:

Hired = α + *Income + e ;



 The result is called a “Linear Probability

Model”

 The predicted values are probabilities that Y

equals 1;

 The equation is linear – the slope is constant

Picture of LPM





1

LPM Regression Line

(slope coefficient)



Points on regression line represent predicted probabilities

For Y for each value of X









0

An Example: Loan Approvals

Data:

Dependent Variable: Loaned

1 if Loan Approved, 0 if not Approved by Bank Z



Independent Variables

ROA = net income as % of total assets of applicant;

Debt = debt as % of total assets of applicant;

Officer = 1 if loan handled by loan officer A and 0 if handled

by officer B;

Scatterplot (Loaned – NITA)

LPM Results

Coefficientsa



Unstandardized Standardized

Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 1.087 .192 5.659 .000

nita .022 .013 .237 1.655 .105

tdta -.063 .029 -.291 -2.156 .036

officer -.279 .138 -.291 -2.020 .049

a. Dependent Variable: loaned









Coefficient on NITA implies 1% increase in ROA increases

Probability of loan by 2.2% (0.022)

LPM Weaknesses

 The predicted probabilities can be greater than 1 or

less than 0

 Probabilities, by definition, have max =1; min = 0;

 This is not a big issue if they are very close to 0 and 1

 The error terms vary based on size of X-variable

(“heteroskedastic”) –

 There may be models that have lower variance – more

“efficient”

 The errors are not normally distributed because Y

takes on only two values

 Creates problems for

 More of an issue for statistical theorists

Predicted Probabilities in LPM

Loans Model



In loan case, all of the predicted probabilities fall within (0,1) range









Descriptiv e Statistics



N Minimum Maximum Mean Std. Deviation

predicted_loans 51 .01273 .97245 .6666667 .19034701

Valid N (listwise) 51

(Binary) Logistic Regression or “Logit”

 Selects regression coefficient to force predicted

values for Y to be between (0,1)

 Produces S-shaped regression predictions rather

than straight line

 Selects these coefficient through “Maximum

Likelihood” estimation technique

Picture of Logistic Regression





1

Logistic Regression

(non-linear slope

coefficient)

Points on regression line represent predicted probabilities

For Y for each value of X









0

LPM & Logit Regressions

 LPM & Logit Regressions in some cases provide

similar answers

 If few “outlying” X-values on upper or lower ends

then LPM model often produces predicted values

within (0,1) band

 In such cases, the non-linear sections of the Logit

regression are not needed

 In such cases, simplicity of LPM may be reason for

use

 See following slide for an illustration

Example where LPM & Logit Results

Similar

LP Model

1









0

LPM & Logit: Loan Case

 In Loan example the results are similar:

 R-square = 98% for regression of LPM-predicted

probabilities & Logit-predicted probabilities

 Descriptive statistics for both probabilities appear

below:

 The main difference is the LPM is max/min closer to 0

and 1



Descriptiv e Statistics



N Minimum Maximum Mean Std. Deviation

pred_lpm 51 .01273 .97245 .6666667 .19034701

pred_logit 51 .06948 .91364 .6666667 .19209809

Valid N (listwise) 51

SPSS Logistic Regression Output for

Loan Approval:





Variables in the Equation



B S.E. Wald df Sig. Exp(B)

Step

a

nita .108 .070 2.393 1 .122 1.114

1 tdta -.325 .180 3.241 1 .072 .723

officer -1.455 .767 3.599 1 .058 .233

Constant 2.968 1.187 6.248 1 .012 19.443

a. Variable(s) entered on step 1: nita, tdta, officer.









Note: The, instead of t-statistics, “Wald” statistics are used to test whether the

Coefficients differ from zero; the associated p-values (Sig) have the same

Interpretation as in any other regression output

Interpreting Logistic Regression (Logit)

Coefficients

 The slope coefficient from a logistic regression



() = the rate of change in the "log odds" of the

event under study as X changes one unit



 What in the world does that mean?

 We want to know the change in the probability of the

event as X changes

 In Logistic Regression, this value changes as X-changes (S-

shape instead of linear)

Loan Example:

Effect of NITA on Probability of Loan

NITA coefficient (B) = 0.11







P (1-P) B*(P)*(1-P)

Low Probability 0.1 .9 0.009



Medium Probability 0.5 0.5 0.0275

High Probability .9 .1 0.009

Meaning?

 At moderate probabilities (around 0.5) of getting a

loan (corresponds to average NITA of about 5), the

likelihood of getting a loan increases by 2.75% for

each 1% increase in NITA

 This estimate is very close to the LPM estimate of

2.2%

 At the lower and upper extremes (NITA values -/+

teens), the probability changes by only about 0.9%

for a 1 unit increase in NITA

Alternative Methods of

Evaluating Logit Regressions

Statistics for comparing alternative logit

models:

 Model Chi-Square

 Percent Correct Predictions

 Pseudo-R2

Chi-Square Test for Fit



Omnibus Tests of Model Coefficients



Chi-square df Si g.

Step 1 Step 8.498 3 .037

Bl ock 8.498 3 .037

M odel 8.498 3 .037









 The Chi-Square statistic and associated p-value (Sig.)

tests whether the model coefficients as a group equal

zero

 Larger Chi-squares and smaller p-values indicate greater

confidence in rejected the null hypothesis of no

Percent Correct Predictions

 The "Percent Correct Predictions" statistic assumes that if

the estimated p is greater than or equal to .5 then the event

is expected to occur and not occur otherwise.

 By assigning these probabilities 0s and 1s and comparing

these to the actual 0s and 1s, the % correct Yes, % correct

No, and overall % correct scores are calculated.



 Note: subgroups for the % correctly predicted is also

important, especially if most of the data are 0s or 1s

Percent Correct Results

35% of loan rejected

cases (0) were correctly

predicted



a

Classification Table



Predicted



loaned Percentage

Observed .00 1.00 Correct

Step 1 loaned .00 6 11 35.3

1.00 2 32 94.1

Overall Percentage 74.5

a. The cut value is .500





75% of all cases (0,1)

94% of loan accepted

were correctly predicted

cases (1) were correctly

predicted



Note: The model is much better at predicting loan acceptance than loan rejection – this may serve

as a basis for thinking about additional variables to improve the model

R2 Problems









1









0

Notice that whether using LPM or logit, the predicted values on the regression lines are not near

The actual observations (which are all either 0 or 1). This makes the typical R-square statistic of no

value in assessing how well the model “fits” the data

Pseudo-R2 Values

Model Summary



-2 Log Cox & Snel l Nagel kerke

Step l i kel i hood R Square R Square

1 56.427 a .153 .213

a. Esti mati on term inated at i terati on num ber 4 because

parameter esti mates changed by l ess than .001.









 There are psuedo-R2 statistics that make adjustment for the

(0,1) nature of the actual data: two are listed above

 Their computation is somewhat complicated but yield

measures that vary between 0 and (somewhat close to) 1 much

like the R2 in a LP model.

Appendix: Calculating Effect of X-

variable on Probability of Y

 Effect on probability of from 1 unit change in X

= ()*(Probability)*(1-Probability)

 Probability changes as the value of X changes



 To calculate (1-P) for a given X values:

 (1-P) = 1/exp[α + 1*X1 + 2*X2 …]

 With multiple X-variables it is common to focus on one at a time and

use average values for all but one


Related docs
Other docs by HC11120106245
I C S U C B I D P May2005
Views: 0  |  Downloads: 0
JobProfile0955 Freight Forwarder
Views: 3  |  Downloads: 0
Student�s Examination Handbook
Views: 0  |  Downloads: 0
In Pursuit of the Perfect Plant
Views: 3  |  Downloads: 0
SCE 2 12 2 1
Views: 20  |  Downloads: 0
E P I809 S Y L L A B U S2008
Views: 1  |  Downloads: 0
Hoja de Registro
Views: 63  |  Downloads: 0
Diapositiva 1
Views: 7  |  Downloads: 0
Job description assistant Nov2010
Views: 1  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!