# Logistic regression - DOC

Shared by:
Categories
Tags
-
Stats
views:
215
posted:
2/16/2010
language:
English
pages:
7
Document Sample

```							             Civ E 640: URBAN TRANSPORT PLANNING

Calibrating Binary Logit Mode-Split Models Using
Logistic Regression in SPSS

Get Started With SPSS
What is SPSS?
SPSS is a comprehensive and flexible statistical analysis and data management system. SPSS can
take data from almost any type of file and use them to generate tabulated reports, charts, and plots
of distributions and trends, descriptive statistics, and conduct complex statistical analyses.

SPSS is available on Polaris

Start SPSS
 Login on to Polaris
 Start => Program => Scienctific => SPSS 8.0 for Windows

SPSS Main Window
Quickest Way to Get Familiar with SPSS
-   Start SPSS
-   From the menus choose Help       = > Tutorial

Logistic Regression (From SPSS User's Guide)
Logistic regression is useful for situations in which you want to be able to predict the presence or absence
of a characteristic or outcome based on values of a set of predictor variables. It is similar to a linear
regression model but is suited to models where the dependent variable is dichotomous. Logistic regression
coefficients can be used to estimate odds ratios for each of the independent variables in the model. Logistic
regression is applicable to a broader range of research situations than discriminant analysis.

To Obtain a Logistic Regression Analysis
From the menus choose:

Statistics
Regression
Logistic...

Select one dichotomous dependent variable. This variable may be numeric or short string. Select one or
more covariates - independent variables.

Click OK SPSS will automatically perform logistic regression and provide related regression statistics:

Statistics. For each analysis: total cases, selected cases, valid cases. For each step: variable(s) entered or
removed, iteration history, -2 log-likelihood, goodness of fit, ... For each variable in the equation:
coefficient (B), standard error of B, Wald statistic, R, estimated odds ratio (exp(B)), confidence interval
for exp(B). …
SPSS Output Window

Logistic Regression Data Considerations
Data. The dependent variable should be dichotomous. Independent variables can be interval level or
categorical; if categorical, they should be dummy or indicator coded (there is an option in the procedure to
recode categorical variables automatically).
Assumptions. Logistic regression does not rely on distributional assumptions in the same sense that
discriminant analysis does. However, your solution may be more stable if your predictors have a
multivariate normal distribution. Additionally, as with other forms of regression, multicollinearity among
the predictors can lead to biased estimates and inflated standard errors. The procedure is most effective
when group membership is a truly categorical variable; if group membership is based on values of a
continuous variable (for example, "high IQ" versus "low IQ"), you should consider using linear regression
to take advantage of the richer information offered by the continuous variable itself.

Related procedures. Use the Scatterplot procedure to screen your data for multicollinearity. If assumptions
of multivariate normality and equal variance-covariance matrices are met, you may be able to get a quicker
solution using the Discriminant Analysis procedure. If all of your predictor variables are categorical, you
can also use the Loglinear procedure. If your dependent variable is continuous, use the Linear Regression
procedure.
Logistic Regression Options

You can specify options for your logistic regression analysis:
Statistics and Plots. Allows you to request statistics and plots. Available options are Classification plots,
Hosmer-Lemeshow goodness-of-fit, Casewise listing of residuals, Correlation of estimates, Iteration
history, and CI for exp(B). Select one of the alternatives in the Display group to display statistics and plots
either At each step or, only for the final model, At last step.
Probability for Stepwise. Allows you to control the criteria by which variables are entered into and
removed from the equation. You can specify criteria for entry or removal of variables.
Classification cutoff. Allows you to determine the cut point for classifying cases. Cases with predicted
values that exceed the classification cutoff are classified as positive, while those with predicted values
smaller than the cutoff are classified as negative. To change the default, enter a value between 0.01 and
0.99.
Maximum Iterations. Allows you to change the maximum number of times that the model iterates before
terminating.
Include constant in model. Allows you to indicate whether the model should include a constant term. If
disabled, the constant term will equal 0.

Logistic Regression Output Report

Total number of cases:      270 (Unweighted)
Number of selected cases:   270
Number of unselected cases: 0

Number of selected cases:                 270
Number rejected because of missing data: 0
Number of cases included in the analysis: 270

Dependent Variable Encoding:

Original          Internal
Value             Value
1.00          0
2.00          1
_

Dependent Variable..        CHOICE

Beginning Block Number       0.   Initial Log Likelihood Function

-2 Log Likelihood       373.09859

* Constant is included in the model.

Beginning Block Number       1.   Method: Enter

Variable(s) Entered on Step Number
1..       ACOST
ATIME
AXTIME
INCOME
TCOST
TTIME
TXTIME
Estimation terminated at iteration number 4 because
Log Likelihood decreased by less than .01 percent.

-2 Log Likelihood         290.390
Goodness of Fit           251.560
Cox & Snell - R^2            .264
Nagelkerke - R^2             .352

Chi-Square      df Significance

Model                      82.709       7        .0000
Block                      82.709       7        .0000
Step                       82.709       7        .0000

Classification Table for CHOICE
The Cut Value is .50
Predicted
1.00    2.00    Percent Correct
1 I     2
Observed        +-------+-------+
1.00     1   I 111 I     33 I   77.08%
+-------+-------+
2.00     2   I   48 I    78 I   61.90%
+-------+-------+
Overall 70.00%

_

---------------------- Variables in the Equation -----------------------

Variable            B       S.E.       Wald    df         Sig       R    Exp(B)

ACOST         .0013        .0009      2.0163    1    .1556       .0066   1.0013
ATIME         .0334        .0277      1.4511    1    .2284       .0000   1.0339
AXTIME        .1760        .1463      1.4471    1    .2290       .0000   1.1925
INCOME       -.0120        .0026     20.8520    1    .0000      -.2248    .9880
TCOST        -.0139        .0106      1.7129    1    .1906       .0000    .9862
TTIME        -.0155        .0189       .6727    1    .4121       .0000    .9846
TXTIME       -.0183        .0502       .1327    1    .7157       .0000    .9819
Constant     2.7105        .9380      8.3494    1    .0039
Calibrating Binary Logit Mode-Split Models: Procedure

Example: Mode choice survey data

Person                    Chosen Mode*                        T auto       T transit
1                             1                               50            30
2                             1                               10            20
3                             0                               30            40

* 1 - Auto; 0 - Transit

Step 1: Identify Choice Set

A = {auto, transit} = {1, 0}

Step 2: Assume Utility Function for Each Mode

V1 = a1 Tauto + a2
V2 = a1 Ttransit + a3

(a1 is assumed to be generic parameter)

Step 3: Formulate the Logit Model

eV 1          1                           1
P( 1 )                     V 2 V 1
      a 1( Ttransit Tauto ) a 3  a 2
e e
V1     V2
1 e            1 e

1
            1 T   0
1 e

where: 1 = a1
0 = a3-a2
T = Ttransit - Tauto

Step 4: Use SPSS to Estimate Parameters (0, 1)

4.1 Prepare data for logistic regression for the assumed model structure
(in Excel)

Person               Chosen Mode*                T transit - T auto
1                        1                            -20
2                        1                             10
3                        0                             10
4.2 Import data to SPSS
- Import data from Excel file
- Name variables (maximum 8 characters)

4.3 Perform logistic regression
- From the menus choose "Statistics => Regression => Logistics…"
- Set "Dependent" variable = "Mode"
- Set "Covariates" = "Tdeta" (= Ttransit-Tauto)
- Click "OK"

4.4 Output
- l = -2 log likelihood
- Coefficients(0, 1)
- t statistics

Step 5: Examine the Quality of the Model

-   …

Step 6: Recover the Utility Functions

V1 = a1 Tauto + a2
V2 = a1 Ttransit + a3

Step 7: Go back to Step 2 if necessary

e.g.:

V1 = a1 Tauto + a2
V2 = a3 Ttransit + a4

(a1 and a3 are assumed to be mode specific parameters)

….

Step 8: Select the Best Model

```
Related docs
Other docs by tyndale