Logistic regression - DOC
Shared by: tyndale
-
Stats
- views:
- 215
- posted:
- 2/16/2010
- language:
- English
- pages:
- 7
Document Sample


Civ E 640: URBAN TRANSPORT PLANNING
Calibrating Binary Logit Mode-Split Models Using
Logistic Regression in SPSS
Get Started With SPSS
What is SPSS?
SPSS is a comprehensive and flexible statistical analysis and data management system. SPSS can
take data from almost any type of file and use them to generate tabulated reports, charts, and plots
of distributions and trends, descriptive statistics, and conduct complex statistical analyses.
SPSS is available on Polaris
Start SPSS
Login on to Polaris
Start => Program => Scienctific => SPSS 8.0 for Windows
SPSS Main Window
Quickest Way to Get Familiar with SPSS
- Start SPSS
- From the menus choose Help = > Tutorial
Logistic Regression (From SPSS User's Guide)
Logistic regression is useful for situations in which you want to be able to predict the presence or absence
of a characteristic or outcome based on values of a set of predictor variables. It is similar to a linear
regression model but is suited to models where the dependent variable is dichotomous. Logistic regression
coefficients can be used to estimate odds ratios for each of the independent variables in the model. Logistic
regression is applicable to a broader range of research situations than discriminant analysis.
To Obtain a Logistic Regression Analysis
From the menus choose:
Statistics
Regression
Logistic...
Select one dichotomous dependent variable. This variable may be numeric or short string. Select one or
more covariates - independent variables.
Click OK SPSS will automatically perform logistic regression and provide related regression statistics:
Statistics. For each analysis: total cases, selected cases, valid cases. For each step: variable(s) entered or
removed, iteration history, -2 log-likelihood, goodness of fit, ... For each variable in the equation:
coefficient (B), standard error of B, Wald statistic, R, estimated odds ratio (exp(B)), confidence interval
for exp(B). …
SPSS Output Window
Logistic Regression Data Considerations
Data. The dependent variable should be dichotomous. Independent variables can be interval level or
categorical; if categorical, they should be dummy or indicator coded (there is an option in the procedure to
recode categorical variables automatically).
Assumptions. Logistic regression does not rely on distributional assumptions in the same sense that
discriminant analysis does. However, your solution may be more stable if your predictors have a
multivariate normal distribution. Additionally, as with other forms of regression, multicollinearity among
the predictors can lead to biased estimates and inflated standard errors. The procedure is most effective
when group membership is a truly categorical variable; if group membership is based on values of a
continuous variable (for example, "high IQ" versus "low IQ"), you should consider using linear regression
to take advantage of the richer information offered by the continuous variable itself.
Related procedures. Use the Scatterplot procedure to screen your data for multicollinearity. If assumptions
of multivariate normality and equal variance-covariance matrices are met, you may be able to get a quicker
solution using the Discriminant Analysis procedure. If all of your predictor variables are categorical, you
can also use the Loglinear procedure. If your dependent variable is continuous, use the Linear Regression
procedure.
Logistic Regression Options
You can specify options for your logistic regression analysis:
Statistics and Plots. Allows you to request statistics and plots. Available options are Classification plots,
Hosmer-Lemeshow goodness-of-fit, Casewise listing of residuals, Correlation of estimates, Iteration
history, and CI for exp(B). Select one of the alternatives in the Display group to display statistics and plots
either At each step or, only for the final model, At last step.
Probability for Stepwise. Allows you to control the criteria by which variables are entered into and
removed from the equation. You can specify criteria for entry or removal of variables.
Classification cutoff. Allows you to determine the cut point for classifying cases. Cases with predicted
values that exceed the classification cutoff are classified as positive, while those with predicted values
smaller than the cutoff are classified as negative. To change the default, enter a value between 0.01 and
0.99.
Maximum Iterations. Allows you to change the maximum number of times that the model iterates before
terminating.
Include constant in model. Allows you to indicate whether the model should include a constant term. If
disabled, the constant term will equal 0.
Logistic Regression Output Report
Total number of cases: 270 (Unweighted)
Number of selected cases: 270
Number of unselected cases: 0
Number of selected cases: 270
Number rejected because of missing data: 0
Number of cases included in the analysis: 270
Dependent Variable Encoding:
Original Internal
Value Value
1.00 0
2.00 1
_
Dependent Variable.. CHOICE
Beginning Block Number 0. Initial Log Likelihood Function
-2 Log Likelihood 373.09859
* Constant is included in the model.
Beginning Block Number 1. Method: Enter
Variable(s) Entered on Step Number
1.. ACOST
ATIME
AXTIME
INCOME
TCOST
TTIME
TXTIME
Estimation terminated at iteration number 4 because
Log Likelihood decreased by less than .01 percent.
-2 Log Likelihood 290.390
Goodness of Fit 251.560
Cox & Snell - R^2 .264
Nagelkerke - R^2 .352
Chi-Square df Significance
Model 82.709 7 .0000
Block 82.709 7 .0000
Step 82.709 7 .0000
Classification Table for CHOICE
The Cut Value is .50
Predicted
1.00 2.00 Percent Correct
1 I 2
Observed +-------+-------+
1.00 1 I 111 I 33 I 77.08%
+-------+-------+
2.00 2 I 48 I 78 I 61.90%
+-------+-------+
Overall 70.00%
_
---------------------- Variables in the Equation -----------------------
Variable B S.E. Wald df Sig R Exp(B)
ACOST .0013 .0009 2.0163 1 .1556 .0066 1.0013
ATIME .0334 .0277 1.4511 1 .2284 .0000 1.0339
AXTIME .1760 .1463 1.4471 1 .2290 .0000 1.1925
INCOME -.0120 .0026 20.8520 1 .0000 -.2248 .9880
TCOST -.0139 .0106 1.7129 1 .1906 .0000 .9862
TTIME -.0155 .0189 .6727 1 .4121 .0000 .9846
TXTIME -.0183 .0502 .1327 1 .7157 .0000 .9819
Constant 2.7105 .9380 8.3494 1 .0039
Calibrating Binary Logit Mode-Split Models: Procedure
Example: Mode choice survey data
Person Chosen Mode* T auto T transit
1 1 50 30
2 1 10 20
3 0 30 40
* 1 - Auto; 0 - Transit
Step 1: Identify Choice Set
A = {auto, transit} = {1, 0}
Step 2: Assume Utility Function for Each Mode
V1 = a1 Tauto + a2
V2 = a1 Ttransit + a3
(a1 is assumed to be generic parameter)
Step 3: Formulate the Logit Model
eV 1 1 1
P( 1 ) V 2 V 1
a 1( Ttransit Tauto ) a 3 a 2
e e
V1 V2
1 e 1 e
1
1 T 0
1 e
where: 1 = a1
0 = a3-a2
T = Ttransit - Tauto
Step 4: Use SPSS to Estimate Parameters (0, 1)
4.1 Prepare data for logistic regression for the assumed model structure
(in Excel)
Person Chosen Mode* T transit - T auto
1 1 -20
2 1 10
3 0 10
4.2 Import data to SPSS
- Import data from Excel file
- Name variables (maximum 8 characters)
4.3 Perform logistic regression
- From the menus choose "Statistics => Regression => Logistics…"
- Set "Dependent" variable = "Mode"
- Set "Covariates" = "Tdeta" (= Ttransit-Tauto)
- Click "OK"
4.4 Output
- l = -2 log likelihood
- Coefficients(0, 1)
- t statistics
Step 5: Examine the Quality of the Model
- …
Step 6: Recover the Utility Functions
V1 = a1 Tauto + a2
V2 = a1 Ttransit + a3
Step 7: Go back to Step 2 if necessary
e.g.:
V1 = a1 Tauto + a2
V2 = a3 Ttransit + a4
(a1 and a3 are assumed to be mode specific parameters)
….
Step 8: Select the Best Model
Get documents about "