Discrete time logistic regression
Applied Longitudinal Data Analysis
JD Singer and JB Willett
Discrete time logistic regression
• Time to event data
• Analyzed by logistic regression
Time
• Time is treated as intervals
• Outcome did or did not occur in the
interval
• Fits data where measurements are made
at intervals (e.g., yearly)
• Feasible to group data even if measured
exactly
Beginning of time
• Intervals must start from a defining
moment
Release from the hospital
Turning 40
• Choice of the beginning can substantially
affect the results
Person-period data set
ID PERIOD EVENT
20 1 0
20 2 0
20 3 1
----------------------
126 1 0 Different numbers of intervals
126 2 0 for participants are ok
126 3 0
126 4 0
126 5 0
126 6 0
126 7 0
126 8 0
126 9 0
126 10 0
126 11 0
126 12 1
----------------------
129 1 0
129 2 0
129 3 0
Proportion of events by periods
PERIOD event total proportion
1 456 3941 0.11571
2 384 3485 0.11019
3 359 3101 0.11577
4 295 2742 0.10759
5 218 2447 0.08909
6 184 2229 0.08255
7 123 2045 0.06015
8 79 1642 0.04811
9 53 1256 0.04220
10 35 948 0.03692
11 16 648 0.02469
12 5 391 0.01279
Study problem
• Grade at first heterosexual intercourse by
presence or absence of parenting
transition prior to the seventh grade
• Boys who were virgins at the start of the
seventh grade
Hazard and survival for all participants
TIME Left Failed Hazard Survival
7 180 15 0.08333 0.9167
8 165 7 0.04242 0.8778
9 158 24 0.15190 0.7444
10 134 29 0.21642 0.5833
11 105 25 0.23810 0.4444
12 80 26 0.32500 0.3000
Hazard and survival curves by parental
transitions (PTs)
Indicator coding
Coding of indicators for time intervals
PERIOD D1 D2 D3 D4 D5 D6 D7
1 1 0 0 0 0 0 0
2 0 1 0 0 0 0 0
3 0 0 1 0 0 0 0
4 0 0 0 1 0 0 0
5 0 0 0 0 1 0 0
6 0 0 0 0 0 1 0
7 0 0 0 0 0 0 1
D1-D7 represent the time intervals
Regression model
Logith(tj)=
α7D7+α8D8+α9D9+α10D10+α11D11+α12D12+
βpPT
Regression results
Fitted
Variable Estimate Odds Hazard
D7 -2.9943 0.05007 0.04768
D8 -3.7001 0.02472 0.02412
D9 -2.2811 0.10217 0.09270
D10 -1.8226 0.16161 0.13912
D11 -1.6542 0.19124 0.16054
D12 -1.1791 0.30757 0.23522
PT 0.8736 2.39556 0.70550
Model hazard and survival curves
Regression model
Logith(tj)=
α7D7+α8D8+α9D9+α10D10+α11D11+α12D12+
β1PT+β2PAS
PAS is a measure of antisocial behavior
Estimated hazards
What if the data are sparse for some
time intervals?
• You can fit a continuous hazard function,
such as a polynomial function
Cubic versus general models
Study example
• Outcome: first depressive episode
• Predictors: age, gender, number of
siblings (nsibs), and parental divorce (PD)
• Time: age fit as a cubic model
PD (parental divorce) is a
time-varying covariate
ID PERIOD EVENT AGE FEMALE PD NSIBS
8 29 0 51 1 1 1
8 30 0 51 1 1 1
8 31 0 51 1 1 1
8 32 0 51 1 1 1
8 33 0 51 1 1 1
8 34 0 51 1 1 1
8 35 1 51 1 1 1
9 4 0 50 0 0 9
9 5 0 50 0 0 9
9 6 0 50 0 0 9
9 7 0 50 0 1 9
9 8 0 50 0 1 9
9 9 0 50 0 1 9
9 10 0 50 0 1 9
Shifted curves from regression model
People switch
curves if they
have a parental
divorce
State or rate dependence
• A model that links contemporaneous
values of time-varying predictors cannot
confirm the direction of the link
• The outcome (or the immediate rate of
change might influence the predictor)
Lagged exposures
• Using exposure measurements from the
previous period can address this concern
• Comparing models with lagged and
concurrent exposures shows both the
strength of previous and concurrent
exposures
Non-linear effects
Parameter DF Estimate Error Chi-Square Pr > ChiSq
ONE 1 -4.5001 0.2067 474.0222 ChiSq
ONE 1 -4.4828 0.1087 1700.3629 <.0001
age_18 1 0.0614 0.0117 27.7226 <.0001
age_18sq 1 -0.00729 0.00122 35.4867 <.0001
age_18cub 1 0.000182 0.000079 5.2809 0.0216
PD 1 0.3710 0.1623 5.2265 0.0222
FEMALE 1 0.5581 0.1095 25.9854 <.0001
BIGFAMILY 1 -0.6108 0.1446 17.8472 <.0001
Non-proportional hazards: test with an
interaction with time