Learning Center
Plans & pricing Sign in
Sign Out



									Survival, Duration, and Hazard Models
and Their Applications in Finance and
                  Zongwu Cai

       University of North Carolina at Charlotte
       Shanghai Jiaotong University, China

1.   Introduction

The use of duration (or survival) models is
relatively recent in economics although they
have been extensively used in engineering and
biomedical research for many years. Survival
analysis, the term used within the biomedical
tradition, is concerned with a group of individuals
for whom a point event of some kind is defined.
This point event is often referred to as a failure.
The event of failure occurs after a length of time
called the failure time and can occur at most once
for any individual or phenomenon under scrutiny.
Survival analysis can also apply to other
phenomenon and need not simply be concerned
with individuals or groups of individuals.
Examples of failure times include the life-time of

machine components in engineering and
electronics, the time taken by subjects to
complete a psychological experiment, the
survival time of patients in a clinical trial, the
time to failure of a business, the duration of an
industrial strike, or the duration of a period of
unemployment experienced by an individual.
Given there is only a single response, although
there may be many explanatory variables, survival
analysis is best understood as a univariate rather
than as multivariate phenomenon.

Let me show you some applications in economics.
Duration analysis is a core subject of
econometrics. Since 1980s, the empirical analysis
of duration variables has become widespread.
There are a number of distinct reasons for this
development. First of all, many types of
behaviour over time tend increasingly to be
regarded as movements at random intervals
from one state to another. Examples include
movements by individuals between the labour
market states of employment, unemployment and
non-participation, and movements between
different types of marital status. This development
reflects the fact that dynamic aspects of
economic behaviour have become more
important in economic theories and that in these
theories the arrival of new information (and
thus the change in behaviour in response to this)

occurs at random intervals. Secondly,
longitudinal data covering more than just one spell
per respondent are widely available in labour
economics, as well as in demography and medical

Applications of duration analysis include:

In labour economics, study the duration of
unemployment and the duration of jobs [see e.g.,
the survey by Devine and Kiefer (1991)], strike
durations [e.g., Kennan (1985)], and the duration
of training programs [Bonnal, Fougere and
Serandon (1997)].

In business economics, duration models have
been used to study the duration until a major
investment [e.g., Nilsen and Schiantarelli (1998)].

In population economics, duration analysis has
been applied to study marriage durations
[Lillard (1993)], the duration until the birth of a
child [Heckman and Walker (1990)], and the
duration until death.

In econometric analyses dealing with selective
observation, duration models have been used to
study the duration of panel survey participation
[e.g., Van den Berg and Lindeboom (1998)].

In marketing, duration models have been used
to study household purchase timing [e.g.,
Vilcassim and Jain (1991)].

In consumer economics, it is to study the
duration until purchase of a durable or storable
product [Antonides (1988), Boizot, Robin and
Visser (1997)].

In migration economics, study the duration until
return migration [e.g., Lindstrom (1996)].

In macro economics, study the duration of
business cycles [e.g., Diebold and Rudebusch

In finance, study the duration between stock-
market share transactions [Engle and Russell

In political economics, study the duration of
wars [see Horvath (1968)].

In industrial organization, study the duration of
a patent [Pakes and Schankerman (1984)].

In international trade market, it is used to
model the duration of time between order
submission and finding a match for trade
execution [Melvin and Wen, 2003].

     There are two main components for survival
analysis: “failure time” and “censoring time”.

2.   Defining Failure Times

There are three pre-conditions required to
determine failure times precisely. Firstly, a time
origin must be unambiguously defined. Although a
precise definition is required for the time origin,
this need not be represented by the same calendar
time for each individual. Most clinical trials, for
instance, have staggered entry, and entry into
unemployment, for instance, is also staggered by its
nature. There may be cases, of course, where mass
redundancies due to plant closure in a given area
lead to the time origin having the same calendar
date for a large proportion of the sample of
unemployed in a given region.

Secondly, a scale for measuring the passage of time
must be agreed. The scale for measuring the
passage of time is usually “clock” or real time but
this can vary depending on the application. For
instance, in engineering the operating time of a
system to a fault, the mileage of car to a
breakdown, or the cumulative load to a collapse
provide a set of alternative measurement scales. In
terms of economic applications, however, the scale
is always real time with real time measured on

person-specific clocks that are set to zero at the
moment the person enters the state in question.

Thirdly, the meaning of failure must be
unambiguous and understood. The meaning and
interpretation of the point event must be
transparent. In medical work, failure could mean
death from a specific cause (e.g., lung cancer) but
what if death is from a source unrelated to cancer?
In modelling unemployment durations we must
be clear what constitutes failure. For instance,
an individual may exit unemployment through
undertaking training, gaining employment,
retiring from the labour market, or dying. We
have to be clear which exits we wish to include in
our analysis to ensure that the modelling is

3.   Censoring

The major problem with analysing survival data is
that the data are invariably censored in one way or
another.     The common cause in economic
applications is that the measurement is
undertaken when the process of interest is still
ongoing. Thus, if we obtain sample spells of
unemployment drawn from surveys, these will
include some individuals who are unemployed at
the time of the survey. If the survey was
undertaken at time period ci, for these individuals

duration or survival is to time ci but is not equal to
it. The data are thus censored as those individuals
who are unemployed at this time could continue in
unemployment for considerably longer than ci.
Estimation must take account of this form of
censorship. The consequences of ignoring data
censored in this way are analogous to the
censorship problems discussed in the regression
analysis in either Econometrics course or Statistics

Survival and Hazard Analysis

Accounts for censoring: Data that are censored at
time t (can be right and/or left censored)






                        a                       b

Suppose the figure represents observations on
patients selected to participate in a therapy study.

  • Observation 1 is not observed, since it does
    not fall within the time period of observation
  • Observation 2 is left and right censored
    because it is not known when driving began
    and the first accident is not observed in the a
    to b time interval.
  • Observation 3 is complete with both start and
    ending times in the observed period.
  • Observations 4 and 6 are left censored
  • Observation 5 is right censored.
Right censoring is easier to accommodate
analytically than left censoring in classical
duration/hazard models.

We can be slightly more formal than this and
introduce some notation. In the absence of
censoring, the ith individual in a sample of n has
failure time denoted by Ti, a random variable.
We assume that there is a period of observation
(ci ) such that the observation on that individual
ceases at ci if failure has not occurred by then.
The observations consist of Xi = min[Ti, ci ]
together with the indicator variable yi = 1 if Ti 
ci (the set of uncensored observations) and yi = 0
if Ti > ci (the set of censored observations). The
ci of individuals who are observed to fail (i.e., yi
= 1) are referred to as unrealised censoring
times whilst the ci of individuals who are

observed not to fail (i.e. yi = 0) are referred to as
realised censoring times.

4.   The Hazard Function

4.1 Model

Econometricians use the term spell length to
describe time occupancy or duration of a given
state. The spell length is usually represented, as
above, by a random variable, which is denoted by
T. T is assumed to be a continuous random
variable and we assume a large population of
people enter some given state at a time identified
by T = 0. As stated above in Section 2, the
calendar time of state entry need not be the
same for all individuals. T is thus the duration of
stay in the state. The population is assumed to be
homogeneous implying that everyone‟s duration of
stay will be a realisation of a random variable from
the same probability distribution.

If we define the probability that a person who has
occupied a state for a time t leaves it in the short
interval of length t after t as:

P[t  T  t + t  T  t]                    [1]

The conditioning event (T  t) in [1] is the event
that the state is still occupied at t. In other words,
the conditioning event is that the individual has not
left the state before time t. If we divide [1] by t
we obtain the average probability of leaving per
unit of time period over a short interval after t.
If we take this average over shorter and shorter
intervals we can formally define:

                          prob[ t  T  t  t T  t ]
( t )  lim   t    0

as the hazard function. It is the instantaneous rate
of leaving per unit of time period at t. The
interpretation of (t)t (sometimes written as
(t)dt) is the probability of exit from a given
state in the short interval of time t after t,
conditional on the state being occupied at time t. It
is also possible to specify the probability
unconditionally (i.e., without the condition T  t).
This is a considerably different concept from the
hazard. For instance, in the context of mortality
data, the hazard function gives the probability that
a fifty-year old person will die whereas the
unconditional concept gives the probability that a
person will die at fifty. In terms of relative
frequencies (50)t gives the proportion of fifty
year olds who die within a small interval (t) of
their fiftieth birthday. The unconditional concept

gives the proportion of people (ever born) who die
within a small interval (t) of their fiftieth birthday.
These concepts are clearly different. It will prove
convenient to express the hazard function and the
unconditional probability of exit in terms of
distribution and density functions of a continuous
random variable T.

Define P[T < t] = F(t) and note trivially that P[T 
t] = 1 – F(t). One minus the distribution function is
an expression that re-occurs in applications
involving duration or survival data. It is known as
the survivor function since it gives the probability
of survival to time t. In terms of frequencies, it
gives the proportion of a given population who stay
(or survive) at least t years. Because of its special
significance in these applications it has its own
special notation: 1 – F(t) = F (t)

Now recall that the derivative of the cumulative
distribution function is the density function:
F(t)/t = f(t) and recall also that the conditional
probability may be written as f(xy)=f(x,y) /f(y).

We can use this last result to write an expression
for the conditional probability [1] as

                               prob[ t  T  t  t , T  t ]   [3]
prob[t  T  t  t T  t] 
                                       prob[T  t]

The numerator of [3] is the joint probability that
someone leaves a given state in the time interval
specified and that the duration is given by T  t.
We should note that there is an intersection of the
two sets {t  T  t + t} and {T t} here. The
latter is subsumed in the former. This allows us to
re-write [3] as:
                                     prob[ t  T  t  t]
 prob[t  T  t  t T  t] 
                                         prob[T  t]

In terms of the distribution function [4] can be re-
expressed as:

                                    F(t  t)  F(t)
prob[t  T  t  t T  t] 
                                        1  F(t)

If we divide expression [5] through by t and take
the limit as t approaches zero we obtain:

                       prob[ t  T  t  t T  t]                  F(t  t)  F(t) 1
( t )  lim   t  0
                                                      lim t  0
                                                                          t        1  F(t)

The final part of expression [6], excluding the
reciprocal of the survivor function, is the derivative
of the distribution function with respect to t, which,
of course, is the density function. It should be
noted that F(t+t) – F(t) = F(t) and recall F(t)/t
= f(t). This allows us to write [6] as:

             f(t)    f(t)
( t )            
           1  F(t) F(t)

This provides a more compact expression for the
hazard function. This expression also allows us to
see the difference between the unconditional exit
probability which is the area under the probability
density function of T from t to t + t [i.e., f(t)t].
This is different (except at the extreme case t=0
where F =1) from the conditional exit probability,
which is given by f(t)t . Note also that expression
[7] is similar to the expression for the lower
truncated or conditional probability density
function.      For this reason, [7] is sometimes
erroneously referred to as a conditional probability
density function. It is not since it is a function of t
defined over the whole non-negative axis and not
just a truncated part of it. Prior to proceeding
further it is worth noting that (t) can be expressed
in an alternative way. The log of the survivor
function can be written as: log[1–F(t)].
What is dlog[1 F(t)] ? If we code 1–F(t) = z, this can
be obtained using the chain rule as:
dlog[1 F(t)]                                          f(t)
                   = dlog[ z] dz =
                       dz dt
                                         [–f(t)] =   1  F(t)
                                                               ( t )   .
This can be expressed as:
 ( t )  dlog[1  F(t)]/dt  dlog[F(t)]/dt                                 [8]
 dlog[1  F(t)]/dt= - (t)                                                    [8]

 4.2 An example of using simulated data:

 A. Data structure:
Account Number   Age   Utilization Credit Score   Lien Position   Rate Difference   Event

   435430        44     86.9%           684                   1        0.75          0
   435430        45     86.1%           684                   1        0.50          0
   435430        57     92.1%           704                   1        0.25          0
   435430        58     94.0%           721                   1        0.25          0
   435430        59     94.7%           721                   1        0.25          0
   451620        63      0.0%           641                   2        1.25          0
   451620        64      0.0%           641                   2        1.25          0
   451620        65      0.0%           641                   2        1.75          0
   451620        70      0.0%           649                   2        1.50          0
   451620        71      0.0%           647                   2        1.75          0
   451620        72      0.0%           647                   2        1.25          2
   452750        49      0.0%           620                   2        0.75          0
   452750        50     13.4%           620                   2        1.00          0
   452750        63     96.1%           644                   2        1.50          0
   452750        64     95.8%           644                   2        1.50          1

 B. Description of Variables:
   • Account Number
   • Age: Account age in months.
   • Utilization: ratio of account balance over
     account limit.
   • Credit Score: A score to represent
     customer’s credit history.
   • Lien Position: Denote the lien position of the
     bank on the collateral (first mortgage or
     second mortgage).
   • Rate Difference: Difference between
     customer’s APR and prevail interest rate in

• Event: 1 denotes default, 2 denotes prepay,
  0 denotes revolving.

C. Model:

• Due to the flexibility, we use Cox’s semi-
  parametric proportional hazards model to
  fit the data.
• We take a simple competing risk method to
  fit prepay and default hazards separately.
  When fitting prepay, the rest is regarded as
  censoring. When fitting default, the rest is
  regarded as censoring.
• We use PROC PHREG in SAS to fit the
  model. Of course, other statistical package is
  applicable too such as R. In R, the command
  is “Surve”, “coxph”, and “survreg”.

  D. SAS Output:

  • Prepay

                    Parameter Standard                       Hazard
     Variable    DF Estimate    Error  Chi-Square Pr > ChiSq Ratio

   UTILIZATION   1   0.36496     0.09489      14.7926    0.0001   1.44
   RATE_DIFF     1   0.16309     0.07333       4.9463    0.0261   1.177

  • Default

                     Parameter Standard                       Hazard
      Variable    DF Estimate    Error  Chi-Square Pr > ChiSq Ratio

   CREDIT_SCORE 1     -0.02248      0.00436    26.6486      <.0001   0.978

E. Interpretation:
  • Prepay
       – The higher the utilization, the higher
         the financing leverage, more likely to
       – The bigger the rate difference, the
         lower the prevailing interest rate, the
         bigger the incentive to prepay.

  • Default
     – The higher the credit score, the better
        the customer’s credit, less likely to

5 Modelling the Hazard Rate and Duration

Plotting hazard functions can provide some
useful insights into how the exit probability is
behaving with state duration. The nature of the
relationship between the hazard rate and the
duration is known as duration dependence. For
instance, it is possible to have

(a) /t = 0. In this case, the hazard rate may be
constant suggesting that the instantaneous rate of
exit is invariant to spell duration.

(b) /t > 0 suggesting positive duration
dependence where the instantaneous rate of exit
increases with spell duration (e.g., employment/ job
durations may provide an example of this – the
longer you have been in a given job the greater the
likelihood of a quit)

(c) /t < 0 suggesting negative duration
dependence where the instantaneous rate of exit
decreases with spell duration (e.g., in labour
economics, the exit rate from unemployment is
seen to decline with duration).

There are a number of special distributions that are
useful for modelling duration data. The use of

specific distributions to model time to failure
assumes what is called a parametric approach. We
now turn to examine three popular distributions in
failure time analysis.

5.1 The Exponential Distribution

We could focus on (a) above in the first instance
and specify a model incorporating a process that
has no memory. Thus, the conditional probability
of failure in a given short interval is the same
regardless of when the observation is made. The
cumulative distribution function for the exponential
distribution is given by:

F(t) = 1 – exp[–t] for t  0 and  is a positive
parameter. The corresponding density function is
given by:
        =  exp[–t] = f(t)

The hazard function is thus expressed as:
             f(t)    f(t)       θexp[-θt]
( t )            
           1  F(t) F(t)
                            =   exp[ θt]
                                                  [9]

 The hazard is a constant and thus independent of
time. This distribution has been used to model the
time until failure of electronic components
primarily because of the memoryless property of

the distribution. It is completely described by one
parameter (). Each unique value of  determines a
different exponential distribution thus implying the
existence of a family of exponential distributions.
The distribution is skewed to the right and the
values of the random variable, T, can vary from 0
to . However, given that the distribution has only
one adjustable parameter, methods based on it are
very sensitive to modest departures in the tail of
this distribution. This fact, and the inherent
constancy of the hazard, has encouraged applied
economists to look at alternative distributions.

5.2 The Weibull Distribution

In contrast to the hazard function for the
exponential distribution which is invariant to the
duration, the Weibull distribution is monotonically
increasing or decreasing in duration depending on
certain parameter values. This is clearly less
restrictive. The survivor function can be written as

F   (t) = exp[–(t)]                        [10]

and the cumulative distribution function is given by
F(t) = 1 – exp[–(t)]                        [11]

The derivative of expression [11] with respect to t
yields the density function which, in this case, is

f(t) = t-1exp[–(t)]                    [12]

and the hazard function can be written as:

f(t)/ F (t) = (t) = t-1                 [13]

This is the Weibull family of distributions and has
been the most extensively used in applied
econometric duration studies. The popularity of
this family of distributions is attributable to the
simplicity of the expressions [10] to [13]. In
contrast to the exponential case, the hazard is not
constant and can either rise or fall with duration.
The  parameter is known as the scale parameter
and  the index parameter. The behaviour of the
hazard depends on the parameter . The following
should be noted from expression [13]:

If  >1 this implies /t > 0 and suggests and
increasing hazard rate and positive duration

If  = 1 this implies /t = 0 and suggests a
constant hazard rate and no duration dependence.

if  < 1 this implies /t < 0 and suggests and
decreasing hazard rate and negative duration

Its clear from the above that use of the exponential
distribution could be problematic if the duration
data are characterized by either positive or negative
dependence. This is one reason why the Weibull is
more popular among economists. However, its
limitation is that it only allows for increasing or
decreasing hazards and not combinations of both.
In other words the hazard is monotonic in t. It is
possible that the data are not consistent with such

6   Maximum Likelihood Estimation

The parameters  and  of the Weibull distribution
can be estimated by maximum likelihood
procedures. The likelihood functions bear an
uncanny resemblance to the functions for the
standard censored Tobit likelihood function
reported in lecture five. The likelihood function in
this case could be defined as:

 =  f(t  ,)  F (t  ,)               [14]
    Uncensored   Censored

For some distributions it can be more tractable to
formulate the likelihood function in terms of the
hazard function so:

 =  (t  ,)  F (t  ,)                [15]
    Uncensored    Censored

The likelihood functions can be expressed as logs
and the usual algorithms can be invoked for the
estimation of its parameters. The inverse of the
information matrix can be used to compute the
asymptotic variance-covariance matrix for the
parameter estimates.

7   Exogenous Variables and Duration Analysis

A limitation of the duration models outlined so far
is that there has been no role for external factors in
the survival distribution. There are measured
differences in individuals or firms that may
influence their survival chances in any given
application.      Regressors can be introduced
relatively easily into the duration models
encountered so far. These regressors are called
“covariates” and can either be time-invariant
covariates or time-varying covariates. For instance,
gender and race are time invariant covariates but
the age of an individual or company or the state of

the economy (i.e., the unemployment rate) are
time-varying covariates. The use of time-varying
covariates, however, poses problems for some
duration models.       The specification of the
likelihood function is considerably more complex
and is not discussed here. However, a simple
approach is suggested below that reduces the
complexity of the problem.

The extension to the hazard specification is
relatively straightforward for time-invariant
covariates. For instance, recall the hazard function
for the Weibull model:

f(t)/ F (t) = (t) = t-1

In this case the covariates are introduced as a
function of  where  = exp[xi] and the xi
includes a constant term and a set of regressors that
are assumed not to change from T=0 to the failure
time T=t. These models can sometimes be cast as
“accelerated failure time” (AFT) models. The
effect of the covariates in accelerated time models
is to re-scale time. In other words, here a covariate
accelerates (or decelerates) the time to failure. This
is in contrast to a hazard model (e.g., a proportional
hazards model) where the role of the covariate is to
change the hazard rate.

The advantage of the AFT model is that it has a
linear regression model interpretation if =1 (i.e.,
the exponential case) of the follow form:

ln(Ti) = –xi + vi                          [16]

where vi is a random error that follows some
continuous distribution and Ti is time.          The
distribution of the random term does not involve
either x or  and the regression model is thus
homoscedastic. On the other hand, if  1 (e.g.,
the Weibull case), then the regression model is
non-linear and interpreting the estimated effects for
the covariate on time to failure is more difficult.
The Weibull specification (with the exponential as
the special case when =1) is the only member that
belongs to both the AFT family of models and the
proportional hazards family of models. We now
turn to the proportional hazards model.

8   Proportional Hazards Model

If we introduce a covariates‟ function, then hazard
function is expressed as:

(x;t) =k1(x)k2(t)                           [17]

where k1 and k2 are the same functions for all
individuals. The baseline hazard is common to all

units in the population and does not vary across
individuals.    Individual     hazards     differ
proportionately based on the realizations of the
covariates. Note that the proportional hazard
model does not contain a constant term as the
baseline hazard represents an individual specific

The model is called a proportional hazard model
because for two individuals with regressor
realizations x1 and x2, the hazards for two
individuals are in the same ratio k1(x1)/k1(x2) for all
t. The proportionate effect of x on the hazard is the
same for all dates. Thus, if being over fifty years
of age lowers the probability of exit from
unemployment on the first day by two percent, it
lowers the probability of leaving on the hundredth
day by the same amount. This is a relatively
restrictive form and it should be noted, in passing,
that there is no obvious reason why hazards should
be proportional with duration data drawn from
economic applications.

Given a proportional hazard of the form expressed
in [17], Cox (1972) suggested a partial maximum
likelihood method for its estimation.        The
conditioning approach adopted effectively sweeps
out the baseline hazard function in an analogous
way to how fixed effects are removed in panel
models. The advantage of this approach is that it

provides maximum likelihood estimates of the 
vector of covariates without specifying the baseline
hazard.     This is generally known as Cox‟s
proportional hazards model. This model has some
inherent disadvantages in the sense that we may be
interested in knowing the nature of duration
dependence in a particular application and since the
baseline hazard is not specified, this is difficult to
infer. This is one reason why a more general
Weibull model is usually preferred in many
duration applications.

Note that there are some extensions in literature to
relax the Cox model. For example, the time-
varying coefficient Cox model had been considered
by Cai and Sun (2003).

9   The Weibull Hazard Model

This turns out to be one of the more popular
parametric      duration     models     in    applied
econometrics.      It provides estimates for the
baseline hazard and the covariate vector. The
estimates for the baseline hazard provide
information on the nature of duration dependence
and so this may be important from a policy
perspective. Sometimes these models are referred
to as non-stationary since they allow for duration
dependence. Under the assumption of a Weibull
specification, the baseline hazard is expressed as:

k2(t) =t-1

We now need to assume some functional form for
k1(xi) and the most tractable generally is:

k1(xi)= exp(xi)

Lancaster (1979) modelled unemployment
durations for a sample of the unemployed stock
drawn at a date during 1973 and interviewed some
five weeks later using exactly this type of
specification. The Weibull hazard is:

i(x;t) = exp(xi) t-1                     [18]

The choice of the functional form for the covariates
(k1(x)) is one of the most commonly used and this
is because it renders the hazard a log-linear
function of the covariates. This facilitates a cleaner
interpretation of the covariate effects.

Lancaster‟s study used this type of model for a
sample of male unemployed. He was concerned
with testing whether =1. The estimate for  of
0.7 suggested a decreasing hazard rate with time.
Lancaster, however, reported that his estimated 
increased as additional covariates were added to the
specification. This suggests that the decreasing

hazard implied by =0.7 may not be indicative of
true duration dependence but to omitted variables.
This problem is known as heterogeneity in duration
models and failure to account for it in one way or
another creates a problem generally described as
“neglected heterogeneity” (see next section).

In contrast to the AFT model, the effect of the
covariates is on the hazard not on the time to
failure. However, we can use the Weibull hazard
function estimates to inform of the effect on
duration, as many investigators are more interested
in the effects of a change in the covariate on the
expected (or average) duration associated with a
particular state. In the Weibull hazard model,
although the derivations are somewhat convoluted,
it can be shown that expected (or average) duration
can be expressed as:

                                ' x 
E(T) =   
                1 exp  
                                     
                                                            [19]
                                        

where (·) is the Gamma CDF operator. Note that
(2) = 1.

If we take logarithms we get:

ln(E(T)) =      
                        ' x
                                   + constant     if   1

an alternative way of expressing this is in terms of
log duration for the ith observation as:

ln(Ti) = –xi + ui                           [20]

where ui is an error term and  is the duration
dependence parameter from the Weibull hazard
model. In this case the expected value of the error
term ui is non-zero since it depends on the Gamma
operator. However, it is obviously independent of
xi. If we want to get the effect of the covariate on
the expected log duration, we take the derivative as
ln(E(T))      
            = –                               [21]

This makes intuitive sense for a number of reasons.
Firstly, if the covariate x has a positive (negative)
effect on the exit hazard (i.e., the instantaneous exit
rate), then the covariate should exert a negative
(positive) effect on the expected log duration in the
state in question. For instance, a high level of
education should increase the instantaneous exit
rate from unemployment and thus reduce the
expected unemployment duration. Secondly, the
scaling of the covariate‟s coefficient in this case
also makes sense. If  < 1 ( > 1), we have
negative (positive) duration dependence, and the
effect of a small change in the covariate on the

expected duration is thus higher (lower) relative to
the no duration dependence case.

It is also evident that if there is no duration
dependence ( =1), a conventional log-linear
interpretation of the estimated effect is warranted.
This is the case of an exponential distribution in
failure times. Note that if  =1, (2) = 1 and

E(T) =      exp(' x)   and thus ln(E(T)) =    ' x

Thus, in the exponential case:
            =   

10 Neglected Heterogeneity

The problem of heterogeneity can be viewed as the
result of an incomplete specification.          The
inclusion of individual specific covariates is
designed to incorporate observation specific
effects. If the model specification is incomplete,
and if systematic individual differences in the
distribution remain after the observed effects have
been controlled for, then inferences based on a mis-
specified model may be incorrect. This was the
problem Lancaster faced when trying to draw
inferences about the presence of duration

dependence. This problem is akin to specification
analysis in the linear regression model. Duration
analysis can be extended to handle neglected
heterogeneity. Lancaster proposed an alternative
specification for the hazard function as:

i(x;t) = ii(x;t)                        [22]

where i is an unobservable random variable
independently and identically distributed as
Gamma(1, 2). This random variable may be
regarded as a proxy for all unobservable exogenous
variables. This is the mechanism through which
unobservables are captured and is analogous to the
inclusion of an error term in a regression model.

After some extensive algebra, which we do not
pursue, the hazard function can be written as
(suppressing the i subscripts):

*(x;t) = (t)[1 – F*( x;t)]2             [23]

Where F*(x;t) is the expected value of the
cumulative distribution function conditional on v.

This used to be a relatively common approach to
modelling neglected heterogeneity. Because [1 –
F*(x;t)]2 is a decreasing function of t, [23]
demonstrates that heterogeneity introduces a
tendency for a decreasing hazard rate. Estimation
of this model is readily done through the use of
maximum likelihood techniques and involves the
estimation of an additional parameter 2. It should
be noted that if 2 = 0 (i.e., the variance of vi is
zero) there is no heterogeneity present in the data
and the model collapses to *(t) = (t) or the
standard Weibull model in this particular case. In
the Lancaster study of unemployment duration, the
maximum likelihood estimate of  rises from 0.7 to
0.9 confirming that the negative duration
dependence in the Weibull model is more
attributable to neglected heterogeneity than to pure
duration dependence effects. The intuition for this
is obvious. The more mobile and employable
individuals are more likely to be the first to leave
unemployment leaving the least mobile and less
employable behind hence creating the illusion of a
stronger negative duration dependence than
actually exists. It can be mathematically shown
that uncontrolled unobservables bias the estimated
hazards towards negative duration dependence.
Failure to control for observables has similar
effects but the direction of the bias can‟t be known
a priori. All of the foregoing has important
implications for policy.

However, the approach has had its critics. In
particular, it has been argued that the approach
tends to over-parameterize the survival distribution
and leads to serious errors in inference. In

particular, Heckman and Singer (1984, Journal of
Econometrics) provide estimates that vary
dramatically depending on the functional form
specified and the mixing distribution used for the
neglected heterogeneity. In addition, the choice of
distribution for heterogeneity (for instance, the
Gamma distribution in our case above) is not
motivated by regard to any obvious economic
consideration but more by mathematical

11 Discrete-Time Duration Models

11.1 Kaplan–Meier Product Limit Estimator

It may sometimes be convenient to graph the
survivor function associated with particular dataset.
A simple non-parametric procedure could be used
to do just that. The Kaplan–Meier survivor
function provides one such procedure and this can
be expressed as:
             k    ni  hi
F   (Tk) =   
             i 1    ni

where: k is the number of distinct survival times;
Tk is the risk set at the kth survival time; nk is the
size of the risk set at time k; and hk is the number
of observation spells completed at time k. The
computation of this measure can be illustrated
using an example in the following table:
Table 1: Employer Change for 200 Employees

Year                  Number                   Risk Set
1                     11                       200
2                     25                       189
3                     10                       164
4                     13                       154
5                     12                       141
Greater                0
than 5

The risk set at the start of the time is 200.
Therefore, n1= 200. In the first year 11 individuals
leave their employer implying h1 = 11;

At the start of the first year: F (T1) = 200 11  0.945
At the start of the second year, the risk set is 200 –
11 = 189 = n2 and h2 = 25. Therefore:
                189  25
F (T2)  0.945          = 0.819
At the start of the third year: n3 = 164 and h3 = 10.
                             164 10
F   (T3)    0.945 0.819
                                       = 0.7268

At the start of the fourth year: n4 = 154 and h3 = 13.

F(T4)  0.945 0.819 0.7268 154 13 = 0.515
At the start of the fifth year: n5 = 141 and h3 = 12.

F (T5)  0.945  0.819  0.7268  0.515  141 12 = 0.471
In this case the survivor function declines steadily
over the five years of interest. After the fifth year
the data points could be viewed as being censored.
It is worth noting that the formula for the survivor
function is not affected if hk = 0 (i.e., when there
are no exits from the state in question). These
estimates for the survivor function obtained above
can be shown to be maximum likelihood estimates.

As we have already seen, however, economists
tend to be more interested in hazard functions than
survivor functions. We could use the above data to
compute hazard rates using the Kaplan–Meier
hazard formula. This is defined as:
            hk
   (Tk) =   nk

This is a non-parametric empirical approach that
imposes no restrictions on the data in the way that
the parametric (i.e., weibull or exponential) models
encountered earlier do. Given the data above, the
hazard rates for the five spells at risk could be
computed as follows:

        11                            25                      10
(T1) = 200 = 0.055;  (T2) =          189
                                             = 0.132;  (T3) =   164
= 0.061;
             13                      12
   (T4) =   154
                   = 0.084;  (T5) =   141
                                             = 0.085;

The interpretation of the first hazard rate estimate
is that there is a 5.5% chance of exiting the state in
the first year. The interpretation of the second
hazard estimate is that conditional on surviving to
the second year, there is a 13.2% chance of exiting
the state. These are clearly useful estimates to have
at ones disposal. However, the approach does not
allow for the introduction of any covariates. In the
Kaplan–Meier procedure the unit of observation for
the empirical analysis is no longer the individual
but the spell at risk of the event occurring. This
point will be developed a little further in an attempt
to introduce covariates.

11.2   Using a Logistic Model for Discrete
Duration Modelling

Duration or failure time analysis is usually situated
in a continuous time framework and this approach
has dominated the discussion thus far. There are
sound reasons for the domination of models rooted
in continuous time. Firstly, in most economic
models there is no natural time unit inside which

individuals make their decisions and take their
actions. Secondly, even if there were natural time
units there is no guarantee that it corresponds to the
monthly, quarterly or annual data normally
available to empiricists. Thirdly, inferences about
an underlying stochastic process based on interval
or point sampled data may be misleading if the
assumption of discrete time invoked is incorrect.
Fourthly, continuous models are invariant to the
time units used. This is not the case for discrete
models. Other reasons are also invariably cited
such as the fact that continuous time is simpler
mathematically and more elegant with some also
arguing that it is much more efficient to undertake
theoretical thinking in a continuous time context.

The foregoing provides some justification for an
emphasis on continuous time in this literature.
More recently, however, there have been some
developments in this area of applied econometrics
techniques, which facilitate the use of discrete time
models. Jenkins (1995), using a sample of lone
mothers on Income Support (IS), suggests a useful
“trick” for the estimation of certain duration
models using a regression model for a binary
dependent variable (e.g. either a logit or a probit).1
The implementation of the “trick” requires a re-
organization of the data away from the individual

    The „trick‟ has its origins in the work of a quantitative sociologist Allison (1982).

as the unit of observation to the spell at risk of
event occurrence.

The Kaplan-Meier hazard function provides a basis
for explaining the nature of the „trick‟. We can
take an even simpler example than the earlier one.
Assume 10 employees observed for a maximum of
three years have the following turnover pattern in a
certain firm:

Table 2: Employer Change for 10 Employees

Year         Number           Risk Set
1             3               10
2             1                7
3             2                6
Greater       4
than 3

These data could be converted to separate
observations for each year that each person was
observed. Thus, those who changed employer in
the first year contributed one person year each,
those who changed employer in the second year
contributed two person years and those that
changed in the third year contributed the maximum
three person years. The final four cases that are

still present after the third year comprise the set of
censored observations. The total sample size can
be obtained by summing the risk set. In this case
this yields a sample size of 23. We now construct a
dummy variable yit =1 if the individual exited the
state and zero otherwise. This variable would be
constructed as follows:

y     =     [(0,0,0,0,0,0,0,1,1,1),     (0,0,0,0,0,0,1),

The first set of binary numbers in brackets refers to
the person-specific contributions for year one, the
second set year two and the third set for year three.
If we construct three dummies D1 = 1 for year one,
D2=1 for year two and D3 = 1 for year three, we
could specify the following linear probability
model (LPM):

yit = 1D1 + 2D2 + 3D3 + ui

and estimate by OLS we get the following

yit = 0.3D1 + 0.14D2 + 0.33D3 + error

These estimates are estimates of the Kaplan-Meier
hazard rates for year one, two and three
respectively. Although we are well aware of the
limitations of the LPM, it provides a useful way of

providing Kaplan-Meier hazard rate estimates.
This could be developed a little further by
introducing covariates and use a more appropriate
model given the discrete nature of the data. One
possibility is to use a logistic function for the
hazard, which also has the advantage of being a
non-proportional hazard. A logistic form for the
hazard could be formulated as

it(x,t) =     exp[k1(x)  k2(t)]
             1  exp[k1(x)  k2(t)]

where it = probability of individual i exiting in an
interval period t conditional on having survived to
period t, where k1(x) = Xit, and where k2(t)
denotes the baseline hazard as introduced earlier.
This baseline hazard can be specified by a set of
binary/dummy variables for each time period as we
did for the LPM above. This provides another
advantage to the discrete-time in that the estimates
for the baseline hazards are derived directly as part
of the estimation procedure. The covariates are
allowed to vary with time, which provides another
advantage over the continuous approach. This type
of hazard model is known as a semi-parametric
model. It is also possible to estimate the above
model using a probit specification.

There are clearly a number of advantages to this
approach. Much economic data are reported at

discrete intervals (e.g., weeks or months) and this
approach commends itself for use in such
circumstances, though it is recognised that the
decisions of agents may not be made in such
discrete intervals. The approach allows the
introduction of time-varying covariates and thus
reduces the need to develop application specific
likelihood functions for continuous time models
that may be extremely complicated in structure.
The approach also allows scope for the
development of a very flexible non-parametric
base-line hazard ultimately determined by the data.

This approach is computationally straightforward
and provides a reasonably good approximation to
continuous-time duration models but it is not free
of criticism. The first relates to the inflation of the
sample size through the re-organization of the data.
However, the estimates obtained are still maximum
likelihood estimates and possess the asymptotic
properties of this type of estimator. The second
relates to the treatment of the repeated observations
as if they were independent of each other. This is
clearly not the case and correlation across
observations is likely to introduce some degree of
inefficiency in the estimates and a potential
downward bias in the sampling variance. A
potential solution to this problem is the
introduction of an error term of the type introduced
to model neglected heterogeneity in duration

models or random effects in panel models. Logit
and probit models are amenable to panel estimation
and a random effects model could be used for this
purpose. (These types of models are discussed
further in the econometrics literature in the context
of a continuous dependent variable.) Finally, the
censoring problem is ignored in estimation. The
implication of this does depend on the scale of the
censoring problem and this may or may or may not
be a non-trivial issue and is application dependent.


To top