VIEWS: 7 PAGES: 42 POSTED ON: 9/26/2011
Survival, Duration, and Hazard Models and Their Applications in Finance and Economics Zongwu Cai University of North Carolina at Charlotte Shanghai Jiaotong University, China 1. Introduction The use of duration (or survival) models is relatively recent in economics although they have been extensively used in engineering and biomedical research for many years. Survival analysis, the term used within the biomedical tradition, is concerned with a group of individuals for whom a point event of some kind is defined. This point event is often referred to as a failure. The event of failure occurs after a length of time called the failure time and can occur at most once for any individual or phenomenon under scrutiny. Survival analysis can also apply to other phenomenon and need not simply be concerned with individuals or groups of individuals. Examples of failure times include the life-time of 1 machine components in engineering and electronics, the time taken by subjects to complete a psychological experiment, the survival time of patients in a clinical trial, the time to failure of a business, the duration of an industrial strike, or the duration of a period of unemployment experienced by an individual. Given there is only a single response, although there may be many explanatory variables, survival analysis is best understood as a univariate rather than as multivariate phenomenon. Let me show you some applications in economics. Duration analysis is a core subject of econometrics. Since 1980s, the empirical analysis of duration variables has become widespread. There are a number of distinct reasons for this development. First of all, many types of behaviour over time tend increasingly to be regarded as movements at random intervals from one state to another. Examples include movements by individuals between the labour market states of employment, unemployment and non-participation, and movements between different types of marital status. This development reflects the fact that dynamic aspects of economic behaviour have become more important in economic theories and that in these theories the arrival of new information (and thus the change in behaviour in response to this) 2 occurs at random intervals. Secondly, longitudinal data covering more than just one spell per respondent are widely available in labour economics, as well as in demography and medical science. Applications of duration analysis include: In labour economics, study the duration of unemployment and the duration of jobs [see e.g., the survey by Devine and Kiefer (1991)], strike durations [e.g., Kennan (1985)], and the duration of training programs [Bonnal, Fougere and Serandon (1997)]. In business economics, duration models have been used to study the duration until a major investment [e.g., Nilsen and Schiantarelli (1998)]. In population economics, duration analysis has been applied to study marriage durations [Lillard (1993)], the duration until the birth of a child [Heckman and Walker (1990)], and the duration until death. In econometric analyses dealing with selective observation, duration models have been used to study the duration of panel survey participation [e.g., Van den Berg and Lindeboom (1998)]. 3 In marketing, duration models have been used to study household purchase timing [e.g., Vilcassim and Jain (1991)]. In consumer economics, it is to study the duration until purchase of a durable or storable product [Antonides (1988), Boizot, Robin and Visser (1997)]. In migration economics, study the duration until return migration [e.g., Lindstrom (1996)]. In macro economics, study the duration of business cycles [e.g., Diebold and Rudebusch (1990)]. In finance, study the duration between stock- market share transactions [Engle and Russell (1998)]. In political economics, study the duration of wars [see Horvath (1968)]. In industrial organization, study the duration of a patent [Pakes and Schankerman (1984)]. In international trade market, it is used to model the duration of time between order submission and finding a match for trade execution [Melvin and Wen, 2003]. 4 There are two main components for survival analysis: “failure time” and “censoring time”. 2. Defining Failure Times There are three pre-conditions required to determine failure times precisely. Firstly, a time origin must be unambiguously defined. Although a precise definition is required for the time origin, this need not be represented by the same calendar time for each individual. Most clinical trials, for instance, have staggered entry, and entry into unemployment, for instance, is also staggered by its nature. There may be cases, of course, where mass redundancies due to plant closure in a given area lead to the time origin having the same calendar date for a large proportion of the sample of unemployed in a given region. Secondly, a scale for measuring the passage of time must be agreed. The scale for measuring the passage of time is usually “clock” or real time but this can vary depending on the application. For instance, in engineering the operating time of a system to a fault, the mileage of car to a breakdown, or the cumulative load to a collapse provide a set of alternative measurement scales. In terms of economic applications, however, the scale is always real time with real time measured on 5 person-specific clocks that are set to zero at the moment the person enters the state in question. Thirdly, the meaning of failure must be unambiguous and understood. The meaning and interpretation of the point event must be transparent. In medical work, failure could mean death from a specific cause (e.g., lung cancer) but what if death is from a source unrelated to cancer? In modelling unemployment durations we must be clear what constitutes failure. For instance, an individual may exit unemployment through undertaking training, gaining employment, retiring from the labour market, or dying. We have to be clear which exits we wish to include in our analysis to ensure that the modelling is meaningful. 3. Censoring The major problem with analysing survival data is that the data are invariably censored in one way or another. The common cause in economic applications is that the measurement is undertaken when the process of interest is still ongoing. Thus, if we obtain sample spells of unemployment drawn from surveys, these will include some individuals who are unemployed at the time of the survey. If the survey was undertaken at time period ci, for these individuals 6 duration or survival is to time ci but is not equal to it. The data are thus censored as those individuals who are unemployed at this time could continue in unemployment for considerably longer than ci. Estimation must take account of this form of censorship. The consequences of ignoring data censored in this way are analogous to the censorship problems discussed in the regression analysis in either Econometrics course or Statistics course. Survival and Hazard Analysis Accounts for censoring: Data that are censored at time t (can be right and/or left censored) t1 1 t2 2 t3 3 Observations t4 4 t5 5 t6 6 a b Suppose the figure represents observations on patients selected to participate in a therapy study. 7 • Observation 1 is not observed, since it does not fall within the time period of observation (missing). • Observation 2 is left and right censored because it is not known when driving began and the first accident is not observed in the a to b time interval. • Observation 3 is complete with both start and ending times in the observed period. • Observations 4 and 6 are left censored • Observation 5 is right censored. Right censoring is easier to accommodate analytically than left censoring in classical duration/hazard models. We can be slightly more formal than this and introduce some notation. In the absence of censoring, the ith individual in a sample of n has failure time denoted by Ti, a random variable. We assume that there is a period of observation (ci ) such that the observation on that individual ceases at ci if failure has not occurred by then. The observations consist of Xi = min[Ti, ci ] together with the indicator variable yi = 1 if Ti ci (the set of uncensored observations) and yi = 0 if Ti > ci (the set of censored observations). The ci of individuals who are observed to fail (i.e., yi = 1) are referred to as unrealised censoring times whilst the ci of individuals who are 8 observed not to fail (i.e. yi = 0) are referred to as realised censoring times. 4. The Hazard Function 4.1 Model Econometricians use the term spell length to describe time occupancy or duration of a given state. The spell length is usually represented, as above, by a random variable, which is denoted by T. T is assumed to be a continuous random variable and we assume a large population of people enter some given state at a time identified by T = 0. As stated above in Section 2, the calendar time of state entry need not be the same for all individuals. T is thus the duration of stay in the state. The population is assumed to be homogeneous implying that everyone‟s duration of stay will be a realisation of a random variable from the same probability distribution. If we define the probability that a person who has occupied a state for a time t leaves it in the short interval of length t after t as: P[t T t + t T t] [1] 9 The conditioning event (T t) in [1] is the event that the state is still occupied at t. In other words, the conditioning event is that the individual has not left the state before time t. If we divide [1] by t we obtain the average probability of leaving per unit of time period over a short interval after t. If we take this average over shorter and shorter intervals we can formally define: prob[ t T t t T t ] ( t ) lim t 0 t [2] as the hazard function. It is the instantaneous rate of leaving per unit of time period at t. The interpretation of (t)t (sometimes written as (t)dt) is the probability of exit from a given state in the short interval of time t after t, conditional on the state being occupied at time t. It is also possible to specify the probability unconditionally (i.e., without the condition T t). This is a considerably different concept from the hazard. For instance, in the context of mortality data, the hazard function gives the probability that a fifty-year old person will die whereas the unconditional concept gives the probability that a person will die at fifty. In terms of relative frequencies (50)t gives the proportion of fifty year olds who die within a small interval (t) of their fiftieth birthday. The unconditional concept 10 gives the proportion of people (ever born) who die within a small interval (t) of their fiftieth birthday. These concepts are clearly different. It will prove convenient to express the hazard function and the unconditional probability of exit in terms of distribution and density functions of a continuous random variable T. Define P[T < t] = F(t) and note trivially that P[T t] = 1 – F(t). One minus the distribution function is an expression that re-occurs in applications involving duration or survival data. It is known as the survivor function since it gives the probability of survival to time t. In terms of frequencies, it gives the proportion of a given population who stay (or survive) at least t years. Because of its special significance in these applications it has its own special notation: 1 – F(t) = F (t) Now recall that the derivative of the cumulative distribution function is the density function: F(t)/t = f(t) and recall also that the conditional probability may be written as f(xy)=f(x,y) /f(y). We can use this last result to write an expression for the conditional probability [1] as prob[ t T t t , T t ] [3] prob[t T t t T t] prob[T t] 11 The numerator of [3] is the joint probability that someone leaves a given state in the time interval specified and that the duration is given by T t. We should note that there is an intersection of the two sets {t T t + t} and {T t} here. The latter is subsumed in the former. This allows us to re-write [3] as: prob[ t T t t] prob[t T t t T t] prob[T t] [4] In terms of the distribution function [4] can be re- expressed as: F(t t) F(t) prob[t T t t T t] 1 F(t) [5] If we divide expression [5] through by t and take the limit as t approaches zero we obtain: prob[ t T t t T t] F(t t) F(t) 1 ( t ) lim t 0 t lim t 0 t 1 F(t) [6] The final part of expression [6], excluding the reciprocal of the survivor function, is the derivative of the distribution function with respect to t, which, of course, is the density function. It should be noted that F(t+t) – F(t) = F(t) and recall F(t)/t = f(t). This allows us to write [6] as: 12 f(t) f(t) ( t ) 1 F(t) F(t) [7] This provides a more compact expression for the hazard function. This expression also allows us to see the difference between the unconditional exit probability which is the area under the probability density function of T from t to t + t [i.e., f(t)t]. This is different (except at the extreme case t=0 where F =1) from the conditional exit probability, which is given by f(t)t . Note also that expression F(t) [7] is similar to the expression for the lower truncated or conditional probability density function. For this reason, [7] is sometimes erroneously referred to as a conditional probability density function. It is not since it is a function of t defined over the whole non-negative axis and not just a truncated part of it. Prior to proceeding further it is worth noting that (t) can be expressed in an alternative way. The log of the survivor function can be written as: log[1–F(t)]. What is dlog[1 F(t)] ? If we code 1–F(t) = z, this can dt be obtained using the chain rule as: dlog[1 F(t)] f(t) dt = dlog[ z] dz = dz dt 1 z [–f(t)] = 1 F(t) ( t ) . This can be expressed as: ( t ) dlog[1 F(t)]/dt dlog[F(t)]/dt [8] or dlog[1 F(t)]/dt= - (t) [8] 13 4.2 An example of using simulated data: A. Data structure: Account Number Age Utilization Credit Score Lien Position Rate Difference Event 435430 44 86.9% 684 1 0.75 0 435430 45 86.1% 684 1 0.50 0 435430 57 92.1% 704 1 0.25 0 435430 58 94.0% 721 1 0.25 0 435430 59 94.7% 721 1 0.25 0 451620 63 0.0% 641 2 1.25 0 451620 64 0.0% 641 2 1.25 0 451620 65 0.0% 641 2 1.75 0 451620 70 0.0% 649 2 1.50 0 451620 71 0.0% 647 2 1.75 0 451620 72 0.0% 647 2 1.25 2 452750 49 0.0% 620 2 0.75 0 452750 50 13.4% 620 2 1.00 0 452750 63 96.1% 644 2 1.50 0 452750 64 95.8% 644 2 1.50 1 B. Description of Variables: • Account Number • Age: Account age in months. • Utilization: ratio of account balance over account limit. • Credit Score: A score to represent customer’s credit history. • Lien Position: Denote the lien position of the bank on the collateral (first mortgage or second mortgage). • Rate Difference: Difference between customer’s APR and prevail interest rate in percentage. 14 • Event: 1 denotes default, 2 denotes prepay, 0 denotes revolving. C. Model: • Due to the flexibility, we use Cox’s semi- parametric proportional hazards model to fit the data. • We take a simple competing risk method to fit prepay and default hazards separately. When fitting prepay, the rest is regarded as censoring. When fitting default, the rest is regarded as censoring. • We use PROC PHREG in SAS to fit the model. Of course, other statistical package is applicable too such as R. In R, the command is “Surve”, “coxph”, and “survreg”. 15 D. SAS Output: • Prepay Parameter Standard Hazard Variable DF Estimate Error Chi-Square Pr > ChiSq Ratio UTILIZATION 1 0.36496 0.09489 14.7926 0.0001 1.44 RATE_DIFF 1 0.16309 0.07333 4.9463 0.0261 1.177 • Default Parameter Standard Hazard Variable DF Estimate Error Chi-Square Pr > ChiSq Ratio CREDIT_SCORE 1 -0.02248 0.00436 26.6486 <.0001 0.978 E. Interpretation: • Prepay – The higher the utilization, the higher the financing leverage, more likely to prepay. – The bigger the rate difference, the lower the prevailing interest rate, the bigger the incentive to prepay. • Default – The higher the credit score, the better the customer’s credit, less likely to default. 16 5 Modelling the Hazard Rate and Duration Dependence Plotting hazard functions can provide some useful insights into how the exit probability is behaving with state duration. The nature of the relationship between the hazard rate and the duration is known as duration dependence. For instance, it is possible to have (a) /t = 0. In this case, the hazard rate may be constant suggesting that the instantaneous rate of exit is invariant to spell duration. (b) /t > 0 suggesting positive duration dependence where the instantaneous rate of exit increases with spell duration (e.g., employment/ job durations may provide an example of this – the longer you have been in a given job the greater the likelihood of a quit) (c) /t < 0 suggesting negative duration dependence where the instantaneous rate of exit decreases with spell duration (e.g., in labour economics, the exit rate from unemployment is seen to decline with duration). There are a number of special distributions that are useful for modelling duration data. The use of 17 specific distributions to model time to failure assumes what is called a parametric approach. We now turn to examine three popular distributions in failure time analysis. 5.1 The Exponential Distribution We could focus on (a) above in the first instance and specify a model incorporating a process that has no memory. Thus, the conditional probability of failure in a given short interval is the same regardless of when the observation is made. The cumulative distribution function for the exponential distribution is given by: F(t) = 1 – exp[–t] for t 0 and is a positive parameter. The corresponding density function is given by: dF(t) dt = exp[–t] = f(t) The hazard function is thus expressed as: f(t) f(t) θexp[-θt] ( t ) 1 F(t) F(t) = exp[ θt] [9] The hazard is a constant and thus independent of time. This distribution has been used to model the time until failure of electronic components primarily because of the memoryless property of 18 the distribution. It is completely described by one parameter (). Each unique value of determines a different exponential distribution thus implying the existence of a family of exponential distributions. The distribution is skewed to the right and the values of the random variable, T, can vary from 0 to . However, given that the distribution has only one adjustable parameter, methods based on it are very sensitive to modest departures in the tail of this distribution. This fact, and the inherent constancy of the hazard, has encouraged applied economists to look at alternative distributions. 5.2 The Weibull Distribution In contrast to the hazard function for the exponential distribution which is invariant to the duration, the Weibull distribution is monotonically increasing or decreasing in duration depending on certain parameter values. This is clearly less restrictive. The survivor function can be written as follows: F (t) = exp[–(t)] [10] and the cumulative distribution function is given by F(t) = 1 – exp[–(t)] [11] 19 The derivative of expression [11] with respect to t yields the density function which, in this case, is f(t) = t-1exp[–(t)] [12] and the hazard function can be written as: f(t)/ F (t) = (t) = t-1 [13] This is the Weibull family of distributions and has been the most extensively used in applied econometric duration studies. The popularity of this family of distributions is attributable to the simplicity of the expressions [10] to [13]. In contrast to the exponential case, the hazard is not constant and can either rise or fall with duration. The parameter is known as the scale parameter and the index parameter. The behaviour of the hazard depends on the parameter . The following should be noted from expression [13]: If >1 this implies /t > 0 and suggests and increasing hazard rate and positive duration dependence. If = 1 this implies /t = 0 and suggests a constant hazard rate and no duration dependence. 20 if < 1 this implies /t < 0 and suggests and decreasing hazard rate and negative duration dependence. Its clear from the above that use of the exponential distribution could be problematic if the duration data are characterized by either positive or negative dependence. This is one reason why the Weibull is more popular among economists. However, its limitation is that it only allows for increasing or decreasing hazards and not combinations of both. In other words the hazard is monotonic in t. It is possible that the data are not consistent with such monotonicity. 6 Maximum Likelihood Estimation The parameters and of the Weibull distribution can be estimated by maximum likelihood procedures. The likelihood functions bear an uncanny resemblance to the functions for the standard censored Tobit likelihood function reported in lecture five. The likelihood function in this case could be defined as: = f(t ,) F (t ,) [14] Uncensored Censored 21 For some distributions it can be more tractable to formulate the likelihood function in terms of the hazard function so: = (t ,) F (t ,) [15] Uncensored Censored The likelihood functions can be expressed as logs and the usual algorithms can be invoked for the estimation of its parameters. The inverse of the information matrix can be used to compute the asymptotic variance-covariance matrix for the parameter estimates. 7 Exogenous Variables and Duration Analysis A limitation of the duration models outlined so far is that there has been no role for external factors in the survival distribution. There are measured differences in individuals or firms that may influence their survival chances in any given application. Regressors can be introduced relatively easily into the duration models encountered so far. These regressors are called “covariates” and can either be time-invariant covariates or time-varying covariates. For instance, gender and race are time invariant covariates but the age of an individual or company or the state of 22 the economy (i.e., the unemployment rate) are time-varying covariates. The use of time-varying covariates, however, poses problems for some duration models. The specification of the likelihood function is considerably more complex and is not discussed here. However, a simple approach is suggested below that reduces the complexity of the problem. The extension to the hazard specification is relatively straightforward for time-invariant covariates. For instance, recall the hazard function for the Weibull model: f(t)/ F (t) = (t) = t-1 In this case the covariates are introduced as a function of where = exp[xi] and the xi includes a constant term and a set of regressors that are assumed not to change from T=0 to the failure time T=t. These models can sometimes be cast as “accelerated failure time” (AFT) models. The effect of the covariates in accelerated time models is to re-scale time. In other words, here a covariate accelerates (or decelerates) the time to failure. This is in contrast to a hazard model (e.g., a proportional hazards model) where the role of the covariate is to change the hazard rate. 23 The advantage of the AFT model is that it has a linear regression model interpretation if =1 (i.e., the exponential case) of the follow form: ln(Ti) = –xi + vi [16] where vi is a random error that follows some continuous distribution and Ti is time. The distribution of the random term does not involve either x or and the regression model is thus homoscedastic. On the other hand, if 1 (e.g., the Weibull case), then the regression model is non-linear and interpreting the estimated effects for the covariate on time to failure is more difficult. The Weibull specification (with the exponential as the special case when =1) is the only member that belongs to both the AFT family of models and the proportional hazards family of models. We now turn to the proportional hazards model. 8 Proportional Hazards Model If we introduce a covariates‟ function, then hazard function is expressed as: (x;t) =k1(x)k2(t) [17] where k1 and k2 are the same functions for all individuals. The baseline hazard is common to all 24 units in the population and does not vary across individuals. Individual hazards differ proportionately based on the realizations of the covariates. Note that the proportional hazard model does not contain a constant term as the baseline hazard represents an individual specific constant. The model is called a proportional hazard model because for two individuals with regressor realizations x1 and x2, the hazards for two individuals are in the same ratio k1(x1)/k1(x2) for all t. The proportionate effect of x on the hazard is the same for all dates. Thus, if being over fifty years of age lowers the probability of exit from unemployment on the first day by two percent, it lowers the probability of leaving on the hundredth day by the same amount. This is a relatively restrictive form and it should be noted, in passing, that there is no obvious reason why hazards should be proportional with duration data drawn from economic applications. Given a proportional hazard of the form expressed in [17], Cox (1972) suggested a partial maximum likelihood method for its estimation. The conditioning approach adopted effectively sweeps out the baseline hazard function in an analogous way to how fixed effects are removed in panel models. The advantage of this approach is that it 25 provides maximum likelihood estimates of the vector of covariates without specifying the baseline hazard. This is generally known as Cox‟s proportional hazards model. This model has some inherent disadvantages in the sense that we may be interested in knowing the nature of duration dependence in a particular application and since the baseline hazard is not specified, this is difficult to infer. This is one reason why a more general Weibull model is usually preferred in many duration applications. Note that there are some extensions in literature to relax the Cox model. For example, the time- varying coefficient Cox model had been considered by Cai and Sun (2003). 9 The Weibull Hazard Model This turns out to be one of the more popular parametric duration models in applied econometrics. It provides estimates for the baseline hazard and the covariate vector. The estimates for the baseline hazard provide information on the nature of duration dependence and so this may be important from a policy perspective. Sometimes these models are referred to as non-stationary since they allow for duration dependence. Under the assumption of a Weibull specification, the baseline hazard is expressed as: 26 k2(t) =t-1 We now need to assume some functional form for k1(xi) and the most tractable generally is: k1(xi)= exp(xi) Lancaster (1979) modelled unemployment durations for a sample of the unemployed stock drawn at a date during 1973 and interviewed some five weeks later using exactly this type of specification. The Weibull hazard is: i(x;t) = exp(xi) t-1 [18] The choice of the functional form for the covariates (k1(x)) is one of the most commonly used and this is because it renders the hazard a log-linear function of the covariates. This facilitates a cleaner interpretation of the covariate effects. Lancaster‟s study used this type of model for a sample of male unemployed. He was concerned with testing whether =1. The estimate for of 0.7 suggested a decreasing hazard rate with time. Lancaster, however, reported that his estimated increased as additional covariates were added to the specification. This suggests that the decreasing 27 hazard implied by =0.7 may not be indicative of true duration dependence but to omitted variables. This problem is known as heterogeneity in duration models and failure to account for it in one way or another creates a problem generally described as “neglected heterogeneity” (see next section). In contrast to the AFT model, the effect of the covariates is on the hazard not on the time to failure. However, we can use the Weibull hazard function estimates to inform of the effect on duration, as many investigators are more interested in the effects of a change in the covariate on the expected (or average) duration associated with a particular state. In the Weibull hazard model, although the derivations are somewhat convoluted, it can be shown that expected (or average) duration can be expressed as: ' x E(T) = 1 exp [19] where (·) is the Gamma CDF operator. Note that (2) = 1. If we take logarithms we get: ln(E(T)) = ' x + constant if 1 28 an alternative way of expressing this is in terms of log duration for the ith observation as: ln(Ti) = –xi + ui [20] where ui is an error term and is the duration dependence parameter from the Weibull hazard model. In this case the expected value of the error term ui is non-zero since it depends on the Gamma operator. However, it is obviously independent of xi. If we want to get the effect of the covariate on the expected log duration, we take the derivative as follows: ln(E(T)) x = – [21] This makes intuitive sense for a number of reasons. Firstly, if the covariate x has a positive (negative) effect on the exit hazard (i.e., the instantaneous exit rate), then the covariate should exert a negative (positive) effect on the expected log duration in the state in question. For instance, a high level of education should increase the instantaneous exit rate from unemployment and thus reduce the expected unemployment duration. Secondly, the scaling of the covariate‟s coefficient in this case also makes sense. If < 1 ( > 1), we have negative (positive) duration dependence, and the effect of a small change in the covariate on the 29 expected duration is thus higher (lower) relative to the no duration dependence case. It is also evident that if there is no duration dependence ( =1), a conventional log-linear interpretation of the estimated effect is warranted. This is the case of an exponential distribution in failure times. Note that if =1, (2) = 1 and E(T) = exp(' x) and thus ln(E(T)) = ' x Thus, in the exponential case: ln(E(T)) x = 10 Neglected Heterogeneity The problem of heterogeneity can be viewed as the result of an incomplete specification. The inclusion of individual specific covariates is designed to incorporate observation specific effects. If the model specification is incomplete, and if systematic individual differences in the distribution remain after the observed effects have been controlled for, then inferences based on a mis- specified model may be incorrect. This was the problem Lancaster faced when trying to draw inferences about the presence of duration 30 dependence. This problem is akin to specification analysis in the linear regression model. Duration analysis can be extended to handle neglected heterogeneity. Lancaster proposed an alternative specification for the hazard function as: i(x;t) = ii(x;t) [22] where i is an unobservable random variable independently and identically distributed as Gamma(1, 2). This random variable may be regarded as a proxy for all unobservable exogenous variables. This is the mechanism through which unobservables are captured and is analogous to the inclusion of an error term in a regression model. After some extensive algebra, which we do not pursue, the hazard function can be written as (suppressing the i subscripts): *(x;t) = (t)[1 – F*( x;t)]2 [23] Where F*(x;t) is the expected value of the cumulative distribution function conditional on v. This used to be a relatively common approach to modelling neglected heterogeneity. Because [1 – F*(x;t)]2 is a decreasing function of t, [23] demonstrates that heterogeneity introduces a tendency for a decreasing hazard rate. Estimation 31 of this model is readily done through the use of maximum likelihood techniques and involves the estimation of an additional parameter 2. It should be noted that if 2 = 0 (i.e., the variance of vi is zero) there is no heterogeneity present in the data and the model collapses to *(t) = (t) or the standard Weibull model in this particular case. In the Lancaster study of unemployment duration, the maximum likelihood estimate of rises from 0.7 to 0.9 confirming that the negative duration dependence in the Weibull model is more attributable to neglected heterogeneity than to pure duration dependence effects. The intuition for this is obvious. The more mobile and employable individuals are more likely to be the first to leave unemployment leaving the least mobile and less employable behind hence creating the illusion of a stronger negative duration dependence than actually exists. It can be mathematically shown that uncontrolled unobservables bias the estimated hazards towards negative duration dependence. Failure to control for observables has similar effects but the direction of the bias can‟t be known a priori. All of the foregoing has important implications for policy. However, the approach has had its critics. In particular, it has been argued that the approach tends to over-parameterize the survival distribution and leads to serious errors in inference. In 32 particular, Heckman and Singer (1984, Journal of Econometrics) provide estimates that vary dramatically depending on the functional form specified and the mixing distribution used for the neglected heterogeneity. In addition, the choice of distribution for heterogeneity (for instance, the Gamma distribution in our case above) is not motivated by regard to any obvious economic consideration but more by mathematical convenience 11 Discrete-Time Duration Models 11.1 Kaplan–Meier Product Limit Estimator It may sometimes be convenient to graph the survivor function associated with particular dataset. A simple non-parametric procedure could be used to do just that. The Kaplan–Meier survivor function provides one such procedure and this can be expressed as: k ni hi F (Tk) = i 1 ni [24] where: k is the number of distinct survival times; Tk is the risk set at the kth survival time; nk is the size of the risk set at time k; and hk is the number of observation spells completed at time k. The computation of this measure can be illustrated using an example in the following table: 33 Table 1: Employer Change for 200 Employees Year Number Risk Set Changing Employer 1 11 200 2 25 189 3 10 164 4 13 154 5 12 141 Greater 0 than 5 The risk set at the start of the time is 200. Therefore, n1= 200. In the first year 11 individuals leave their employer implying h1 = 11; At the start of the first year: F (T1) = 200 11 0.945 200 At the start of the second year, the risk set is 200 – 11 = 189 = n2 and h2 = 25. Therefore: 189 25 F (T2) 0.945 = 0.819 189 At the start of the third year: n3 = 164 and h3 = 10. Therefore: 164 10 F (T3) 0.945 0.819 164 = 0.7268 At the start of the fourth year: n4 = 154 and h3 = 13. Therefore: 34 F(T4) 0.945 0.819 0.7268 154 13 = 0.515 154 At the start of the fifth year: n5 = 141 and h3 = 12. Therefore: F (T5) 0.945 0.819 0.7268 0.515 141 12 = 0.471 141 In this case the survivor function declines steadily over the five years of interest. After the fifth year the data points could be viewed as being censored. It is worth noting that the formula for the survivor function is not affected if hk = 0 (i.e., when there are no exits from the state in question). These estimates for the survivor function obtained above can be shown to be maximum likelihood estimates. As we have already seen, however, economists tend to be more interested in hazard functions than survivor functions. We could use the above data to compute hazard rates using the Kaplan–Meier hazard formula. This is defined as: hk (Tk) = nk [25] This is a non-parametric empirical approach that imposes no restrictions on the data in the way that the parametric (i.e., weibull or exponential) models encountered earlier do. Given the data above, the hazard rates for the five spells at risk could be computed as follows: 35 11 25 10 (T1) = 200 = 0.055; (T2) = 189 = 0.132; (T3) = 164 = 0.061; 13 12 (T4) = 154 = 0.084; (T5) = 141 = 0.085; The interpretation of the first hazard rate estimate is that there is a 5.5% chance of exiting the state in the first year. The interpretation of the second hazard estimate is that conditional on surviving to the second year, there is a 13.2% chance of exiting the state. These are clearly useful estimates to have at ones disposal. However, the approach does not allow for the introduction of any covariates. In the Kaplan–Meier procedure the unit of observation for the empirical analysis is no longer the individual but the spell at risk of the event occurring. This point will be developed a little further in an attempt to introduce covariates. 11.2 Using a Logistic Model for Discrete Duration Modelling Duration or failure time analysis is usually situated in a continuous time framework and this approach has dominated the discussion thus far. There are sound reasons for the domination of models rooted in continuous time. Firstly, in most economic models there is no natural time unit inside which 36 individuals make their decisions and take their actions. Secondly, even if there were natural time units there is no guarantee that it corresponds to the monthly, quarterly or annual data normally available to empiricists. Thirdly, inferences about an underlying stochastic process based on interval or point sampled data may be misleading if the assumption of discrete time invoked is incorrect. Fourthly, continuous models are invariant to the time units used. This is not the case for discrete models. Other reasons are also invariably cited such as the fact that continuous time is simpler mathematically and more elegant with some also arguing that it is much more efficient to undertake theoretical thinking in a continuous time context. The foregoing provides some justification for an emphasis on continuous time in this literature. More recently, however, there have been some developments in this area of applied econometrics techniques, which facilitate the use of discrete time models. Jenkins (1995), using a sample of lone mothers on Income Support (IS), suggests a useful “trick” for the estimation of certain duration models using a regression model for a binary dependent variable (e.g. either a logit or a probit).1 The implementation of the “trick” requires a re- organization of the data away from the individual 1 The „trick‟ has its origins in the work of a quantitative sociologist Allison (1982). 37 as the unit of observation to the spell at risk of event occurrence. The Kaplan-Meier hazard function provides a basis for explaining the nature of the „trick‟. We can take an even simpler example than the earlier one. Assume 10 employees observed for a maximum of three years have the following turnover pattern in a certain firm: Table 2: Employer Change for 10 Employees Year Number Risk Set Changing Employer 1 3 10 2 1 7 3 2 6 Greater 4 than 3 These data could be converted to separate observations for each year that each person was observed. Thus, those who changed employer in the first year contributed one person year each, those who changed employer in the second year contributed two person years and those that changed in the third year contributed the maximum three person years. The final four cases that are 38 still present after the third year comprise the set of censored observations. The total sample size can be obtained by summing the risk set. In this case this yields a sample size of 23. We now construct a dummy variable yit =1 if the individual exited the state and zero otherwise. This variable would be constructed as follows: y = [(0,0,0,0,0,0,0,1,1,1), (0,0,0,0,0,0,1), (0,0,0,0,1,1)] The first set of binary numbers in brackets refers to the person-specific contributions for year one, the second set year two and the third set for year three. If we construct three dummies D1 = 1 for year one, D2=1 for year two and D3 = 1 for year three, we could specify the following linear probability model (LPM): yit = 1D1 + 2D2 + 3D3 + ui and estimate by OLS we get the following estimates: yit = 0.3D1 + 0.14D2 + 0.33D3 + error These estimates are estimates of the Kaplan-Meier hazard rates for year one, two and three respectively. Although we are well aware of the limitations of the LPM, it provides a useful way of 39 providing Kaplan-Meier hazard rate estimates. This could be developed a little further by introducing covariates and use a more appropriate model given the discrete nature of the data. One possibility is to use a logistic function for the hazard, which also has the advantage of being a non-proportional hazard. A logistic form for the hazard could be formulated as it(x,t) = exp[k1(x) k2(t)] 1 exp[k1(x) k2(t)] where it = probability of individual i exiting in an interval period t conditional on having survived to period t, where k1(x) = Xit, and where k2(t) denotes the baseline hazard as introduced earlier. This baseline hazard can be specified by a set of binary/dummy variables for each time period as we did for the LPM above. This provides another advantage to the discrete-time in that the estimates for the baseline hazards are derived directly as part of the estimation procedure. The covariates are allowed to vary with time, which provides another advantage over the continuous approach. This type of hazard model is known as a semi-parametric model. It is also possible to estimate the above model using a probit specification. There are clearly a number of advantages to this approach. Much economic data are reported at 40 discrete intervals (e.g., weeks or months) and this approach commends itself for use in such circumstances, though it is recognised that the decisions of agents may not be made in such discrete intervals. The approach allows the introduction of time-varying covariates and thus reduces the need to develop application specific likelihood functions for continuous time models that may be extremely complicated in structure. The approach also allows scope for the development of a very flexible non-parametric base-line hazard ultimately determined by the data. This approach is computationally straightforward and provides a reasonably good approximation to continuous-time duration models but it is not free of criticism. The first relates to the inflation of the sample size through the re-organization of the data. However, the estimates obtained are still maximum likelihood estimates and possess the asymptotic properties of this type of estimator. The second relates to the treatment of the repeated observations as if they were independent of each other. This is clearly not the case and correlation across observations is likely to introduce some degree of inefficiency in the estimates and a potential downward bias in the sampling variance. A potential solution to this problem is the introduction of an error term of the type introduced to model neglected heterogeneity in duration 41 models or random effects in panel models. Logit and probit models are amenable to panel estimation and a random effects model could be used for this purpose. (These types of models are discussed further in the econometrics literature in the context of a continuous dependent variable.) Finally, the censoring problem is ignored in estimation. The implication of this does depend on the scale of the censoring problem and this may or may or may not be a non-trivial issue and is application dependent. 42