Bayesian Inference of Credit Risk Models
Xianghua Liu First Draft, August 2008 October 20, 2008
Abstract Quantitative models have been applied in credit risk measurement for financial asset pricing and banking regulation requirement. The implementation of quantitative credit risk models is facing two major problems in practice: data insufficiency and model risk. Data problem arises when there is no sufficient information to reach a reliable estimate, and model problem arises when we are not confident to choose the “best” model among many competitors. We claim that these problems are likely to be reduced by using a Bayesian approach to combine information across different models and data sets. This approach is actually applicable to the validation of quantitative models in all social science, which commonly encounters data and model problems. Markov Chain Monte Carlo (MCMC) and importance sampling algorithms are also developed to obtain estimates in all the credit risk models discussed in this paper. Key Words: credit risk, probability of default, Bayesian inference, MCMC, importance sampling
1
1
Introduction
Credit risk (or default risk) is the uncertainty of a party’s ability to service its obligation in a financial contract. While the concept of “credit risk” is as old as banking itself, credit risk emerged as a significant risk management issue during the 1990s. A series of financial crisis in the 1990s and more recent subprime crisis evidence the importance of credit risk measurement and management in the financial industry. As the introduction of financial mathematics to this area, practitioners have become more and more relied on quantitative model to measure credit risk and price the securities subject to default such as corporate bonds, mortgage-backed securities and complex credit derivatives. Duffie and Singleton (2003) and Lando (2004) provide comprehensive treatments of the theoretical and practical foundation of credit risk modeling. Measurement of credit risk is also an inevitable task for banking regulation since it directly determines the requirement of risk capital. In June 2004, the Basel Committee on Banking Supervision issued the long-awaited International Convergence of Capital Measurement and Capital Standards: a Revised Framework. The key element of this so called Basel II Accord compared to the 1988 Basel Accord is greater reliance on banks’ own internal credit risk measures including PD (Probability of Default), LGD (Loss Given Default) and EAD (Exposure At Default) in calculation of regulatory capital requirement. This regulatory change furthermore stimulates the quantitative research in credit risk. Although there is not an unanimous theoretical framework to model credit risk of the financial contracts, researchers and practitioners have made extensive efforts in exploring the mechanism underlying the default behavior of obligors. A large number of statistical models have been developed in academia and applied in various forms in industry. However, it has been widely disclosed by practitioners and in a series of studies conducted by the Basel Committee that validating quantitative models for credit risk measures is facing large difficulties in practice, which can be summarized in two aspects: • Data insufficiency: Banks and researchers have reported data limitation to be a key impediment to the calibration and validation of credit risk models. Default events are naturally infrequent. Therefore it is often not likely to observe enough historical defaults for the obligers with short credit history or low default probabilities to make a reliable statistical inference. • Model risk: There are a variety of competing credit risk models in the market and no simple guideline available tells practioners which one is appropriate. Although model validation is required, it is often hard to claim that one model is consistently superior to the other or there even exists a “correct” model to choose. Empirical results have shown that existing models often produce different results and none of them have perfect prediction power in all cases. To tackle these problems, practioners often have to make unrealistic assumption (for instance, 2
obligors with the same external rating have the same PD) or heavily rely on subjective judgement (for instance, adjusting the estimates from quantitative models based on personal experience or opinion) in practice. Some researchers have made efforts to solve these problems from a statistical perspective. Pluto and Tasche (2005) propose a “most-prudent” estimation methodology in which a wide confidence interval or the upper bound is obtained to replace a point estimate of the PD. Forrest (2005) also prefer to estimate a confidence interval for PD and the interval is computed by using the log-likelihood ratio. Shuermann and Hanson (2005) use the bootstrap method to deal with the small sample size and obtain a tighter confidence interval. Admittedly it is not unusual to doubt the accuracy of quantitative analysis due to the imperfectness of model and data in social science. It seems impossible to “solve” these problems in practice. Therefore, this paper is aimed to provide a insightful starting point, instead of final solution, to alleviating the data and model problems in estimating the PD. The idea was motivated by a statement in the original Basel II Accord:“All banks using the IRB (Internal-Rating Based) approach must estimate a PD for each internal borrower grade. These internal estimates must incorporate all relevant and available data and methods.” No historical default data does not mean no data at all. A specific model is not reliable does not mean no model is reliable at all. It is reasonable to believe that the problems of data insufficiency and model risk can be reduced by incorporating all relevant and available information from different data sets and even across different credit risk models. The question is: How to incorporate those information? Although the Basel II does not directly suggest the Bayesian approach, Bayesian theory is very suitable for the task in the statement. Incorporating the information from different sources to yield a “better” estimate is not a new issue in statistics. Various estimation methods with different names such as mixed estimation, combined estimation, pooled estimation, weighted estimation, shrinkage, model averaging, meta analysis, transferability theory, credibility theory, etc. have been available for a long time in the classical approach, but almost all of them can be treated as special cases of the Bayesian inference. Compared to the classical approach, Bayesian decision theory is very suitable to deal with imperfect information and inevitable uncertainty in statistical inference. Bayesian approach also provides a scientific way to update and combine information extracted from different sources and based on a set of competing models, even incorporating the human judgement inherently. This is not the first literature in the application of Bayesian inference in credit risk models. It has been realized by some researchers and practitioners that Bayesian inference should be very useful in credit risk model validation. However, most of the existing literature focus on assigning priors or adding parameter uncertainty in a particular credit risk model. McNeil and Wendin (2007) apply a generalized linear mixed model to credit risk and estimate the model using a Gibbs sampler. Gossl (2005) jointly estimates the PD and default correlation for a credit risk portfolio in a Bayesian approach. Kiefer (2006) asserts how to incorporate expert opinion as priors for lowdefault portfolios with little historical default information. By using the Bayesian inference, Kadam 3
and Lenk (2007) allow issuer heterogeneity in credit rating migration despite of data sparsity. However, as far as I know, there is no literature discussing how to incorporate relevant information from different types of data sets and across different structural or reduced-form credit risk models using the Bayesian inference. This paper is trying to answer such questions as: • If we have a PD estimate based on sparse historical default data, but we have sufficient data of equity prices to yield a PD estimate based on the Merton model, how can the Bayesian method help us incorporate equity price information to the historical default estimate? • If we obtain a PD estimate using observed corporate bond prices of the obligor based on a reduced-form model, find the other PD estimate using observed stock prices based on a structural model and these two estimates are quite different, how we can apply the Bayesian method to mix these estimates instead of simply dropping one of them? Not only Bayesian inference potentially provides a scientific way to synthesize various credit risk models and incorporate difference sources of information to improve the estimation of PD, Bayesian inference also has a lot of advantages in dealing with complicated credit risk models. For example, Bayesian estimators share the efficiency of likelihood-based methods; MCMC algorithms provide an integrated framework to estimate latent variables as well as parameters; Interval estimates, hypothesis testing and model comparison are more easily interpreted in the Bayesian framework. Admittedly, the Bayesian analogue of the PD estimation discussed in the following is more time consuming and, certainly, more computationally intensive. Continuous advances in computing power, however, act to mitigate these drawbacks. The paper is structured as follows: Section two introduce several types of credit risk models that are widely used in the industry, and the Bayesian inference is conducted and associated MCMC algorithms are introduced or developed in the section. The following section explores how these estimates based on different models and information can be synthesized in the Bayesian approach. The importance sampling algorithm is used as a major tool to combine these PD estimates. At the end of the section, a simple formula is developed based on approximation and hopefully useful in practice. Section four conducts empirical analysis using real data. The last section concludes.
2
Models and Bayesian Inference
In this section, four classes of credit risk models are explored and corresponding Bayesian inference is conducted. The credit risk models are classified according to different information they utilize to estimate the probability of default (PD) of a specific debt obligor. Before we proceed to a specific credit risk model, we define the default of the obligor i, during
4
a time period, as a binary variable Di : Di = 1 if obligor i defaults 0 otherwise
which is often assumed to follow a Bernoulli distribution Di ∼ Ber(P Di ) where P Di = P rob{Di = 1} = 1 − P rob{Di = 0} is the default probability of obligor i. We start with a “naive” model for estimating the PDs. In the “naive” model, we are assumed to be able to observe a random sample {Dil }Ni for the random variable Di , and the maximum l=1 likelihood estimator for the Bernoulli parameter is simply P ˆ i = D
Ni l=1 Dil Ni
. But this “naive” model
is not easy to be implemented in practice. First, we need a large sample size Ni to obtain a reliable estimate of P Di . However, default is naturally infrequent. Hence it is difficult to collect a sufficiently long default history for an obligor with a relatively low default probability. For example, suppose that an obliger has a true PD of 0.5%, which means we are likely to observe only one default among its 200 debt issues. For most obligers in practice we might only observe several debt issues in history and none of them ever defaulted. The estimate would be 0 based on the “naive” model, which obviously does not make sense. So the “naive” model tends to underestimate the default probability of an obligor with low default probability and short debt history. Moreover, the assumption of this “naive model” is not realistic. How could the default probability of an obligor be time-invariant? But, if P Di is indexed by time, we even need a large sample for this obligor during a short time period. The relaxation of the model assumption leads to more serious problems in data insufficiency and model implementation.
2.1
Rating Migration Model
Despite of its limitation, the methodology of the “naive” model has actually been frequently used in practical PD estimation. The limitation is generally reduced by using credit ratings. Obligors are grouped into different credit ratings, either assigned by external rating agencies such as Moody’s and S&P or an internal rating system, and the obligors with the same credit ratings are assumed to be identical in terms of PD. When the number of obligors in a group is large enough, data insufficiency problem in the “naive” model is dramatically reduced. When we are able to observe a number of defaults within a group of obligors, the sample size Ni is likely to be large enough to Ni D ensure us that the maximum likelihood estimator P ˆ i = l=1 il is close to the true PD, compared D
Ni
with the “naive” model. We call the credit risk models using this approach rating migration models. Rating migration models and their maximum likelihood estimation are extensively explained by Lando and Skodeberg (2002). Suppose that we are interested in estimating the PD of an obligor rated i (for example, 5
Table 1: One-year Credit Rating Transition Matrix 05/2004-05/2005 (%) AAA AAA AA A BBB BB B CCC CC 90.00 0 0 0 0 0 0 0 AA 10.00 87.95 0 0.16 0 0 0 0 A 0 7.23 92.49 1.44 0 0.50 0 14.29 BBB 0 1.20 4.92 92.47 4.85 0.50 0 0 BB 0 0 0 3.85 87.44 8.73 1.75 0 B 0 0 0 0.32 4.41 72.57 14.04 28.57 CCC 0 0 0 0 0.22 3.24 56.14 0 CC 0 0 0 0 0 0.50 0 0 Default 0 0 0 0 0 1.50 19.30 14.29 NA 0 3.62 2.59 1.76 3.08 12.47 8.77 42.86
AAA) at time t. Totally Ni obligors are rated i a certain time period, say 1 year, ago. Then we count how many of them default (downgraded to rating D) by the end of the time period, say NiD =
Ni l=1 Dil .
The average annual PD of the obligors with the rating i is computed by simply NiD Pˆ i = D . Ni (2.1)
taking the ratio
The average PDs for all ratings can be computed by the same way. Finally we can obtain a transition matrix with elements P Dij representing the transition probability from the rating i to the rating j over one year. The PDs are a part of the transition matrix. We use S&P historical credit rating data to compute the one-year credit rating transition matrix from 05/2004 to 05/2005, which is presented in Table 1. Note that the problem of insufficient data is just reduced, but still remains to some extent in this rating migration model. For instance, only low ratings such as B , CCC and CC have positive PD estimates, while the PDs of the ratings from AAA to BB are estimated to be zero. Those estimates do not reflect the true default likelihood of the obligors with high ratings because we all understand the possibilities that AAA or BB rated obligors default in one-year are definitely not zero. These biased estimates should be due to data insufficiency. Moreover, we can see that the probability that CCC rating is downgraded to CC is even lower than that of downgrading by two grades to D, which does also not make sense. Although we report significant PDs for low ratings, those estimates are not reliable because of either the small sample sizes or the low default probability. Data problem is more severe to the CC rating: Almost half of the CC-rated obligors quit S&P rating system in one year; the probability that the CC rating keep unchanged is zero while the probability that the CC rating are upgraded to A is even 14%, etc.. We conclude that the rating migration model still suffers from data problem, and yields rough estimates of PD that should be used with high caution. This generic rating migration model has been extended to incorporate more complicated model 6
structure or more exogenous variables to model the time-varying PDs. The matrix in Table 1 is often called unconditional transition matrix because the exogenous information about the macroeconomic condition is not considered. Nickell et.al. (2000) report that credit rating transition is sensitive to busyness cycles. Blume et.al. (1998) and Nickell et.al. (2000) use the ordered probit model to estimate the conditional transition matrix of credit ratings. Bayesian inference for the generic rating migration model is straightforward. Suppose that we are interested in estimating the PD of an obligor with credit rating i. We observe that there are Ni obligors with credit rating i at the beginning of the observation period. At the end of the observation period, say 1 year, NiD (=
Ni l=1 Dil )
of them default and the rest did not. Since
we assume that the default of the obligors follows a Bernoulli Ber(P Di ) distribution, the joint distribution of observed defaults, or the likelihood, is proportional to
Ni Ni
p(Dil |P Di ) =
l=1 l=1
D P Di il (1 − P Di )1−Dil
Ni l=1
= P Di
Dil
(1 − P Di )Ni −
Ni l=1
Dil
N = P Di iD (1 − P Di )Ni −NiD
If a noninformative prior is used, the posterior density function of P Di is
N p(P Di |data) ∼ P Di iD (1 − P Di )Ni −NiD ,
(2.2)
which is a Beta(NiD + 1, Ni − NiD + 1) distribution. Note that the posterior mean of P Di with a noninformative prior is
NiD +1 Ni +2 ,
which is approximately equal to the maximum likelihood estimator
when the sample sizes Ni and NiD are large. Furthermore, if a conjugate Beta prior Beta(a0 , b0 ) is used, the posterior density function of P D is
N p(P Di |data) ∼ P Di iD +a0 −1 (1 − P Di )Ni −NiD +b0 −1 ,
(2.3)
which is a Beta(NiD + a0 , Ni − NiD + b0 ) distribution. Later we will introduce an expert opinion represented by a Beta distribution as a conjugate prior and feed it to the rating-based model. The sampling from a Beta distribution is straightforward and MCMC algorithms are not needed.
2.2
Credit Scoring Model
The most important limitation of rating-based models is the assumption that a group of heteroskedastic obligors have identical default probabilities. Credit scoring models, initiated by Altman (1968), employ the obligor-specific information to estimate the PDs of heteroskedastic obligors. In credit scoring models, the PD of an obligor depends on a set of obligor characteristics, for example, financial ratios. A renowned example about the financial ratios used to predict the default of a firm is Altman’s Z score that combines five financial ratios: working capital to total assets, retained 7
earning to total assets, EBIT to total assets, market value of equity to book value of total debt, and sales to total assets. Other financial ratios and firm-specific characteristics that have been investigated in literature include pretax interest coverage, operating income to sales, long-term debt to assets, firm size, beta, etc.. There are different ways to model the dependence between the defaults and explanatory variables. Altman (1968) uses the linear discriminate analysis. Shumway (2001) proposes the proportional hazard regression. Many other statistical and data mining techniques such as neutral network, support vector machine, etc. are also used by practioners. The most popular and easily interpreted credit scoring model is the probit (or logit) model. In a probit credit scoring model, the default of a firm i still follows a Bernoulli distribution, while the PD depends on a linear combination of k explanatory variables X = (x1 , x2 , · · · , xk ), instead of being a constant in rating-based models, as P rob{Di = 1} = Φ(Xi β), (2.4)
where β = (β0 , β1 , β2 , · · · , βk ) and Φ(·) is the cumulative distribution function of a standard normal random variable. The classical approach to estimate probit/logit models has been available, before Albert and Chib (1993) propose a Gibbs sampler for categorical models using data augmentation. Chen (2008) uses the Metropolis-Hastings algorithm to directly draw from the posterior distributions of the parameters. Albert and Chib (1993) re-write the probit model using some latent variables as
∗ Di = βXi + εi ,
εi ∼ N (0, 1)
(2.5) (2.6)
Di =
∗ 1 if Di ≤ 0
0 otherwise
∗ where the latent variables Di are continuous variables instead of binary variables as Di . To obtain ∗ a maximum likelihood estimator of β, we have to integrate latent variables Di out of the likelihood. ∗ However, under the Gibbs sampler, Di can be sampled as well as the parameters β.
To implement the Gibbs sampler, we have to specify the full conditional posterior distributions
∗ ∗ ∗ ∗ of β and Di . Given all latent variables D∗ = (D1 , D2 , · · · , DN ) and data D = (D1 , D2 , · · · , DN ) ,
the conditional posterior distribution of β is straightforward from a linear regression model: ˆ p(β|D∗ , D) ∼ N (β, V ), (2.7)
ˆ where β = (X X)−1 X D∗ and V = (X X)−1 are just the OLS estimates when the priors are noninformative.
∗ It can be shown that Di follows a truncated normal distribution given β and Di : ∗ ∗ ∗ Di |Di = 1, β = Di |Di ≤ 0, β ∼ N (Xi β, 1)I[0, ∞) ∗ ∗ ∗ Di |Di = 0, β = Di |Di < 0, β ∼ N (Xi β, 1)I(−∞, 0).
(2.8)
8
Sampling from a truncated normal distribution is not trivial. Different MCMC algorithms are available such as accept-reject and Metropolis-Hastings algorithms. An efficient algorithm using inverse CDF transformation was developed by Devroye(1986), which is based on a simple fact that
∗ the CDF of a continuous random variable has a uniform [0, 1] distribution. Define εi = Di − Xi β,
which has an truncated N (0, 1) distribution and its CDF must follow a U [0, 1] distribution: F (εi |Di = 1, β) = F (εi |Di = 0, β) = Φ(εi ) − Φ(−Xi β) ∼ U [0, 1] 1 − Φ(−Xi β) Φ(εi ) ∼ U [0, 1]. Φ(−Xi β) (2.9)
Specifically, the Gibbs sampler using data augmentation and inverse CDF transformation is carried out in the following steps: Step 1: Choose an initial value for β (0) . Step 2: Draw ui from the uniform [0,1] distribution. Generate εi as εi = Φ−1 (ui (1 − Φ(−Xi β)) + Φ(−Xi β)) if Di = 1 Φ−1 (ui Φ(−Xi β)) if Di = 0
∗ and the latent variable Di = Xi β + εi for i = 1, 2, · · · , N .
Step 3: Draw β (1) from N ((X X)−1 X D∗ , (X X)−1 ). Step 4: Repeat Step 2 through Step 3 for M iterations.
The Gibbs sampler will yield random draws from the posterior distribution of β. Note that PD is not directly a parameter in the credit scoring model. Instead, P Di is a function of the parameter β and the explanatory variable Xi as P Di = Φ(Xi β). The random draws of P Di of our interest can be obtained by directly computing Φ(Xi β) using the MCMC draws from the joint posterior distribution of β and the values of Xi .
2.3
Credit Spread Model
Both rating migration models and credit scoring models use rating and historical default information to estimate the default probability. Basel II Accord says: “An external rating can be the primary factor determining an internal rating, but the bank must consider other relevant information.” We already see that both models suffer data insufficiency a lot. because credit ratings and financial 9
statement information are measured infrequently. Both approaches are criticized to only produce static and backward-looking PD estimates. Is there any other source of information to compute the PDs? The answer is yes. Sometimes we can even observe the PDs on the financial markets when the associated credit risk is directly “traded” on public markets. For example, investors can purchase a Credit Default Swap (CDS) contract from a counterparty on purpose of being protected against certain credit risk. The buyer of the CDS makes the periodic payments to the seller in exchange for a compensation from the seller when a credit event occurs. The periodic payment from the buyer is just the price of the CDS, represented in a percentage of the notional amount of the CDS. It is easy to see that the price of the CDS is just the PD of the buyer that the market realizes. If we can observe the time series of the CDS prices, we have a very good estimate of the PD. However, CDS contracts are infrequently traded. Thus it is not easy to obtain qualified sequences of CDS prices. Hull and White (2000) develop a pricing model for CDS using the credit spreads between zero-coupon corporate bonds and Treasury bonds. In practice, it is generally easier to observe frequent time series of corporate bond prices (or equivalently credit spreads) than those of CDS prices. In the following, we discuss how we can extract implied PDs from observed corporate bond prices or yields. Note that the credit spread model and the asset value model we discussed later are continuoustime models and use frequent observations of market prices such as bond prices and stock prices. These market prices are capable of producing dynamic and forward-looking PD estimates, while both rating migration models and credit scoring models yield backward-looking PD estimates because they utilize historical defaults to predict the future possibility of defaults. The credit spread model introduced in the following is based on Duffie and Singleton (1997) and Lando (1998). First we consider a zero-coupon Treasury bond without default risk. The instantaneous risk-free rate is denoted rt and the maturity is T. Then the time t price of the Treasury bond with 1 dollar face value is
Q Bt = Et exp − T
rs ds
t
where Q denotes risk-neutral probability measure. Next we consider a zero-coupon corporate bond, which possibly defaults at time τ prior to the maturity T . This default time τ is a positive random variable and its distribution is described in terms of a hazard function h as
t
P rob{τ > t} = exp −
0
hs ds .
(2.10)
Suppose that the recovery rate is zero when default occurs prior to maturity. Then the time t value
10
of this zero-coupon corporate bond is
Q Vt = Et exp − Q = Et exp − Q = Et exp − Q = Et exp − T
rs ds 1τ >T
t T
(2.11)
rs ds ET (1τ >T )
t T T
rs ds exp −
t T t
hs ds
(rs + hs )ds .
t
It is not realistic to assume that the recovery rate is zero in practice. Assuming a fixed loss rate LGD, we can obtain a more general pricing formula for a defaultable zero-coupon corporate bond:
Q Vt = Et exp − T
(rs + LGD · hs )ds .
t
(2.12)
Note that LGD cannot be identified with ht together. The common way to solve this identification problem is to estimate LGD separately and replace it with the estimate. Finally we can compute the n-year PD of the bond issuer at time t as
t+n
P rob{t < τ < t + n} = exp −
t
hs ds .
(2.13)
The pricing formula in Equation(2.12) states that the interest rate of a defaultable bond include two components: the risk-free rate that compensates investors for time values, and the credit spread that compensates investors for taking credit risk. However, this results from an oversimplified theoretical model. Actually we already know that spreads between corporate bonds and Treasury bonds include other components such as liquidity premium and tax effect. The so called “credit spread puzzle” indicates that observed spreads on corporate bonds do not match the default probabilities of corporate bonds in practice. For example, the average spread on BBBrated corporate bonds is about 170 basis points in 1997-2003, while the average PD of those bonds is only 20 basis points during the same period. Elton et al (2001) disclose that the default component accounts for a surprisingly small fraction of the spreads and state tax treatments explain a substantial portion of the difference. Huang and Huang (2003) report the similar findings. Based on these empirical evidences, researchers believe that credit spread models are suitable for corporate bond pricing rather than PD estimation. This is also the reason why there is few commercial application of credit spread models in the banking risk management. The instantaneous risk-free rate rt and default intensity ht can be furthermore assumed to follow some diffusion process as we do in the pricing model of Treasury bonds. Popular choices include the Vasicek model and CIR model. The limitation of the Vasicek model is that the possibility that rt or ht becomes negative is nonzero, which is not realistic. The CIR model guarantee positive
11
values but has a nonnormal transition distribution. Duffee (1999) adopts the CIR model for both rt and ht as √ drt = κ1 (θ1 − rt )dt + σ1 rt dW1t dht = κ2 (θ2 − ht )dt + σ2 ht dW2t . (2.14)
The yield to maturity of a zero-coupon bond is yt = −logPt /(T − t). Assuming rt and ht follow the independent CIR process, yt has a closed form expression as yt = −(A1 + A2 ) + B1 rt + B2 ht where Aj = Bj = γj = for j = 1, 2. In practice we are only able to observe the bond yields, and both rt and ht are unobserved
f latent variables and must be estimated using the data. Suppose that we observe the yields yt for
(2.15)
2κj θj 1 · 2 log T −t σj
2γj exp[(κj + λj + γj )(T − t)/2] (κj + λj + γj )[exp(γj (T − t) − 1) + 2γj
;
2 exp(γj (T − t) − 1) 1 · T − t (κj + λj + γj ) exp(γj (T − t) − 1) + 2γj
2 (κj + λj )2 + 2σj
zero-coupon Treasury bonds and yt for zero-coupon corporate bonds. When measurement errors are introduced, we come up with two sets of state-space models:
f yt
drt
= B1 rt − A1 + σe1 ε1t √ = κ1 (θ1 − rt )dt + σ1 rt dW1t ,
(2.16) (2.17) (2.18)
and
f yt − yt
= B2 ht − A2 + σe2 ε2t ht dW2t .
(2.19)
dht = κ2 (θ2 − ht )dt + σ2
Both state space models is nonlinear and nonnormal. Duffee (1999) follows Chen and Scott (1993) to use the extended Kalman filter to estimate these nonlinear state-space models. Bayesian inference has been shown to be powerful to deal with nonlinear and nonnormal state-space models. Single-move and multi-move Gibbs sampler for nonlinear and nonnormal state space models have been developed by Carlin, Polson and Stoffer (1992) and Carter and Kohn (1994). We use the single-move Gibbs sampler with a Metropolis-Hastings step in each Gibbs step to estimate these state-space models. 12
Bayesian inference in the state space model is aimed at deriving the joint posterior distribution of parameters θ = (κ1 , κ2 , θ1 , θ2 ), λ1 , λ2 , σ1 , σ2 , σe1 , σe2 and latent variables X = (r1 , · · · , rn , h1 , · · · , hn )
f f given the data Y = (y1 , · · · , yn , y1 , · · · , yn ), which can be obtained by the Bayes rule as
p(θ, X|Y) ∝ p(Y|X, θ)p(X, θ) ∝ p(Y|X, θ)p(X|θ)p(θ)
(2.20)
where p(Y|X, θ) is determined by the measurement equations, p(X|θ) is determined by the state equations, and p(θ) is the priors of parameters. Generally there is no analytical expression for this posterior density if the state space model is nonlinear and nonnormal. MCMC algorithms including Gibbs sampler and Metropolis-Hastings algorithm can be used to draw random samples from this joint posterior distribution. The details of MCMC algorithms are presented in Appendix A.
2.4
Asset Value Model
Credit spread models are typical reduced-form credit risk models. There is no economic theory telling us what triggers the default. Structural credit risk models contain richer economic reasoning to explain the default behavior of an obligor. Merton (1974) is the benchmark for all structural credit risk models in that they can be treated as its extensions. We call this class of credit risk models the asset value models because the default probabilities are implicit in the movements of the market values of the assets. Following Merton (1974), we assume that a firm, which is the debt issuer, has a simple capital structure: a single homogeneous class of debt and the residual claim (equity). At time t, the market values of its asset, liability(bond) and equity(stock) are denoted Vt , Bt and St . The following accounting equation must hold: Vt = Bt + St . (2.21)
Next we assume that the market value of the assets follows a geometric Brownian motion as: dVt = µVt dt + σVt dWt , (2.22)
where µVt is the physical drift, σVt is the physical diffusion and Wt is a standard Brownian motion under the physical probability space (Ω, F, P). If the maturity of the debt is T , the asset value at T will be 1 VT = Vt exp[(µ − σ 2 )(T − t) + σ(WT − Wt )] 2 which follows a lognormal distribution as: 1 log(VT ) − log(Vt )|Ft ∼ N ((µ − σ 2 )(T − t), σ(T − t)) 2 At the same time, the only liability of the firm is assumed to be a zero-coupon bond with face value F with a constant risk-free interest rate r. 13
Then the default motivation is modeled. Merton assumes that, acting in the best interests of stockholders at the maturity, the firm must either pay the promised payment F to the bondholders when the residual claim is positive, i.e. VT − F ≥ 0, or choose default on debt and liquid its asset to repay the bondholders when its asset cannot even cover the promised payment of debt, i.e. VT − F ≥ 0. Then at the maturity the value of the bond is BT = min(F, VT ) = F − max(0, F − VT ). At the same time, the value of equity at the maturity is ST = VT − min(F, VT ) = max(VT − F, 0), (2.24) (2.23)
which exactly matches the expiration value of a European call option on a non-dividend common stock where Vt corresponds to the value of the underlying stock and F corresponds to the strike price. The Black-Scholes formula gives a solution for the market value of equity at time t as
Q St = Et exp[−r(T − t)] · max(Vt − F, 0)
= Vt · Φ(d1 ) − F · exp[−r(T − t)] · Φ(d2) where d1 = d2 log(Vt /F ) + (r + 1/2σ 2 )(T − t) √ σ T −t √ = d1 − σ T − t
(2.25)
and Φ(·) is the cumulative distribution function for a standard normal distribution. The assumption about the default boundary of the Merton model is criticized to be unrealistic. In practice, firms often default when the asset value is much lower than the total liabilities. Note that the Moody’s KMV model, a successful commercial application of the Merton model, assume a default threshold between the total current liability and total liability as F =short-term debt +1/2· long-term debt. Moreover, a number of structural credit risk models extend the Merton model by allowing more complicated assumptions about the default threshold process: Black and Cox (1996) use an exponentially increasing function for the default process as Ft = F0 exp(−γ(T − t)); Leland and Toft (1996) claim that the default threshold Ft is determined exogenously by the optimal capital structure of the firm; Collin-Dufresne and Goldstein (2001) assume a stochastic process for the default threshold as dlnFt = κl [ln(V t/F t) − ν − φ(r − θ)]dt. The Merton model was originally developed for pricing a defaultable corporate bond. We can also use it to find the implicit n−year default probability of the debt issuer, which is the probability 14
that the time t + n asset value of the firm falls below the default threshold value: P Dt = P rob{Vt+n < F |Ft } ln(Vt ) + (µ − 1/2σ 2 )(n) − lnF √ = Φ(− ). σ n (2.26)
where the fraction in the bracket is called the DD (Distance to Default) in the Moody’s KMV model, and the PD calculate using the Merton’s model is called the EDF (Expected Default Frequency), i.e. EDF = Φ(−DD). To implement the Merton model and compute the default probability, we need the asset value Vt and its drift µ and volatility σ as inputs. However, they are not observable in practice. Different methods have been proposed to find their values. Since we can often observe equity values St on the stock market for the obligor, a rough estimate of the market value of asset would be the sum of the market value of equity and the book value of liability. A technique called implicit estimation uses observable equity price and its volatility to derive implicitly the asset value and its volatility. Such calibration methods are very popular in practice. Both Duan et. al. (2002) and Ericsson and Reneby (2002) develop maximum likelihood estimators for structural credit risk models and argue that MLE has several advantages over the traditional implicit estimation method. In their MLE, the randomness of observable prices only comes from the asset value dynamic and there is no space for the measurement error of the market price of equity. We introduce measurement errors to the stock prices. Then the Merton model has a state-space model form as St = C BS (Vt , σ) + εt dVt = µVt dt + σVt dWt ,
2 in which the equity values St are observed with errors εt ∼ N (0, σe ), and C BS (Vt , σ) is the Black-
(2.27)
Scholes formula with the underlying asset value Vt and volatility σ and other parameters including the risk-free interest rate, strike price and maturity are observed or defined. Again this is a nonlinear state-space model. MCMC algorithms using a single-move Gibbs sampler are developed and presented in the Appendix B. With MCMC draws from the joint posterior distribution of all parameters and latent variables, the estimate of PD can be obtained by computing log(Vt /F ) + (µ − 1/2σ 2 )(n) √ ). σ n Note that PD in the asset value model is a function of both model parameters and a latent variable. P Dt = Φ(−
2.5
Expert Judgement and Prior Information
Finally there is a model that is extensively used in practice but very easy to be ignored by quantitative researchers. Credit risk measurement and management are not typically done in a quantitative 15
framework. Qualitative methods including subjective judgement and adjustment on the estimates based on quantitative models are very popular in practice. This can be partially due to data insufficiency and model uncertainty in validating PD models, which make the estimates not reliable. Basel II also allows both quantitative and qualitative elements in the estimation of PD, as in this statement “While credit scoring models and other mechanical procedures are permissible as primary drivers of risk assessments, sufficient human judgment is necessary to ensure that all relevant information is considered. When combining models with human judgment, the judgment must take into account all information not addressed by the model. ” Actually the subjective judgement or adjustment can be more scientifically incorporated in the statistical inference under the Bayesian approach because Bayesian inference is in essence a subjective decision theory. For Bayesians, expert opinion and other analysis take the role of priors, which can be combined with the information in the data based on the Bayes rule. Then expert judgement evolves to the elicitation of prior distribution of PD. PD is a quantity falling in an interval between 0 and 1. Therefore, a candidate for the prior distribution of PD can be a Beta distribution: p(P D) ∼ Beta(a0 , b0 ) , with mean a0 /(a0 +b0 ) and variance a0 b0 /(a0 +b0 )2 (a0 +b0 +1). Experts can express their opinion about PD by controlling the mode and dispersion of the prior Beta distribution. In the previous section, the choice of a Beta prior was justified again since The Beta distribution is the conjugate distribution for the individual likelihood with a Bernoulli kernel. After all, the models discussed above are just a small part of the available credit risk models used in academia and industry1 . However, these models can represent the majority of credit risk models. Implementation of these models in the banking industry consists of a large number of commercially available models. Table (3) lists the major commercial credit risk models used by practitioners and associated statistical techniques and data sources used to estimate the default probabilities. A survey on what models financial institutions are actually using was conducted by the Rutters Associate in 2002. Among the respondents, 80% use (internal) rating migration models, 78% use (external) rating migration models, 78% use the KMV (asset value) model, 33% use credit scoring models, and 30% use credit spread models to calibrate their credit risk measures. This survey also tells us that practioners do use more than one type of model or information to estimate PDs.
1
It is important to note that the paper only covers the PD of a single obligor and the default correlation is an
unignorable factor for estimating the PD of a credit risk portfolio.
16
Table 2: Major Commercial Credit Risk Models Model Moody’s KMV CreditMetrics Zeta (Altman’s Z-Score) Moody’s RiskCalc S&P’s CreditModel Kamakura CSFB’s CreditRisk+ Data Source Equity Values Credit Ratings Financial Ratios Financial Ratios Financial Ratios Credit Spreads Historical Defaults Statistical Method Continuous-Time Stochastic Process Discrete Markov Process Discriminant Analysis Probit/Logit Regression Neural Network or PSV Continuous-Time Stochastic Process Poisson Distribution
3
Combining PD Estimates in Bayesian Inference
The classical way to make inference about a quantity of interest can be expressed in the following way: We set up a quantitative model that contains the quantity we would like to make inference about and some observable variables. Assuming the quantitative model is correct in reflecting the relationship between those quantities and variables and ensuring the quantitative model is identified about the quantity, we collect data on the observable variables. We use some particular statistical method to make inference and claim that the estimator is consistent, which means that the estimate is close the “true value” when data is sufficient. We also conduct some hypothesis testing on the assumption or specification of the model, for example selecting from competing models based on in-sample or out-of-sample tests (model validation). If this approach was implemented to the estimation of PD, we would select a theoretical credit risk model (probably one discussed in the previous section), collect data on observable financial variables such as historical defaults, credit ratings, credit spreads, financial ratios, bond or stock prices and so on. Some estimation methods, either classical or Bayesian, are available to obtain consistent estimators for PD or related quantities. The selection of the appropriate credit risk model could be based on subjective judgement or statistical testing. However, this approach is facing problems such as data insufficiency and model uncertainty as we discussed before. We often find that it is quite difficult to find sufficient data for historical defaults, rating migrations for high-rating obligors, market prices or spreads of corporate bonds. Data insufficiency might be an important criteria for researchers to select an appropriate credit risk model. However, it has to be realized that the model selection procedure limited by data problem is likely to miss a better candidate. It has to be admitted that none of the available credit risk models is perfect. Extensive research has been conducted to compare empirically various credit risk models. The models using historical defaults or rating migrations have been criticized likely to underestimate the PDs because of data scarcity. The credit spread models tend to overestimate the PDs since the default component in 17
the credit spread is difficult to distinguish from other components such as liquidity and tax effects. Huang and Huang (2003) conclude that PD only accounts for 20%-30% of the credit spreads for the investment grade bonds. The major problem facing the credit scoring model is the selection of appropriate characteristics to detect potential defaults and the fit of the models to the data is quite unsatisfactory. The empirical results for the structural models are even mixed. The early work by Jones et. al. (1984) claims that the Merton model is not able to generate credit spreads as high as those observed spreads, while a comprehensive project by Eom et. al. (2003) reveals that most of the other structural models predict spreads that are too high on average. Therefore, it can be expected that the results will be very ambiguous if we conduct statistical model comparison on credit risk models with the severe data limitation and model uncertainty. Model validation will be difficult and unreliable in such situations. However, when a quantitative model yields unreliable results that suggest we should reject the model specification statistically, it is logically more consistent to incorporate those results in the analysis right at the beginning than to reject it afterwards. This paper gives an insight instead of a final answer to dealing with these difficulties in estimating the PD. The idea is: Why not mix the estimates if none of them can be trusted? We have a number of economic models that try to approach the “truth” about the PD of an obligor from different approaches. If all of them are correct, they should yield the “same” estimates of PD. However, we are not able to guarantee that any model is 100% correct. Actually all of them are differently somewhere around the “truth”. Moreover, data insufficiency still yields inaccurate estimates even if the model is absolutely “correct”. Our approach is based on a hypothesis that the combined results from a number of imperfect studies at least contain more information than any individual result. For example, when we find it is hard to find sufficient data on the bond prices of an obligor, why don’t we use stock prices and the asset value model to supplement the information about the PD? The next question is: How to combine? Combining information from different sources to improve the PD estimation is not new. Some commercially available credit risk models have indicated that they produce “hybrid” estimates for PD due to the unreliability of a single quantitative model. This paper employs a Bayesian approach to integrate diverse information across models. Heterogeneous data sets can be analyzed consecutively, the posterior of the previous analysis taking the role of prior in the next one. Expert knowledge can be used as prior information for the first analysis. In the last section, we studied four types of credit risk models that are summarized in the following table. Expert opinion can be easily incorporated in the rating migration model as a Beta prior. With each model Mi (i = 1, · · · , 4), we are able to find the posterior distribution of PD pi (P D|Di ) analytically or numerically. We use a subscript i to denote that the joint distribution between P D and Di is governed by different model structures Mi . To incorporate information from heterogeneous data sets, we are deriving the posterior distribution given more than one data set, 18
Table 3: Summary of Credit Risk Models Investigated Model M1 :Rating Migration M2 :Credit Scoring M3 :Credit Spread M4 :Asset Value PD a parameter a function of parameters and observed variables a latent variable a function of parameters and latent variables such as p(P D|D2 , D3 ), in the Bayesian point of view. Note that D1 that the rating migration model uses is a subset of D2 used by the credit scoring model. We do not consider a combination between these two because we need supplemental information and conditional independence between data sets. Suppose that we are interested in combining the information in D1 and D4 for estimating PD. Bayesian inference is always based on the Bayes’ formula. We can apply the Bayes’ formula to this case as: p14 (P D|D1 , D4 ) ∝ p1 (D1 |P D)p4 (P D|D4 ), (3.1) D3 :corporate bond prices D4 :stock prices Data D1 :defaults of all obligors with the same rating D2 :defaults of all obligors
assuming that D4 and D1 are conditionally independent given P D. This result shows that we can take the posterior distribution based on the rating migration model as the prior distribution in a consecutive study in which the asset value model is adopted. Generally we are not likely to find a closed-form density function for the posterior distribution such as p14 (P D|D1 , D4 ). We need to resort to a numerical solution. Prior to Bayesian combination, we have MCMC draws from p4 (P D|D4 ). If we are able to evaluate p1 (D1 |P D), we can obtain MCMC draws from the posterior distribution p14 (P D|D1 , D4 ) by MCMC methods or importance sampling algorithm. Importance sampling is originally a numerical integration method. Suppose that we are interested in evaluate the expectation of a function of a continuous random variable X as E[h(X)] = h(x)f (x)dx (3.2)
where f (x) is the density function of X. If we have a large number of independent draws X1 , · · · , Xn from its distribution, the integral can be approximately computed as 1 h(x)f (x)dx ≈ n
n
h(Xi ).
i=1
(3.3)
However, we are often not be able to draw from the distribution f (x) directly. Rather, we can draw X1 , · · · , Xn from another distribution with a density g(x) that is ideally close to f (x). The 19
expectation can be approximated as E[h(X)] = h(x) f (x) 1 g(x)dx ≈ g(x) n
n
h(Xi )
i=1
f (Xi ) , g(Xi )
(3.4)
where g(x) is called the importance function of f (x). Rubin(1987) develops a sampling algorithm to draw from the target density based on importance sampling, which is often called Sampling/Importance Resampling (SIR). The idea can be illustrated by using our case of PD estimation. Suppose that we are interested in drawing from the posterior distribution p14 (P D|D1 , D4 ) that is not feasible directly. However, we are able to draw from the “prior” p4 (P D|D4 ) and we know p14 (P D|D1 , D4 ) ∝ p1 (D1 |P D)p4 (P D|D4 ). SIR is implemented in the following steps: Step 1: Draw P Dj for j = 1, · · · , m from the “prior” p4 (P D|D4 ). Step 2: Evaluate the likelihood wj = p1 (D1 |P D) at each P Dj , and compute weights as ˜ wj = . Step 3: Resample m draws with replacement from the multinomial distribution {P Dj }m with probj=1 abilities {wj }m . j=1 The m draws we obtain are approximately from the posterior distribution p14 (P D|D1 , D4 ). Importance sampling requires that the importance density should be close to the target density, which is easy to be satisfied in our case as long as two models do not diverge too much in PD estimation. When this condition is not met, we can always increase the size of independent draws, m, to reduce the approximation error. The numerical evaluation of the likelihood function is the other problem we need to be concerned about. If the model is M1 , the likelihood p1 (D1 |P D) can be directly evaluated since P D is the only parameter in M1 . If the model is another one with complicated structure such as a state-space form, the evaluation of the likelihood is not trivial. We need to find another way. Suppose that we are interested in combining the information in both corporate bond prices and stock prices to infer PD. Again, assuming that D3 and D4 are conditionally independent, the posterior distribution is p34 (P D|D3 , D4 ) ∝ p3 (D3 |P D)p4 (P D|D4 ) p3 (P D|D3 ) p4 (P D|D4 ) ∝ p3 (P D) ∝ p3 (P D|D3 )p4 (P D|D4 ), 20 (3.5) wj ˜
m ˜i i=1 w
where p3 (P D) is the prior of P D used in the model M3 and is assumed to be uninformative. This assumption is not necessary but simplifies our inference since we can always incorporate any informative Beta prior of P D in M1 and combine it later. Now we only need to numerically evaluate p3 (P D|D3 ) to run the importance sampling algorithm. A kernel density estimator is applicable since we have MCMC draws from the distribution. If we want to incorporate more information in the inference of PD, we can continue to update the posterior distribution. For example, since we are able to draw from p34 (P D|D3 , D4 ), we are also able to draw from p134 (P D|D3 , D4 ) ∝ p1 (D1 |P D)p34 (P D|D3 , D4 ) ∝ p1 (P D|D1 )p3 (P D|D3 )p4 (P D|D4 ) using the importance sampling algorithm. This posterior combines the information about PD embedded in credit ratings, corporate bond prices and stock prices. Actually we can see that the posterior density function of quantity given multiple data sets is proportional to the multiplication of the individual posterior density function given a single data set under some mild assumptions, which can somewhat be treated as a “weighted” average of individual posteriors. This result can be further justified by the following derivation. The numerical methods to find the posterior distribution is time-demanding in practice. Hence we introduce an approximation for the computing the posterior distribution, which could be very useful when the accuracy of the estimation is often dominated by the speed. Bayesian central limit theorem tells us that any posterior distribution converges to a normal distribution when the size of the data set is large. Then we can approximate the individual posterior density function with a normal density function with the posterior mean and standard deviation that are computed from the MCMC draws. Suppose we want to combine k individual posteriors pi (P D|Di ) for i = 1, · · · , k, each of which has posterior mean µi and standard deviation σi . Then each posterior density function can be approximated as (P D − µi )2 1 exp − 2 2σi 2πσi The multiplication of normal densities still yields a normal density as pi (P D|Di ) ≈ √
k
(3.6)
(3.7)
p(P D|D1 , · · · , Dk ) ∝
i
pi (P D|Di )
(3.8)
∝ N (m, V ) where
k
m = (
i=1 k
µi 2) · V σi 1 −1 2) . σi
V
= (
i=1
21
If we use the posterior mean as a point estimate of PD in the convention of the classical inference, then PD ≈
i=1 k
wi µi ,
1 2 σi k 1 2 i=1 σi
(3.9)
where wi =
,
which indicates that the “combined” estimate of PD is approximately a weighted average of a number of individual estimates and the weights are determined by the variances of the individual estimates. It is intuitive that the individual estimate with high variance is less credible, thus has lower weight in the average.
4
Empirical Analysis
In this section, we empirically estimate the PDs for a specific obligor using real data, and compare the estimates from four credit risk models we discussed in the second section. Although there exists a large number of empirical research about different credit risk models, the empirical comparison of credit risk models in the literature is generally restricted in one class of credit risk models. Since four models use different data, a direct model comparison based on model fits is not feasible. Our way is to compare four models according to their estimates of PDs for the same obligor.
4.1
Data
Credit ratings and financial statement information are from COMPUSTAT, which is a database of financial, statistical and market information covering publicly traded companies in the US and Canada over past 20 years. It provides annual or quarterly data from financial statements and supplemental data. It also contains the monthly long-term debt ratings since 1985. S&P Long Term Domestic Issuer Credit Rating range from AAA to D. A bond with a rating BBB or above is known as an investment grade bond; one with a rating BB or lower is known as an speculation grade bond, or junk bond. In an attempt to refine these ratings further, S&P now on occasion assigns a + or − to its ratings to indicate that the bond is at the upper or lower end of the rating category. If we adopt the finer rating system, the estimates of credit rating transition probabilities should more accurately reflect the heterogeneity of obligors regarding the default probability. However, we might have to face more severe data insufficiency. Since we do not rely on the rating migration model, the more coarse rating system is adopted and + or − are dropped. Among the 9854 firms, 2032 firms have a valid rating at the beginning of the time period between May 2004 and May 2005. that we are interested in. 22
For credit scoring models, we need to specify a set of explanatory variables. In this study, we adopt the five financial ratios used in the renowned Altman’s Z score: 1. working capital/total assets 2. retained earning/total assets 3. EBIT/total assets 4. market value of equity/book value of total debt 5. sales/total assets Note that there might be more appropriate choices for financial ratios than these five. Our empirical analysis is focused on how diverse PD estimates can be combined in the Bayesian approach, instead of choosing the “best” PD model. Some firms in the Compustat data set do not have financial statement information we need. The sample size is reduced to 1972 after we drop the firms with no information or extraordinary financial ratios that might cause trouble to the classifying capability of our model. Among 1972 firms, 14 firms default by 05/2005. A company, Honeywell International Inc.(NYSE symbol: HON), is selected to conduct the empirical analysis of the asset value and credit spread model for the purpose of illustration. Corporate bond and Treasury strips data are obtained from Datastream. Honeywell issued a zero-coupon corporate bond that expires on August 1, 2009. We match the corporate bond and Treasury bond with the same maturity. Daily bond prices within one year to May 2005 are collected and bond yields are computed. There are total 73 daily yields and spreads are observed. Stock prices of Honeywell are collected from CRSP to estimate the asset value model. We compute the total market values of equity across time by using the quoted stock prices and the number of the common shares and the preferred shares that are obtained from COMPUSTAT. We need to use stock prices to estimate important parameters such as the instantaneous asset value and its drift and volatility. Since we are interested in PD at the end of May 2005, we collect the daily stock prices from 01/01/2005 to 05/31/2005 and the number of outstanding common shares and total liability at the end of 2004. The risk free rate is set to be 3-month T-bill rate at 05/31/2005 of 2.8%. The time to maturity of liability is set to be 5 years. Note that the estimation results are robust to the time to maturity. Finally credit default swap (CDS) prices are also collected from Datastream. CDS is a financial contract in which one party makes periodic payment to exchange for the protection from the other party when the reference entity defaults. The price of the CDS, represented in a percentage of the notional amount, can be treated as the investors’ expectation about the PD of the reference entity. Although the CDS market is not very liquid and the prices may be a distorted indicator of the true PD, it can be used a criterion to compare the PD estimates from different credit risk models since the model comparison criteria are not applicable to our case. 23
Table 4: Number of Defaults during 05/2004-05/2005 AAA Number of Obligors (05/2004) Defaults (05/2005) 20 0 AA 83 0 A 386 0 BBB 624 0 BB 454 0 B 401 6 CCC 57 11 CC 7 1
Table 5: Estimates of The Credit Scoring Model Coefficient constant β1 : working capital/total assets β2 : retained earning/total assets β3 : EBIT/total assets β4 : market value of equity/book value of total debt β5 : sales/total assets Posterior Mean -2.1272 -0.0180 0.0007 -0.0503 -0.0100 0.0015 Std Deviation 0.1167 0.0045 0.0017 0.0121 0.0047 0.0009
4.2
Results
First we estimate the rating migration model. The rating migration model produces the roughest estimate among all in that only the PDs for a number of ordinal ratings will be reported. Table 4 presents how many obligors default during 06/2004-05/2005. The point estimates of PD in Table 1 are computed using these numbers. One important advantage of Bayesian approach to the classical approach is that we can obtain the whole posterior distribution of PD, which follows a Beta distribution. On the purpose of comparison, we use a flat prior: a uniform [0, 1] distribution. We conclude that the posterior distribution of the PD of an AAA-rated obligor is Beta(1, 21) ∝ (1 − P D)20 , and the posterior distribution of the PD of a CCC-rated obligor is Beta(12, 47) ∝ P D11 (1 − P D)46 , and so on. Honeywell was rated A by S&P during the year. So the posterior distribution of its PD follows a Beta(1,387) distribution, which is plotted in Figure 1. We can find that even if the MLE estimate of a PD is zero, the posterior mode or mean of the PD is generally above zero. The credit scoring model using the Altman’s Z score variables yields the estimates of five coefficients for explanatory variables. We draw 5,000 samples using the Gibbs sampler from their posterior distributions. The first 2,000 are “burned”. All MCMC chains converged very quickly. The MCMC draws and posterior densities for six parameters are plotted in Figure 2 and 3. The posterior means and standard deviations are presented in Table 5. It can be seen that four of five explanatory variables are significant, and three of them are significantly negative, which is reasonable since those variables should be low when default happens. Retained earning/total assets is not quite significant and the coefficient of Sales/total assets seems to contradict our expectation. With the MCMC draws from the posterior distribution of the parameters, we can easily obtain 24
300
250
200
150
100
50
0
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
Figure 1: Posterior distribution of PD of Honeywell, Beta(1,387), using the rating migration model
25
constant -1.5 0 -0.01 -2 -0.02 -0.03 -2.5 10 5 -0.05 0 -5 -0.1 6 4 2 -2 -4 0 0 1000 2000 3000 -2 0 0 -3 x 10 1000 2000 retained earning/total assets 3000 -0.04 0 0
working capital/total assets
1000 2000 EBIT/total assets
3000
0 1000 2000 3000 -4 market value of equity/book value of total debt x 10 2 0
0 -3 x 10
1000
2000
3000
sales/total assets
1000
2000
3000
Figure 2: MCMC draws of six parameters in the credit scoring model
26
constant 4 3 2 1 0 -2.6 300 200 100 0 -5 0 5 10 market value of equity/book value of total debt -3 x 10 6000 4000 2000 0 -4 -2.4 -2.2 -2 -1.8 retained earning/total assets -1.6 0 -0.04 40 30 20 10 0 -0.1 600 400 200 0 -2 50 100
working capital/total assets
-0.03 -0.02 -0.01 EBIT/total assets
0
-0.08
-0.06 -0.04 -0.02 sales/total assets
0
-2
0 x 10
2
-4
0
2
4 x 10
6
-3
Figure 3: Posterior densities of six parameters in the credit scoring model
27
90
80
70
60
50
40
30
20
10
0 0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
Figure 4: Posterior distribution of PD of Honeywell, using the credit scoring model MCMC draws for the PD of any firm by computing the function P Di = Φ(Xi β). These MCMC draws are from the posterior distributions of the PDs. The posterior distribution of PD of Honeywell using the credit scoring model is displayed in Figure 4. Compared with the highly skewed posterior distribution in the rating migration model, the posterior distribution of Honeywell’s PD seems to be roughly normally distributed. The posterior mean and mode are also much bigger than those using the rating migration model. Using the daily stock prices of Honeywell in 2005, we estimate the asset value model. We still burn the first 2000 draws and keep 3000 draws. The traces in Figure 5 show that MCMC algorithms converge very well for three parameters and a latent variable that we are interested in. The posterior distributions of four quantities are plotted in Figure 6. Again with the MCMC draws from the posterior distribution of the parameters and the latent variable, we can obtain MCMC draws for the PD based on Equation . The posterior distribution of the PD of Honeywell is plotted in Figure 7. This distribution is even more skewed to the right than the posterior distribution using the migrating model, and also much more flatter. MCMC algorithms fail to estimate the credit spread model using the real data of Honeywell
28
µ σ 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 0 1000 σe 0.02 0.91 2000 3000 0.08 0.06 vt 0.12 0.1 0.16 0.14
0
1000
2000
3000
0.015
0.9
0.01
0.89
0.005 0.88 0 0 1000 2000 3000 0 1000 2000 3000
Figure 5: MCMC draws of three parameters and a latent variable in the asset value Model
29
σ µ* 2.5 2 1.5 1 0.5 0 -1 35 30 25 20 15 10 5 -0.5 0 σe 200 150 0.5 1 0 0.06 0.08 0.1 vt 0.12 0.14 0.16
150 100 100 50 50
0
0
0.005
0.01
0.015
0.02
0
0.88
0.9
0.92
Figure 6: Posterior densities of three parameters and a latent variable in the asset value model
30
4
3.5
3
2.5
2
1.5
1
0.5
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 7: Posterior distribution of PD of Honeywell using the asset value model
31
Bond yields 0.06 0.055 0.05 0.045 0.04 0.035 corporate bond yields Treasury strip yields
0
-3
10
20
30
40 Credit spreads
50
60
70
80
8.8 8.6 8.4 8.2 8 7.8
x 10
0
10
20
30
40
50
60
70
80
Figure 8: Corporate bond yields and credit spreads of Honeywell before May 31 2005 credit spreads before May 31 2005, though they work well in the simulation study. From the credit spreads plotted in Figure 8, we find the series do not show the mean-reverting feature of the CIR model. Although the assumption that interest rates are mean-reverting is economically sound, there is no economic reason to claim that the default intensities also follow a mean-reverting process. The selection of the CIR model is based on the need of a closed-form solution for bond prices rather than the fitness to the real data. At the same time, we also have reasons to doubt the quality of corporate bond prices and credit spreads due to the illiquidity of corporate bond markets. Either bad-specified model or bad-behaved data may cause MCMC algorithms to collapse. This empirical results show again why the credit spread models are rarely used for the purpose of PD estimation. Finally we pool three posterior distributions of the PD of Honeywell at the end of May 2005 in Figure 9. Three posterior distributions are quite different from one another and it is not appropriate to display three distributions in one figure. The vertical line in three figures represents the CDS price, 6.8 bp, of Honeywell on May 31 2009, which can be used here to evaluate how far three distributions are away from one another. From the graphs, we can observe that the posterior
32
Rating Migration Model 300 200 100 0 100
0
0.002
0.004
0.006
0.008 0.01 Credit Scoring Model
0.012
0.014
0.016
0.018
50
0 5 4 3 2 1 0
0
0.005
0.01
0.015
0.02 0.025 Asset Value Model
0.03
0.035
0.04
0
0.05
0.1
0.15
0.2
0.25
0.3
Figure 9: Comparison of three posterior distributions of Honeywell’s PD using three different credit risk models. The vertical lines are the CDS price of 0.00068 density corresponding to the rating migration model has the highest peak, above 200, among the three, and the smallest variance. The posterior mode is near to the CDS price. The posterior density corresponding to the credit scoring model has a lower peak, below 100, and is less skewed. But its posterior mode or mean, even the majority of the distribution, is well above the CDS price. We conclude that the credit scoring model tends to overestimate the PD of Honeywell compared to the other models. The posterior density corresponding to the asset value model is very flat and the peak is below 4. As a result, it has a very fat tail and large variance. This posterior distribution seems to be much less informative about the PD than the other two. This should be due to a much smaller sample size and more complicated state-space model structure used by the asset value model. Cross-sectional data are used in both the rating migration and credit scoring model and the sample size is over one thousand, while the time series data used by the asset value model only contains around one hundred daily stock prices. Moreover, the asset value model tends to underestimate the PD since the posterior mode is the nearest to zero among the three. This result coincides the finding by Jones et. al. (1984) and Eoms et. al. (2003).
33
100 90 80 70 60 50 40 30 20 Asset Value Model 10 0 Credit Scoring Model Combined
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
Figure 10: Combination of two posterior distributions of Honeywell’s PD from the credit scoring model and the asset value model Figure 10 shows how we can combine two posterior distributions of the same PD together by using the importance sampling algorithm. We treat the posterior distribution using the asset value as the prior distribution in the credit scoring model. Although the asset value model yields a flat posterior distribution, it still assign more weights (more importance) to the draws near to zero. Thus, the combined posterior density has a higher peak than the posterior density using the credit scoring model, and its peak is on the left of the peak corresponding to the credit scoring model. As a result, the combined posterior distribution is more concentrated and has smaller variance. If the credit scoring model suffers from overestimation and the asset value suffers from underestimation, the combined PD estimate is more likely to be moderate.
5
Concluding Remarks
The classical approach to apply a quantitative model to credit risk is to build a model, collect data for the model, make statistical inference including estimating the parameters and testing the validity of the model. The implementation of this approach suffers data insufficiency and model 34
uncertainty. Data insufficiency results from the infrequent nature of defaults, which often causes the sample size not to be large enough to yield a reliable estimate. Model uncertainty problem is popular in every social science discipline if there are multiple competing models available. There is no consensus about which model is the “best” among them so far. These problems limit the usefulness of quantitative models in credit risk. Practioners often need to rely on some naive assumptions or brutal subjective judgement to supplement the quantitative analysis. This paper is aimed at neither solving these problems nor proposing a “superior” model that yields better estimate or prediction than existing credit risk models. Rather, this paper is to provide an alternative to the classical approach. Instead of selecting the “best” model and sticking to it in practice, we suggest using the Bayesian approach to combine the information about default in heterogenous data sets and across various types of credit risk models. The combined estimate PD that incorporates more information than any individual credit risk models should be more efficient theoretically and more robust against data and model problems practically. This paper is far from the end of the research project since only the simplest credit risk models are investigated. To produce more accurate PD estimates, we should allow more complicated quantitative models for the practical use. The combinability between different models need to be explored case by case. We hope this paper provides an insightful starting point for this approach.
35
A
MCMC Algorithms for the Credit Spread Model
f Given that we can observe the credit spreads st = yt − yt , we try to estimate the parameters and
latent variables in the following model: st = B2 ht − A2 + σe2 εt ht = κ1 θ1 ∆ + (1 − κ1 ∆)ht−∆ + σ1 where 1 T −t 1 B2 = T −t A2 = γ2 = 2γ2 exp[(κ2 + λ2 + γ2 )(T − t)/2] (κ2 + λ2 + γ2 )[exp(γ2 (T − t) − 1) + 2γ2 2 exp(γ2 (T − t) − 1) · (κ2 + λ2 + γ2 ) exp(γ2 (T − t) − 1) + 2γ2 · 2κ2 θ2 2 log σ2 ; ht−∆ ∆ε2t , (A.1)
2 (κ2 + λ2 )2 + 2σ2
Note that we discretize the CIR diffusion process using the Euler scheme. Actually the discretization is not necessary since we are able to evaluate the transition density of the CIR process which is proportional to a non-central chi-square density. Given the model specification, the conditional posterior densities for five parameters fall in two categories: • for σe2 and λ2 , the conditional posterior
N
p(σe |·) ∝
i=1
φ(si ; si , σe2 ) · p(σe ) ˆ 2
where si = B2 hi − A2 is a nonlinear function of three parameters κ2 , λ2 , σ2 , but a linear ˆ function of the parameter θ2 , and φ(x; µ, V ) is a normal density function of random variable x with mean µ and variance V . • for κ2 , θ2 and σ2 , the conditional posterior is like
N N 2 φ(si ; si (κ2 ), σe2 ) ˆ i=1
p(κ2 |·) ∝
·
i=1
2 φ(hi ; κ2 θ2 ∆ + (1 − κ2 ∆)hi−1 , σ2 hi−1 ) · p(κ2 ).
(A.2)
Assuming noninformative priors, we use a Gibbs sampler to draw the sample of five parameters from their conditional posteriors iteratively:
2 1. Draw σe2 from
IG(
N 1 , 2 2
N
(si − si )2 )); ˆ
i=1
36
2. Draw λ2 using a Metropolis-Hastings (MH) step. The proposal, conditioned on the last draw λ2 , is a normal distribution q(λ2 |λ2 ) = N (µλ , Vλ ) where µλ = λ2 + Vλ ·
i=1 N
(si − si (λ2 ))∂ˆt (λ2 )/∂λ2 −1 ˆ s ; Vλ = 2 σe2
N i=1
[∂ˆi (λ2 )/∂λ2 ]2 s 2 σe2
3. Draw θ2 using an independent MH step with the proposal q(θ2 ) = N (µθ , Vθ ) where
N
µθ = Vθ ·
i=1
(B2 hi − yi )∂A2 /∂θ2 ) + 2 σe2 ]2 + κ2 ∆ 2 2 σ2 hi−1 i=1
N
N i=1
κ2 [hi − (1 − κ2 ∆)hi−1 ] 2 σ2 hi−1
;
Vθ−1 = N
[∂A2 /∂θ2 2 σe2
(A.3)
4. Draw κ2 using an MH step with the proposal q(κ2 |κ2 ) = N (µκ , Vκ ) where
N
µκ = κ2 + Vκ ·
i=1 N −1 Vκ = i=1
(si − si (κ2 ))∂ˆi (κ2 )/∂κ2 ˆ s + 2 σe2
N i=1
N i=1
κ2 [hi − (1 − κ2 ∆)hi−1 ] 2 σ1 hi−1
;
[∂ˆi (κ2 )/∂κ2 ]2 s + 2 σe2
κ2 ∆(hi−1 − θ2 )2 2 2 σ2 hi−1
(A.4)
2 5. Let φ2 = 1/σ2 , and draw φ2 using a MH step. The proposal is a Gamma distribution
q(φ2 |φ2 ) = Gamma where
µ2 µφ 1 N φ + , + 2 V φ Vφ 2
N i=1
[hi − (1 − κ2 ∆)hi−1 − κ2 θ2 ∆]2 hi−1 ∆
N
µφ = φ2 + Vφ ·
i=1 N −1 Vφ = i=1
(si − si (φ1 ))∂ˆi (φ2 )/∂φ2 ˆ s ; 2 σe2 (A.5)
[∂ˆi (φ2 )/∂φ2 ]2 s 2 σe2
37
6. Draw latent variables {hi }N iteratively under the single-move approach. At each move, draw i=0 hi using a MH step. The proposal is a normal distribution q(hi |hi ) = N (µh , Vh ) where µh =
−1 Vh =
(si + A2 )B2 κ2 θ2 ∆ + (1 − κ2 ∆)hi−1 (hi+1 − κ2 θ2 ∆)(1 − κ2 ∆) + + ; 2 2 2 σe2 σ2 hi−1 ∆ σ 2 hi ∆ 2 B2 1 (1 − κ2 )2 2 + 2 + σ2h 2 σe2 ∆ σ 2 hi ∆ 2 i−1
7. Repeat the above steps for a number of iterations.
B
MCMC Algorithms for the Asset Value Model
Suppose that observed data consist of a sequence of stock prices of the obligor S = {Si∆ }n within i=1 equal time interval ∆ before the maturity date T . Before we proceed, let us reparameterize the model (2.27) as St = C(vt , σ) + εt , √ vt = vt−∆ + µ∗ ∆ + σ ∆ηt , where vt = ln(Vt ), µ∗ = µ − 1/2σ 2 , ηt ∼ N (0, 1), and the function C(vt , σ) = exp(vt )Φ(d1 ) − exp[−r(T − t)]Φ(d2 ). The joint posterior density function of a set of latent variables v = {vi∆ }n and parameters i=1 Θ = (σe , µ∗ , σ) can be derived using the Bayes rule as: p(v, Θ|S) ∝ p(S|v, Θ) · p(v|Θ) · p(Θ)
n n
(B.1)
(B.2)
∝
i=1 n
p(Si∆ |vi∆ , σ, σe ) ·
i=1
p(vi∆ |v(i−1)∆ , µ, σ) · p(Θ)
n i=1
∝
i=1
(Si∆ − C(vi∆ , σ))2 1 exp − · 2 σe 2σe
(vi∆ − v(i−1)∆ − µ∆)2 1 exp − · p(Θ). σ 2σ 2 ∆
We use a Gibbs sampler to draw samples from the marginal posterior distribution of each hyperparameter. To implement the Gibbs sampler, we need to obtain fully conditional posterior density functions for all hyperparameters. For simplicity, we assume independent noninformative priors
2
as
2 −2 p(µ∗ ) ∼ c, p(σ 2 ) ∼ σ −2 , p(σe ) ∼ σe .
Their fully conditional posterior density functions are derived as
2 2 we can also use conjugate priors µ∗ ∼ N, σ 2 ∼ IG, σe ∼ IG
38
• The conditional posterior distribution of µ∗ is normal: p(µ∗ |·) ∝ p(v|µ∗ , σ) · p(µ)
n
(B.3) µ∆)2
∝
i=1
exp −
n
(vi∆ − v(i−1)∆ − 2σ 2 ∆ (vi∆ − v(i−1)∆ ),
1 ∼ N( n∆
i=1
σ2 ), n
2 and the conditional posterior distribution of σe is inverted Gamma: 2 2 p(σe |·) ∝ p(S|σe , σ, v) · p(σe ) n
(B.4) , σ)]2 }· 1 2 σe
∝
i=1
[Si∆ − C(vi∆ 1 exp{− 2 σe 2σe
n
n 1 ∼ IG( , 2 2
[Si∆ − C(vi∆ , σ)]2 ).
i=1
• The conditional posterior distribution of σ 2 is nonstandard: ˙ p(σ 2 |) ∝ p(S|v, σ) · p(v|σ) · p(σ 2 )
n
(B.5) [vi∆ − v(i−1)∆ − 1 √ exp{− 2σ 2 ∆ σ ∆ i=1
n
∝
i=1
exp{−
[Si∆ − C(vi∆ , σ)]2 } 2 2σe
µ∆]2
}
1 σ2 (B.6)
where the second part of the posterior kernel is an inverted Gamma, but the first part is not recognized as the kernel of any standard distribution. I propose a Metropolis-Hasting algorithm to draw from this nonstandard distribution. Define φ = σ 2 , then the posterior density function of φ is
n
p(φ|·) ∝
i=1
exp{−
[Si∆ − C(φ)]2 } 2 2σe
n
φ1/2 · exp{−
i=1
[vi∆ − v(i−1)∆ − µ∆]2 φ} · φ, 2∆ (B.7)
where C(φ) = exp(vt )Φ(d1 ) − exp[−r(T − t)]Φ(d2 ) is a nonlinear function of φ, since d1 = and d2 = [vi∆ + (r − [vi∆ + (r +
1 2φ )(T
√ − i∆)] φ
T − i∆)
1 2φ )(T
√ − i∆)] φ
T − i∆) 39
.
We approximate the nonlinear function using a Taylor expansion around the last draw, say φ(j−1) , as Si∆ − C(φ) ≈ Si∆ − C(φj−1 ) − (φ − φ(j−1) )Cφ (φ(j−1) ), (B.8)
where Cφ (φ(j−1) ) is the partial derivative of C with respect to φ evaluated at φ(j−1) . Then the first part of the posterior density function in (B.7) can be approximated using a normal kernel for φ with mean m and variance V where m = φ(j−1) + V · V =
n i=1 {[Si∆
− C(φ(j−1) )]Cφ (φ(j−1) )} 2 σe
(B.9)
2 σe . n 2 (φ(j−1) ) i=1 Cφ
Furthermore, we approximate this normal distribution using a Gamma distribution with the same mean and variance as N (m, V ) ≈ Gamma( m2 m , ). V V (B.10)
Since the product of two Gamma kernels is still a Gamma kernel, we finally derive a Gamma proposal distribution for the Metropolis-Hasting algorithm: q(φ|φ(j−1) ) ∼ Gamma( 1 m2 n m + , + V 2 V 2∆
n
[vi∆ − v(i−1)∆ − µ∆]2 ).
i=1
(B.11)
• The conditional posterior distribution of latent variable vi∆ i = (1, 2, · · · , n) is also nonstandard: p(vi∆ |·) ∝ p(Si∆ |vi∆ , Θ) · p(v(i+1)∆ |vi∆ , Θ) · p(v(i)∆ |v(i−1)∆ , Θ) ∝ exp{− [v(i+1)∆ − vi∆ − [Si∆ − C(vi∆ )]2 } · exp{− 2 2σe 2σ 2 ∆ 2 [vi∆ − v(i−1)∆ − µ∆] ·exp{− }. 2σ 2 ∆ µ∆]2 } (B.12)
(B.13) The last two components of the product are normal kernels for vi∆ , but the first component is nonstandard. Again, we use the Taylor expansion around the last draw vi∆ the first component as Si∆ − C(vi∆ ) ≈ Si∆ − C(vi∆ where Cv (vi∆
(j−1) (j−1) (j−1)
to approximate
) − (vi∆ − vi∆
(j−1)
)Cv (vi∆
(j−1)
),
(j−1)
(B.14) . Then
) is the partial derivative of C with respect to vi∆ evaluated at vi∆
we derive a normal proposal distribution for the Metropolis-Hasting algorithm: q(vi∆ |vi∆
(j−1)
) ∼ N (mv , Vv ),
(B.15)
40
where mv = vi∆ Vv−1 =
(j−1)
+ Vv )
[Si∆ − vi∆ 2 σ2∆
(j−1)
]Cv (vi∆
(j−1)
)
2 σe
+
v(i+1)∆ + v(i−1)∆ σ2∆
2 Cv (vi∆ 2 σe
(j−1)
+
.
Note that vn∆ and v0 have slightly different posterior densities and proposal distributions. The MCMC algorithms are implemented in the following steps: Step 1 Set the initial values µ∗(0) , σ (0) , σe
(0)
and v(0) ;
Step 2 At the j-th iteration, draw µ∗(j) from a normal distribution in (B.3); Step 3 Draw σe
2(j)
from an inverted Gamma distribution in (B.4);
Step 4 Draw φ from a proposal Gamma distribution in (B.11) and accept this draw and let φ(j) = φ with a probability ρ = min 1, q(φ(j−1) |φ ) p(φ |·) · p(φ(j−1) |·) q(φ |φ(j−1) ) φ(j) ;
, and recover the draw of σ by letting σ (j) = 1/
(j)
Step 5 Draw {vi∆ }n iteratively under the single-move approach. At each move, draw vi∆ from a i=1 normal proposal distribution in (B.15) and accept it with a probability ρ = min 1, p(vi∆ |·)
(j−1) p(vi∆ |·)
·
q(vi∆
|vi∆ ) (j−1) q(vi∆ |vi∆ )
(j−1)
;
Step 6 Repeat Step 2 through Step 5 for M iterations.
References
[1] Albert, J. and S. Chib (1993), Bayesian Analysis of Binary and Polychotomous Response Data, Journal of the American Statistical Association, 88, 669C679. [2] Altman, E. I. (1968), Financial Ratios, Discrimination Analysis and the Prediction of Corporate Bankrutcy, Journal of Finance 23, 589-609. [3] Basel Committee on Banking Supervision (2004), International Convergence of Capital Measurement and Capital Standards: A Revised Framework, Bank for International Settlements. [4] Black, F., and J. Cox (1976), Valuing Corporate Securities: Some Effects of Bond Indenture Provisions, Journal of Finance, 31, 351-367. 41
[5] Black, F. and M. Scholes (1973), The Pricing of Options and Corporate Liabilities, Journal of Political Economy 81, 637C659. [6] Carlin, B. P., N. G. Polson and D. S. Stoffer (1992), A Monte Carlo Approach to Nonnormal and Nonlinear State-Space Modeling, Journal of the American Statistical Association, 87, 493500 [7] Carter, C. K. and R. Kohn (1994), On Gibbs sampling for state space models, Biometrika, 81, 541C553. [8] Chen, R., R. and L. Scott (1993). Multi-factor Cox-Ingersoll-Ross Models of the Term Structure: Estimates and Tests from a Kalman filter model, working paper, University of Georgia [9] Collin-Dufresne, P. and R. Goldstein (2001), Do Credit Spreads Reflect Stationary Leverage Ratios, Journal of Finance 56, 1929-1957. [10] Cox, J. C., J. E. Ingersoll and S. A. Ross (1985), A Theory of the Term Structure of Interest Rates, Econometrica,53,385-407 [11] Devroye, L. (1986), Non-Uniform Random Variate Generation, Springer, New York. [12] Duan, J., G. Gauthier, J. Simonato and S. Zaanoun (2002), Maximum Likelihood Estimation of Structural Credit Spread Models - Deterministic and Stochastic Interest Rates, University of Toronto, working paper. [13] Duffee, G. (1999), Estimating the Price of Default Risk, Review of Financial Studies, 12, 197-226 [14] Duffie, D. and K. Singleton (1999), Modeling Term Structures of Defautable Bonds, Review of Financial Studies, 12, 687-720 [15] Duffie, D. and K. Singleton (2003), Credit Risk Pricing, Measurement and Management, Princeton University Press, Princeton. [16] Eom, Y. H., J. Helwege, and J. Huang (2002), Structural Models of Corporate Bond Pricing: An Empirical Analysis, Review of Financial Studies17, 499-544. [17] Ericsson, J. and J. Reneby (2002), Estimating Structural Bond Pricing Models, McGill University working paper. [18] Fruhwirth-Schnatter, S. and A. LJ. Geyer (1996), Bayesian Estimation of Econmetric MultiFactor Cox-Ingersoll-Ross Models of the Term Structure of Interest Rates Via MCMC Methods, working paper, Vienna University
42
[19] Gossl, C. (2005), Predictions Based on Certain Uncertainties - A Bayesian Credit Portfolio Approach, working paper, HypoVereinsbank AG [20] Huang J. and M. Huang (2003). How much of the corporate-treasury yield spread is due to credit risk? a new calibration approach. Working paper, Stanford University. [21] Hull, J. and A. White (2000), Valuing Credit Default Swaps I: No Counterparty Default Risk, Journal of Derivatives, 8, 29-40. [22] Jarrow, R., D. Lando, and S. Turnbull (1997), A Markov Model for the Term Structure of Credit Spreads, Review of Financial Studies, 10, 481-523 [23] Jarrow, R. and S. Turnbull (1995), Pricing Options on Financial Securities Subject to Default Risk, Journal of Finance, 50, 53-86 [24] Johannes, M and N.G. Polson (2004), MCMC methods for Financial Econometrics. Handbook of Financial Econometrics, Y. Ait-Sahalia and L. Hansen eds. [25] Jones, E., Mason, S. and E. Rosenfeld (1984), Contingent Claims Analysis of Corporate Capital Structures: An Empirical Investigation, Journal of Finance 39, 611-627. [26] Kadam, A. and P. Lenk (2007), Bayesian Inference for Issuer Heterogeneity in Credit Ratings Migration, working paper, Cass Business School, London [27] Kiefer, N. M. (2006), Default Estimation for Low-Default Portfolios, working paper, Departement of Economics and Statistical Science, Cornell University. [28] Lando, D. (1998), On Cox Processes and Credit Risk Securities, Reviews of Derivatives Research, 2, 99-120. [29] Lando, D. (2004), Credit Risk Modeling, Princeton University Press, Princeton. [30] Lando, D. and T. M. Skodeberg (2002), Analyzing Rating Transitions and Rating Drift with Continuous Observations, Journal of Banking and Finance 26, 423-444. [31] Leland, H., and K. Toft (1996), Optimal Capital Structure, Endogenous Bankruptcy, and the Term Structure of Credit Spreads, Journal of Finance, 51, 987-1019. [32] Longstaff, F. A. and E. S. Schwartz (1995), A Simple Approach to Valuing Risky Fixed and Floating Rate Debt, Journal of Finance, 50, 789-820 [33] Merton, R. (1974), On the Pricing of Corporate Debt: the Risk Structure of Interest Rates, Journal of Finance, 29, 449-470
43
[34] McNeil, A. J. and J. P. Wendin (2007), Bayesian Inference for Generalized Linear Mixed Models of Portfolio Credit Risk, Journal of Empirical Finance, 14, 131-149. [35] Pluto, K. and D. Tasche (2005), Estimating Probabilities of Default for Low Default Portfolios, Deutsche Bundesbank. [36] Robert, C. and G. Casella (1999), Monte Carlo Statistical Methods, Springer-Verlag, New York. [37] Rubin, D. B. (1988), Using the SIR Algorithm to Simulate Posterior Distributions In Bayesian Statistics, edits J. M. Bernardo et al, Clarendon Press, Oxford. p395-420. [38] Shumway, T. (2001), Forecast Bankruptcy More Accurately: A Simpel Hazard Model, Journal of Business, 74, 101-124. [39] UK Financial Services Authority (2007), Expert Group paper on Low Default Portfolios. [40] Vasicek, O. A. (1977), An Equilibrium Characterization of the term structure, Journal of Financial Economics, 5, 177-188.
44