Markov Modulated Poisson Processes in Credit Risk Modeling
Daniel Wei-Chung Miao Linacre College University of Oxford
D.Phil Transfer Report Hilary 2005
Contents
1 Introduction 1.1 Credit Risk Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 1.1.2 1.2 1.3 Structural Approach . . . . . . . . . . . . . . . . . . . . . . . Reduced Form Approach . . . . . . . . . . . . . . . . . . . . . 1 1 1 2 3 4 5 6 8 8 10 15 15 16 18 20 23 26 30 34 39 50 54 56 56 58
Markov Modulated Poisson Processes . . . . . . . . . . . . . . . . . . Problems and Approaches in this Report . . . . . . . . . . . . . . . . 1.3.1 1.3.2 Topics in this Report . . . . . . . . . . . . . . . . . . . . . . . Organization . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 The Convergence Analysis of MMPP to OU Processes 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 2.3 Moment Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . Convergence of n-th Moment 2.3.1 2.3.2 2.3.3 2.3.4 . . . . . . . . . . . . . . . . . . . . . . Definitions of Convergence Order . . . . . . . . . . . . . . . . Convergence of the First and Second Moments . . . . . . . . . Convergence of the Third Moment . . . . . . . . . . . . . . . Convergence of the Fourth Moment . . . . . . . . . . . . . . .
2.3.5 Convergence of Higher Moments . . . . . . . . . . . . . . . . . 2.4 Steady State Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Convergence in One Dimensional Distribution . . . . . . . . . . . . . 2.6 Convergence in Finite Dimensional Distribution . . . . . . . . . . . . 2.7 Tightness and Weak Convergence . . . . . . . . . . . . . . . . . . . . 2.8 Evaluation of Survival Probabilities . . . . . . . . . . . . . . . . . . . 2.9 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Default Term Structure in Advanced Models 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Models Extended from OU . . . . . . . . . . . . . . . . . . . . . . . .
i
3.2.1 3.2.2 3.3
Processes with Generaized Drift . . . . . . . . . . . . . . . . . Processes with Generalized Volatility . . . . . . . . . . . . . .
58 59 64 65 68 70 70 74 75 78 81
Jump Diffusion Models . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 OU-based Jump Diffusion . . . . . . . . . . . . . . . . . . . . 3.3.2 CIR-based Jump Diffusion . . . . . . . . . . . . . . . . . . . . Switching Caused by a Background Markov Chain . . . . . . . Switching Triggered by Hitting Threshold . . . . . . . . . . . Models with Memory Effects . . . . . . . . . . . . . . . . . . . Switched Diffusion Models . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 3.4.2 3.4.3
3.4
3.5 3.6
Models with Time-Varying Parameters . . . . . . . . . . . . . . . . . Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Higher Dimensional MMPP Formulation of Advanced Models with Stochastic Parameters 4.1 4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Formulation of Two Dimensional MMPP . . . . . . . . . . . . . . . . 4.2.1 Two Independent MMPPs . . . . . . . . . . . . . . . . . . . . 4.2.2 4.2.3 4.3 4.4 Two Correlated MMPPs . . . . . . . . . . . . . . . . . . . . . Extension to n Correlated MMPPs . . . . . . . . . . . . . . . 83 83 85 86 88 92 93 95 96 96 96 97 98 99
Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . Default Rate Models with Stochastic Parameters . . . . . . . . . . . . 4.4.1 4.4.2 Default Rate Models with Stochastic Drift . . . . . . . . . . . Default Rate Models with Stochastic Volatility . . . . . . . . .
4.5
4.4.3 Default Rate Models with Stochastic Drift and Volatility . . . Stock Price Models with Stochastic Parameters . . . . . . . . . . . . 4.5.1 4.5.2 Stock Price Models with Stochastic Drift . . . . . . . . . . . . Stock Price Models with Stochastic Volatility . . . . . . . . .
4.6
4.5.3 Stock Price Models with Stochastic Drift and Volatility . . . . 99 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 101 . . . . . . . . . . . . . . . . . . . . . . 101
5 Summaries and Future Works 5.1 5.2 Summaries of Current Works
Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 103
Bibliography
ii
List of Figures
1.1 2.1 2.2 (a) A 2-state MMPP (b) Default rate v.s. time . . . . . . . . . . . . 3
The formulation of MMPP (a) Discretization in state space (b) The 2N + 1 discrete states . . . . . . . . . . . . . . . . . . . . . . . . . . (3) The convergence of Rh (t) . . . . . . . . . . . . . . . . . . . . . . . .
(4)
12 20 22 40 52 52 53 53 59 61 63 66 67 69 71 75 77
2.3 The convergence of Rh (t) . . . . . . . . . . . . . . . . . . . . . . . . ˜ 2.4 The construction of Xh (t) from Xh (t) . . . . . . . . . . . . . . . . . . 2.5 2.6 2.7 2.8 Survival Probabilitiy term structure for OU-1 (a) t = 0 ∼ 2 (b) Closeup at t = 1.5 ∼ 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Survival Probabilitiy term structure for OU-2 (a) t = 0 ∼ 10 (b) Closeup at t = 9 ∼ 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Convergence in survival Probability for OU-1 (a) Error v.s. N (b) Error v.s. h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Convergence in survival Probability for OU-2 (a) Error v.s. N (b) Error v.s. h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 3.2 3.4 3.5 The sample drift function . . . . . . . . . . . . . . . . . . . . . . . . The drift function µ(x) corresponding to CIR process . . . . . . . . . Incorporating a jump into MMPP (a) The desired jump rate and size (b) Added into MMPP . . . . . . . . . . . . . . . . . . . . . . . . . . (a) The extra states and extended area (b) The transition rates for a state in the extended area . . . . . . . . . . . . . . . . . . . . . . . . 3.6 The treatment of jumps in the dimension of Yh (t) . . . . . . . . . . . 3.7 (a) Background Markov Chain (b) The overall 2-layer MMPP . . . . 3.9 Switching caused by hitting threshold . . . . . . . . . . . . . . . . . . 3.10 The k-stage recovery process . . . . . . . . . . . . . . . . . . . . . . .
3.3 The difference between Xh (t) and Yh (t) . . . . . . . . . . . . . . . . .
3.8 (a) Erlang distribution (k = 3) (b) Hyperexponential distribution (k = 3) 74
iii
4.1 A 2-D MMPP (a) The grid structure (b) The transitions to adjacent states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 91 4.2 Allowed transitions for each case (a) ρ = 0 (b) ρ > 0 (c) ρ < 0 . . . .
iv
Chapter 1 Introduction
1.1 Credit Risk Modeling
The main objective of credit risk modeling is to capture the behaviour of default risk, based on which we may be able to determine default probability term structure and price credit derivatives. In the finance literature[4][11][14], there are two major approaches to model default risk. The first approach, termed the structural approach, considers a firm’s asset value and defines default as the event when the firm’s value process reaches a prespecified default boundary for the first time. On the other hand, the second approach, called the reduced-form approach, or intensity based approach, assumes that the default arrival follows a Possion process with stochastic intensity. In this section we give a brief review of these two approaches.
1.1.1
Structural Approach
The structural approach can be dated back to the pathbreaking work of Black and Scholes (1973) and Merton (1974). In their original work, default occurs at the maturity date of debt in the event that the issuer’s assets are less than the face value of the debt. More precisely, default is the event A(T ) ≤ D, where A(t) is the asset value process and D is the face value of debt with the maturity time T . Under the log-normal assumption on the asset price, the concept of distance to default is introduced as log A(t) − log D , σ where σ is the asset volatility. As a result, making use of the nice properties of the X(t) = Normal distribution, the default probability P (X(T ) ≤ 0|X(t)) can be calculated. 1
A second type of structural approach was introduced by Black and Cox (1976). In this model default occurs as soon as the asset value of a firm falls below a certain boundary. It is also known as the first-passage model. That is, the default epoch is defined by the first passage time as τ = inf{t ≥ 0 : A(t) ≤ D(t)}, where D(t) is a prespecified default boundary which is often an increasing function of t. The survival probability is then the probability that the distance to default does not reach zero during the time interval (t, T ), i.e. p(t, T ) = P(X(s) ≥ 0, t ≤ s ≤ T |X(t)), which can be evaluated under suitable assumptions on A(t) and D(t).
1.1.2
Reduced Form Approach
The reduced-form approach is based on the notion of the arrival intensity of default. When the default arrival is a Poisson process with a stochastic intensity (rate), then it becomes a Cox process[14]. A Cox processes is doubly stochastic in that the uncertainty arises from two layers: 1. The randomness of the default rate process itself, and 2. The randomness of the Poisson process. (Conditioning on the path of the default rate process, the default arrival becomes a Poisson with time-varying rate.) The instantaneous default rate λ(t) is defined by λ(t) = lim 1 P(t ≤ τ ≤ t + ∆t|τ > t), ∆t→0 ∆t
where τ is the lifetime of a corporate bond. Thus, over a small time interval ∆ → 0 P(t ≤ τ ≤ t + ∆t|τ > t) = λ(t)∆t. Given the path of λ(s) where t ≤ s ≤ T , the survival probability during the interval (t, T ) is given by P(τ > T |λ(s) : t ≤ s ≤ T ) = e− Therefore, this survival probability can be obtained as p(t, T ) = E[e− 2
T t T t
λ(s)ds
.
λ(s)ds
].
This formula has the same form as that for the zero coupon bond (ZCB) price in interest rate models, i.e. B(t, T ) = E[e−
T t
r(s)ds
],
where B(t, T ) is the time t price of ZCB at maturity time T . Apart from the similarity in formula, both the credit risk model and interest rate model usually have meanreverting properties, therefore some standard interest rate models[1] may be used as the models for default risk. For example, the OU (when the probability of going negative is small) and CIR processes.
1.2
Markov Modulated Poisson Processes
A Markov Modulated Poisson Process (MMPP) is a doubly stochastic Poisson process where the default rate is determined by the state of the underlying continuous-time Markov Chain. Since the Markov Chain has a finite number of states, the Poisson arrival rate takes discrete values corresponding to each state. Shown in Fig. 1.1 is an example of 2-state MMPP. Due to its discrete property, the MMPP is actually
Figure 1.1: (a) A 2-state MMPP (b) Default rate v.s. time quite different from those (continuous state) doubly stochastic diffusion processes commonly used in finance. When we use a MMPP to model default rate, we sacrifice the continuous change (diffusion) of default rates across time, which we have from the diffusion models. However, using finite states to model just a discrete set of default rates allows us to use Markov Chain theory to obtain the survival (default) probabilities. We take the simplest 2-state MMPP as an example. Define • i, j : state index, 1 ≤ i, j ≤ 2 • S(t) = i: state variable is i at time t 3
• λi : default rate for state i (S(t) = i) • pij (t) = P[no default arrival in (0, t), S(t) = j|S(0) = i] Consider the time interval from t to t + ∆t, we may write down the relation between pij (t) and pij (t + ∆t) as pi1 (t + ∆t) = pi1 (t)(1 − α1 ∆t − λ1 ∆t) pi2 (t + ∆t) = pi1 (t)α1 ∆t + + pi2 (t)α2 ∆t .
pi2 (t)(1 − α2 ∆t − λ2 ∆t)
Therefore we may write down the system of differential equations for pij (t) d pi1 (t) = pi1 (t)(−α1 − λ1 ) + pi2 (t)α2 dt , d p (t) = p (t)α + pi2 (t)(−α2 − λ2 ) i2 i1 1 dt which can be expressed in matrix form as d dt or equivalently p (t) = A · p(t), where p(t) and A are defined as shown above. Thus p(t) can be solved as p(t) = eAt · p(0). Therefore, the survival probability can be obtained as Pi (t) = P(survival until t|S(0) = i) =
j=1,2
pi1 (t) pi2 (t)
=
−α1 − λ1 α2 α1 −α2 − λ2
pi1 (t) pi2 (t)
,
pij (t).
Namely, the survival probability can be calculated through a matrix exponential where the matrix dimension is 2×2. Obviously this is due to the fact that the MMPP has just 2 states. If we consider an N -state MMPP, we should have to evaluate the exponential of a matrix of size N × N .
1.3
Problems and Approaches in this Report
The central idea of this report is to model the default intensity through an MMPP. In a traditional reduced-form setting, the default rate is often assumed to be a diffusion process like OU or CIR. To make the MMPP as close as possible to such a class of 4
diffusion processes, we may use a sufficient large (but hopefully not too large) number of states to approximate them well. In other words, an MMPP with a suitable number of states may behave quite like a diffusion process. Another point to make here is that, those candidate processes usually have the mean-reverting property that prevents the process from going far away from its central values. In some models, such as the geometric Brownian motion which is often used to model stock prices, the process value will go to infinity as time tends to infinity. In principle, the MMPP with a finite number of states is not suitable to approximate such unbounded processes (unless further treatments are given). On the contrary, due to the (stochastic) boundness of those mean-reverting processes, the MMPP discretization is particularly suitable because we can just ignore the small probability that the process goes beyond the range of the MMPP.
1.3.1
Topics in this Report
For those mean-reverting processes whose process values are mostly distributed within a certain upper and lower bounds, using such an finite state approximation immediately rise a natural question: how good is the accuracy? Is there a relation between the number of states we use and the accuracy we acquire? This will lead to the first topic in this report: the convergence analysis of the MMPP to the OU process. In the study of convergence we will see that, using the moment matching technique on an infinitesimal time interval, the MMPP converges weekly to the original OU process. Furthermore, the convergence of the moments is in second order, which means the relevant (path-dependent) expectations converge in a rather fast manner. These results justify the approach to discretize an OU process. Our main goal is certainly not just to approximate the well know processes like the OU. In fact, when we turn a diffusion process into a MMPP setting, it becomes easier to incorporate some interesting properties without compromising the computational complexity and tractability. For example, due to the discrete nature of MMPP, any state change can be seen as a (small) jump in default rate. Consequently, it is natural that we may incorporate a greater jump (as occurs in a jump-diffusion process) as a state change across many states. Such an extension of the original model does not bring more computational complexity to the MMPP, and the survival probability can be calculated through the same procedure. In the following, we summarize the extended models that we are going to discuss in this report. The study of these models forms the second topic of this report.
5
• Models with generalized drift and volatility • Jump diffusion models • Switched diffusion models • Models with time-varying parameters All the above models can be extended from discretized OU processes. We will discuss how to contruct each model such that the desired properties may be included. Based on the model formulations, the numerical works on the default probability term structure will be carried out in the future. A further extension of the MMPP discretization technique is to deal with 2- or n-factor models. This is done through a two or higher dimensional MMPP. One major distinction between these and one dimensional case is that we need to deal with the correlation between the driving Brownian motions. The idea to determine the transition rates is still through moment matching. One application of the higher dimensional MMPP is the models with stochastic parameters. For a diffusion process, if we allow the drift or volatility to be stochastic and follow another diffusion process, then the overall model becomes a 2-factor model. When both of them becomes stochastic, then it is a 3-factor models. Two classes of stochastic parameter models will be discussed in this report. In the first one the main process is the default rate, while in the second one, the main process is the stock price. In the former case, we intend to study its default term structure, while in the latter we are interested in derivative pricing problems. In short, this report intends to study a credit risk model which may include a few desired features without sacrificing tractability. Discretization of a diffusion process to an MMPP is the main idea and each model will end up with an MMPP formulation, where the calculation of the default probability can be conducted through matrix computations. One major benefit is that the MMPP discretization technique serves as a good alternative to Monte-Carlo simulation that we usually have to use to deal with advanced models.
1.3.2
Organization
This report is organized as follows. In Chapter 2, we investigate the convergence behaviour of the moment matched MMPP to its original OU process. Chapter 3 extends the discretized MMPP models to incorporate desirable features. In Chapter
6
4 we use two or higher dimensional MMPP to deal with 2- or n-factor models with stochastic parameters. Summaries and future works are given in Chapter 5. It should be also noted that, Chapter 2 is a more complete piece of work. Chapter 3 and 4 are considered as reserach plans, where more analytical and numerical works will be carried out in the future.
7
Chapter 2 The Convergence Analysis of MMPP to OU Processes
2.1 Introduction
The Ornstein-Uhlenbeck (OU) process is perhaps the most fundamental diffusion process used in credit risk or interest rate modeling, as it has constant volatility[1][2]. With its mean-reverting property, it is used to model the dynamics of default rate (reduced-form modeling) or interest rate. In the text of interest rate term stucture, it is also known as the Vasicek model. The OU process has been well studied and is known to have many nice mathematical properties, such as its value at any time horizon follows a Gaussian distribution, which makes it completely tractable. However, one of the major drawbacks is the possibility for it to go negative, which is undesirable when modeling default rates or interest rates. Even so, the OU process is still a good place to start with if we want to develop more advanced default rate models. Time discretization is quite often used in the implementation of advanced financial models, especially when analytical results are not available. Perhaps the best known example of discretization in mathematical finance is to use the binomial method to discretize the stock price which follows geometric Brownian motion[9][13]. Such a discretization is made on the time axis (and thus also makes the state axis discrete), with equally spaced time steps. It is widely known that, as time step shrinks to 0, the option price from binomial method will converge to the Black-Scholes formula. [16] is a study of the speed of convergence from the binomial model to Black-Scholes. For general cases where more complicated models are considered and more branches are allowed, such discretization is also known by a general term — the tree method[2][13]. In the tree method, moment matching is used to contruct the tree, or more precisely, 8
to determine the probabilities for the tree to move to a specific branch as time goes one step forward. Another way to deal with more advanced models with less tractability is Monte-Carlo simulation. Time discretization is also central to the implementation of Monte-Carlo. For all these methods, it is natural to study the convergence behaviour of the time discretized process to the original process, as the time step tends to zero[5][15]. The basic idea of this report to develop advanced models is through the discretization of the OU process. However, unlike the standard tree method or Monte-Carlo simulation, our method is to discretize the state space but keep the continuity of time axis. In other words, we allow the stochastic factor (such as default rate) to operate with jump-like dynamics instead of a diffusion-like one, and all the jumps may happen in continuous time. When we use sufficiently many discrete states and sufficiently small jump size, the space discretized process should be quite close to the original diffusion process. In the text of credit risk modeling, the stochastic factor of interest is the default rate. When the default rate is assumed to be a diffusion process like an OU, it is actually an inhomogeneous Poisson process with a stochastic intensity. Such a process is also said to be doubly stochastic in the sense that there are two levels of randomess: one comes from the Poisson process itself and the other comes from the stochastic intensity. Our discretization on its state space is to use a finite number of discrete states (note that the OU takes value over the whole real line) to cover the area within which the OU process is most likely to sojourn (i.e. it may move out of this area with a sufficiently small probability). Since OU has long term stability (its steady state exists, and never becomes unbounded as time goes on), these finite number of states are considered to be enough to approximate the original OU. With a set of discrete states, we use moment matching to determine the transition rates between them. Through this procedure the discretized process is well defined and is actually a Markov Modulated Poisson Process (MMPP)[7]. An MMPP is a Poisson process with intensity depending on the state of an underlying continuous time Markov Chain. In other words, such a discretization has put the original problem into a Markov Chain setting. This naturally raises the question about whether the moment matched MMPP converges to the original OU, and how fast the convergence is. This will be the main focus in this chapter. Through the convergence analysis done in this chapter, we will come to our main conclusion which can be stated in a simple claim: the moment
9
matched MMPP converges weakly to its original OU, and the expectations converge in second order. Our ultimate goal is not just to discretize a fundemantal process like the OU. We take the OU as an initial step in order to have a theoretical foundation to determine how good the moment matching method is. Based on the work in this chapter, we shall extend the MMPP model in order to incorporate more complicated behaviour of the default rate. For those advanced models, we are not likely to have analytic results and Monte-Carlo seems to be the only way out. By bringing them to the MMPP setting, the techniques from Markov Chains can be used to deal with these advanced models. In other words, with the MMPP discretization, we expect to develop more advanced models to cover more complicated default rate dynamics without sacrificing too much tractability. These will be the focus of the subsequent chapters. This chapter is organized as follows. In Section 2.2, we use moment matching to define the approximating MMPP. In Section 2.3, we study the convergence of all the moments. Steady state analysis is given in Section 2.4. The convergence of the one dimensional distribution is investigated in Section 2.5, on which Section 2.6 is based to discuss the convergence of the finite dimensional distributions. Section 2.7 discusses the tightness condition which is connected with the concept of weak convergence. Section 2.8 introduces the method to evaluate the term structure of survival probability. Concluding remarks are given in Section 2.9.
2.2
Moment Matching
The original OU process is described by the following stochastic differential equation (SDE). dX(t) = k(θ − X(t))dt + σdW (t), X(0) = x0
where k is the cofficient of reversion, θ is the long term mean, and σ is the volatility. An OU process can be then characterized by (k, θ, σ, x0 ). From the basic texts [1][13], it is well known that the mean and variance for X(t) are given by −kt E[X(t)] = θ + (x0 − θ)e
2 V[X(t)] = σ (1 − e−2kt ) 2k
The state discretized process MMPP is denoted by Xh (t) of which the first two moments should be matched to the original OU. Let h stand for the step size between equally spaced adjacent states. Then Xh (t) is characterized by the five-tuple 10
(h; k, θ, σ, x0 ). Now we use moment matching to define the transition rates between adjacent states. To make the MMPP as close to the original OU as possible, we only allow the transitions to happen between adjacent states, in order to capture the diffusion behaviour (continuous state). Assuming Xh (t) = x, when time moves from t to t + ∆t (∆t → 0), only three cases are allowed to happen for Xh (t + ∆t): jump up to x + h, jump down to x − h, or stay at x. During the infinitesimal time interval, the first two moments for the OU, which are going to be matched, are E[ X(t + ∆t) |X(t) = x] = x + k(θ − x)∆t E[X 2 (t + ∆t)|X(t) = x] = σ 2 ∆t + [x2 + 2xk(θ − x)∆t] Let u and d denote the up and down jump rates, then the state transition probabilities are given by P[Xh (t) = x + h|Xh (t)] = u∆t P[Xh (t) = x |Xh (t)] = 1 − u∆t − d∆t P[X (t) = x − h|X (t)] = d∆t h h Therefore, the first two moments for MMPP are E[Xh (t + ∆t)|Xh (t) = x] = u∆t(x + h) + (1 − u∆t − d∆t)x + d∆t(x − h)
2 E[Xh (t + ∆t)|Xh (t) = x] = u∆t(x + h)2 + (1 − u∆t − d∆t)x2 + d∆t(x − h)2
Matching the first two moments of Xh (t) to those of X(t), we obtain the following formulae 2 u(x) = σ + hk(θ − x) 2h2 2 d(x) = σ − hk(θ − x) 2h2 Note that in the formulae, when x > θ, the states with larger values of x have greater
jump down rates and lower jump up rates. On the other hand, when x < θ, the states with smaller values of x have greater jump up rates and lower jump down rates. This simply reflects the mean-reverting property. Moreover, in any case the transition rate should be kept non-negative, and this means we have some restrictions on the largest and smallest possible values of x. Let xmax and xmin denote the upper and lower bounds that generate sensible transition rates. To determine xmax and xmin , we set u(xmax ) = d(xmin ) = 0 to obtain 2 σ 2 + hk(θ − xmax ) u(xmax ) = xmax = θ + σ =0 2h2 hk =⇒ σ 2 − hk(θ − xmin ) σ2 d(xmin ) = xmin = θ − =0 2h2 hk 11
Obviously any state value x should satisfy xmin ≤ x ≤ xmax . The next questions to ask are how many states should we use for MMPP and how should we assign the discrete value x to each state? Let 2N + 1 denote the number of discrete states which are indexed from 0 to 2N , and xi represents the state value (default rate) of the MMPP, where i = 0, · · · , 2N . Notice that OU is mean-reverting and symmetric in that, for the states of equal distance above and below θ, i.e. θ+c and θ − c, the “momentum” to be pulled back toward θ are the same. Taking symmetry into account when assigning state values for the MMPP, we let the center state N take the value of the long term mean θ, with an equal number (N ) of states above (indexed N + 1, · · · , 2N + 1) and below (indexed 0, · · · , N − 1) it. In other words, set center state xN = θ, as well as the lowest state x0 = θ − N h and the highest state x2N = θ + N h. Obviously, the MMPP can only take discrete values within the range [x0 , x2N ]. More precisely, xi = x0 + ih = θ − (N − i)h, 0 ≤ i ≤ 2N
Shown in Fig. 2.1 is a conceptual diagram of the MMPP discretization and the
Figure 2.1: The formulation of MMPP (a) Discretization in state space (b) The 2N +1 discrete states arrangement of states. Note that since the MMPP is upper and lower bounded, it will serve as a good approximation only if the probability for the original OU to move out of this range is sufficiently small. Furthermore, in order to have sensible transition rates, we need to demand that xmin ≤ x0 ≤ xi ≤ x2N ≤ xmax . In particular, if we
12
further demand x0 = xmin and x2N = xmax , we will come to σ2 x0 = θ − N h = xmin = θ − σ hk =⇒ h = √ 2 Nk x2N = θ + N h = xmax = θ + σ hk which gives a relation between h and N . If number of states 2N + 1 is given, then h is thus decided. Conversely, if h is given first, then the integer N should be given σ2 σ2 ≤ 2 , i.e. only particular choices of h will make x0 = xmin and by N = h2 k hk x2N = xmax . σ In this report, we consider just the cases when h satisfies h = √ for all possible Nk integer N = 1, 2, · · · . As N → ∞, the number of states 2N + 1 → ∞ and step size h → 0, and also σ2 x0 = θ − Nh = θ − → −∞ hk 2 x2N = θ + N h = θ + σ → ∞ hk Thus we see when N → ∞ and h → 0, [x0 , x2N ] → (−∞, ∞) which means the MMPP tends to cover the whole real number just like OU. Now we summarize the introduction of the moment matched MMPP and give a precise definition as below. Definition 2.2.1. (Moment Matching) For a given OU(k, θ, σ, x0 ), we use an MMPP(h, k, θ, σ, x0 ) with 2N +1 discrete states to approximate it by matching its first and second moments at any time t across an infinitesimal ∆t. The state values are denoted by xi where 0 ≤ i ≤ 2N , and we define σ for N = 1, 2, · · · , such that u(x2N ) = d(x0 ) = 0. 1. h satisfies h = √ Nk 2. xi = x0 + ih = θ − (N − i)h where 0 ≤ i ≤ 2N . In particular 2 x0 = θ − N h = θ − σ hk xN = θ = θ 2 x2N = θ + N h = θ + σ hk 3. The up and down jump rates are given by 2 2 2 ui = u(xi ) = σ + hk(θ − xi ) = σ + h k(N − i) = 2h2 2h2
2 2 2 di = d(xi ) = σ − hk(θ − xi ) = σ − h k(N − i) = 2h2 2h2
σ2 2h2 σ2 2h2
1+ 1−
N −i N N −i N
13
4. We see the mean-reverting property 0 ≤ i ≤ N − 1 (xi ≤ θ) ui < di ui = di i=N (xi = θ) u >d N + 1 ≤ i ≤ 2N (xi ≥ θ) i i 5. We see that ui + di = σ2 for each state i. Moreover, {ui } is a decreasing h2 arithmetic sequences, while {di } is an increasing one with the same difference between adjacent terms, i.e. ui − ui+1 = di+1 − di = 1 N σ2 2h2 = k 2
From the fact that u + d is a constant for each state, we have the following proposition about the distribution of state holding time and number of state transitions during a given time interval. Proposition 2.2.1. For a moment matched MMPP: 1. The holding time in each state follows an exponential distribution with parameter σ2 . h2 2. The number of state transitions which happen in the time interval [0, T ] follows σ2 a Poisson distribution with parameter 2 T . h Proof. For a continuous time Markov Chain, the holding time of each state is exponential distributed with parameter equaling the sum of the transition rates from that σ2 state. In the moment matched MMPP, this is ui + di = 2 , which is a constant h independent of the state index i. Therefore, no matter how the state transitions from one to another, the holding time distribution is always the same. This makes the number of state changes within a specific time interval a Poisson distribution. Consequently we have the claimed results. The above proposition also indicates that the counting process of the state transition events of a moment matched MMPP is actually a Poisson process with deterσ2 ministic intensity 2 . h The rest of the sections in this chapter focus on the convergence behaviour of the moment matched MMPP to its original OU as h → 0. Basically, we want to answer these two questions: (1) whether the MMPP will converge to OU (2) How fast is the convergence? 14
2.3
Convergence of n-th Moment
In order to discuss the speed of convergence, we introduce the definition of order of convergence, which is often used to discuss the convergence of the Monte-Carlo method. Following Kloeden and Platen[12][5], for a continuous time process X(t), ˆ t ∈ R and its time discretized process X(i∆t), i = 1, 2, · · · , with ∆t denoting the time step size. The strong and weak order of convergence are defined as below.
2.3.1
Definitions of Convergence Order
Definition 2.3.1. (Order of Convergence) Fix a time T and let n∆t = T . ˆ 1. The discretization X is said to have strong order of convergence β > 0 if ˆ E | X(n∆t) − X(T ) | ≤ C · ∆tβ for some constant C and all sufficient small ∆t. ˆ 2. The discretization X is said to have weak order of convergence β > 0 if ˆ E [g(X(n∆t))] − E[g(X(T ))] ≤ C · ∆tβ for some constant C and all sufficient small ∆t, for all regular g satisfying some technical conditions. In financial applications, weak order of convergence is more relevant because the pricing formulae often end up with an expectation which is what we need to evaluate. ˆ For a good approximation, we need to ensure the expectations computed from X are close to those computed from X. However, in general we are not concerned about the paths of the two processes. In our analysis, it is the weak order of convergence that we want to study. In addition, since our discretization is done over the space axis while time is kept continuous, we need to slightly modify the definition of weak order of convergence, so as to discuss the convergence order as the step size h goes to 0. Apart from the convergence analysis of an expectation on a single time, we would also like to generalize the concept to discuss the convergence order of an expectation which depends on the path. There are two levels of such generalization. The first level is about an expectation of a function depending on the process value at a finite number of time instants. The second level is to discuss an expectation of a function depending on the whole path. Their (weak) order of convergence is defined as follows. 15
Definition 2.3.2. (Order of Convergence) The (weak) order of convergence from the moment matched MMPP Xh (t) to its original OU X(t) is defined in three levels as below. 1. at the level of a single time instant: g : R → R E[g(Xh (t))] − E[g(X(t))] ≤ C · hβ 2. at the level of multiple time instants: g : Rn → R E[g(Xh (t1 ), · · · , Xh (tn )] − E[g(X(t1 ), · · · , X(tn ))] ≤ C · hβ 3. at the level of the whole path: g : D[0, T ] → R (g is a function on path; D[0, T ] is the set of the cadlag paths on [0, T ]) E[g({Xh (t)})] − E[g({X(t)})] ≤ C · hβ The function g(·) is assumed to be a sufficiently nice function in each case. This section discusses the convergence of moments of process value at a single time point, which is at the first level. We start with the first two moments, continue with the third and fourth moments, and then discuss the higher moments.
2.3.2
Convergence of the First and Second Moments
Before our analysis, let us define the following notation
n Rh (t) = E[Xh (t)] − E[X n (t)] (n)
representing the time function of error for the n-th moment between an MMPP and an OU. Since the moment matching uses the first two moments, it is expected that we (1) (2) should get much better error functions Rh (t) and Rh (t) than for higher order moments. As a matter of fact, it turns out the time functions of the first and second moments of MMPP exactly match those of OU, which are presented as below. Proposition 2.3.1. The MMPP and OU have an exact match in their first and second moments. In other words, the error is zero for any single time horizon t. E[Xh (t)] = E[X(t)], i.e. Rh (t) = 0
(2) (1)
2 E[Xh (t)] = E[X 2 (t)], i.e. Rh (t) = 0
∀t
16
Proof. Recall Section 2.2, the moment matching is done across an infinitesimal time interval from t to t + ∆t. The first moment matching formula implies E[Xh (t + ∆t)|Xh (t) = x] = x + k(θ − x)∆t Taking expectation on Xh (t) gives E[Xh (t + ∆t)] = E[Xh (t)] + k(θ − E[Xh (t)])∆t Letting ∆t approach 0, we write down the simple ODE dE[Xh (t)] = −k(E[Xh (t)] − θ) dt which is solved by E[Xh (t)] = c1 e−kt + c2 The constants c1 and c2 are determined through these initial conditions E[Xh (0)] = x0 and E[Xh (∆t)] = x0 + k(θ − x)∆t. Thus we obtain E[Xh (t)] = (x0 − θ)e−kt + θ which is exactly the same as E[X(t)]. Similarly, for the second moment, moment matching implies
2 E[Xh (t + ∆t)|Xh (t) = x] = x2 + [2xk(θ − x) + σ 2 ]∆t
Taking expectation on Xh (t) gives
2 2 2 E[Xh (t + ∆t)] = E[Xh (t)] + −2kE[Xh (t)] + 2kθE[Xh (t)] + σ 2 ∆t (2)
(2.1)
2 For simplicity, we do not solve for E[Xh (t)] directly. Instead, we solve for Rh (t). We
know for the original OU E[X 2 (t + ∆t)] = E[X 2 (t)] + −2kE[X 2 (t)] + 2kθE[X(t)] + σ 2 ∆t Substracting (2.1) by (2.2), we have Rh (t + ∆t) = Rh (t) + −2kRh (t) + 2kθRh (t) + σ 2 ∆t Expressing this by an ODE and using Rh (t) = 0, we have dRh (t) (2) = −2kRh (t) dt 17
(2) (1) (2) (2) (2) (1)
(2.2)
which is solved by Rh (t) = c1 e−2kt + c2 Using the initial condition Rh (0) = 0 and Rh (∆t) = 0, we obtain Rh (t) = 0 as required. The results for the first and second moments fit our intuition, because the moment matching defines the dyanmics of the first two moments across an infinitesimal interval, which lead to the same ODE as OU. In addition, it should be noted that the step size h is not involved in the above proof, which means such formulae are valid for any sensible choice of N . Taking the smallest possible N = 1 which defines an MMPP with 2N + 1 = 3 states, we see that even we use just 3 states to approximate OU, we still get a exact match in the first two moments.
(2) (2) (2) (2)
2.3.3
Convergence of the Third Moment
After dealing with the first two moments, it is then natural to study the error of higher moments under the moment matching method. We cannot expect the errors in the higher moments to be independent of h as in the first two moments. But we do expect to see for higher moments the error shrinks to zero as h → 0. In the following we present the error function for the third moment. Proposition 2.3.2. The third moment error function is given by Rh (t) = h2 · R1 (t) where R1 (t) =
(3) (3) (3)
(θ − x0 ) −kt (e − e−3kt ) 2
Proof. For MMPP, we use the fact u − d = k(θ − x) h σ2 u+d = h2 to obtain
3 E[Xh (t + ∆t)|Xh (t) = x] = u∆t(x + h)3 + (1 − u∆t − d∆t)x3 + d∆t(x − h)3
= x3 + 3x2 k(θ − x) + 3xσ 2 + h2 k(θ − x) ∆t 18
Similar to the lower moment cases, we take expectation on Xh (t) to reach
3 3 3 2 E[Xh (t + ∆t)] = E[Xh (t)] + −3kE[Xh (t)] + 3kθE[Xh (t)] + 3σ 2 E[Xh (t)]
(2.3) +h2 k(θ − E[Xh (t)]) ∆t To repeat the process for OU, we need to have the third moment for OU. Recall that the MGF of a Gaussian random variable X is E[eαX ] = eαm+
α2 v 2
where m and v denote mean and variance of X. From MGF we have the third moment as E[X 3 ] = m3 + 3mv Applying this to our OU X(t + ∆t) given X(t) = x, we have E[X 3 (t + ∆t)|X(t) = x] = m3 + 3mv where m = E[X(t + ∆t)|X(t) = x] = x + k(θ − x)∆t v = V[X(t + ∆t)|X(t) = x] = σ 2 ∆t
This enables us to write down E[X 3 (t + ∆t)] = E[X 3 (t)] + −3kE[X 3 (t)] + 3kθE[X 2 (t)] + 3σ 2 E[X(t)] ∆t (2.4) Substracting (2.3) by (2.4) Rh (t + ∆t) = Rh (t) + −3kRh (t) + 3kθRh (t) + 3σ 2 Rh (t) +h2 k(θ − x0 )e−kt ∆t which leads to this ODE (using Rh (t) = Rh (t) = 0) dRh (t) (3) + 3kRh (t) = h2 k(θ − x0 )e−kt dt This ODE can be solved to obtain Rh (t) = e−3kt = h2
(3) (3) (2) (1) (3) (3) (3) (2) (1)
e3kt · h2 k(θ − x0 )e−kt dt + c
(θ − x0 ) −kt e + ce−3kt 2 (θ − x0 ) and then obtain the 2
With the initial condition R(3) (0) = 0, we have c = −h2 claimed result. 19
The simple formula shown in the above proposition describes the relation of error functions between the cases with general h and the case h = 1. For a given original OU(k, θ, σ, x0 ), R1 (t) is fixed, thus Rh (t) converges as h → 0. Following the definition of convergence order, we may write
3 E[Xh (t)] − E[X 3 (t)] ≤ C · h2 (3) (3)
which indicates the convergence is in second order. In fact, throughout our analysis we will see that, the errors of all the expectations turn out to converge to zero in second order. The above formula is just the first place to reveal this mathematical fact. Note that the error function Rh (t) is valid for all 0 ≤ t ≤ ∞. In particular, at steady state when t → ∞, the error becomes 0 which says the third moments have an exact match at steady state (no matter which h is chosen), i.e.
t→∞ (3)
lim Rh (t) = 0
(3)
=⇒
t→∞
3 lim E[Xh (t)] = lim E[X 3 (t)] t→∞ (3)
Shown in Fig. 2.2 is a plot of the convergence of Rh (t) where θ = 10, x0 = 5, k = 0.5. Due to the second order convergence, we see the error goes to zero quite fast as (3) (3) h → 0. Note that Rh (t) does not depend on the choice of σ. Also, we see Rh (t) → 0 when t → ∞ regardless of h.
Figure 2.2: The convergence of Rh (t)
(3)
2.3.4
Convergence of the Fourth Moment
In this section we move on to derive the error function for the fourth moment. We use (4) (4) the same method and show in the following proposition that, Rh (t) and R1 (t) are 20
related by a multiplier h2 , exactly the same as what we have for the third moment. Certainly we have a slightly messy expression for R1 (t). Proposition 2.3.3. The fourth moment error function is given by Rh (t) = h2 · R1 (t) where
(4) (4) (4) (4)
R1 (t) = −
σ2 (1 − e−4kt ) + 2θ(θ − x0 )(e−kt − e−3kt ) 4k 2σ 2 − 4k(x0 − θ)2 −2kt + (e − e−4kt ) 2k
Proof. For MMPP we have
4 E[Xh (t + ∆t)|Xh (t) = x] = u∆t(x + h)4 + (1 − u∆t − d∆t)x4 + d∆t(x − h)4
= x4 + 4x3 k(θ − x) + 6x2 σ 2 + 4h2 xk(θ − x) + h2 σ 2 ∆t Similarly, we take expectation on Xh (t) to have
4 4 E[Xh (t + ∆t)] = E[Xh (t)] + 4 3 2 −4kE[Xh (t)] + 4kθE[Xh (t)] + 6σ 2 E[Xh (t)] 2 −4h2 kE[Xh (t)] + 4h2 kθE[Xh (t)] + h2 σ 2
∆t (2.5)
From the MGF of a Gaussian random variable X we have E[X 4 ] = m4 + 6m2 v + 3v 2 Similarly, we apply this to write down E[X 4 (t + ∆t)|X(t) = x] and take expectation over X(t) to come to E[X 4 (t + ∆t)] = E[X 4 (t)] + −4kE[X 4 (t)] + 4kθE[X 3 (t)] + 6σ 2 E[X 2 (t)] ∆t (2.6)
Substracting (2.5) by (2.6) yields Rh (t + ∆t) = Rh (t) +
(4) (4)
−4kRh (t) + 4kθRh (t) + 6σ 2 Rh (t) +h2 −4kE[X 2 (t)] + 4kθE[X(t)] + σ 2
(4)
(3)
(2)
∆t ∆t
and its ODE as
dRh (t) (4) + 4kRh (t) = h2 · g(t) dt where g(t) = 2kθ(θ − x0 )(e−kt − e−3kt ) −2(1 − e−2kt ) − 4k[θ + (x0 − θ)e−kt ] +4kθ[θ + (x0 − θ)e−kt ] + σ 2 21
2
(4)
Solving the ODE gives Rh (t) = h2 · e−4kt where
(4)
e4kt g(t)dt + c
e4kt g(t)dt = −
σ 2 4kt e + 2θ(θ − x0 )(e3kt − ekt ) 4k 2σ 2 − 4k(x0 − θ)2 2kt + e 2k
Using the initial condition Rh (0) = 0 to find the constant c, we obtain the claimed result. We see Rh (t) has the same neat formula as Rh (t), and such formula again implies second order convergence. However, one different thing between the third and fourth moments is their steady state behaviour. Let t → ∞, it is seen that the error doesn’t vanish and one term involving h2 is left as shown below. lim R(4) (t) = h2 − σ2 4k
(4) (3)
(4)
t→∞
This is actually the lowest moment with nonzero error at steady state.
Figure 2.3: The convergence of Rh (t) Shown in Fig. 2.3 is a plot of the convergence of Rh (t) where θ = 5, x0 = 3, k = 0.5, σ = 1. Note that Rh (t) is dependent on σ which is not the case in Rh (t). (4) Furthermore, when t → ∞, Rh (t) do not tend to zero.
(4) (3) (4)
(4)
22
2.3.5
Convergence of Higher Moments
In principle, following similar procedure we are able to obtain the formulae for error functions of all the higher moments, although it might be quite messy. If we go on (5) with such procedure to find Rh (t), it can be seen that not only h2 but also h4 will be involved. Since h is to shrink to zero, the higher order terms do not affect the convergence order. In other words, we still have second order convergence for the fifth moment. But how should we investigate the convergence order for even higher moments?
i Since we see Rh (t) depends on Rh (t) and/or E[Xh (t)], for all i < n, our approach (n) (i)
to study higher moment convergence is to contruct the recursive relation between Rh (t) and the lower order terms. In the following we will show that, every higher moments actually converges in second order as well. The proof is based on induction principle, which lies on the recursive relation between higher and lower order terms. However, for all the higher moments the simple relations we have in the third and fourth moments do not hold any longer, i.e. Rh (t) = h2 · R1 (t) n ≥ 5 Instead, we have a slightly weaker form still indicating their second order convergence. The following proposition is about the expressions of the higher moments of a Gaussian random variable, which will be used to calculate E[X n (t)]. Proposition 2.3.4. For a Gaussian random variable X with mean m and variance v, its n-th moment can be expressed as the following general form
n 2
(n)
(n)
(n)
E[X n ] =
k=0
n (2k)! v mn−2k 2k k! 2
k
Proof. The MGF for X is given by E[eαX ] = eαm+ where A = eαm and B = e
n
α2 v 2 α2 v 2
= (eαm ) · (e
α2 v 2
)=A·B
. Thus the n-th moment is
n
dn E[X ] = E[eαX ] dαn
= (A · B)
α=0
(n) α=0
=
j=0
n (n−j) j A B j
α=0
where the superscript (n) represents the n-th derivative. Note that A(j) = mj eαm thus A(j)
α=0
= mj . Also, B (j) can be expressed as B (j) = v 1 v 1 + α2 + · · · + ( )i α2i + · · · 2 i! 2 23
(j)
2k! v α=0 α=0 k! 2 tuting back and changing the index j to k, we obtain the desired result. For all odd j, B (j) = 0. For even j = 2k, we have B (j) =
k
. Substi-
The purpose of the above proposition is to calculate E[X n (t + ∆t)|X(t) = x] for OU. This is to be used in the proof of the following proposition, which is our main result in this section. Proposition 2.3.5. For n ≥ 5, the n-th moment error functions have the following form Rh (t) = h2 · Sh (t) where S (n) (t) = A0 (t) + h2 · A2 (t) + h4 · A4 (t) + · · · and S (n) (t) is bounded as h → 0. Proof.
n E[Xh (t + ∆t)|Xh (t) = x] (n) (n) (n) (n) (n)
= u∆t(x + h)n + (1 − u∆t − d∆t)xn + d∆t(x − h)n = xn + + + = xn + + +
n 1 n 3 n 5 n 1 n 3 n 5 n xn−2 h2 (u + d)∆t 2 xn−3 h3 (u − d)∆t + n xn−4 h4 (u + d)∆t 4 n n−5 5 x h (u − d)∆t + 6 xn−6 h6 (u + d)∆t + xn−1 k(θ − x)∆t + n xn−2 σ 2 ∆t 2 n−3 2 x h k(θ − x)∆t + n xn−4 h2 σ 2 ∆t 4 n n−5 4 x h k(θ − x)∆t + 6 xn−6 h4 σ 2 ∆t + · · ·
xn−1 h(u − d)∆t +
···
= xn + nxn−1 k(θ − x)∆t + where Sh,1 (x) is of such form as
(n)
n(n − 1) 2 n−2 (n) σ x ∆t + h2 · Sh,1 (x)∆t 2
Sh,1 (x) = a0 (x) + h2 · a2 (x) + h4 · a4 (x) + · · · Taking expectation yields
n−1 n n n E[Xh (t + ∆t)] = E[Xh (t)] + −nkE[Xh (t)] + nkθE[Xh (t)]
(n)
(n)
(n)
(n)
+
n(n − 1) 2 (n) n−2 σ E[Xh (t)] + h2 · E[Sh,1 (Xh (t))] 2
∆t
(2.7)
As for OU, when ∆t → 0 (terms with high order of ∆t ignored), we have E[X(t + ∆t)|X(t) = x]n = xn + nkxn−1 (θ − x)∆t = mn V[X(t + ∆t)|X(t) = x] = 24 σ 2 ∆t = v
Using Proposition 2.3.4 with the above m and v, we obtain E[X n (t + ∆t)|X(t) = x] = xn + nkxn−1 (θ − x)∆t + Therefore E[X n (t + ∆t)] = E[X n (t)] + −nkE[X n (t)] + nkθE[X n−1 (t)] + Substracting (2.7) by (2.8) gives Rh (t + ∆t) = +
(n) (n) (n)
n(n − 1) 2 n−2 σ x ∆t 2
n(n − 1) 2 σ E[X n−2 (t)] 2
∆t
(2.8)
Rh (t) +
(n)
−nkRh (t) + nkθRh
(n)
(n−1)
(t)
n(n − 1) 2 (n−2) (n) σ Rh (t) + h2 · Sh,2 (t) ∆t 2
where Sh,2 (t) = E[Sh,1 (Xh (t))]. Now we use the principle of mathematical induction to prove the claim. From earlier subsections, we have seen the claim is true for n = 3 and n = 4. Assuming it is true for all m ≤ n − 1, i.e. Rh (t) = h2 · Sh (t) where Sh (t) is bounded as h → 0. Then the above formula becomes the following ODE dRh (t) (n) (n) + nkRh (t) = h2 · Sh,3 (t) dt where Sh,3 (t) = nkθSh This ODE is solved by Rh (t) = e−nkt
(n) (n) (n) (n−1) (n) (m) (m) (m)
(t) +
n(n − 1) 2 (n−2) (n) σ Sh (t) + Sh,2 (t) 2
enkt · h2 · Sh,3 (t)dt + c
(n)
= h2 · Sh,4 (t) + ce−nkt where Sh,4 (t) = e−nkt Therefore Rh (t) = h2 · Sh,4 (t) − Sh,4 (0) = h2 · Sh (t) From the definition of Sh,i (t), i = 1, 2, 3, 4 and assumptions on Sh (t), m ≤ n − 1, we see Sh (t) is also bounded as h → 0. More precisely, Sh (t) is of the following form Sh (t) = A0 (t) + h2 · A2 (t) + h4 · A4 (t) + · · · with all Ai (t) bounded for a fixed n. Consequently, the claim is true for all integer n. 25
(n) (n) (n) (n) (n) (n) (n) (n) (m) (n) (n) (n) (n) (n)
enkt · Sh,3 (t)dt . Using Rh (0) = 0 we get c = −h2 · Sh,4 (0).
(n)
(n)
(n)
This section can be summarized in the following statement: all the error functions for the n-th moments converge to 0 in second order. In particular, when n = 1, 2, the error functions are zero, indicating exact match. When n = 3, 4, we have a simple relation between cases with general h and the special case h = 1. For n ≥ 5, h2 always exists in the error function formulae, which are weaker than the cases n = 3, 4.
2.4
Steady State Analysis
The analysis in the previous section is valid for any time horizon t including steady state (t → ∞). Since the steady state behaviour can be also treated through Markov Chain theory, in this section we focus on the convergence analysis at steady state using this approach. We expect to see the agreements between this analysis and our earlier results. Notation. The following are defined to lighten our notations. Xh = lim Xh (t)
t→∞ t→∞
X
= lim X(t)
For a finite t < ∞, the distribution function of Xh (t) is not easy to obtain in an explicit form. On the contrary, at steady state, the distribution of Xh is particularly tractable. In fact, it has the same distribution as an affine function of a binomial random variable. Here we present the result. Proposition 2.4.1. For the MMPP (h, k, θ, σ, x0 ), Xh has the same distribution as the random variable Y = h · Z + (θ − N h) where Z follows a binomial distribution with parameter 2N, 1 . 2
Proof. Recall Section 2.2, Xh can take values at the following discrete points: xi = x0 + ih = ih + (θ − N h) i = 0, · · · , 2N with their center at xN = θ. Since Z is binomial and take integer values from 0 to 2N, Y takes value at all xi , i = 0, · · · , 2N . Therefore P(Y = xi ) = P(Z = i) = 2N i 1 2
2N
,
i = 0, · · · , 2N
26
This claim actually says P(Xh = xi ) = P(Y = xi ), ∀i, which is what we are going to prove. For simplicity we denote P(Xh = xi ) by pi . At steady state, the adjacent state probabilities should follow the local balance equations as pi ui = pi+1 di+1 , i = 0, · · · , 2N − 1
By definitions of ui and di in Section 2.2, this becomes pi (2N − i) = pi+1 (i + 1), i = 0, · · · , 2N − 1
Because of the symmetric property between the first half states (i < N ) and the second half (i > N ), we have pN −j = pN +j , j = 1, · · · , N . Therefore, we only need to derive either half and the other half is obtained by symmetry. The balance equations for states i ≥ N can be written as pN · (N ) pN +1 · (N − 1) ··· pN +j−1 · (N − j + 1) ··· p2N −1 · (1) = = = = = = pN +1 · (N + 1) pN +2 · (N + 2) ··· pN +j · (N + j) ··· p2N · (2N )
Let pN = p, we can recursively calculate all the state probabilities as below: pN pN −1 pN −2 ··· pN −j ··· p0 = pN +1 = pN +2 ··· = pN +j ··· = pN = p = p · NN +1 = p · NN · +1 ··· = p · NN · +1 ··· = p · NN · +1
N −1 N +2 N −1 N +2 N −1 N +2 −j+1 · · · NN +j −j+1 1 · · · NN +j · · · 2N
Since all the probabilities should sum to 1
1 p · 1 + 2 NN + · · · + 2 NN N −1 · · · 2N = 1 +1 +1 N +2
Multiplying both sides by p·
2N N
2N N
=
2N (2N −1)···(N +1) , 1·2···N 2N 0
we come to
2N N
+ ··· + 2
2N N −j
+ ···2
= p · 22N =
=⇒ p =
2N N
1 2N 2
As a consequence, we have pN −j = pN +j = Namely P(Xh = xi ) = pi = which completes the proof. 27
2N i 1 2N , 2 2N N 1 2N N 2 N +1
·
N −1 N +2
−j+1 · · · NN +j =
2N N −j
1 2N 2
=
2N N +j
1 2N 2
i = 0, · · · , 2N
We have fully characterized the distribution of Xh , thus its MGF and the moments can be obtained without difficulty. In the following we derive the first four moments for Xh in order to check whether these agree with what we obtained earlier. As a comparison to Xh , we firstly summerize the first four moments of X as below. Proposition 2.4.2. 1. The MGF of X is given by E[eαX ] = exp αθ + α2 2 σ2 2k
2. The first four moments can be expressed as • The first moment E[X]
2
= θ
2
σ2 • The second moment E[X ] = θ + 2k 2 σ 3 3 • The thrid moment E[X ] = θ + 3θ 2k σ2 • The fourth moment E[X 4 ] = θ4 + 6θ2 2k
σ2 +3 2k
2
Proof. For a Gaussian random variable X with mean m and variance v, we have E[eαX ] = exp αm + E[X] E[X 2 ] E[X 3 ] E[X 4 ] = = = = α2 v 2
m m2 + v m3 + 3mv m4 + 6m2 v + 3v 2 σ2 to get the results. 2k
We simply substitute m = θ and v =
In parallel to X, the MGF and the first four moments for Xh are summarized as below, based on our understanding on its distribution. Proposition 2.4.3. 1. The MGF of Xh is given by E[eαXh ] = eαθ e
αh 2
+ e− 2
αh 2
2N
28
2. The first four moments of Xh can be expressed as • The first moment E[Xh ] = θ σ2 2k 2 σ 3 E[Xh ] = θ3 + 3θ 2k σ2 4 E[Xh ] = θ4 + 6θ2 2k
2 • The second moment E[Xh ] = θ2 +
• The thrid moment • The fourth moment
+3
σ2 2k
2
+ h2 −
σ2 4k
Proof. For the first part, recall that the MGF of a binomial random variable (n, p) is (1 − p + peα )n . Since Z ∼ Bin(2N, 1 ), 2 E[e
αZ
1 + eα ]= 2
2N
According to Y = h · Z + (θ − N h) E[e
αX
] = E[e
αY
] = E[e
αhZ
]·e
α(θ−N h)
1 + eαh = 2
2N
· eα(θ−N h)
The claimed result can be obtained by simple calculations. In the second part, we derive the first four moments for Z as E[Z] =
d E[eαZ ] α=0 dα d2 E[eαZ ] dα2 α=0 d3 αZ E[e ] dα3 α=0 d4 αZ E[e ] dα4 α=0
= N
1 = N2 + 2N 1 = N3 + 2N2 + 1N 2
E[Z 2 ] = E[Z 3 ] = E[Z 4 ] =
= N 4 + 3N 3 + 3 N 2 − 1 N 4 4
Then, since X and Y has the same distribution and from the affine relation between Y and Z, their moments are related as follows. E[X] E[X 2 ] E[X 3 ] E[X 4 ] = = = = E[Y ] E[Y 2 ] E[Y 3 ] E[Y 4 ] = = = = E[hZ + θ − N h] E[(hZ + θ − N h)2 ] E[(hZ + θ − N h)3 ] E[(hZ + θ − N h)4 ]
After some messy derivations, the results are obtained as claimed. From the Proposition 2.4.2 and 2.4.3, we summarize the error terms at steady state for the first four moments as below 0 (n) σ2 lim R (t) = n→∞ h h2 − 4k 29 n = 1, 2, 3 n=4
which agrees with our earlier results. Apart from the moments, the convergence behaviour for MGF is also observed in the following proposition. Proposition 2.4.4. The MGF of Xh converges to that of X. That is E[eαXh ] → E[eαX ] as h → 0 Proof. E[e
αXh
]
=
e
αθ
e
αh 2
+ e− 2
αh 2
2N
=
eαθ 1 +
1 αh 1 αh ( ) + ( ) + ··· 2! 2 4! 2 α 2 h + o(h2 ) 8 =
2
1 h2 2σ 2 k
2
4
2σ 2 h2 k
=
h→0
eαθ
1+
− → eαθ · e −
α2 2σ 2 · k 8
exp αθ +
α2 σ 2 2 2k
=
E[eαX ]
We see the h2 is involved in the expression of E[eαXh ], indicating the agreement with our earlier result about second order convergence.
2.5
Convergence in One Dimensional Distribution
We have seen that all the moments of Xh (t) converge in second order to X(t). It is then natural to ask whether Xh (t) converges to X(t), and whether the convergence order is preserved for the expectations of general functions. In the rest part of this chapter, we will discuss such convergence behaviour in a more general sense. When it comes to the convergence of the process Xh (t) to X(t), there are in fact three different levels to be discussed, which are described as below. 1. Convergence in one-dimensional distribution — for a single t Xh (t) converges to X(t) in distribution, denoted by Xh (t) − X(t). This implies → E[g(Xh (t))] −→ E[g(X(t))] for a bounded continuous function g(·). 30
D
2. Convergence in finite-dimensional distribution — for time instants t1 , · · · , tn (Xh (t1 ), · · · , Xh (tn )) converges to (X(t1 ), · · · , X(tn )) in distribution, denoted by (Xh (t1 ), · · · , Xh (tn )) − (X(t1 ), · · · , X(tn )). This implies → E[g(Xh (t1 ), · · · , Xh (tn ))] −→ E[g(X(t1 ), · · · , X(tn ))] for a bounded continuous function g(·). 3. Weak Convergence — for all t ∈ R {Xh (t)} converges weakly to {X(t)}, denoted by {Xh (t)} − {X(t)}. This → implies E[g({Xh (t)})] −→ E[g({X(t)})] for a bounded continuous function g(·) on paths. As far as convergence order is concerned, we are interested in knowing whether the second order convergence is preserved to the expectations in all the three levels. Namely, we want to ask whether the following hold E[g(Xh (t))] − E[g(X(t))] = h2 · Sh = h2 · Sh
(g) W D
E[g(Xh (t1 ), · · · , Xh (tn ))] − E[g(X(t1 ), · · · , X(tn ))] = h2 · Sh E[g({Xh (t)})] − E[g({X(t)})]
(g)
(g)
In this section we will discuss the first level, the convergence in one-dimensional distribution, as well as the order of convergence for the expectations. We begin with the expectation of a nice function g(·). Proposition 2.5.1. For a continuous function g(x) which is sufficiently nice such that it can be expressed by the Taylor expansion. Assuming both E[g(Xh (t))] and E[g(X(t))] exist, we have E[g(Xh (t))] − E[g(X(t))] = h2 · Sh (t) where Sh (t) = A0 (t) + h2 · A2 (t) + h4 · A4 (t) + · · · and Sh (t) is bounded as h → 0.
(g) (g) (g) (g) (g) (g)
31
Proof. This is a direct result from Section 2.3. Express g(x) by its Taylor expansion
∞
as g(x) =
i=0
ai xi . Therefore
∞ ∞ n ai E[Xh (t)] i=0 ∞ ∞ ∞
E[g(Xh (t))] − E[g(X(t))] = = h2 ·
i=0 ∞ (i)
−
i=0 ∞
ai E[X (t)] = h ·
i=0 (i)
n
2
ai Sh (t)
(i)
ai A0 (t) + h2 ·
i=0
(i)
ai A2 (t) + h4 ·
i=0
(i)
ai A4 (t) + · · ·
Note that those
i=0
ai Aj (t), j = 0, 2, 4, · · · terms have no dependence on h. Due
∞
to the existence of E[g(Xh (t))] and E[g(X(t))], all the
i=0 ∞
ai Aj (t) terms must be
∞
(i)
bounded. (If any of these terms is unbounded, then we won’t have a finite E[g(Xh (t))]− E[g(X(t))]) Consequently,
∞ i=0 (i) ai Sh (t)
is bounded as h → 0. Define
(g) Aj
=
i=0
ai Aj (t),
(i)
∀j and Sh (t) =
i=0
(g)
ai Sh (t), we complete the proof.
(i)
This proposition indicates the second order convergence of Taylor expandable functions. One typical example of such functions is moment generating function. According to the above, the MGF of Xh (t) and X(t) with a fixed α (such that the expectations are bounded) can be related as below. E[eαXh (t) ] − E[eαX(t) ] = h2 · Sh (t, α) where Sh (t, α) = A0 (t, α) + h2 · A2 (t, α) + · · · is bounded as h → 0. As to the convergence in distribution, we need to look at the characteristic functions and distribution functions of Xh (t) and X(t). Based on the results on distribution functions, we are then able to discuss the expectations of more general functions. Proposition 2.5.2. 1. The chararteristic functions of Xh (t) and X(t) are related as follows ΦXh (t) (α) − ΦX(t) (α) = h2 · Sh (t, α) where ΦXh (t) (α) = E[eiαXh (t) ] and ΦX(t) (α) = E[eiαX(t) ], and Sh (t, α) = A0 (t, α) + h2 · A2 (t, α) + h4 · A4 (t, α) + · · · which is bounded as h → 0. 32
(Φ) (Φ) (Φ) (Φ) (Φ) (M ) (M ) (M ) (M )
2. Xh (t) converges to X(t) in distribution as h → 0. Moreover, their distribution functions are related as below FXh (t) (x) − FX(t) (x) = h2 · Sh (t, x) where FXh (t) (x) = P(Xh (t) ≤ x) and FX(t) (x) = P(X(t) ≤ x), and Sh (t) = A0 (t, x) + h2 · A2 (t, x) + h4 · A4 (t, x) · · · which is bounded as h → 0. 3. For any function g(x) such that both E[g(Xh (t))] and E[g(X(t))] exist, we have E[g(Xh (t))] − E[g(X(t))] = h2 · Sh (t) where Sh (t) = A0 (t) + h2 · A2 (t) + h4 · A4 (t) + · · · is bounded as h → 0. Proof. Similar to MGF, characteristic functions are also Taylor expandable, thus claim 1 can be directly proved from Proposition 2.5.1. All the treatments are actually the same as those in MGF with α replaced by iα. However, unlike MGF which may grow unbounded as α getting larger, the absolute value of a characteristic function is always bounded for any α. Write ΦXh (t) (α) − ΦX(t) (α) = h2 · Sh (t, α) In other words, the left hand side is bounded on α, which implies Sh (t, α) is also bounded on α. In addition, Aj (t, α) = Aj
(Φ) (M ) (Φ) (Φ) (g) (g) (g) (g) (g) (F ) (F ) (F ) (F ) (F )
(t, iα), for all j = 0, 2, 4, · · · .
Claim 1 indicates the convergence of their characteristic functions, which implies the convergence in their distributions. Using the inversion formula, we can link their distribution functions to characteristic functions as
∞
FXh (t) (x) = FXh (t) (x) − FXh (t) (−∞) = FX(t) (x) = FX(t) (x) − FX(t) (−∞)
(Φ)
a→−∞
lim
−∞ ∞ −∞
=
a→−∞
lim
e−iαa − e−iαx ΦXh (t) (α)dα 2πiα e−iαa − e−iαx ΦX(t) (α)dα 2πiα
Using φXh (t) (α) − φX(t) (α) = h2 · Sh (t, α), we have
∞
FXh (t) (x) − FX(t) (x) = h2 ·
a→−∞
lim
−∞
e−iαa − e−iαx (Φ) (F ) Sh (t, α)dα = h2 · Sh (t, x) 2πiα 33
with all the Aj (t, x) coming from the inverse Fourier transform of Aj (t, α). All these Aj (t, x) terms are bounded, thus Sh (t, x) is bounded when h → 0, otherwise the left hand side cannot be bounded. Therefore, claim 2 is proved. As to claim 3, the expectations can be expressed by definition as
∞ (F ) (F )
(F )
(Φ)
E[g(Xh (t))] =
−∞ ∞
g(x)dFXh (t) (x) g(x)dFX(t) (x)
−∞
E[g(X(t))]
(F )
=
Using FXh (t) (x) − FX(t) (x) = h2 · Sh (t, x), we have
∞
E[g(Xh (t))] − E[g(X(t))] = h ·
−∞
2
g(x)dSh (t, x) = h2 · Sh (t)
(g)
(F )
(g)
Also we see Aj (t) =
−∞
(g)
∞
g(x)dAj (t, x), j = 0, 2, 4, · · · . The boundedness of Sh (t)
(F )
comes from the same reason. Thus we complete the proof. The above proposition summarizes the main results in the one-dimensional level. In short, for any single t, Xh (t) converges to X(t) in distribution, and all the expectations converge in second order.
2.6
Convergence in Finite Dimensional Distribution
In this section, we discuss the convergence of finite dimensional distribution, the second level according to the previous section. We are now looking at these two random vectors (Xh (t1 ), · · · , Xh (tn )) and (X(t1 ), · · · , X(tn )) and wish to study the relations between their joint distribution functions and their general expectations. For simplicity, we use the following lighter notations in the rest of this chapter: Xh (ti ) = Xih , X(ti ) = Xi , i = 1, · · · , n
Unlike in the previous section, we start the analysis with their finite dimensional expectations instead of their characteristic functions or distribution functions. This is because we intend to construct the error formula for a general multi-dimensional expectation by recursively using the one-dimensional result (Proposition 2.5.2, point 3). When the general expectations are well treated, then the characteristic functions and distribution functions (which can be expressed as expectations of a multi-dimensional indicator function) are just special cases included in the general result. The following is a multi-dimensional extension of the point 3 in Proposition 2.5.2. 34
h h Proposition 2.6.1. For a function g(x1 , · · · , xn ) : Rn → R. Assuming E[g(X1 , · · · , Xn )]
and E[g(X1 , · · · , Xn )] exist, we have
h h E[g(X1 , · · · , Xn )] − E[g(X1 , · · · , Xn )] = h2 · Sh (g)
where Sh = Ah,0 + h2 · Ah,2 + h4 · Ah,4 + · · · in which Ah,j , j = 0, 2, 4, · · · and Sh are bounded as h → 0. Proof. We first define the following notations
h h h Gh (x1 , · · · , xi ) = E[g(X1 , · · · , Xn )|X1 = x1 , · · · , Xih = xi ] i h h h = E[g(x1 , · · · , xi , Xi+1 , · · · , Xn )|X1 = x1 , · · · , Xih = xi ] h h = E[g(x1 , · · · , xi , Xi+1 , · · · , Xn )|Xih = xi ] (g) (g) (g) (g) (g) (g)
(Markov)
Gi (x1 , · · · , xi ) = E[g(X1 , · · · , Xn )|X1 = x1 , · · · , Xi = xi ] = E[g(x1 , · · · , xi , Xi+1 , · · · , Xn )|X1 = x1 , · · · , Xi = xi ] = E[g(x1 , · · · , xi , Xi+1 , · · · , Xn )|Xi = xi ] Clearly, with i = 0
h h Gh = E[g(X1 , · · · , Xn )] , 0
(Markov)
G0 = E[g(X1 , · · · , Xn )]
Also, with i = n Gh (x1 , · · · , xn ) = Gn (x1 , · · · , xn ) = g(x1 , · · · , xn ) n Note that, for both Gh and Gi , we have the recursive relations between index i − 1 i and i as follows.
h Gh (x1 , · · · , xi−1 ) = E[Gh (x1 , · · · , xi−1 , Xih )|Xi−1 = xi−1 ] i−1 i
Gi−1 (x1 , · · · , xi−1 ) = E[Gi (x1 , · · · , xi−1 , Xi ) |Xi−1 = xi−1 ] for i = 1, · · · , n. The reasons for (2.9) can be seen as below. Firstly, we see
h h Gh (x1 , · · · , xi−1 ) = E[g(x1 , · · · , xi−1 , Xih , · · · , Xn )|Xi−1 = xi−1 ] i−1
(2.9)
=
yi yi+1
···
yn
h h g(x1 , · · · , xi−1 , yi , · · · , yn )P(Xih = yi , · · · , Xn = yn |Xi−1 = xi−1 )
=
yi
yi+1
···
yn
h h g(x1 , · · · , xi−1 , yi , · · · , yn )P(Xi+1 = yi+1 , · · · , Xn = yn |·
h h Xih = yi , Xi−1 = xi−1 ) P(Xih = yi |Xi−1 = xi−1 )
taken out by Markov
35
=
yi
h h h E[g(x1 , · · · , xi−1 , yi , Xi+1 , · · · , Xn )|Xih = yi ]P(Xih = yi |Xi−1 = xi−1 ) h Gh (x1 , · · · , xi−1 , yi )P(Xih = yi |Xi−1 = xi−1 ) i yi
=
h = E[Gh (x1 , · · · , xi−1 , Xih )|Xi−1 = xi−1 ] i
Therefore, similarly we have Gi (x1 , · · · , xi−1 ) = E[g(x1 , · · · , xi−1 , Xi , · · · , Xn )|Xi−1 = xi−1 ] =
yi yi+1
···
yn
h g(x1 , · · · , xi−1 , yi , · · · , yn )f (Xih = yi , · · · , Xn = yn | h Xi−1 = xi−1 )dyn · · · dyi
=
yi yi+1
···
yn
g(x1 , · · · , xi−1 , yi , · · · , yn )f (Xi+1 = yi+1 , · · · , Xn = yn |
Xi = yi , Xi−1 = xi−1 )dyn · · · dyi+1 · f (Xi = yi |Xi−1 = xi−1 )dyi
taken out by Markov
=
yi
E[g(x1 , · · · , xi−1 , yi , Xi+1 , · · · , Xn )|Xi = yi ]f (Xi = yi |Xi−1 = xi−1 )dyi Gi (x1 , · · · , xi−1 , yi )f (Xi = yi |Xi−1 = xi−1 )dyi
yi
=
= E[Gi (x1 , · · · , xi−1 , Xi )|Xi−1 = xi−1 ] The idea of the proof is to relate Gh to Gi−1 with the index i going backward i−1 from n to 1. Each time when we move one step backward, we need to use the one dimensional result, i.e.
h E[fi (Xih )|Xi−1 = xi−1 ] = E[fi (Xi )|Xi−1 = xi−1 ] + h2 · Sh,i−1 (xi−1 )
where Sh,i−1 (xi−1 ) = Ai−1,0 (xi−1 ) + h2 · Ai−1,2 (xi−1 ) + h4 · Ai−1,4 (xi−1 ) + · · · and fi (·) is a one-variable function, for i = 1, · · · , n. Consider the case i = n, since Gh = Gn = g and consider xn as the only variable by seeing x1 , · · · , xn−1 as constants, n i.e. fn (xn ) = g(x1 , · · · , xn ), we have
h h Gh (x1 , · · · , xn−1 ) = E[g(x1 , · · · , xn−1 , Xn )|Xn−1 = xn−1 ] n−1
= E[g(x1 , · · · , xn−1 , Xn )|Xn−1 = xn−1 ] + h2 · Sh,n−1 (xn−1 ) = Gn−1 (x1 , · · · , xn−1 ) + h2 · Sh,n−1 (xn−1 ) 36
Now we move one more step backward to i = n−1, let fn−1 (xn−1 ) = Gn−1 (x1 , · · · , xn−1 ), we have Gh (x1 , · · · , xn−2 ) n−2
h h = E[Gh (x1 , · · · , xn−2 , Xn−1 )|Xn−2 = xn−2 ] n−1 h h h h = E[Gn−1 (x1 , · · · , xn−2 , Xn−1 )|Xn−2 = xn−2 ] + h2 · E[Sh,n−1 (Xn−1 )|Xn−2 = xn−2 ]
= E[Gn−1 (x1 , · · · , xn−2 , Xn−1 )|Xn−2 = xn−2 ] + h2 · Sh,n−2 (xn−2 )
h h + h2 · E[Sh,n−1 (Xn−1 )|Xn−2 = xn−2 ] h h = Gn−2 (x1 , · · · , xn−2 ) + h2 · Sh,n−2 (xn−2 ) + h2 · E[Sh,n−1 (Xn−1 )|Xn−2 = xn−2 ]
Repeat this procedure for i = n − 2, n − 3, · · · and define fi (xi ) = Gi (x1 , · · · , xi ) for all i. Consequently for each i we have
n−1
Gh (x1 , · · · i−1
, xi−1 ) = Gi−1 (x1 , · · · , xi−1 )+h ·Sh,i−1 (xi−1 )+h ·
j=i
2
2
h h E[Sh,j (Xj )|Xi−1 = xi−1 ]
Therefore, when the procedure is finished at i = 1, we have
h h E[g(X1 , · · · , Xn )] n−1
= Gh 0
=
G0 + h2 · Sh,0 (x0 ) + h2 ·
j=1 n−1
h h E[Sh,j (Xj )|X0 = x0 ]
= E[g(X1 , · · · , Xn )] + h2 ·
j=0 n−1
h h E[Sh,j (Xj )|X0 = x0 ] n−1 h h E[Aj,0 (Xj )|X0 j=0
= E[g(X1 , · · · , Xn )] + h
n−1
2
= x0 ] + h ·
j=0
2
h h E[Aj,2 (Xj )|X0 = x0 ] + · · ·
n−1 h E[Sh,j (Xj )]
n−1 h h E[Sh,j (Xj )|X0
Define
n−1
(g) Sh
=
j=0
=
j=0
= x0 ] and
(g) Ah,l
=
j=0
h E[Aj,l (Xj )] =
h h E[Aj,l (Xj )|X0 = x0 ], l = 0, 2, 4 · · · , then j=0 h h E[g(X1 , · · · , Xn )] − E[g(X1 , · · · , Xn )] = h2 · Sh = h2 · Ah,0 + h2 · Ah,2 + · · · h h Note that in Ah,l , the dependence on h comes from Xj . As h → 0, Xj → Xj (g) (g) (g) (g)
(convergence in one dimensional distribution), limh→o Ah,l does not depend on h and should be bounded. Therefore, limh→o Sh is bounded, and we complete the proof. Remark. In fact, (2.9) can be also proved by tower law. We simply redefine
h h Gh = E[g(X1 , · · · , Xn )|Fi ] i (g)
(g)
Gi
= E[g(X1 , · · · , Xn )|Fi ] 37
where the filtration Fi represents the information up to time i. By tower law, we have
h h h h h Gh i−1 = E[g(X1 , · · · , Xn )|Fi−1 ] = E[E[g(X1 , · · · , Xn )|Fi ]|Fi−1 ] = E[Gi |Fi−1 ]
Gi−1 = E[g(X1 , · · · , Xn )|Fi−1 ] = E[E[g(X1 , · · · , Xn )|Fi ]|Fi−1 ] = E[Gi |Fi−1 ] Therefore, using Markov property we have (2.9). This proposition is applicable to a general function g(·). In order to discuss the convergence in distribution, we need to look at the joint characteristic functions and joint distribution functions. This can be obtained as special cases from the above proposition. Below we present the results. Proposition 2.6.2. 1. The joint characteristic functions for both processes are related as below E[ei(α1 X1 +···+αn Xn ) ] − E[ei(α1 X1 +···+αn Xn ) ] = h2 · Sh where Sh
(Φ) (Φ) (Φ) (Φ)
h h
(Φ)
= A0,h + h2 · A2,h + h4 · A4,h + · · · is bounded as h → 0.
2. The joint distribution functions for both processes are related as below Fh (x1 , · · · , xn ) − F (x1 , · · · , xn ) = h2 · Sh where Sh
(F ) (F ) (F ) (F ) (F )
= A0,h + h2 · A2,h + h4 · A4,h + · · · is bounded as h → 0.
Proof. The first claim is simply proved by applying the above proposition to the characteristic functions (although they are complex functions). As to claim 2, we need to express the distribution functions as expectations of an indicator function, i.e.
h h h h Fh (x1 , · · · , xn ) = P(X1 ≤ x1 , · · · , Xn ≤ xn ) = E[1{X1 ≤ x1 , · · · , Xn ≤ xn }]
F (x1 , · · · , xn )
= P(X1 ≤ x1 , · · · , Xn ≤ xn )
= E[1{X1 ≤ x1 , · · · , Xn ≤ xn }]
Thus the result is straightforward by applying the above proposition. Therefore, we see the second order convergence of finite dimensional distribution. fd D h h h Namely, (X1 , · · · , Xn ) − (X1 , · · · , Xn ) (or Xh (t) − X(t)), and all finite dimen→ → sional expectations converge in second order.
38
2.7
Tightness and Weak Convergence
In this section, we discuss the convergence at the third level, the weak convergence. This is a more general idea than finite dimensional distribution, in the sense that we concern infinite number of time instants. The objective is to see the convergence order is preserved when it comes to the expectations of functions on paths. The sufficient condition for the weak convergence[10] of a sequence of continuous process is given by 1. The convergence of finite dimensional distribution 2. Tightness condition where the second condition is described formally as follows. Definition 2.7.1. (Tightness) Consider the sequence of continuous random processes {Xn (t), n = 1, 2, · · · } and X(t). Under the condition that Xn (t) − X(t) (convergence in finite dimensional → distribution), the sequence {Xn (t)} is said to be tight if
T →0 n→∞ fd
lim lim E
sup Xn (s + t) − Xn (s)
0≤t≤T
→0
In the above definition, all the Xn (t) and the converged process X(t) are continuous random processes. However, in our setting, the MMPP Xh (t) is not a continuous random process, therefore the above conditions can not be applied directly. In or˜ der to make these conditions work, we define another continuous process Xh (t) from Xh (t) as below. ˜ Xh (t) = τi+1 − t t − τi Xh (τi ) + Xh (τi+1 ) τi+1 − τi τi+1 − τi Xh (τi+1 ) − Xh (τi ) τi+1 Xh (τi ) − τi Xh (τi+1 ) = t + τi+1 − τi τi+1 − τi
τi ≤ t ≤ τi+1
˜ where τi , i = 1, 2, · · · are the jump instants of Xh (t). Namely, Xh (t) is a linear interpolation for each interval (τi , τi+1 ), as shown in Fig. 2.4. Clearly, when h → 0, these two processes become the same one, i.e.
h→0
˜ lim Xh (t) = lim Xh (t)
h→0
fd fd ˜ ˜ Since Xh (t) − X(t), we also have Xh (t) − X(t). If {Xh (t)} can be proved to be → → W W ˜ tight, then Xh (t) − X(t), consequently, Xh (t) − X(t) as well. That is to say, → →
39
˜ Figure 2.4: The construction of Xh (t) from Xh (t) in order to discuss the weak convergence of {Xh (t)}, we should check the tightness ˜ of {Xh (t)} because they are continuous processes. In the following we look at the tightness condition in our setting, which is to be proved in this section. Definition 2.7.2. (Tightness in our setting) ˜ The convergence of the Xh (t) to X(t) is said to be tight if lim lim E ˜ ˜ sup Xh (s + t) − Xh (s)
0≤t≤T
T →0 h→0
→0
Note that we change the index of the sequence from n to h and use h → 0 to represent n → ∞. ˜ From the definition of Xh (t) (see Fig. 2.4), we know that ˜ Xh (t) − Xh (t) ≤ h Consequently, we have sup
0≤t≤T
∀t
˜ ˜ Xh (s + t) − Xh (s) ≤ sup
0≤t≤T
Xh (s + t) − Xh (s) + 2h
∀s
Taking lim lim E[·] on both sides, we see that
T →0 h→0
T →0 h→0
lim lim E
sup Xh (s + t) − Xh (s)
0≤t≤T
→0
(2.10)
is a sufficient condition based for the tightness property in Definition 2.7.2. Clearly, ˜ (2.10) is based on Xh (t) instead of Xh (t), therefore is easier to handle using our knowledge on Xh (t). In other words, (2.10) is the objective of our proof in this section. 40
In the following we present two useful results which will support the proof of (2.10). The first one is about the upper bond of E and X(t). Proposition 2.7.1. The following holds for Xh (t) no matter what we choose for h E sup |Xh (t)| ≤ 2 σ 2 T + (x0 − θ)2 + |θ|
0≤t≤T
sup |Xh (t)| while the second
0≤t≤T
one is a continous version of Doob’s decomposition for both the processes Xh (t)
∀h
Proof. Define the Brownian motion Y (t) = σdW (t), i.e. the original OU with no drift term (k = 0). Let Yh (t) denote the discretized MMPP of Y (t) with step size σ2 h. From moment matching, the up and down jump rates are both 2 for each state 2h (there are infinitely many states for Yh (t) actually). Obviously, both Y (t) and Yh (t) are martingales. The proof can be divided into the following three claims. 1. E sup |Xh (t)| ≤ E
0≤t≤T
sup |Xh (t) − θ| + |θ|
0≤t≤T
2. E
sup |Xh (t) − θ| < E
0≤t≤T
sup |Yh (t) − θ|
0≤t≤T
3. E
sup |Yh (t) − θ| ≤ 2 σ 2 T + (x0 − θ)2
0≤t≤T
The first claim is trivial from the fact |Xh (t)| ≤ |Xh (t) − θ| + |θ|. The third one can be proved through Doob’s inequality which gives a upper bound for the second moment of a martingale.
2
E
sup |Yh (t) − θ|
0≤t≤T
≤ E
sup (Yh (t) − θ)2
0≤t≤T
(E[X]2 ≤ E[X 2 ]) (Doob’s inequality)
≤ 4E (Yh (T ) − θ)2 = 4 V[Yh (T ) − θ] + E[Yh (T ) − θ]2 = 4 σ 2 T + (x0 − θ)2 Claim 3 can be obtained by taking square root on both sides.
Now we prove claim 2. According to proposition 2.2.1, for both Xh (t) and Yh (t), σ2 the rate to jump out of any current state is u + d = 2 , and the number of jumps J h
41
occuring during [0, T ] follows a Poisson distribution with parameter
σ2 σ2 T ) e− h2 T 2 P(J = j) = h j!
σ2 T , i.e. h2
j
(
j = 0, 1, 2, · · ·
Let Xi (and Yi ) denote the values of Xh (t) (and Yh (t)) after the i-th jump occurs, where 0 ≤ i ≤ j. In order words, {Xi } (and {Yi }) actually forms a discrete-time Markov Chain observed at the jump time of the original continuous-time Markov Chain. Consequently, we write
∞
E
sup |Xh (t) − θ|
0≤t≤T
=
j=0 ∞
P(J = j)E P(J = j)E
j=0
sup |Xh (t) − θ| J = j
0≤t≤T
= Similarly
sup |Xi − θ| J = j
0≤i≤j
∞
E
sup |Yh (t) − θ|
0≤t≤T
=
j=0
P(J = j)E
sup |Yi − θ| J = j
0≤i≤j
¯ ¯ ¯ Define Xi = |Xi − θ| (and Yi = |Yi − θ|), i.e. Xi is shifted in such a way that Xi would get less and less because Xi tends to move closer to its long term mean θ ¯ (while Yi would have the same probabilities to get greater or less). Consequently, the mean-reverting property for Xi can be rephrased as ¯ ¯ ¯ ¯ P(Xj > Xi ) < P(Xj < Xi ) ¯ while for Yi ¯ ¯ ¯ ¯ P(Yj > Yi ) = P(Yj < Yi ) ∀j > i ¯∗ ¯ ¯∗ ¯ Define Xj = supi≤j Xi (and Yj = supi≤j Yi ), from the mean-reverting property of ¯ ¯ Xi and the martingale property of Yi , we have ¯ ¯∗ ¯ ¯∗ P(Xj+1 > Xj ) < P(Xj+1 < Xj ) ¯ ¯ ¯ ¯ P(Yj+1 > Yj∗ ) = P(Yj+1 < Yj∗ ) As a consequence, we obtain ¯ ¯∗ ¯ ¯ P(Xj+1 > Xj ) < P(Yj+1 > Yj∗ ) ∀j > i
42
¯∗ ¯∗ Now we claim E[Xj ] ≤ E[Yj ] for all integer j by induction. As an initial step, we ¯∗ ¯ ¯∗ know E[X0 ] = x0 = E[Y0∗ ] (they start from the same value x0 ). Assuming E[Xj ] ≤ ¯ E[Yj∗ ], then ¯∗ E[Xj+1 ] ≤ ¯ ¯∗ ¯∗ ¯ ¯∗ P(Xj+1 ≤ Xj )E[Xj+1 |Xj+1 ≤ Xj ] ¯ ¯∗ ¯∗ ¯ ¯∗ + P(Xj+1 > Xj )E[Xj+1 |Xj+1 > Xj ] ¯∗ ¯ ¯∗ = E[Xj ] + P(Xj+1 > Xj ) · h ¯ ¯ ¯ ≤ E[Yj∗ ] + P(Yj+1 > Yj∗ ) · h = As a result E sup |Xh (t) − θ|
0≤t≤T
¯∗ ¯∗ (note Xj+1 = Xj + 0) ¯∗ ¯∗ (note Xj+1 = Xj + h)
¯∗ E[Yj+1 ]
= ≤
∞ j=0 ∞ j=0
¯∗ P(J = j)E Xj ¯ P(J = j)E Yj∗ = E sup |Yh (t) − θ|
0≤t≤T
Thus claim 2 is proved. Therefore, we complete the whole proof. Note that the above proposition is independent of h, thus the upper bond is also valid for the original OU process X(t) (the case h → 0). However, what is relevant to our main result is the fact that E sup0≤t≤T |Xh (t)| is bounded. The upper bound doesn’t actually matter. Moving on, in the following we present the next supportive result, the continuous time version of the Doob’s decomposition for Xh (t). The Doob’s decomposition is a general result for discrete time processes[17], saying that any such process Xi can be decomposed by Xi = X0 + Mi + Ai where Mi is a martingale starting from 0 and Ai can be expressed as
i
Ai =
j=1
E[Xj − Xj−1 |Fj−1 ]
Consider the MMPP(h, k, θ, σ, x0 ) with its mean as E[Xh (t)] = θ + (x0 − θ)e−kt which is independent of h. If it is time discretized with step size ∆t and let Xi = Xh (i∆t), we apply the Doob’s decomposition to Xi and have that, for a small ∆t E[Xj − Xj−1 |Fj−1 ] = θ + (Xj−1 − θ)e−k∆t − Xj−1 = θ + (Xj−1 − θ)(1 − k∆t) − Xj−1 = k(θ − Xj−1 )∆t 43
Consequently
i i
Ai =
j=1
[k(θ − Xj−1 )∆t] = kθi∆t − k
j=1
Xj−1 ∆t
Now we make a heuristical extension to continuous time. Let t = i∆t and let ∆t → 0 (i → ∞) while keeping t fixed, we write
t
Ai −→ Ah (t) = kθt − k
0
Xh (τ )dτ
The correctness of such extension is validated by the following proposition. Proposition 2.7.2. The continuous time process Xh (t) can be decomposed as below Xh (t) = Xh (0) + Mh (t) + Ah (t) where Mh (t) is a martingale starting from 0, and
t
Ah (t) = kθt − k
0
Xh (τ )dτ
Proof. This is equivalent to prove that Mh (t) = Xh (t) − Xh (0) − Ah (t) is a martingale starting from 0. Clearly, Mh (0) = −Ah (0) = 0. Also, we need to prove E[Mh (s + t)|Fs ] = Mh (s). Since Mh (s + t) = Xh (s + t) − Xh (0) − Ah (s + t)
s s+t
= Xh (s + t) − Xh (0) − kθ(s + t) − k
0
Xh (τ )dτ − k
s
Xh (τ )dτ
Thus E[Mh (s + t)|Fs ] = E[Xh (s + t)|Fs ] − Xh (0) − E[Ah (s + t)|Fs ] = E[Xh (s + t)|Fs ] − Xh (0)
s s+t
− kθ(s + t) − k
0
Xh (τ )dτ −
s
E[Xh (s + t)|Fs ]dτ
= [θ + (Xh (s) − θ)e−kt ] − Xh (0)
s s+t
− kθ(s + t) − k
0
Xh (τ )dτ −
s s
[θ + (Xh (s) − θ)e−kt ]dτ
= Xh (s) − Xh (0) − kθs − k
0
Xh (τ )dτ
= Xh (s) − Xh (0) − Ah (s) = Mh (s) which is the desired result. 44
Based on the above two results, we now move on to check the tightness condition of Xh (t). Because of Markov property, we set s = 0 in definition 2.7.1 without loss of generality. Also because our treatment on the expectation E sup0≤t≤T Xh (t) − Xh (0) doesn’t depend on h, we may simply drop lim in the definiton. In the following we h→0 present our main result in this section, in which the tightness condition is simplified to fit our setting. Proposition 2.7.3. The MMPP Xh (t), ∀h satisfies the condition (2.10) E sup Xh (t) − Xh (0)
0≤t≤T
→0
as T → 0
Proof. Since Xh (t) = Xh (0) + Mh (t) + Ah (t), we have E ≤ E sup Xh (t) − Xh (0)
0≤t≤T
=E
sup Mh (t) + Ah (t)
0≤t≤T
sup Mh (t)
0≤t≤T
+E
sup Ah (t)
0≤t≤T
The second term can be treated as
t
E ≤ E
sup Ah (t)
0≤t≤T
=E +E
sup kθt − k
0≤t≤T t 0
Xh (τ )dτ
sup kθt
0≤t≤T
sup k
0≤t≤T 0
Xh (τ )dτ −− 0 −→
T →0
≤ kθT + k · E
sup Xh (t)
0≤t≤T
bounded
·T
In order to check if the first term E
sup Mh (t)
0≤t≤T
→ 0, we square it and use Doob’s
inequality (since Mh (t) is a martingale) to have
2
E
sup Mh (t)
0≤t≤T
≤E
sup Mh (t)2 ≤ 4E Mh (T )2
0≤t≤T
Thus we have to derive E [Mh (T )2 ]. Since Mh (T ) = Xh (T ) − Xh (0) − Ah (T )
T
= Xh (T ) −Xh (0) − kθT + k
0
stochastic
Xh (τ )dτ
stochastic
45
Squaring, taking expectation over the stochastic terms, we come to E [Mh (T )2 ] = E (Xh (T ) − Xh (0))2 + (kθT )2
(A) T 2
−− 0 −→
T →0
+k E
0
2
Xh (τ )dτ
(D) T
−2kθT E[Xh (T )] − Xh (0)
(B) T
+2k E Xh (T )
0 (E)
Xh (τ )dτ −2k(Xh (0) + kθT ) E
0
Xh (τ )dτ
(C)
In the following we deal with the expected terms (A) − (E) as above to see all of them tend to 0 as T → 0. 1. term (A) E [(Xh (T ) − Xh (0))2 ] = = E [Xh (T )2 ] − 2Xh (0)E [Xh (T )2 ] + Xh (0)2 σ2 2 (1 − e−2kT ) + [θ + (Xh (0) − θ)e−kT ] 2k −2Xh (0)[θ + (Xh (0) − θ)e−kT ] + Xh (0)2
− − 0 + [θ + (Xh (0) − θ)]2 − 2Xh (0)[θ + (Xh (0) − θ)] + Xh (0)2 = 0 −→ 2. term (B) E[Xh (T )] − Xh (0) = θ + (Xh (0) − θ)e−kT − Xh (0) − − 0 −→ 3. term (C)
T T T →0
T →0
E[
0 T
Xh (τ )dτ ] =
0
E[Xh (τ )]dτ (Xh (0) − θ) (1 − e−kT ) k
2
=
0
[θ + (Xh (0) − θ)e−kτ ]dτ = θT +
−− 0 −→
T →0
4. term (D) From the Cauchy-Schwartz inequality : with a = 1 and b = Xh (τ )
T 2 T T T (a 0
· b)dτ
≤
T 0
a2 dτ ·
T 0
b2 dτ
1 · Xh (τ )dτ
0 T 2
≤
0
12 dτ
0 T
Xh (τ )2 dτ −− 0 −→
T →0
E[
0
Xh (τ )dτ
] ≤ T·
0
E[Xh (τ )2 ]dτ
46
5. term (E) From the Cauchy-Schwartz inequality : E[(AB)2 ] ≤ E[A2 ]E[B 2 ] with A = Xh (τ ) and B =
T 0
Xh (τ )dτ
T 2 T 2
E[Xh (T )
0
Xh (τ )dτ ]
T
≤ E[ Xh (T )
0 2
Xh (τ )dτ
]
≤ E[Xh (T ) ] ·
2
2
E[
0 T
Xh (T )dτ
] −− 0 −→
T →0
≤ (E[Xh (T ) ]) · T
0
E[Xh (T )2 ]dτ
We also note that, all the above treatments are independent of h. Thus we see E [Mh (T )2 ] → 0. Therefore, we complete the proof of our original claim. From above we see the tightness condition is satisfied for the MMPP Xh (t) as h → 0. Together with the convergence of finite dimensional distribution presented in the previous section, it is sufficient to conclude that Xh (t) converges weakly to X(t). More importantly, from weak convergence we also have the convergence of the path-dependent expectations. We summarize the direct result below. Proposition 2.7.4. {Xh (t)} − {X(t)}. Also, for a continuous path-dependent → function g(·), we have E[g({Xh (t)})] −→ E[g({X(t)})]
T 0
W
as
h→0
T 0
Perhaps the most important example of path dependent functions in the text of credit risk (or interest rate) modeling is e−
X(t)dt
, since E[e−
X(t)dt
] is the formula
to evaluate the survival probability (or the zero coupon bond price). From the above general result, we know E[e−
T 0
Xh (t)dt
] −→ E[e−
T 0
X(t)dt
]
as h → 0
Furthermore, as to its convergence order, we extend the argument for the expectations of finite dimensional functions to see the order is preserved when the expectation is taken over an infinite dimensional path-dependent function. The result is presented as follows. Proposition 2.7.5. For a continuous function on paths g(·) : D[0, T ] → R (D stand for the set of right continuous functions with left limits) which are nice enough to be expressed by g({x(t), 0 ≤ t ≤ T }) = lim g (n) (x1 , · · · , xn )
n→∞
47
where (x1 , · · · , xn ) is an equally spaced partition for {x(t), 0 ≤ t ≤ T } i.e. xi = x(i∆t), i = 1, · · · , n, and ∆t = T /n, and g (n) (·) : Rn → R is a n-step approximating function to g(·). Assuming both E[g({Xh (t)})] and E[g({X(t)})] exist, we have E[g({Xh (t)})] − E[g({X(t)})] = h2 · Zh where Zh = B0 + h2 · B2 + h4 · B4 + · · · Proof. This is the infinite dimensional version of the Proposition 2.6.1. Therefore, basically, the procedure to prove the claim is similar, except we need to consider the modifications on the earlier results when we are looking at the case ∆t → 0. Consider the random variables Xh (t + ∆t) (and X(t)) conditioning on Xh (t) (and X(t)), we wish to claim that, when ∆t → 0 (all the high order terms of ∆t ignored), all the error terms (which are of the form h2 · Sh ) in one dimension case should be expressed as h2 ·Zh ∆t, i.e. an extra ∆t would appear in these terms. More specifically, to see where it comes from, recall Proposition 2.3.5 (in the proof), we see
n E[Xh (t + ∆t)|Xh (t) = x] − E[X n (t + ∆t)|X(t) = x] = h2 · Sh,1 (x)∆t = h2 · Zh (x)∆t (n) (n) (g) (g) (g) (g) (g)
(the notation Sh,1 (x) is changed to Zh (x)) If we do not take expectations over Xh (t) (and X(t)) and just keep the conditioning, then all the subsequent results will get an extra ∆t involved in their error terms. Namely, for the characteristic functions, we have ΦXh (t+∆t)|Xh (t)=x (α) − ΦX(t+∆t)|X(t)=x (α) = h2 · Zh (x, α)∆t Also, for a general function g(·), we have E[g(Xh (t + ∆t))|Xh (t) = x] − E[g(X(t + ∆t))|X(t) = x] = h2 · Zh (x)∆t where Zh (x) = B0 (x) + h2 · B2 (x) + h4 · B4 (x) + · · · Now we turn to look at our claim. Fixed T , consider the interval [0, T ] which is chopped into n slots indexed by i = 1, · · · , n. Let ti = i∆t where ∆t = T /n. When ∆t is sufficiently small (n is sufficiently large), from the above reason, the one dimensional recursive formula in Proposition 2.6.1 can be modified as
h E[fi (Xih )|Xi−1 = xi−1 ] = E[fi (Xi )|Xi−1 = xi−1 ] + h2 · Zh,i−1 (xi−1 )∆t (g) (g) (g) (g) (g) (Φ)
(n)
(n)
48
where Zh,i−1 (xi−1 ) = Bi−1,0 (xi−1 ) + h2 · Bi−1,2 (xi−1 ) + h4 · Bi−1,4 (xi−1 ) + · · · and fi (·) is a one-variable function, for i = 1, · · · , n. Now we follow the definitions and repeat the steps in the proof of Proposition 2.6.1, we have for general i
n−1
Gh (x1 , · · · i−1
, xi ) = Gi−1 (x1 , · · · , xi )+h ·Zh,i−1 (xi−1 )∆t+h ·
j=i
2
2
h h E[Zh,j (Xj )|Xi−1 = xi−1 ]∆t
When the procedure is finished at i = 1, we have
h h E[g (n) (X1 , · · · , Xn )] n−1
=
Gh 0
=
G0 + h · Zh,0 (x0 )∆t + h ·
j=1 n−1
2
2
h E[Zh,j (Xj )]∆t
= E[g (n) (X1 , · · · , Xn )] + h2 ·
j=0
h E[Zh,j (Xj )]∆t n−1 n−1 h E[Bj,0 (Xj )] + h2 · j=0 j=0 h E[Bj,2 (Xj )] · · · ∆t
= E[g (n) (X1 , · · · , Xn )] + h2 ·
Taking limit n → ∞ on both side, we obtain E[g({Xh (t)})] = =
n→∞ h h lim E[g (n) (X1 , · · · , Xn )] n−1 n→∞
lim E[g
(n)
(X1 , · · · , Xn )] + h · lim
T
2
n→∞
h E[Zh,j (Xj )]∆t j=0
= E[g({X(t)})] + h2 ·
0
E[Zh,t (Xh (t))]dt
(g) Bl T
Similarly, we define
(g) Zh
T
=
0
E[Zh,t (Xh (t))]dt and
=
0
E[Bl,t (Xh (t))], l =
0, 2, 4 · · · , then we complete the proof.
Therefore, we see the second order convergence is preserved to such a nice class of path-dependent functions g(·). In particular, the statement is valid for the function T n g({x(t)}) = e− 0 x(t)dt . To see this, we simply define g (n) (x1 , · · · , xn ) = e− i=1 xi ∆t and clearly g({x(t)}) = e− Therefore E[e−
T 0 T 0
x(τ )dτ
= e− limn→∞
T 0
n i=1
xi ∆t
= lim g (n) (x1 , · · · , xn )
n→∞
Xh (t)dt
] − → E[e− −
h→0
X(t)dt
]
in second order
This results presented in this section, which are at the third level, has put an end to our convergence analysis. 49
2.8
Evaluation of Survival Probabilities
Now that we know the path-dependent expectations converge in second order. In credit risk modeling, this means, when default rates follow MMPP and OU, the survival probability obtained from MMPP should serve as a good approximation to that obtained from OU (with a suitable choice of h). The survival probability for the OU(k, θ, σ, x0 ) is available in the following analytic formula[1][2] P(τ > T ) = E[e− = exp (θ −
T 0
X(t)dt
] where B = 1 (1 − e−kT ) k
σ2 σ2 )(B − T ) − B 2 − x0 B 2k 2 4k
in which τ represents the time of first default. The next question is, how may we evaluate the survival probability when default rate follows MMPP? In the following we show that this can be obtained by solving a system of first order ODE. Assume the default rate follows MMPP(h, k, θ, σ, x0 ) and define pi (t) = P(τ > T, Xh (t) = xi ) i = 0, · · · , 2N Consider the infinitesimal time interval (t, t + ∆t) where ∆t → 0, at most only one event may happen during ∆t. This event can be a transition between adjacent states, or a default event. Therefore, for each state i, we may write down the recurrence equations for pi (t) as follows p0 (t + ∆t) pi (t + ∆t) = p0 (t)(1 − u0 ∆t − x0 ∆t) + p1 (t)(d1 ∆t) = pi−1 (t)(ui−1 ∆t) + pi (t)(1 − ui ∆t − di ∆t − xi ∆t) +pi+1 (t)(di+1 ∆t) p2N (t + ∆t) = p2N − 1 (t)(u2N − 1 ∆t) + p2N (t)(1 − d2N ∆t − x2N ∆t) i = 1 · · · , 2N − 1 i = 2N i=0
Letting ∆t → 0, these equations can be expressed as a system of first order ODE d p0 (t) = p0 (t)(−u0 − x0 ) + p1 (t)(d1 ) dt d pi (t) = pi−1 (t)(ui−1 ) + pi (t)(−ui − di − xi ) + pi+1 (t)(di+1 ) dt d p2N (t) = p2N − 1 (t)(u2N − 1 ) + p2N (t)(−d2N − x2N ) dt which can also be expressed in matrix form as p (t) = A · p(t) 50 i=0 i = 1 · · · , 2N − 1 i = 2N
where p(t) denotes the column vector [p0 (t), · · · , p2N (t)]T , and A stands for the following (2N + 1) × (2N + 1) matrix
−u0 − x0 u0 0 . . . . . . . . . 0 d1 0
0 0
−u1 − d1 − x1 u1 0 . . . . . . 0
d2 .. . ui−1 0 . . . 0
0 . . . 0 di+1 .. . u2N −2 0
di −ui − di − xi ui 0 0
0 . . . . . . 0 d2N −1 −u2N −1 − d2N −1 − x2N −1 u2N −1
0 . . . . . . . . . 0 d2N −d2N − x2N
Therefore, p(t) is solved as p(t) = eAt · p(0) In other words, we have to calculate the matrix exponential in order to obtain p(t). Consequently, the survival probability is then given by
2N
P(τ > T ) =
i=0
pi (T )
The default rate xi shown in the matrix A can be also seen as the transition rate to an absorbing state, i.e. the default state. In other words, apart from the original 2N + 1 states, we consider there is an extra state (indexed 2N + 1) which represents default and this state is essentially absorbing. In this viewpoint, we have a total of 2N + 2 states, and all pi (t) can be considered as the time t state probabilities, i.e. pi (t) = P(Xh (t) = xi ), i = 0, · · · , 2N + 1, and they sum to 1. In particular, p2N + 1 (t) = P(Xh (t) = default) = P(τ < t). Therefore, the time T suvival probability is the probability that the Markov Chain is not yet in the absorbing state at time T , i.e. P(τ > T ) = 1 − p2N + 1 (T ) =
i=0 2N
pi (T )
which agrees with the above formula. Our numerical results are given in Fig. 2.5 ∼ Fig. 2.8. Two parameter sets of OU processes are used: OU-1 OU-2 (k, θ, σ, x0 ) = (0.2, 5.0, 1.0, 5.0) (k, θ, σ, x0 ) = (0.2, 0.5, 0.1, 0.5)
where OU-1 is just an demonstrative example and OU-2 is a more realistic one. Note that default is a rare event thus in general the default rate of a normal business entity 51
shouldn’t be too high. In OU-1, the long-run mean θ = 5.0 (defaults/year) indicates that the average default rate is 5 times per year, which is apparently too high for a normal defaultable entity. It is just used to cover the extreme cases with very high default rates. On the other hand, OU-2 has a more reasonable θ = 0.5 (defaults/year) indicating that on the average one default is expected to happen during two years time (although this is also a bit high). For convenience, we pick x0 = θ. In addition, the parameter sets are selected such that the following condition σ θ > n√ 2k n≥1
is satisfied. This is a condition suggested in [13] to ensure the probability for an OU process to go negative is sufficiently low. A greater n implies a lower probability of going negative.
Figure 2.5: Survival Probabilitiy term structure for OU-1 (a) t = 0 ∼ 2 (b) Close-up at t = 1.5 ∼ 2
Figure 2.6: Survival Probabilitiy term structure for OU-2 (a) t = 0 ∼ 10 (b) Close-up at t = 9 ∼ 10 52
Figure 2.7: Convergence in survival Probability for OU-1 (a) Error v.s. N (b) Error v.s. h
Figure 2.8: Convergence in survival Probability for OU-2 (a) Error v.s. N (b) Error v.s. h In these numerical examples, we see that the MMPP approximation provides a good accuracy. The survival probability term structure for both OU examples are given in Fig. 2.5 (OU-1) and Fig. 2.6 (OU-2). In Fig. 2.5(a), we see the two curves nearly coincide with each other and it’s a bit difficult to tell them apart. Fig. 2.5(b) give a close-up at t = 1.5 ∼ 2 years, where the two curves are very close to each other even in such a small order of magnitude (10−4 ). The close agreement can be also observed in Fig. 2.6. Note that in both cases, N = 5, meaning that such a good approximation is obtained when only 2N + 1 = 11 discrete states are used. Fig. 2.7 (OU-1) and Fig. 2.8 (OU-2) is about the convergence of error in the survival probability as N → ∞ or h → 0. It can be seen in Fig. 2.7(a) and Fig. 2.8(a) that the error converges quite fast to zero as greater N is used. Even if smaller N is used, the approximation has been satisfactorily good. Second order of convergence 53
can be seen more clearly in Fig. 2.7(b) and Fig. 2.8(b), where the horizontal axis is h. If we compare the two points h = 1 and h = 2 in Fig. 2.7(b) (or h = 0.1 and h = 0.2 in Fig. 2.8(b)), the error in h = 2 is roughly 4-fold higher than in h = 1. Furthermore, in general the greater t is, the larger the error is, since the error will accummulate as t turns greater. But it is not necessarily the case, see t = 1.0 and t = 1.5 in Fig. 2.7. This is because with greater t, the survival probability becomes lower (default probability becomes higher), so does its error. When the survival probability is not in a comparable order of magnitute, it may be the case that larger t brings lower error.
2.9
Concluding Remarks
In this chapter, we analyze the convergence behaviour of the moment matched MMPP to its original OU process. The central conclusion can be given by this simple statement: under moment matching, the MMPP converges weakly to the OU, and the expectations converge in second order. The results can be presented in three levels. Level 1 concerns a single time horizon. The main results are 1. Convergence in distribution, i.e. Xh (t) − X(t), ∀t. → 2. The first two moments match exactly, i.e. E[Xh (t)] = E[X(t)] 2 E[Xh (t)] = E[X 2 (t)] This holds for any sensible choice of h. 3. The third and fourth moments converge in second order, with a neat formula as (3) (3) Rh (t) = h2 · R1 (t) Rh (t) = h2 · R1 (t) In particular, at steady state, we have
t→∞ (4) (4) D
lim Rh (t) = 0,
(n)
n = 1, 2, 3
4. The higher moments converge in second order, with a slightly weaker formula as Rh (t) = h2 · Sh (t)
(n) (n)
54
5. Second order convergence on general expectations, i.e. E[g(Xh (t))] − E[g(X(t))] = h2 · Sh (t) Level 2 concerns multiple instants along time axis. The main results are 1. Convergence of finite dimensional distribution, i.e. (Xh (t1 ), · · · , Xh (tn )) − (X(t1 ), · · · , X(tn )) → 2. Second order convergence on finite dimensional expectations, i.e. E[g(Xh (t1 ), · · · , Xh (tn ))] − E[g(X(t1 ), · · · , X(tn ))] = h2 · Sh Level 3 concerns the whole time axis. The main results are 1. Weak convergence, i.e. {Xh (t)} − {X(t)}. → 2. Second order convergence on expectations of functions on paths, i.e. E[g({Xh (t)})] − E[g({X(t)})] = h2 · Zh In particular, E[e−
T 0
(g)
D
(g)
W
(g)
Xh (t)dt
] −→ E[e−
T 0
X(t)dt
] in second order.
55
Chapter 3 Default Term Structure in Advanced Models
3.1 Introduction
In this chapter, we use MMPP discretization to analyze the default term structure of advanced models that are generalized from OU processes. From Chapter 2, we have seen a moment matched MMPP converges weakly to its original OU, and the expectations converge in second order. In credit risk modeling (reduced form approach), the most relevant expectation is the formula for survival probability, which is the center quantity in this chapter. The relation about how survival (default) probabilities vary with time is known as default term structure, which is usually expressed by the time function of survival probabilities, i.e. {P(t), ∀t} in which P(t) = P(τ ≥ t) = E[e−
t 0
λ(s)ds
]
where τ is the (first) default time and λ(t) the default rate. As we have mentioned, the purpose of MMPP discretization is not just to develop another method to calculate survival probabilities for OU, but to use this technique to deal with more advanced models that incorporate more complicated characteristics. The analytic (closed-form) solution for survival probability exists for only fundamental models, such as OU and CIR[1][2][13]. It is often the case that a generalization or extension of such fundamental models makes them no longer analytically tractable. In such cases, we usually have to turn to numerical methods like Monte-Carlo. The MMPP discretization actually gives another viable approach to deal with these models and obtain their default term structure. When the state space of such a general diffusion model is discretized, it is turned into a (continuous time) Markov Chain,
56
and the survival probabilities can be obtained from some matrix computations like matrix exponential. In this chapter, we will develop a few classes of advanced models. The first class is a direct extension from OU, in which the drift and volatility are made state dependent. In fact, the MMPP technique can be applied to those processes with general (state dependent) drift. For processes with general volatility (such as CIR), we need to first express it as a function of a constant volatility process and then discretize it indirectly. The second class of processes that are suitable to handle in the Markov Chain setting is the jump diffusion processes[9][18][15]. This is because in an MMPP, the diffusion behaviour is approximated by small jumps between adjacent states, thus a real jump can be seen a state transition across a couple of adjacent states. We will discuss the state dependent jumps added to OU and CIR processes. Next, we turn to look at the third class termed switched diffusion processes, where the process state may switch between two or more diffusion processes. Two switching rules are considered. The first one is that a switching happens according to a background Markov Chain, while the second is that a switching is caused by hitting a prespecified threshold. Generalized from the switched models, we move on to discuss a model with memory effect. This type of models is aiming to capture the phenomenon that, when the default rate has once been to a risky area, then it is more likely to go back, until it recovers to normal after a sufficiently long period of time. Apart from these models with fixed parameters on time, the MMPP technique is in fact also suitable for the analysis of models with time-varying parameters. The idea is to decompose such models into two parts. One is a stochastic process without time-varying parameters, and the other is a pure time function. Therefore, with a minor treatment, their default term structure can be obtained. Unlike the previous chapter, this chapter is about the problem formulations instead of complete research. In each type of models we intend to study, only main ideas and prospective approaches are addressed. Namely, this chapter is considered as a research plan and further analytical and numercial works will be carried out in the future. This chapter is currently organized as follows. Section 2 applies MMPP to extended models with general drift and volatility. Section 3 incorporates jumps and discusses the jump diffusion processes. Switched diffusion processes are discussed in Section 4. In Secton 5, we deal with the models with time-varying parameters. Finally, concluding remarks are given in Section 6. 57
3.2
Models Extended from OU
In this section, we consider the models that are directly extended from OU, allowing the drift and volatility to be made dependent on the state values. Consider the following general diffusion process dX(t) = µ(X(t))dt + σ(X(t))dW (t), X(0) = x0
OU process is clearly the case when µ(x) = k(θ − x) and σ(x) = σ. The tractability of OU mainly comes from its constant volatility, making X(t) Gaussian for any t. The mean-reverting property can be seen from the drift term that µ(x) > 0 for x < θ and µ(x) < 0 for x > θ. We first discuss the extension on drift and keep volatility constant. In OU, the momentum of X(t) to be pulled back toward θ is proportional to its distance from θ. In this section we generalize the drift µ(x) to have nonlinear relation on x, but still keep it mean-reverting. Because of its Gaussian nature that comes from its constant volatility, the first two moments across an infinitesimal time interval can be obtained without difficulty, thus we may come straight to a generalized version of the MMPP moment matching formula. The objective of this study is to see how the nonlinearity affects its default term structure. The second extension is to allow nonconstant volatility of this form σ(x) = σxβ and keep drift constant. This can be done through a function between two diffusion processes (the target and underlying processes). Then Ito’s lemma is used to on this function characterize their relations. We will see that, for any diffusion with nonconstant volatility of this kind, it may be expressed as a function of another diffusion process with constant volatility, which is to be discretized to MMPP.
3.2.1
Processes with Generaized Drift
The diffusion process with generalized mean-reverting drift is expressed as dX(t) = µ(X(t))dt + σdW (t) where µ(x) > 0 x < θ µ(x) < 0 x > θ
Due to its constant volatility, when time t state is given, the variance across an infinitesimal time interval ∆t is V[X(t + ∆t)|X(t) = x] = σ 2 ∆t. Therefore, the first two moments for this process across (t, t + ∆t) are E[ X(t + ∆t) |X(t) = x] = x + µ(x)∆t E[X 2 (t + ∆t)|X(t) = x] = σ 2 ∆t + [x2 + 2xµ(x)∆t] 58
Similar to the OU case, using moment matching, the approximating MMPP Xh (t) has up and down jump rates (for Xh (t) = x) as below σ 2 + h · µ(x) u(x) = 2h2 2 d(x) = σ − h · µ(x) 2h2 Note that OU is a special case when µ(x) = k(θ − x). Some basic arguments in Chapter 2 still work if we replace k(θ − x) that appear in the convergence analysis by µ(x), since the drift does not affect its Gaussian nature. It is hoped that, the conclusion about second order convergence can still apply to this general drift case (a more careful examination should be carried out). In particular, we consider the following state dependent drift function as µ(x) = sgn(θ − x) · k|θ − x|α where sgn(x) is the sign function. Shown in Fig. 3.1 is a plot of this function when θ = 0 and k = 1. Obviously, OU is the case when α = 1. For the case when α = 0, the drift does not depend on the distance from θ. For general α we have a nonlinear relation between µ(x) and x. The effects of different choice of α will be studied through numerical works.
Figure 3.1: The sample drift function
3.2.2
Processes with Generalized Volatility
Now we turn to consider the diffusion process with general volatility as below dY (t) = K(Θ − Y (t))dt + ΣY β (t)dW (t), 59 Y (0) = y0
where the drift is kept the same as in OU and the volatility is of such form as Σ · y β . Obviously, OU is the case with β = 0 and CIR is the case with β = 1/2. The reason we use Y (t) as a notation for this process is because we will use X(t) to denote another process and build up the connection between X(t) and Y (t). Let X(t) stand for a process with general (mean-reverting) drift and constant volatility (i.e. the one considered in the previous subsection) as shown below dX(t) = µ(X(t))dt + σdW (t), X(0) = x0
The idea in this subsection is to express Y (t) (termed the target process) as a function of X(t) (termed the underlying process) to obtain the desired volatility in Y (t). Write Y (t) = f (X(t)). To see how it works, we start with a special example, the CIR process. Using X(t) to denote an OU process with long-run mean θ = 0 and Y (t) is a standard CIR process that we wish to define from X(t), i.e. dX(t) = −kX(t)dt + σdW (t) Y (t)dW (t)
dY (t) = K(Θ − Y (t))dt + Σ
If we use Y (t) = X 2 (t), then from Ito’s lemma, we have dY (t) = 2X(t)dX(t) + (dX(t))2 = 2X(t) = 2k = 2k − kX(t)dt + σdW (t) + σ 2 dt σ2 − X 2 (t) dt + 2σX(t)dW (t) 2k σ2 − Y (t) dt + 2σ 2k Y (t)dW (t)
σ2 , and Σ = 2σ. However, 2k there is not sufficient degree of freedom in the selection of CIR parameters (K, Θ, Σ), Namely, Y (t) become a CIR process with K = 2k, Θ = since there are only two parameters k and σ in OU. This indicates, we can not use such method to define a CIR with arbitrary parameters from an underlying OU (even if θ in this OU is nonzero). In order to obtain a CIR with any sensible parameter set, we use a general drift term on the underlying X(t) in the above problem, i.e. dX(t) = µ(X(t))dt + σdW (t) Y (t)dW (t)
dY (t) = K(Θ − Y (t))dt + Σ
60
With the relation Y (t) = X 2 (t), we repeat the above procedure to come to dY (t) = 2X(t)dX(t) + (dX(t))2 = 2X(t) µ(X(t))dt + σdW (t) + σ 2 dt = Also the target process is dY (t) = = K(Θ − Y (t))dt
2
2X(t)µ(X(t)) + σ 2 dt + 2σX(t)dW (t)
+ Σ
Y (t)dW (t)
KΘ − KX (t) dt +
ΣX(t)dW (t)
Matching the drift and volatility to give 2X(t)µ(X(t)) + σ 2 = KΘ − KX 2 (t) and 2σ = Σ
Therefore, for a desired CIR parameter (K, Θ, Σ), the drift and volatility of the underlying process are µ(x) = KΘ Σ2 − 2 8 x−1 − K x 2 and σ= Σ 2 (3.1)
Shown in Fig. 3.2 is a plot of µ(x) when K = 1, Θ = 1, Σ = 1. Now we discuss
Figure 3.2: The drift function µ(x) corresponding to CIR process whether this form of drift µ(x) makes X(t) a stable (mean reverting) process. Looking at the drift term we observe that, with large value of |x| (X(t) = x), the first order K term − x will dominate the drift. From the negative coefficient in the dominant 2 term, we see X(t) will be prevented from going far away (→ ±∞), thus in a sense it is mean reverting. Another angle is to look at the relation between X(t) and Y (t). According to Y (t) = X 2 (t), since Y (t) is mean-reverting and is prevented from going 61
far away, so is X(t). That is to say, the mean-reverting property of Y (t) can be carried over to X(t). Also we notice from Fig. 3.2 that, when x → ±0, the drift is dominated by the x term and µ(x) → ±∞. This means, when the process X(t) starts from either
−1
the x > 0 or x < 0 part, it would always stay in that part and never go to the other part, since the large drift will pull it back when X(t) → ±0. However, this does not matter from Y (t) point of view, because either X(t) ∈ (0, ∞) or X(t) ∈ (−∞, 0) will make Y (t) well defined. Our MMPP discretization for Y (t) is made on the underlying process X(t) instead of Y (t), as the convergence analysis can be only applied to a process with constant volatility. When X(t) is discretized into Xh (t), Xh (t) is expected to capture the dynamics of X(t). Since Y (t) = f (X(t)) where f (·) is a sufficiently nice function, f (Xh (t)) is also expected to capture the dynamics of f (X(t)). Thus we may define Yh (t) = f (Xh (t)), which is the indirect discretization of Y (t). From Chapter 2, we see that {Xh (t)} − {X(t)} →
W
=⇒
{Yh (t)} = {f (Xh (t))} − {f (X(t))} = {Y (t)} →
W
although there should be some technical conditions on f (·) for the above to be true. In summary, for a target process with non-constant volatility Y (t), its discretization Yh (t) is done on its underlying process X(t) which is of constant volatility. Through the function f (·), the weak convergence in Xh (t) is carried over to Yh (t). From Y (t) point of view, Yh (t) can be seen as a discretization with nonuniform step size. The survival probabilities when default rate follows Yh (t) can be easily obtained as described below. In the underlying discretized process Xh (t), each state i has an up jump rate ui , a down jump rate di . When the default rate follows Xh (t), we have a third transition rate (going to the absorbing state representing default) that is default rate xi . Since now the default rate actually follows Y (t) = f (X(t)), we simply replace the original default rate in each state xi by yi = f (xi ) and keep each ui and di intact, see Fig. 3.3. Namely, ui and di are used to describe the dynamics of both Xh (t) and Yh (t). Following the procedure in Chapter 2, we can calculate the survival probability from matrix computations. CIR is just a special case. For the target process Y (t) with general volatility, we may repeat the procedure to find its underlying process X(t). Again, consider the
62
Figure 3.3: The difference between Xh (t) and Yh (t) following pair of processes dX(t) = µ(X(t))dt + σdW (t)
dY (t) = K(Θ − Y (t))dt + ΣY β (t)dW (t) Let us take Y (t) = X a (t) where a ∈ R. Using Ito’s lemma on Y (t), we have dY (t) = aX a−1 (t)dX(t) + a(a − 1) a−2 X (t) · (dX(t))2 2 a(a − 1) a−2 = aX a−1 (t) µ(X(t))dt + σdW (t) + X (t) · σ 2 dt 2 a(a − 1) 2 a−2 = aX a−1 (t)µ(X(t)) + σ X (t) dt + aσX a−1 (t)dW (t) 2
This should be equal to the SDE for Y (t) as shown below. dY (t) = = K(Θ − Y (t))dt + ΣY β (t)dW (t)
KΘ − KX a (t) dt + ΣX aβ (t)dW (t)
Matching the drift and volatility yields aX a−1 (t)µ(X(t)) + a(a − 1) σ 2 X a−2 (t) = KΘ − KX a (t) 2 a−1 aσX (t) = ΣX aβ (t) Therefore, we have a= 1 1−β σ = Σ(1 − β) (3.2)
KΘ 1−a K (a − 1) 2 −1 µ(x) = x − x− σ x a a 2 63
which indicates for any sensible parameter set (K, Θ, Σ, β) of Y (t) whenever β = 1, the underlying X(t) can be constructed through this formula. For the case when β = 1, we shall use Y (t) = eX(t) . Once again, we see dY (t) = eX(t) µ(X(t))dt + σdW (t) + eX(t) · σ 2 dt = eX(t) µ(X(t)) + eX(t) · σ 2 dt + σeX(t) dW (t) = K(Θ − Y (t))dt + ΣY (t)dW (t) Thus we have the formula for β = 1 as σ=Σ µ(x) = KΘe−x − K − σ 2 (3.3)
As a summary of this section, the MMPP discretization technique can be actually applied to more general mean reverting process while the weak convergence property is still preserved. We shall use numerical examples to validate our arguments.
3.3
Jump Diffusion Models
In this section, we discuss another type of extension from the fundamental diffusion processes, the jump diffusion process. Since in MMPP discretization we use small jumps to approximate diffusion dynamics, a real jump is considered as a state transition across a couple of states and thus can be easily incorporated in the Markov Chain setting. In the literature of jump diffusion process[5][4], the jump part is often described by an Poisson process with an arrival rate independent of the diffusion process, such that it has more tractability and easier to perform Monte-Carlo simulation. However, when an MMPP is used, we may allow the dependence between the diffusion and the jump parts. This is because we may specify the individual jump behaviour for each single state when we are constructing the Markov Chain. The processes we will consider in this section are OU and CIR processes with state dependent jumps. The diffusion part is discretized as an MMPP first and the jump part will be added to this MMPP. OU can be discretized directly but CIR is done indirectly by the procedure given in the previous section. If fact, more general diffusion can be considered using the indirect discretization. Without loss of generality, the CIR case will be studied, and more general cases can be treated in a similar way.
64
3.3.1
OU-based Jump Diffusion
We first consider the jump diffusion processes where the diffusion part is an OU process, i.e. dX(t) = k(θ − X(t))dt + σdW (t) + dJ(t, X(t)) where the jump part J(t, X(t)) is a process with piecewisely constant sample paths. To be more precise, J(t, X(t)) can be expressed as
M (t)
J(t, X(t)) =
j=1
Z(X(τj ))
where M (t) is a counting process which counts the number of jump arrivals in [0, t], τj , j = 1, 2, · · · are the j-th jump arrival times, and Z(X(τj )) are the jump size of the j-th jump occuring at τj . Note that M (t) and τj are related by M (t) = sup{n : τn ≤ t} Let ν(x) denote the Poisson jump arrival rate when the process state is X(t) = x. Based on these notations, such a state dependent jump process is then completely characterized by 1. The Poisson arrival rate of jump ν(x) 2. The distribution of jump size Z(x) which are both dependent on process state X(t) = x. In principle, Z(x) can be any discrete or continuous distribution taking positive values, such as Poisson, lognormal, and etc. For simplicity, we first consider deterministic jump size. One key condition for a process to be well described by an MMPP is mean reverting. Namely, a finite state Markov Chain is sufficient to capture its behaviour. To make the MMPP discretization work for jump diffusion processes, we have to assume that the jump part is not too harmful such that the overall process can be still well described by a finite number of states. In other words, the overall process should be assumed to be still mean-reverting. The question remains to ask is how can we construct the MMPP for the overall process? Consider the first case that jump is added to OU, firstly we have to discretize the OU diffusion part. This is done through the moment matching formula given in Chapter 2. Let (h, k, θ, σ, x0 ) denote the MMPP parameters, and let xi , i = 0, · · · , 2N represent the discrete state values. For each state i, the moment matching gives the 65
formulae for its up and down jump rates, ui and di . Obviously, in each state i, there are three state change events that may occur in an infinitesimal time interval, i.e. jump up with rate ui , jump down with rate di , and default with rate xi . To incorporate the jump part, we should allow more state changes to happen.
Figure 3.4: Incorporating a jump into MMPP (a) The desired jump rate and size (b) Added into MMPP Assume the state dependent jump rate is νi and jump size is a deterministic constant zi . Then a jump means a state change from xi to xi + zi with a rate νi . Let us first consider the case xi + zi ≤ x2N , which means the jump does not make the process go outside the original upper boundary of discretized OU. Also because xi + zi does not necessarily happen to be a state value, i.e. xj ≤ xi + zi ≤ xj+1 (see Fig. 3.4(a)), when incorporating into MMPP, such a jump is then regarded as stage changes from state i to both state j and j + 1 with rates summing to νi (see Fig. 3.4(b)). Consequently, we have to determine the state change rates from i to j and j + 1, νi,j and νi,j+1 , which satisfy νi,j + νi,j+1 = νi νi,j νi,j+1 (xj − xi ) + (xj+1 − xi ) = zi νi νi Thus νi,j and νi,j+1 can be easily obtained. In the cases where the jump may take a value outside the original upper boundary x2N , we need to have some further treatment. Let us assume we need to add extra L states (indexed 2N + 1, · · · , 2N + L) with the same step size h such that all jumps from states 0 to 2N won’t go beyond the new upper boundary x2N +L . The L states form the extended area, as shown in Fig. 3.5(a). In addition, we have to assume that there is no jump occuring when the process is in the extended area. The transition 66 (3.4)
rates (which are related to diffusion part) for these extra states should be determined through moment matching. However, since these states are of higher values than x2N , from the moment matching formula we are unable to obtain sensible up jump rates (they will become negative). But we know with such higer state values, these states have a stronger momentum to move toward its long-run mean. Therefore, we simply assume there is no up jump event for these states, i.e. ui = 0, i = 2N + 1, · · · , 2N + L. Instead, we allow the process to move down not just to the lower adjacent state (i−1), but also to state i − 2, as shown in Fig. 3.5(b). Let di and qi denote the rates from state i to state i − 1 and i − 2, where i = 2N + 1, · · · , 2N + L, the moment matching says they should satisfy (1 − di ∆t − qi ∆t)x + di ∆t(xi − h) + qi ∆t(xi − 2h) = xi + k(θ − xi )∆t (1 − di ∆t − qi ∆t)x2 + di ∆t(xi − h)2 + qi ∆t(xi − 2h)2 = σ 2 ∆t + [x2 + 2x k(θ − x )∆t]
i i i
Thus we obtain di and qi , i = 2N + 1, · · · , 2N + L.
Figure 3.5: (a) The extra states and extended area (b) The transition rates for a state in the extended area The assumption that no jump may happen to the extended states is in order to fit the jump diffusion process into our Markov Chain setting. In fact, it is acceptable when the jump part is not that harmful. This can be explained as follows. In the original OU, the state can only move outside x2N with a rather small probability, so the extra states do not play a role as important as the original ones. Jumps are actually the major reason to drive the overall process to the extended states. Since the extra states have relatively stronger driving forces to move downward, the down jump rates di and qi , i = 2N + 1, · · · , 2N + L are relatively greater. Also because 67
jumps are often used to describe rare events, the jump rate is in general lower. If the jump rates at extended states νi , i = 2N + 1, · · · , 2N + L, are much smaller compared with di and qi , then they are neglectable and it can be seen as there is no jump at those states. However, such assumption need to be validated through numerical results. Thus far our discussion is aiming at fixed jump size. When Z(xi ) is no longer deterministic and assumed to follow a specific distribution (either discrete or continuous), we should (re)-discretize the distribution in such a way that Z(xi ) = nh (or xi +Z(xi ) = xj , i < j ≤ 2N +L), which indicates a jump can only drive the process to another discrete state points. Assuming that, after re-discretization the probabilities that the jump size is nh is pi,j = P(Z(xi ) = nh = yj − yi ) Therefore, the jump rate from state i to j is νi,j = νi · pi,j , ∀i, j, j>i and
∀j>i
∀i, j,
j>i
νi,j = νi
As a result, the jump rates to all the other states νi,j , i < j ≤ 2N + L can be well defined.
3.3.2
CIR-based Jump Diffusion
This subsection considers the following jump diffusion process where the diffusion part is a CIR. dY (t) = K(Θ − Y (t))dt + Σ Y (t)dW (t) + dJ(t, Y (t))
The basic principle is the same as the OU-based case, except it needs more steps in constructing the Markov Chain for the CIR diffusion part. This is because the CIR part should be discretized indirectly through its underlying process. As defined above, Y (t) is the target jump diffusion process, and the underlying jump diffusion process X(t) is shown as dX(t) = k(θ − X(t))dt + σdW (t) + dJ (t, X(t)) ¯ ¯ which satisfies Y (t) = X 2 (t). We further use X(t) and Y (t) to represent the OU and CIR part of X(t) and Y (t), namely ¯ dX(t) = ¯ k(θ − X(t))dt + σdW (t) Y (t)dW (t)
¯ ¯ dY (t) = K(Θ − Y (t))dt + Σ 68
¯ ¯ ¯ ¯ where the CIR process Y (t) is related to its underlying process X(t) by Y (t) = X 2 (t). ¯ ¯ Discretizing X(t) we obain the MMPP Xh (t) with up and jump rates ui and di , thus ¯ ¯ ¯2 the discretization of Y (t) is indirectly defined by Yh (t) = Xh (t). ¯ ¯ ¯2 Let {xi , ∀i} be all the state values in Xh (t). Since Yh (t) = Xh (t), these state values ¯ ¯ for Yh (t) become {yi , ∀i} where yi = x2 . From Yh (t) point of view, there are three
i
state changes that may happen at state i, i.e. up jump with rate ui , down jump with ¯ rate di , and default with rate yi = x2 . Now we incorporate jumps into Yh (t) to form i Yh (t). Firstly, we have to assume that L extra equally spaced states x2N1 , · · · , x2N +L ¯ can be added to Xh (t) such that all the jumps from Yh (t) point of view will not drive the overall process Yh (t) out of the new upper bound y2N +L = x2 +L . Note 2N that these extra states are of nonuniform step size from Yh (t) point of view since yi = x2 , 2N + 1 ≤ i ≤ 2N + L. i
Figure 3.6: The treatment of jumps in the dimension of Yh (t) Suppose at each state i, the jump arrival rate is νi and jump size is a deterministic ¯ value zi . An occuring of a jump from state i means Yh (t) is changing from yi to the nearest state values to yi +zi , since jumps have to happen between the discrete states. ¯ Notice that we are now incorporating jumps into the Yh (t) (or Yh (t)) domain instead ¯ of the Xh (t) (or Xh (t)) domain. The relation of a jump in both Xh (t) and Yh (t) domains is shown in Fig. 3.6. Therefore, we have to determine two adjacent states j and j + 1 such that x2 = yj ≤ yi + zi ≤ yj+1 = x2 j j+1 =⇒ xj ≤ √ yi + zi ≤ xj+1
Following similar procedure, we may determine νi,j and νi,j+1 , the jump rates from
69
state i to state j and j + 1, which satisfies νi,j + νi,j+1 = νi νi,j νi,j+1 (yj − yi ) + (yj+1 − yi ) = zi νi νi ¯ When all the jumps are added into Xh (t), then we have constructed Xh (t). At the same time, Yh (t) is also well defined. For general jump size distribution, the principle to incorporate them is the same. ¯ For each state i in Yh (t), the chosen jump size distribution should be first re-discretized on the nonuniformly spaced points {yj − yi , j > i} so that Z(yi ) may satisfy yi + Z(yi ) = yj , ∀j > i. Through the re-discretization we obtain the probabilities that the jump size is yj − yi as pi,j = P(Z(yi ) = yj − yi ) Therefore, the jump rate from state i to j is νi,j = νi · pi,j , which sum to νi . ∀i, j, j>i ∀i, j, j>i
3.4
Switched Diffusion Models
The models we have discussed thus far are one dimensional (1-D) diffusion processes (with jumps). In this section, we consider the models in which the state may switch between more than one 1-D processes. Two types of switching rules are considered. The first is that such a switching is governed by a background Markov Chain with transition rates possibly dependent on the diffusion process values. The second is that a switching is triggered by hitting a prespecified threshold. Both these types of switched models are suitably to be formulated by MMPP.
3.4.1
Switching Caused by a Background Markov Chain
This subsection considers the first case where a switching happens according to a background Markov Chain. To illustrate the idea, let us consider a continuous time Markov Chain I(t) toggling between two states (good and bad), as shown in Fig. 3.7(a). The states can be used to reflect, for example, the market status, i.e. the market is in either good or bad state. When it is in the good state (state 1), we 70
assume the default rate follows an OU process X1 (t). On the other hand, when it is in the bad state (state 2), the default rate is described by another OU X2 (t). Therefore, the overall default rate is actually an OU with its parameters dependent on the state of the background Markov Chain. When the diffusion process in each state is discretized, the overall process becomes a two-layer Markov Chain, as shown in Fig. 3.7 (b).
Figure 3.7: (a) Background Markov Chain (b) The overall 2-layer MMPP Let ri denote the transition rate for the background state I(t) to switch from state i to state 2 − i, i = 1, 2. Assume that the OU processes have only different long-run mean θ1 and θ2 , with other parameters kept the same, as dX1 (t) = k(θ1 − X1 (t))dt + σdW1 (t) dX2 (t) = k(θ2 − X2 (t))dt + σdW2 (t) where W1 (t) and W2 (t) are independent. Since state 1 is seen as a better state than state 2, we simply let θ1 < θ2 . Let X1,h (t) and X2,h (t) represent the MMPP that are discretized from X1 (t) and X2 (t). Assume both Xi,h (t) takes values at the same N discrete points xj , j = 1, · · · , N , and when Xi,h (t) = xj , its up and jump rates are ui,j and di,j . (A further treatment like the extended states in jump diffusion process needs to be used for these two MMPPs to have the same state value sets.) Combine these two MMPPs, we have the overall two-layer MMPP, which is specified by the state variables (I(t), J(t)), where I(t) = 1, 2 is the index of background state and J(t) = 1, · · · , N is the index of diffusion state. In this two-layer MMPP, when the background state is I(t) = i, the diffusion behaviour follows the Xi,h (t) moving between the state values x1 , · · · , xN . When J(t) = j, it indicates the diffusion state 71
value at time t is xj . From the two-layer MMPP point of view, for the process state (I(t), J(t)) = (i, j), the transition rates of moving in the diffusion (vertical) direction are ui,j and di,j , which are directly from the one-layer process Xi,h (t). The default rate at state (i, j) are also defined as from Xi,h (t) as 2-layer viewpoint default rate xi,j = 1-layer viewpoint xj
where the notation xi,j is used to distinguish from xj . On the other hand, in the background state (horizontal) direction, the switching rate for all states in layer i is ri , which is also directly from the background Markov Chain. Namely, we define 2-layer viewpoint switching rate ri,j = background state viewpoint ri
where ri,j is also used to distinguish from ri . Note that, when the state is switched from (i, j) to (2 − i, j) in the horizontal direction, the instantaneous default rate is not changed, i.e. xi,j = x2−i,j = xj . What is changed is the diffusion dynamics after the switching time. In fact, we may allow the dependence of ri,j on the diffusion index j, as in a Markov Chain, we may assign different transition rates for each state. This will not sacrifice its computational tractability. For example, we may define the switching rate for state (i, j) to have linear dependence on the default rate as ri,j = ai xi,j + bi in which ai can be used to reflect the relation between default rate and switching rate. When I(t) = 1 (in the good state), the default rate turning higher means this default entity becomes more risky, thus should be also more likely to switch to a bad state. On the other hand, it goes the opposite way when I(t) = 2. Therefore, it would be sensible to demand that a1 > 0 and a2 < 0 Next, we consider a generalization on the background state holding time. When the background state switching rates do not depend on diffusion state values, i.e. ri,j = ri , the background state holding time follows a exponential distribution with 1 mean . Since exponential distribution is known to be memoryless, the probability ri for a switching to happen in the next infinitesimal time interval is independent of how long it has stayed in this state. In order to generalize, we use the concept of hazard (failure) rate function h(t) [11][6] of a distribution as below h(t)dt = P(t ≤ T ≤ t + dt|T ≥ t) = f (t)dt 1 − F (t) =⇒ h(t) = f (t) 1 − F (t)
72
where f (t) and F (t) are PDF and CDF for a random variable T standing for the background state holding time. Obviously, the exponential distribution has a constant hazard rate. In this study we intend to consider the generalization on holding time such that 1. h(t) is an increasing function on t (known as increasing failure rate, or IFR) 2. h(t) is an decreasing function on t (known as decreasing failure rate, or DFR) In case of IFR, the longer it stays in the current state, the more likely for it to switch. The DFR indicates the opposite. Now the next question is how we can incorporate IFR or DFR background state holding time to our MMPP. To bring IFR or DFR into our setting, we have to use some special distributions which can be decomposed to several exponential distributed random variables. An example of IFR satisfying the above condition is Erlang distribution. The density of an Erlang random variable T (k, µ) is given by (µk)k k−1 −kµx f (t) = x e (k − 1)! (0 < x < ∞)
where k is a positive integer and µ is a positive constant. In addition, its mean and variance are given by E[T ] = 1 µ V[T ] = 1 kµ2
We see that, with fixed mean, as k increases, its variance becomes less and the Erlang distribution becomes more deterministic. Such T can be actually expressed by a sum of k independent and identical exponential distributed random variables. Therefore, we can use k successive states to describe the phases or stages that T will go through. Sketched in Fig. 3.8(a) is an example of an Erlang holding time with k = 3. Consequently, Erlang distributed holding time is able to be included in our MMPP setting, although a k-fold number of states are required to be used. Similarly, when it comes to DFR, we use another distribution which can be also expressed in phase form, the hyperexponential distribution. A hyperexponential distribution is a mixed exponential distribution, or more precisely, a weighted avarage of k (different) exponential distributed random variables. Let Ti , i = 1, · · · , k be the exponential distributed random variables, then the hyperexponential distributed T can be defined as T = Ti with probability pi , i = 1, · · · , k, where p1 + · · · + pk = 1
73
Figure 3.8: (a) Erlang distribution (k = 3) (b) Hyperexponential distribution (k = 3) Similar to the Erlang case, we have to use a k-fold number of states to bring a hyperexponential holding time into our setting. Shown in Fig. 3.8(b) is an example of hyperexponential distribution with k = 3.
3.4.2
Switching Triggered by Hitting Threshold
In this subsection, we discuss another type of switching which is caused by state value hitting a prespecified threshold. Consider two background states, a good (i = 1) and a bad (i = 2) states. The diffusion behaviour of default rate is described by Xi (t), i = 1, 2. If it is currently in the good state, and the default rate X1 (t) is moving high up to a certain level, say
1,
then the background state switches to the bad one. On
the the hand, when it is in the bad state, and the default rate X2 (t) is dropping down to another specific value 2 , it switches back to the good state. Without loss of generality, we are still based on two OUs with different long-run means as described earlier. dX1 (t) = k(θ1 − X1 (t))dt + σdW1 (t) dX2 (t) = k(θ2 − X2 (t))dt + σdW2 (t) Let i , i = 1, 2 represent the threshold in each background state. Like in the previous model, we let θ1 < θ2 since state 1 is considered as a better state than state 2. To formulate this process by a two layer MMPP, we first discretize both OU processes into MMPP X1,h (t) and X2,h (t). For simplicity, the threshold one of the discrete state values of in Xi,h (t). Once X1,h (t) hits
1 i
is assumed to take
from downward, the
background state will be switched to state 2 immediately. Therefore, in the MMPP Xh,1 (t), all the discrete points with higher values than 1 would have no chance to be visited and thus should be truncated. Similarly, in X2,h (t) all the states with values lower than
2
should be also truncated. The final MMPP is shown as in Fig. 3.9. 74
Note that we need to have
1
≥
2.
Otherwise, the default rates before and after
1
switching can not be at the same values. In particular, when
=
2,
the overall
process can be seen as a single OU process with state dependent long-run mean.
Figure 3.9: Switching caused by hitting threshold In fact, this model can be also seen as a special case of the models discussed in the previous subsection. From this point of view, the background state switching rates ri are dependent on the diffusion state values xi,j as r1 (x1,j ) = ∞ 0 x1,j = x1,j <
1 1
r2 (x2,j ) =
0 ∞
x2,j > x2,j =
2 2
Namely, there is an infinity switching rate at the threshold states. One can also allow non-zero state dependent switching rates at those states which are not threshold. Such a model can be seen as a combination of models in this and the previous section. The numerical examples will be carried out in the future.
3.4.3
Models with Memory Effects
In this section we further generalize the switched diffusion model to incorporate a phenomenon which is refered to as memory effect. Consider a diffusion process X(t) with a predefined threshold which is seen as a boundary between the normal area (X(t) < ) and risky area (X(t) > ). The memory effect we want to discuss includes the following behaviour 1. If it has once hit the threshold (been to the risky area), then it is considered as being in a risky state and more inclined to hit that threshold again (return the risky area). 75
2. After hitting the threshold, the longer it keeps in the normal area, the less likely it will hit threshold. Namely, it will gradually become normal if it stays in the normal area. It finally become completely normal (in the normal state) if it keeps there for a sufficient long time. The difference between this and the previous model (Section 3.4.2) is in how a process in the risky state returns to the normal state. In the previous model, the process in bad state (state 2) is switched back if it hits another threshold 2 . In the current model, the rule is based on the time the process stays in the normal area. One reason to support such assumption is that, the longer it stays in the normal area, the more likely the concerned defaultable entity is under a good control. When such controlled status is maintained for sufficiently long, the entity should be seen the same as other normal entities. To be more precise, we assume when the default rate process is in the normal (i = 1) and risky (i = 2) states, the it follows OU processes with different long-run means as below dX1 (t) = k(θ1 − X1 (t))dt + σdW (t) dX2 (t) = k(θ2 − X2 (t))dt + σdW (t) normal state risky state (after hitting )
where θ1 is a constant, and θ2 should be decreasing on time in some sense. Since state 2 is considered more risky than state 1, we have θ2 > θ1 . In order to model the time related behaviour, the whole recovering process (from state 2 to state 1) is assumed to have k stages. In each stage it still follows an OU process X2,j (t) dX2,j (t) = k(θ2,j − X2,j (t))dt + σdW (t) risky state (after hitting ) where j = 0, · · · , k. In particular, we let θ2,0 = θ2 and θ2,k = θ1 . Therefore, in the entire recovery process, as long as the default rate stays in normal area, the long-run mean would move like θ2 = θ2,0 → θ2,1 → · · · → θ2,k−1 → θ2,k = θ1 That is to say, we need to use another k −1 diffusion processes to describe the gradual decreasing property of θ2 . Discretizing the k + 1 diffusion processes gives us an MMPP of k + 1 layers, as sketched in Fig. 3.10. The diffusion dynamics in each layer is described by the single layer MMPP Xj,h (t), j = 0, · · · , k. The next question becomes how to specify 76
the transition across the layers. Here we assume the entire recovery time follows an Erlang distribution with parameters (k, µ). It is chosen for two reasons. The first reason is that from Section 3.4.1, an Erlang distribution has IFR property which is in accordance with our requirement here: the longer it stays, the sooner it finishes. The second reason is because an Erlang recovery time can be seen as several exponential phases, thus it can fit into our MMPP. Consequently, we simply assume µ to be the transition rate across layers in the k-stage recovery process, then the whole recovery time will follow Erlang (k, µ).
Figure 3.10: The k-stage recovery process Note that, as shown in Fig. 3.10, in any time of the recovery process X2,j , if the process value has hit again, then it will be immediately switched back to the process X2,0 , which is where the recovery starts from. Furthermore, we assume θ2,j decreases linearly on the stage index j as θ2,j = θ2 − θ2 − θ1 j k
Let θ2 (t) denote the time t value of long-run mean where t is measured from the beginning of the recovery process. Clearly, θ2 (t) is a piecewisely constant time function, with each constant piece following exponential distribution. It is expected that, E[θ2 (t)] is exponentially decreased from θ2 to θ1 . When the mean recovery time is fixed, the higher k means the more deterministic recovery time, as well as the more deterministic time function θ2 (t), although more layers are required to be used.
77
3.5
Models with Time-Varying Parameters
The last type of generalization is a diffusion process with time-varying parameters. With a further treatment, the default term structure for such models can be obtained through MMPP discretization as well. As a first step, we look at the simplest case where the parameters are piecewisely constant. Consider an OU process with time-varying parameter set of the following form (k(t), θ(t), σ(t)) = (kj , θj , σj ) j∆t ≤ t ≤ (j + 1)∆t
In the j-th time interval j∆t ≤ t ≤ (j + 1)∆t, the parameter set is constant, thus it is a standard OU during this period. This means we have to use one MMPP for each time interval. Assuming the total state number 2N + 1 and the step size h are the same for all intervals, and define pj = p(j∆t) = [p1 (j∆t), · · · , p2N +1 (j∆t)]T standing for the state probability column vector. In the boundary time instants, the state probability vector can be given by pj+1 = eAj ∆t · pj j = 1, 2, · · ·
where Aj is the state transition matrix for the MMPP in interval j (associated with parameter set (kj , θj , σj )). For each time interval, we have p(t) = eAj (t−j∆t) · pj j∆t ≤ t ≤ (j + 1)∆t
The survival probability at time t is given by
2N +1
P(τ ≥ t) = p(t) · 1 =
i=1
pi (t)
The only difference between this case and the basic OU case is that we need to calculate the state transition matrix Aj for each interval j. In addition to the case of piecewisely constant parameters, we come to consider the case where the long-run mean is a continuous time function θ(t). This is a special type of the Hull-White extended models[1]. The Hull-White extended models are extended from an OU or CIR allowing all parameters to be time dependent. The most commonly used models in this class are those with time-dependent long-run mean φ(t), i.e. dY (t) = k(φ(t) − Y (t))dt + dY (t) = k(φ(t) − Y (t))dt + σ 78 σdW (t) Y (t)dW (t) (OU) (CIR)
The function φ(t) is often used to fit the term structure of zero coupon bond (ZCB) price in interest rate modeling. The idea to deal with φ(t) is decomposition, which is to express a target process Y (t) as a sum of another diffusion process without time-varying parameters X(t), as well as a pure (deterministic) time function ϕ(t). If such a decomposition exists, we have Y (t) = X(t) + ϕ(t) which yields P(τY ≥ t) = E[e−
t 0
Y (s)ds
] = E[e−
t 0
X(s)ds
] · e−
t 0
ϕ(s)ds
= P(τX ≥ t) · e−
t 0
ϕ(s)ds
where τY and τX are the default times when the default rates follow Y (t) and X(t). As a result, the calculation of the survival probability P(τY ≥ t) can be done separately by calculating P(τX ≥ t), which can be obtained through MMPP, and the pure time function part. The next question is how to derive such a decomposition. Assuming X(t) is an OU and Y (t) is a Hull-White extended OU with time-varying long-run mean as below. dX(t) = k(θ − X(t))dt + σdW (t)
dY (t) = k(φ(t) − Y (t))dt + σdW (t) Now we look for a function ϕ(t) that satisfies the decomposition Y (t) = X(t) + ϕ(t). From Ito’s lemma, dY (t) = dX(t) + ϕ (t)dt = k(θ − X(t))dt + σdW (t) + ϕ (t)dt = = kθ − kX(t) + ϕ (t) dt + σdW (t) kφ(t) − k(X(t) + ϕ(t)) dt + σdW (t)
Equating the drift term, we see that for a given φ(t), the corresponding ϕ(t) should satisfy the following ODE ϕ (t) + kϕ(t) = kφ(t) − kθ As a simple example, if ϕ(t) and φ(t) are both second order polynomial functions as below φ(t) = at2 + bt + c , ϕ(t) = At2 + Bt + c
Given any (a, b, c) we use the above ODE to determine (A, B, C) as follows A = a, B= kb − 2A , k 79 C= kc − kθ − B k
Generalizing this idea, if φ(t) is an n-th order polynomial, using the above ODE we may obtain the corresponding ϕ(t) which is also in n-th order. For a general φ(t), we have to solve the ODE to obtain ϕ(t). For example, if φ(t) = ae−bt + c is given, we shall solve the above ODE to find ϕ(t) as ϕ(t) = e−kt = e−kt = ekt [kφ(t) − kθ]dt + d ekt [kae−bt + k(c − θ)]dt + d e−kt
ka −bt ka e + (c − θ) + a + θ − k−b k−b
where the constant d = a+θ−
ka is obtained from the initial condition φ(0) = a+c. k−b In case that the given φ(t) is not nice enough for us to find ϕ(t) by solving the
ODE, we may develop a numerical procedure to find a piecewisely polynomial function ˜ φ(t), as an approximation to φ(t). This may allow us to obtain ϕ(t), which is another ˜ piecewisely polynomial function to approximate ϕ(t). In the following we present the ˜ case where second order polynomials are used for φ(t) and ϕ(t). ˜ Divide the time axis into equally spaced interval with step size ∆t. In each time interval, we use an second order polynomial to approximate the curve. Consider the ˜ j-th interval, i.e. j∆t ≤ t ≤ (j + 1)∆t, in order to get the second order function φ(t), we pick three time points j∆t, (j + 1 )∆t, and (j + 1)∆t. For simplicity we use the 2 following notations tj,0 = j∆t tj,1 = (j + 1 )∆t 2
1 )∆t) 2
tj,2 = (j + 1)∆t φj,2 = φ((j + 1)∆t)
φj,0 = φ(j∆t) φj,1 = φ((j + and furthermore ˜ ˜ φj (t) = φ(j∆t + t) where the time index is shifted by j∆t.
and
ϕj (t) = ϕ(j∆t + t) ˜ ˜
˜ In the j-th interval j∆t ≤ t ≤ (j +1)∆t, φj (t) can be obtained through the second Lagrange interpolating polynomial[3] as follows
1 (t − 0)(t − 1 ∆t) (t − 0)(t − ∆t) 2 ˜j (t) = φj,0 (t − 2 ∆t)(t − ∆t) + φj,1 φ + φj,2 (0 − 1 ∆t)(0 − ∆t) ( 1 ∆t − 0)( 1 ∆t − ∆t) (∆t − 0)(∆t − 1 ∆t) 2 2 2 2
After some algebraic treatments, we may write ˜ φj (t) = aj t2 + bj t + cj aj = ∀j where
1 1 (2φj,0 − 4φj,1 + 2φj,2 ) bj = (−3φj,0 + 4φj,1 − φj,2 ) cj = φj,0 2 ∆t ∆t 80
Using earlier result the corresponding ϕj (t) is given by ˜ ϕj (t) = Aj t2 + Bj t + Cj ˜ Aj = a j Bj = k − bj − 2Aj k ∀j Cj = where kcj − kθ − Bj k
As a result, with t = n∆t, we have
t n (j+1)∆t n ∆t n
ϕ(s)ds = ˜
0 j=0 j∆t
ϕ(s)ds = ˜
j=0 t 0 0
ϕj (s)ds = ˜
j=0
Aj
∆t2 ∆t3 + Bj + Cj ∆t 3 3
as an approximation to
ϕ(s)ds. Therefore, the survival probability for default rate
t 0
process Y (t) is obtained by P(τY ≥ t) = P(τX ≥ t) · e−
ϕ(s)ds
.
When the target process Y (t) is a CIR with time varying long-run mean φ(t), the above decomposition procedure still work, because the time function ϕ(t) only contributes to the drift term in the target process. Repeat the steps to find ϕ(t) and the survival probability can be obtained accordingly.
3.6
Concluding Remarks
In this chapter, we use MMPP discretization to obtain the default term structure for the advanced models that are generalized or extended from OU. Such generalization or extension usually mades these models no longer analytically tractable. MMPP is a means to formulate such models so that the default term structure can be calculated numerically. The models considered in this chapter include four major categories. The first category is a direct extension from OU, where the drift and volatility are of general forms. Our MMPP discretization actually works for such generalization. For the general drift models, a direct use of MMPP is enough. For models with general volatility, a function to link the target and underlying processes is used, and discretization is done on the underlying process. One important example is the CIR process. The second category is to incorporate jumps into the diffusion processes such as OU and CIR. They are easily incorporated since a jump can be considered as a state change across several states in MMPP. Due to the nature of MMPP, jumps can be actually made dependent on the state values without sacrificing model tractability. The third category is the switched diffusion models, in which the model state may move between more than one diffusion processes. When it is turned into MMPP, each diffusion process becomes a layer of states. The overall process is therefore moving between multiple layers according to some rules. Two different switching rules are 81
considered: (1) according to a background Markov Chain (2) triggered by hitting a threshold. We also consider a further extension to a class of models with memory effect. The last category is the models with time dependent parameters. When the parameter vector is piecewisely constant, the model still fits to the MMPP technique, although the state transition matrix has to be updated for each time interval. We also consider a special type of the Hull-White extended models, an OU or CIR model with time dependent long-run mean. Such a process can decomposed into a process with constant parameters and a pure time function. The survival probability can be calculated separately from these two parts. For the cases where such decomposition is not easily performed (difficult to solve a ODE), a method based on Lagrange interpolating polynomial can be used to find the time-varying part. In fact, there is another category of the advanced models which have stochastic parameters (such as stochastic volatility models) and are suitable to be analyzed by MMPP. However, a two or higher dimensional MMPP is needed to formulate such models. This type of models will be discussed in the next chapter.
82
Chapter 4 Higher Dimensional MMPP Formulation of Advanced Models with Stochastic Parameters
4.1 Introduction
In this chapter, we consider the models with more than one correlated Brownian motions. The models introduced in Chapter 3 are all based on one dimensional (1-D) MMPP. Even in the cases of switched diffusion models where multiple layers of one dimensional MMPP are used, they are still regarded as based on 1-D MMPP. When it comes to more advanced 2- or n-factor models involing two or more Brownian motions, such as the volatility of any earlier model being assumed stochastic and following another diffusion process, then the 1-D MMPP discretization will no longer work. This gives rise to the two or higher dimensional MMPP formulation. Following the 1-D MMPP discretization done in Chapter 2, to formulate a 2-D MMPP, we also start with the simplest 2-D OU process with correlated Brownian motions. The idea to construct the transition rates between states is still through moment matching. By matching the first two moments of the process values in individual dimensions, as well as the covariance between processes, we determine the transition rates from each state to its adjacent states. As we have seen in Chapter 3, through a function between (target and underlying) processes we are able to deal with processes with more general drift and volatility. Applying such an idea to the 2-D case, more general models like 2-D CIR processes can be also studied. That is to say, the 2-D MMPP discretization can be used to deal with more general 2-D diffusion processes. Moreover, the same idea can be further extended to construct an n-D MMPP which is used to deal with a general n-factor model.
83
It still remains interesting to ask how the convergence behaviour is in this 2-D case. Like treated in Chapter 2, we plan to analyze the errors in joint moments, as well as in joint moment generating functions (MGF). The convergence in joint distribution can be examined through the convergence of joint characteristic functions, which is closely related to joint MGF. We also hope to prove that the second order convergence in the 1-D case can be preserved to the n-D cases. Besides, an analysis on the steady state behaviour will be also addressed. As to the applications in default term structure, we simply use the same technique as shown in Chapter 2 to calculate the survival probabilities through matrix exponential. Due to the 2-D (n-D) structure of state transitions, it is indeed more cumbersome and messy in formulating the state transition matrix. However, the principle is the same and the survival probability for each time horizon should be obtained without difficulty. The n-factor models with stochastic parameters considered in this chapter are divided into two categories. The first class is default rate models with stochastic parameters, while the second class is (risk neutral) stock price models with stochastic parameters. The major difference of these two lies in the long term stability of the main processes (default rate and stock price). Default rate processes are often with mean-reverting property and thus we may set up a set of upper and lower bounds that the process may move outside these bounds with a sufficiently small probability. Generally the drift and volatility can be also assumed mean-reverting thus the whole model is suitably described by a Markov Chain with finite states. Consequently, an n-D MMPP should be a reasonable method to approximate its major dynamics. On the other hand, however, the positive drift term in a stock price will drive it toward infinity as time goes on, and therefore it is not suitable to use such a fixed set of bounds to define its possible range for the entire time horizon. In fact, to price a payoff at a fixed maturity time, we only need to consider the stock price dynamics up to this maturity time. During the life of this payoff, the stock price is still stable and thus a finite state Markov Chain should be still a reasonable approximation. The difference between this and the previous case is that we need to set a set of suitable bounds for each choice of maturity time. Similar to the default rate case, we will also consider the stock price models with stochastic drift and volatility. Like the previous chapter, this chapter is also a research plan. Only main ideas and problem formulation are presented, and the real part will be carried out in the future. It is organized as follows. In Section 2, we discuss how to formulate 2-D MMPP using moment matching and make extensions to n-D cases. In Section 3, we 84
give a preliminary study of the convergence behaviour of n-D MMPP to its original OU. In Section 4, we use the 2-D and 3-D MMPP to study the default term sturcture in default rate models with stochastic drift and volatility. In Section 5, the main process is changed to a stock price, and we will study the derivative pricing problem where the drift and volatility of stock price are stochastic. Finally, concluding remarks are given in Section 6.
4.2
Formulation of Two Dimensional MMPP
dX1 (t) = k1 (θ1 − X1 (t))dt + σ1 dW1 (t) dX2 (t) = k2 (θ2 − X2 (t))dt + σ2 dW2 (t)
Consider the following 2-D OU process (X1 (t), X2 (t))
where dW1 (t)·dW2 (t) = ρ·dt. Denote this 2-D OU parameter set by ((k1 , θ1 , σ1 ), (k2 , θ2 , σ2 ), ρ). Let (X1,h1 (t), X2,h2 (t)) be the discretized 2-D MMPP and h1 and h2 be the step size in each dimension. (To keep notations simple, we use X(t) and X t interchangably in this chapter, for a stochastic process X.) Also let 2N1 + 1 and 2N2 + 1 be the number of states in each dimension and x1,i1 and x2,i2 be the state values where 0 ≤ i1 ≤ 2N1 , 0 ≤ i2 ≤ 2N2 . The total state number comes to (2N1 + 1) × (2N2 + 1). As in Chapter 2, we simply assign x1,N1 = θ1 and x2,N2 = θ2 . Sketched in Fig. 4.1(a) is the grid structure of a 2-D MMPP.
Figure 4.1: A 2-D MMPP (a) The grid structure (b) The transitions to adjacent states Like in Chapter 2, we wish to use moment matching to determine the transition rates from any state (x1,i1 , x2,i2 ) to its eight adjacent states, as shown in Fig. 4.1(b). 85
One basic principle is that if we just look at either one direction and ignore the other one, the first two moments in that dimension should be matched, i.e.
t+∆t t t t t E[Xit+∆t |(X1 , X2 ) = (x1 , x2 )] = E[Xi,hi |(X1,h1 , X2,h2 ) = (x1 , x2 )] t+∆t t t t t V[Xit+∆t |(X1 , X2 ) = (x1 , x2 )] = V[Xi,hi |(X1,h1 , X2,h2 ) = (x1 , x2 )]
(4.1)
where i = 1, 2 and (x1 , x2 ) is used to lighten notations. In addition to each individual dimension, in order to incorporate the correlation between these two processes, we need also to match their covariance as below
t+∆t t+∆t t+∆t t t t t t+∆t Cov[X1 , X2 |(X1 , X2 ) = (x1 , x2 )] = Cov[X1,h1 , X2,h2 |(X1,h1 , X2,h2 ) = (x1 , x2 )]
(4.2) Both the variance of a single process and covariance between two processes are second order statistics, thus the matching is still regarded as done on the first two moments. This is equivalent to say the first two moments of their sum process are matched, i.e.
t+∆t t+∆t t+∆t t+∆t t t t t E[X1 + X2 |(X1 , X2 ) = (x1 , x2 )] = E[X1,h1 + X2,h2 |(X1,h1 , X2,h2 ) = (x1 , x2 )] t+∆t t+∆t t+∆t t+∆t t t t t V[X1 + X2 |(X1 , X2 ) = (x1 , x2 )] = V[X1,h1 + X2,h2 |(X1,h1 , X2,h2 ) = (x1 , x2 )]
In particular, when the two processes are independent, then matching the moments for each individual process is enough to determine the transition rates. In the following subsections, we derive the moment matching formula that satisfies the above requirements. We start with the independent case, based on which we allow joint movement to incorporate their correlation. We also discuss a further generalization to higher dimensional cases with more correlated Brownian motions.
4.2.1
Two Independent MMPPs
When the two OU processes are independent (ρ = 0), the two discretized MMPP must be also independent. In this case, a movement in one dimension does not have any impact on the other dimension. Looking back to Fig. 4.1(b), since a diagonal transition (from O to E, F, G, H) actually indicates cross dependence between processes, in this independent case we simply allow the transitions to happen only in the horizontal (from O to A, B) and vertical (from O to C, D) directions. Namely, this 2-D MMPP can be simply seen as a direct combination of the two 1-D MMPPs which are separately discretized. The procedure to construct this 2-D MMPP is described as follows. In each demension i, i = 1, 2, we first use the moment matching formulae in Chapter 2 to
86
obtain the single dimension up and down jump rates as 2 ui (xi,j ) = σi + hi ki (θi − xi,ji ) i 2h2 i where i = 1, 2 and ji = 0, · · · , 2Ni 2 σi − hi ki (θi − xi,ji ) di (xi,ji ) = 2h2 i Combining these two 1-D MMPPs, we have the desired 2-D MMPP with horizontal and vertical transition rates as 2 2 σ1 + h1 k1 (θ1 − x1,j1 ) u1 (x1,j , x2,j ) = u2 (x1,j , x2,j ) = σ2 + h2 k2 (θ2 − x2,j2 ) 2 1 1 2 2h2 2h2 1 2 2 2 σ1 − h1 k1 (θ1 − x1,j1 ) σ2 − h2 k2 (θ2 − x2,j2 ) d1 (x1,j1 , x2,j2 ) = d2 (x1,j1 , x2,j2 ) = 2h2 2h2 1 2 where u1 and d1 are horizontal transition rates (O to A, B) and u2 and d2 are vertical transition rates (O to C, D). Note that in each dimension 1 and 2, we have u1 + d1 =
2 σ1 h2 1 2 σ2 h2 2
u2 + d2 =
h2 1 i = . Although we 2 ui + di σi may allow any choice of (h1 , h2 ), different selections of (h1 , h2 ) will result in different degrees of fineness as well as different sojourn times in both dimensions. To make Thus the mean state sojourn time in dimension i is the step size in one dimension is in a suitable relative level to the other dimension, we need to have a restriction on the selection of (h1 , h2 ). Here we make a heuristic suggestion to demand that in both directions, the mean sojourn time should be equal, i.e. h2 h2 h1 h2 1 = 2 =⇒ = 2 2 σ1 σ2 σ1 σ2 which says the step size should be chosen in proportion to their volatility. This is in fact a reasonable requirement since this will keep roughly equal tendancy of moving along both directions. This is because the more the volatility is, the easier it is to have a state change if the step size is fixed. When a higher volatility is given, a greater value of step size could be used to offset against the effect of the volatility such that the tendancy for a jump to happen in both directions are roughly the same. hi We further let H = , i = 1, 2, representing the normalized step size. In this σi ρ = 0 case, the sojourn time in each dimension is H 2 . When the 2-D MMPP is considered as a whole, the total moving out rate in a state is u1 + d1 + u2 + d2 = 87
2 σ1 σ2 2 + 2 = 2 2 2 h1 h2 H
Therefore, the overall mean sojourn time becomes 1 H2 = u1 + d1 + u2 + d2 2 which is a half of the single dimension sojourn time. Clearly it agrees with our intuition: since the process has equal tendancy to jump in both dimensions, doubling the tendancy in the 1-D case, therefore the mean sojourn time becomes a half.
4.2.2
Two Correlated MMPPs
In this subsection we discuss the correlated case ρ = 0. Clearly, we have to allow the transitions in the diagonal directions (from O to E, F, G, H) to happen, in addition to the transitions in horizontal or vertical directions. Let r1 , r2 , r3 , r4 be the state transition rates toward E, F, G, H, and we refer to them as the cross transition rates. It is obvious that the existence of r1 and r3 indicates positive correlation while r2 and r4 indicate negative correlation. Also, let us use ui and di , i = 1, 2 to denote the up and down jump rates in dimension i for the ρ = 0 case, in contrast with ui , di representing the transition rates defined in the previous subsection for the ρ = 0 case. As stated earlier, to match the first two moments in the joint sense is equivalent to match the first two moments in the individual dimensions, as well as the covariance. In (4.1)-(4.2) there are totally 5 equations to be used to determine the transition rates in 8 directions. If we just look at each dimension separately, the up and down jump rates from dimension i point of view are Dimension 1 viewpoint up jump rate = u1 + r1 + r 4 Dimension 2 viewpoint up jump rate = u2 + r1 + r2
down jump rate = d1 + r2 + r3
down jump rate = d2 + r3 + r4
Since ui and di , i = 1, 2 are actually the up and down jump rates in dimension i which make the first two moments matched, therefore, matching individual dimension’s moments is equivalent to say the following should hold u1 + r1 + r 4 = u 1 d1 + r2 + r3 = d1 u 2 + r1 + r 2 = u 2 d2 + r3 + r4 = d2 (4.3)
These four equations are seen as a simple version of the original four equations in (4.1). Consequently, we have only one condition left, which is the matching of covariance condition (4.2). We first derive the covariance for the 2-D OU process. Recall the results on bivariate normal distribution. Assume (X1 , X2 ) is a bivariate normal random vector 88
with parameter set ((m1 , v1 ), (m2 , v2 ), ρ), where mi , vi denote mean and standard deviation (square root of variance) of Xi , i = 1, 2, and ρ stands for the correlation coefficient, which is by definition ρ= Cov[X1 , X2 ] V[X1 ] V[X2 ] =⇒ Cov[X1 , X2 ] = ρ · v1 v2
t t Now we look back to our 2-D OU process (X1 , X2 ). Suppose we are given the time
t state vector (x1 , x2 ), then as time moves onward by an infinitesimal time ∆t, the
t+∆t t+∆t vector (X1 , X2 ) follows a bivariate normal distribution with mean and variance t t in each dimension as (the conditioning (X1 , X2 ) = (x1 , x2 ) is dropped in the following for simplicity) t+∆t E[X1 ] = m1 = x1 + µ1 ∆t t+∆t E[X2 ] = m2 = x2 + µ2 ∆t t+∆t 2 2 V[X1 ] = v1 = σ1 ∆t t+∆t 2 2 V[X2 ] = v2 = σ2 ∆t
where µ1 = k1 (θ1 − x1 ) and µ2 = k2 (θ2 − x2 ) are short for the drift terms, and their the covariance is given by
t+∆t t+∆t Cov[X1 , X2 ] = ρ · v1 v2 = ρ · σ1 σ2 ∆t t+∆t t+∆t On the other hand, the covariance of X1,h1 and X2,h2 conditioned on the time t state is obtained by t+∆t t+∆t t+∆t t+∆t Cov[X1,h1 , X2,h2 ] = E[X1,h1 · X2,h2 ] − m1 · m2
where (by ignoring ∆t2 term) m1 · m2 = (x1 + µ1 ∆t)(x2 + µ2 ∆t) ∼ x1 x2 + µ1 x2 + µ2 x1 ∆t =
89
and
t+∆t t+∆t E[X1,h1 · X2,h2 ]
= u1 ∆t · (x1 + h1 )x2 + d1 ∆t · (x1 − h1 )x2 + u2 ∆t · x1 (x2 + h2 ) + d2 ∆t · x1 (x2 − h2 ) +r1 ∆t · (x1 + h1 )(x2 + h2 ) + r2 ∆t · (x1 − h1 )(x2 + h2 ) +r3 ∆t · (x1 − h1 )(x2 − h2 ) + r4 ∆t · (x1 + h1 )(x2 − h2 ) + 1 − (u1 + d1 + u2 + d2 + r1 + r2 + r3 + r4 )∆t · x1 x2 = x1 x2 + (u1 − d1 + r1 − r2 − r3 + r4 )h1 x2 ∆t + (u2 − d2 + r1 + r2 − r3 − r4 )h2 x1 ∆t + (r1 − r2 + r3 − r4 )h1 h2 ∆t µ1 u1 − d1 + r1 − r2 − r3 + r4 = u1 − d1 = h1 Note that µ2 u −d +r +r −r −r = u −d = 2 1 2 3 4 2 2 2 h2 = x1 x2 + µ1 x2 + µ2 x1 ∆t + (r1 − r2 + r3 − r4 )h1 h2 ∆t Therefore, we have
t+∆t t+∆t Cov[X1,h1 , X2,h2 ] = (r1 − r2 + r3 − r4 )h1 h2 ∆t
Obviously we see the covariance is determined only by the cross transition rates r1 , r2 , r3 , r4 . Matching the covariance we have r1 − r2 + r3 − r4 = From this formula we see that • ρ>0 • ρ<0 r 1 + r3 > r 2 + r 4 r 1 + r3 < r 2 + r 4 ρσ1 σ2 h1 h2
This agrees with our intuition since r1 , r3 indicate both processes moving in the same direction while r2 , r4 indicate them moving in the opposite way. As mentioned earlier, we have 5 conditions for 8 variables, so we need to add extra conditions to get a suitable solution. Here we make a heuristic suggestion: • ρ>0 • ρ<0 r1 = r3 r 2 = r4 r2 = r4 = 0 r1 = r3 = 0
This is to say, we only use relevant transition rates for either case of ρ. Moreover, in each case, the remaining two transition rates have actually the same contributions to 90
the covariance, thus they are chosen to be equal. Based on such heuristic condition, all the nonzero rates are r1 = r3 (ρ > 0) = r= ρσ1 σ2 2h1 h2 = r2 = r4 (ρ < 0)
Therefore, the up and jump rates in each dimension u1 , d1 , u2 , d2 are obtained from (4.3). Shown in Fig. 4.2 are the allowed transitions in each case of ρ.
Figure 4.2: Allowed transitions for each case (a) ρ = 0 (b) ρ > 0 (c) ρ < 0 To sum up, the moment matching formulae for 2-D MMPP can be summarized as r1 = r3 = r r2 = r4 = r (ρ > 0) (ρ > 0) u1 = u1 − r d1 = d1 − r u2 = u2 − r d2 = d2 − r (4.4)
We note that, in this 2-D MMPP, the total moving out rate at a state is u1 + d1 + u2 + d2 + r1 + r2 + r3 + r4
= u1 + d1 + u2 + d2 − (r1 + r2 + r3 + r4 ) =
2 σ1 h2 1
+
2 σ2 h2 2
−
ρσ1 σ2 h1 h2
=
2−ρ H2
Therefore, the mean state sojourn time becomes 1 H2 = u1 + d1 + u2 + d2 + r1 + r2 + r3 + r4 2−ρ which will reach its maximum at |ρ| = 1 and minimum at ρ = 0. Like presented in Chapter 3, the 2-D MMPP discretization can be applied to more general models such as a 2-D CIR process. The idea is the same as in Chapter 3, using a function between the target and the underlying process. By applying Ito to 91
this function, for a given target process we may find the corresponding underlying process with constant volatility. Such underlying processes with Gaussian process values are suitable for our MMPP technique. For example, let (Y1 (t), Y2 (t)) stands for the target 2-D CIR process, i.e. dY1 (t) = k1 (θ1 − Y1 (t))dt + σ1 dY2 (t) = k2 (θ2 − Y2 (t))dt + σ2 Y1 (t)dW1 (t) Y2 (t)dW2 (t)
where dW1 (t) · dW2 (t) = ρ · dt. Using Yi (t) = Xi2 (t), i = 1, 2 we obtain the underlying process (X1 (t), X2 (t)) of such form as dX1 (t) = µ1 (X1 (t))dt + σ1 dW1 (t) dX2 (t) = µ2 (X2 (t))dt + σ2 dW2 (t) where µ1 (x), µ2 (x) can be found through Ito. When (X1 (t), X2 (t)) is discretized, the target process (Y1 (t), Y2 (t)) is also discretized indirectly.
4.2.3
Extension to n Correlated MMPPs
The principle to match the first two moments can actually be applied to higher dimensional cases. In 2-D case we have seen that, matching the first two moments in a joint sense includes two conditions 1. Matching the first two moments in each dimension. 2. Matching the covariance between two dimensions. It is also seen that, the cross transition rates (in diagonal directions) which are directly related to the covariance (or ρ) should be obtained first using condition 2, and then the transition rates in individual dimensions are determined through condition 1. Now we extend this idea to a higher dimensional case. Consider an n-D OU vector process (X1 (t), X2 (t), · · · , Xn (t)) dX1 (t) = k1 (θ1 − X1 (t))dt dX2 (t) = k2 (θ2 − X2 (t))dt ··· = ······ dXn (t) = kn (θn − Xn (t))dt of the first two moments means we need to match • E[Xi (t)], V[Xi (t)] • Cov[Xi (t), Xj (t)] 92 i = 1, · · · , n i, j = 1, · · · , n, i = j
+ σ1 dW1 (t) + σ2 dW2 (t) + ······ + σn dWn (t)
where dWi (t) · dWj (t) = ρij · dt, i, j = 1, · · · , n, i = j. In this n-D case, the matching
which sum to 2n +
n 2
equations in total. Again we let ui , di , i = 1, · · · , n represent
the up and down jump rates in the 1-D sense, and ui , di denote the up and down jump
++ −+ −− +− rates along each single dimension in this n-D MMPP. Further we let rij , rij , rij , rij +− denote the cross transition rates involving dimensions i and j. For instance, rij
stands for the rate to move one step up in dimension i and one step down in dimension j, and so on. From the previous subsection, these cross transition rates are determined by (ρ > 0)
++ −− rij = rij = +− −+ rij = rij =
(ρ < 0) rij = ρij σi σj 2hi hj 0
−+ +− = rij = rij −− ++ = rij = rij
and the single dimension up and down jump rates are given by ρij σi σj ++ +− ui = ui − (rij + rij ) = ui − 2hi hj ∀j=i ∀j=i d = d − i i
−+ (rij ∀j=i
+
−− rij )
ρij σi σj = di − 2hi hj ∀j=i
i = 1, · · · , n
One can still use a set of suitable functions to indirectly discretize a more general n-D process such as an n-D CIR process.
4.3
Convergence Analysis
In this section we plan to discuss the convergence behaviour of a 2-D MMPP to its original 2-D OU process. We will follow the idea presented in Chapter 2 to conduct the convergence analysis. Like in Chapter 2, the objective is to study 1. Whether or not (X1,h1 (t), X2,h2 (t)) converges to (X1 (t), X2 (t)) in distribution. 2. Whether or not the expectations converge in second order. This section has not yet been finished actually. Here we just present the preliminary results on the convergence of the joint MGF. It can be seen in the following that, when time t state is given, the joint MGF at time t + ∆t (∆t is infinitesimal) converges in second order as (h1 , h2 ) → 0. We first recall the result about the joint MGF in bivariate normal distribution. Consider the bivariate normal random vector (X1 , X2 ) with parameters ((m1 , v1 ), (m2 , v2 ), ρ), its joint MGF is given by E[eα1 X1 +α2 X2 ] = exp α1 m1 + α2 m2 + 93
2 2 2 2 α1 v1 + 2α1 α2 ρv1 v2 + α2 v2 2
Given the time t state (x1 , x2 ), the joint MGF for 2-D OU at time t + ∆t is given by E[eα1 X1
t+∆t t+∆t +α2 X2
t t |(X1 , X2 ) = (x1 , x2 )]
= exp(α1 x1 + α2 x2 ) · exp
α1 µ1 + α2 µ2 +
2 2 2 2 α1 σ1 + 2α1 α2 ρσ1 σ2 + α2 σ2 2
∆t ∆t
2 2 2 2 ∼ exp(α1 x1 + α2 x2 ) · 1 + α1 µ1 + α2 µ2 + α1 σ1 + 2α1 α2 ρσ1 σ2 + α2 σ2 = 2
On the other hand, for the moment matched 2-D MMPP, we have E[eα1 X1,h1
t+∆t t+∆t +α2 X2,h 2
t t |(X1,h1 , X2,h2 ) = (x1 , x2 )]
= u1 ∆t · eα1 (x1 +h1 )+α2 (x2 ) + d1 ∆t · eα1 (x1 −h1 )+α2 (x2 ) +u2 ∆t · eα1 (x1 )+α2 (x2 +h2 ) + d2 ∆t · eα1 (x1 )+α2 (x2 −h2 ) +r1 ∆t · eα1 (x1 +h1 )+α2 (x2 +h2 ) + r2 ∆t · eα1 (x1 −h1 )+α2 (x2 +h2 ) +r3 ∆t · eα1 (x1 −h1 )+α2 (x2 −h2 ) + r4 ∆t · eα1 (x1 +h1 )+α2 (x2 −h2 ) + (1 − R∆t) · eα1 x1 +α2 x2 ∼ eα1 x1 +α2 x2 · 1 + = u1 · eα1 (h1 ) + d1 · eα1 (−h1 ) + u2 · eα2 (h2 ) + d2 · eα2 (−h2 ) +r1 · eα1 (h1 )+α2 (h2 ) + r2 · eα1 (−h1 )+α2 (h2 ) +r3 · eα1 (−h1 )+α2 (−h2 ) + r4 · eα1 (h1 )+α2 (−h2 ) ∆t
where R = u1 − d1 − u2 − d2 − r1 − r2 − r3 − r4 . We expand all the exponential terms in the bracket and use the following conditions for the case ρ > 0 (similar derivation for case ρ < 0). ρσ1 σ2 r2 = r4 = 0 r 1 + r3 = h1 h2 and 2 2 (u + r1 ) + (d + r3 ) = u1 + d1 = σ1 (u + r1 ) + (d + r3 ) = u2 + d2 = σ2 1 2 1 2 h2 h2 1 2 (u + r ) − (d + r ) = u − d = µ1 (u + r ) − (d + r ) = u − d = µ2 1 2 1 3 1 1 1 3 2 2 1 2 h1 h2 then we continue with the above derivation to arrive at E[eα1 X1,h1 +
t+∆t t+∆t +α2 X2,h 2
t t |(X1,h1 , X2,h2 ) = (x1 , x2 )] = exp(α1 x1 + α2 x2 ) ·
1
α1 µ1 + α2 µ2 +
2 2 2 2 α1 σ1 + 2α1 α2 ρσ1 σ2 + α2 σ2 + O(h2 + h1 h2 + h2 ) ∆t 1 2 2
Obviously, we see the conditioned joint MGF of the 2-D MMPP will converge to that of the original 2-D OU as (h1 , h2 ) → 0. Moreover, the above formula points out the least power terms in the error is of the second order on h1 and h2 , which indicates the second order convergence in this 2-D case. 94
Note that such joint MGF is taken at time t + ∆t conditioned on the time t state. Thus, this formula actually says the error brought in across a time interval ∆t is in the second order on (h1 , h2 ). Intuitively, if the conditioning in the joint MGF is taken t+∆t t+∆t t−∆t t−∆t one more time step ∆t backward, i.e. E[eα1 X1,h1 +α2 X2,h2 |(X1,h1 , X2,h2 ) = (x1 , x2 )]), the error brought in by the interval (t − ∆t, t) should be also of the second order on (h1 , h2 ). Therefore, it can be anticipated that, so on and so forth until the conditioning is on time 0 state, the overall error should be still in the second order on (h1 , h2 ). This suggests an intuitive way to see the second order convergence of the joint MGF. However, the formal proof should be carried out in the future work. Due to the similarity between MGF and characteristic funciton, the joint characteristic functions also converge in the same way, i.e. E[ei(α1 X1,h1 +α2 X2,h2 ) ] −→ E[ei(α1 X1 +α2 X2 ) ] which implies the convergence in joint distribution. Furthermore, the above derivations can be also applied to the joint MGF of n-D MMPP and OU. We expect the error term in joint MGF should be like O(
∀i
t t t t
h2 + i
∀i,j
hi hj )
which is again in the second order on (h1 , · · · , hn ). In addition, it is also possible to analyze the convergence of the steady state distribution. Hopefully, we may completely characterize the steady state distribution as we have done in the 1-D case.
4.4
Default Rate Models with Stochastic Parameters
Based on the proposed 2-D (or n-D) MMPP discretization, in this section we discuss the default term structure of 2-factor (or n-factor) default rate models with stochastic parameters. The main process is certainly a default rate process. Clearly, if a 2-factor model is used, we may allow one more parameter in the default rate process to be made stochastic. Moreover, when a 3-factor model is used, two stochastic parameters are allowed. In this section, we introduce the 2- and 3- factor models in the subsections, allowing drift, or volatility, or both of them to be assumed stochastic. Like the main default rate process, these stochastic parameters are also assumed to follow general 95
mean reverting processes. The objective is to study its survival probability term structure, and the impacts of the stochastic parameters on the term structure. We note that, all the processes (including the default rate process and the stochastic parameter processes) have mean-reverting property and thus are stable in long term. This means the discretization can be made within a specific upper and lower bounds in each dimension without causing much error.
4.4.1
Default Rate Models with Stochastic Drift
In this subsection, we use the following 2-factor model to describe a default rate process (X1 (t)) with stochastic drift (X2 (t)).
β dX1 (t) = k1 (X2 (t) − X1 (t))dt + σ1 X1 1 (t)dW1 (t)
dX2 (t) =
k2 (θ2 − X2 (t))dt
β + σ2 X2 2 (t)dW2 (t)
where X2 (t) takes the place of θ1 , the long-run mean of the process X1 (t), and dW1 (t)· dW2 (t) = ρ · dt. When βi = 0, then the technique of using a process function should be again employed to transform Xi (t) to another constant volatility process. Note that for any discrete state (x1 , x2 ), its default rate is λ(x1 , x2 ) = x1 . Namely,
in the default term structure analysis, in addition to the state transition rates u1 , d1 , u2 , d2 , r1 , r2 , r3 , r there is another transition rate λ jumping to the default absorbing state.
4.4.2
follows.
Default Rate Models with Stochastic Volatility
In this subsection, we keep the drift intact and make the volatility stochastic as dX1 (t) = k1 (θ1 − X1 (t))dt + dX2 (t) = k2 (θ2 − X2 (t))dt +
β X2 (t)X1 1 (t)dW1 (t) β σ2 X2 2 (t)dW2 (t)
2 where X2 (t) take the place of σ1 in the process X1 (t), and dW1 (t) · dW2 (t) = ρ · dt.
Similarly, for any discrete state (x1 , x2 ), the transition rate to the default absorbing state is also λ(x1 , x2 ) = x1 .
4.4.3
Default Rate Models with Stochastic Drift and Volatility
two models. β + X3 (t)X1 1 (t)dW1 (t) + +
β σ2 X2 2 (t)dW2 (t) β σ3 X3 3 (t)dW3 (t)
In this subsection, we use a 3-D model to allow both drift and volatility to be stochastic as follows. It’s a combination of the above dX1 (t) = k1 (X2 (t) − X1 (t))dt dX2 (t) = k2 (θ2 − X2 (t))dt dX3 (t) = k3 (θ3 − X3 (t))dt 96
2 where X2 (t), X3 (t) take the place of θ1 and σ1 in the process X1 (t), and dWi (t) ·
dWj (t) = ρij · dt, i, j = 1, 2, 3, i = j. Similarly, for any discrete state (x1 , x2 , x3 ), the transition rate to the default absorbing state is λ(x1 , x2 , x3 ) = x1 .
4.5
Stock Price Models with Stochastic Parameters
In this section, we discuss another application of the 2-D (or n-D) MMPP discretization to the case where the main process is a stock price process. The objective is to price a general payoff that depends on the stock price at maturity. The major difference between this and previous cases is that a stock price process is not long-term stable (i.e. steady state does not exist) like a mean-reverting default rate process. However, given any maturity time T , a stock price has bounded parameters (mean and variance) and should be still well covered by a suitable set of upper and lower bounds. Namely, we need to use different bounds in the dimension of stock price for differernt choice of maturity time T . When it comes to a derivative pricing problem, we need to consider the stock price and stochastic parameter processes under the risk neutral measure, instead of the real world measure. Risk neutrally, the stock price should have the return rate equal to the spot interest rate r(t). According to the risk neutral pricing formula, for a time T payoff Φ(S(T )), its time 0 price is given by E[e−
T 0
r(t)dt
· Φ(S(T ))]
where E is an expectation taken under risk neutral measure and S(t) is a stock price process. Such expectation has a more complicated form than the survival probability formula, since the payoff term Φ(S(T )) is involved. One possible approach to evaluate this type of expectations is to use the change of numeraire technique[1][2]. However, this is not what we want to use here. Instead, we will use the same technique as that we used in calculating survival probabilities to calculate this expectation. Because of the similarity between the role of interest rate r(t) in the above formula and the role of default rate λ(t) in the survival probability formula, to evaluate this expectation, we simply think of this stock price model as a default rate model with
97
a default rate equal to r(t). Therefore, the following expectation can be obtained through the calculation of the survival probability as E[e−
T 0
r(t)dt
] = P(τ > T ) =
∀i
P(τ > T, X(T ) = xi )
where P(τ > T, X(T ) = xi ) denotes the probability of survival given the state vector information at maturity X(T ) = xi . Note that in the MMPP setting, we are able to calculate these state probabilities P(τ > T, X(T ) = xi ) and P(X(T ) = xi ) ∀i
We also note that the following conditioned expectation can be also obtained by E[e−
T 0
r(t)dt
|X(T ) = xi ] = P(τ > T |X(T ) = xi ) =
P(τ > T, X(T ) = xi ) P(X(T ) = xi )
∀i
Consequently, the desired expectation is thus obtained using the procedure as follows (note that S(t) is an element in the vector X(T ) thus we express the payoff in a more general form as Φ(X(T ))) E[e− =
∀i
T 0
r(t)dt
· Φ(X(T ))]
T 0
P(X(T ) = xi ) · E[e−
r(t)dt
· Φ(X(T ))|X(T ) = xi ]
T 0
=
∀i
P(X(T ) = xi ) · Φ(xi ) · E[e−
r(t)dt
|X(T ) = xi ]
=
∀i
P(X(T ) = xi ) · Φ(xi ) · P(τ > T |X(T ) = xi ) P(τ > T, X(T ) = xi ) · Φ(xi )
∀i
=
In particular, when the interest rate is deterministic r(t) = r, the pricing formula is reduced to e−rT · E[Φ(X(T ))] = e−rT ·
∀i
P(X(T ) = xi ) · Φ(xi )
which is much easier to calculate.
4.5.1
Stock Price Models with Stochastic Drift
In this subsection, we consider a stock price model with stochastic drift. Under risk neutral measure, the drift (return rate) in stock price is equal to the spot interest 98
rate, thus we are actually considering a stock price model with stochastic interest rate. In the following we define a 2-factor model where interest rate is a general mean-reverting process. dS(t) = r(t)dt + vdW1 (t) S(t) dr(t) = k(θ − r(t))dt + σrβ (t)dW (t) 2 where dW1 (t) · dW2 (t) = ρ · dt. When this model is discretized by a 2-D MMPP, then the price of a general payoff can be obtained through the procedure presented above.
4.5.2
Stock Price Models with Stochastic Volatility
In this subsection, we allow the volatility of stock price to be stochastic. This is what the term stochastic volatility models are mostly refered to. The volatility is also assumed to follow another mean-reverting process, which makes the 2-factor model as dS(t) = rdt + V (t)dW1 (t) S(t) dV (t) = k(θ − V (t))dt + σV β (t)dW (t) 2
where dW1 (t) · dW2 (t) = ρ · dt. In the case of β = 1/2, it becomes the famous Heston models[9][8] which have analytic results. Our aim here is again to give an alternative approach to obtain derivative price from our 2-D MMPP discretization.
4.5.3
Stock Price Models with Stochastic Drift and Volatility
In this subsection, we combine the above two models to have the following 3-factor model, where both the drift (interest rate) and volatility are assumed stochastic. dS(t) = r(t)dt + V (t)dW1 (t) S(t) dr(t) = k2 (θ2 − r(t))dt + σ2 rβ2 (t)dW2 (t) dV (t) = k3 (θ3 − V (t))dt + σ3 V β3 (t)dW3 (t) where dWi (t) · dWj (t) = ρij · dt, i, j = 1, 2, 3, i = j. In principle, the derivative price can be obtained through the same procedure, although the computations could be quite slow and cumbersome. If necessary, a technique to speed up the computations can be a direction for future study.
99
4.6
Concluding Remarks
In this chapter, we extend the MMPP discretization technique to two dimensional cases. The idea to construct the two dimensional MMPP is still through matching the first two moments. One major distinction between 2-D and 1-D cases is that we need to deal with the correlation between the driving Brownian motions. We use the cross transition rates along diagonal directions to achieve the desired correlation. Such a method can be further extended to higher dimensional cases. Two classes of stochastic parameter models are discussed in this chapter. In the first class, the main process is a default rate process, and the objective is to calculate the default term structure. When either the drift or the volatility in the default rate process becomes stochastic, we have a 2-factor model, which can be analyzed by a 2-D MMPP. Moreover, we consider the case where both drift and volatility are stochastic, which gives rise to a 3-factor model and is approximated by a 3-D MMPP. In the other class, the main process is a risk neutral stock price process, and the objective is to evaluate the price of a general payoff at a specific maturity time. The major difference between this and the first class is that the stock price is not stable in long term, thus we have to define a set of upper and lower bounds for each maturity time. The price of the general payoff is given by the risk neutral pricing formula, which can be calculated through all the state probabilities in MMPP. The major problem of the higher dimensional MMPP discretization is the problem size. The MMPP technique is, in a sence, like the tree method or finite difference method, and is subject to the so-called curse of dimensionality[13]. When more dimensions are involved, the problem size (number of states) grows up exponentially and soon becomes difficult to deal with. In such a higher dimensional case, the development of efficient computing methods is necessary. This chapter is basically an introduction of our plans to deal with the n-factor models. Only the main ideas are presented, and many analytic works and computations/simulations will be carried out in the future. In addition, some extended topics such as developing more efficient computation methods, will also be studied in the future.
100
Chapter 5 Summaries and Future Works
5.1 Summaries of Current Works
As the last chapter of this report, we give a brief summary of our work so far, and point our the future directions of this research topic. The central idea in this work is to use MMPP to discretize diffusion models. Through moment matching, we determine the transition rates between adjacent states. When the original diffusion models are turned into MMPP setting, we may use standard Markov Chain techniques to solve all the state probabilities. These will enable us to solve the expectations of the following types E[e−
T 0
X(t)dt
]
and
E[e−
T 0
r(t)dt
· Φ(X(T ))]
which represents the survival probability (X(t) : default rate) and the derivative price (X(t) : stock price). The main conclusion in Chapter 2 is that the moment matched (1-D) MMPP converges weakly to its original OU process. Moreover, expectations (including some path-dependent expectations) converge in second order. This is a theoretical work which aims to give an insight on how well the MMPP approximation is. In Chapter 3 we have seen that the MMPP can be extended to deal with more general diffusion models, where a general drift or volatility is allowed. It is usually difficult to have analytical solutions for such models, and quite often we have to turn to Monte-Carlo. The MMPP technique in fact offers a viable alternative approach for these models. With our MMPP discretization, their default term structure can be obtained through some matrix computations. In Chapter 4 we further extend to 2- or n-D MMPP in order to deal with 2- or n-factor models. Matching both the first two moments in each dimension, as well as the covariance between any two dimensions, we may derive the transition rates of the 101
corresponding MMPP. There is also a sign showing that the second order convergence is preserved to such higher dimensional case. The applications are mainly on the diffusion processes with stochastic parameters. In summary, Chapter 2 is a more complete piece of work, while Chapter 3 and 4 should be seen as reserach plans and only main ideas are presented. Many analytical and numerical works will be carried out in the future.
5.2
Future Works
Apart from the works we have to finish in the previous chapters, there are a few future directions of this research topic we can follow. These include at least 1. Defaultable bond pricing 2. Correlated default In the pricing of defaultable bond, it is important to model the interest rate and default rate simultaneously. A 2-D MMPP should be established, and more general model behaviours and pricing problems will be considered, such as • Default rate is sensitive to the change of interest rate. • Stochastic default recovery rate. • Pricing of payoff that depends the second or subsequent default events. A correlated default model involving two default entities is basically a 2-factor model, and is suitable to be described by a 2-D MMPP. Based on this 2-D MMPP, we plan to incorporate • Joint jumps in default rates • Contagious/Infectious effect in order to generate sufficient correlations in joint default events. This will lead to the pricing problems of credit derivatives such as first-to-default swap[14]. In addition, we shall extend the 2-entity models to include more defaultable entities.
102
Bibliography
[1] T. Bjork. Arbitrage Theory in Continuous Time. Oxford University Press, 1 edition, 1998. [2] D. Brigo and F. Mercurio. Interest Rate Models, Theory and Practice. Springer, 2001. [3] R. L. Burden and J. D. Faires. Numerical Analysis. PWS-KENT, 4 edition, 1989. [4] D. Duffie and K.J. Singleton. Credit Risk. Princeton University Press, 2003. [5] P. Glasserman. Monte Carlo Methods in Financial Engineering. Springer, 2004. [6] D. Gross and C. M. Harris. Fundamentals of Queueing Theory. Wiley, 2 edition, 1985. [7] H. Heffes and D.M. Lucantoni. A markov modulated characterization of packetized voice and data traffic and related statistical multiplexer performance. IEEE J. Selected Areas in Commun., Vol. SAC-4, No. 6:856–868, 1986. [8] S.L. Heston. A closed form solution for options with stochastic volatility with applications to bonds and currency options. Review of Financial Studies, Vol. 6, No. 2:327–343, 1993. [9] J.C. Hull. Options, Futures, and Other Derivatives. Wiley, 5 edition, 2003. [10] O. Kallenberg. Foundations of Modern Probability. Springer, 1997. [11] M. Kijima. Stochastic Processes with Applications to Finance. Chapman and Hall/CRC, 2003.
103
[12] P.E. Kloeden and E. Platen. Numerical Solution of Stochastic Differential Equations. Springer-Verlag, 1992. [13] Y.D. Lyuu. Financial Engineering and Computation, Principles, Mathematics, Algorithms. Cambridge University Press, 5 edition, 2002. [14] P. Schonbucher. Credit Derivatives Pricing Models. Wiley, 2003. [15] D.A. Tavella. Quantitative Methods in Derivatives Pricing. Wiley, 2002. [16] J.B. Walsh. The rate of convergence of the binomial tree scheme. Finance and Stochastics, Vol. 7:337–361, 2003. [17] D. Williams. Probability with Martingales. Cambridge University Press, 1991. [18] P. Wilmott. Derivatives. Wiley, 1998.
104