Switching VARMA Term Structure Models

Document Sample
Switching VARMA Term Structure Models Powered By Docstoc
					       Switching VARMA Term Structure Models
                                        (1)                               (2)
               Alain MONFORT                  Fulvio PEGORARO

                          First version : February, 2005
                            This version : April, 2006


                       [Preliminary and incomplete version]


                                  Abstract
                    Switching VARMA Term Structure Models


The purpose of the paper is to propose a global discrete-time modeling of the term struc-
ture of interest rates able to capture simultaneously the following important features : (i)
interest rates with an historical dynamics involving several lagged values, and switching
regimes; (ii) a specification of the stochastic discount factor (SDF) with time-varying and
regime-dependent risk-premia; (iii) the possibility to derive explicit or quasi explicit for-
mulas for zero-coupon bond and interest rate derivative prices; (iv) the positiveness of
the yields at each maturity. We develop the Switching Autoregressive Normal (SARN)
Term Structure model of order p and the Switching Autoregressive Gamma (SARG) Term
Structure model of order p. Regime shifts are described by a Markov chain with (histori-
cal) state-dependent transition probabilities. In both cases multifactor generalizations are
proposed. An empirical application to the U.S. term structure of interest rates, observed
from June 1964 to December 1995, is presented.


Keywords : Affine Term Structure Models, Stochastic Discount Factor, Car processes,
Switching Regimes, VARMA processes, Lags, Positiveness, Derivative Pricing.

JEL number : C1, C5, G1


   1
     CNAM, Chaire de Modelisation Statistique, 292, rue Saint-Martin 75141 Paris cedex
03 (France); E-mail: monfort@cnam.fr. CREST, Laboratoire de Finance-Assurance, Bu-
                                                  e
reau 1121, Timbre J320, 15, Boulevard Gabriel P´ri, 92245 Malakoff Cedex (France);
E-mail: monfort@ensae.fr.
   2
                           e                               e
     CEREMADE, Universit´ Paris-Dauphine, Place du Mar´chal de Lattre de Tassigny
75775 Paris Cedex 16 (France); E-mail: pegoraro@ceremade.dauphine.fr. CREST, Lab-
                                                                                   e
oratoire de Finance-Assurance, Bureau 1112, Timbre J320, 15, Boulevard Gabriel P´ri,
92245 Malakoff Cedex (France); E-mail: pegoraro@ensae.fr.
1     INTRODUCTION
In this paper we propose a global discrete-time modeling of the term struc-
ture of interest rates, which captures simultaneously the following important
features :

    - interest rates with an historical dynamics involving several lagged val-
      ues, and switching regimes;

    - a specification of the stochastic discount factor (SDF) with time-
      varying and regime-dependent risk-premia;

    - the possibility to derive explicit or quasi explicit formulas for zero-
      coupon bond and interest rate derivative prices;

    - the positiveness of the yields at each maturity.

     It is well known in the literature that interest rates show an histori-
cal dynamics involving lagged values and switching regimes [see, among the
others, Hamilton (1988), Cai (1994), Driffill and Sola (1994), Garcia and
Perron (1996), Gray (1996), Boudoukh, Richardson, Smith, and Whitelaw
(1999), Ang and Bekaert (2002a, 2002b), Christiansen (2002), Christiansen
and Lund (2005), Cochrane and Piazzesi (2005)]; indeed, changes in the busi-
ness cycle conditions or monetary policy may affect real rates and expected
inflation and cause interest rates to behave quite differently in different time
periods, both in terms of level and volatility. In addition, there is a large
empirical literature on bond yields, based in general on the class of Affine
Term Structure Models (ATSMs)3 , suggesting that regime switching models
describe the term structure of interest rates better than single-regime mod-
els [see, for example, Bansal and Zhou (2002), Driffill, Kenc and Sola (2003),
Evans (2003), Ang and Bekaert (2005), Dai Singleton and Yang (2005)].
     This results lead us to propose dynamic term structure models (DTSMs)
where the yield curve is driven by a univariate or multivariate factor (xt )
    3
      The Affine family of dynamic term structure models (DTSMs) is characterized by the
fact that the zero-coupon bond yields are affine functions of Markovian state variables,
and it gives closed-form expressions for zero-coupon bond prices which greatly facilitates
pricing and econometric implementation [see Duffie and Kan (1996), and Dai and Singleton
(2003) and Piazzesi (2003) for a survey]. Observe that the Affine Term Structure family
is much larger that it has been considered in the literature : indeed, it has been observed
recently that the family of Quadratic Term Structure Models (QTSMs) [see Beaglehole
and Tenney (1991), Ahn, Dittmar and Gallant (2002), and Leippold and Wu (2002)] is a
special case of the Affine class obtained by stacking the factor values and their squares
[see Gourieroux and Sufana (2003), Cheng and Scaillet (2005)].


                                            1
which depends on its p most recent lagged values [Xt , say] and for which all
the sensitivity coefficients depend on the present and past values of a latent
J-states non homogeneous Markov Chain (zt ) [Zt , say] describing different
regimes in the economy. Consequently, the joint dynamics of (xt , zt ) is not
a Compound Autoregressive (Car) process4 under the historical probability,
and allows for nonlinearities which has been documented in the literature [see
Ait-Sahalia (1996), Stanton (1997), Ang and Bekaert (2002b)]. The factor
(xt ) is considered as an exogenous variable or an endogenous variable: in
the second case the factor is a vector of several yields.
     We consider an exponential-affine SDF with time-varying and regime-
dependent risk correction coefficients defined as functions of the present
and past values of the factor (xt ) and the regime indicator function (zt ).
In our models, both factor risk and regime-shift risk are priced, and this
is done by taking into account not just the information at date t, that
is (xt , zt ), but a larger information given by (Xt , Zt ). This specification
leads to stochastic and regime-dependent risk premia. This specification is
coherent with the recent empirical literature which suggests to define risk
correction coefficients as functions, at the same time, of the factors and their
volatilities, in order to well replicate the observed temporal variation of one-
period expected excess returns on zero-coupon bonds [see Ahn, Dittmar and
Gallant (2002), Dai and Singleton (2002), Duffee (2002), Duarte (2004), Dai,
Singleton and Yang (2005)]. Moreover, the fact to consider these coefficients
as function of (Xt , Zt ) lead to a multi-lag specification which generalizes the
Markovian of order one specifications proposed in the literature [see Dai
and Singleton (2000), Duffee (2002), Duarte (2004), Cheridito, Filipovic
and Kimmel (2005), Dai, Le and Singleton (2006)].
     At the same time, we want to exploit the tractability of Car models, and
obtain explicit or quasi explicit formula for zero-coupon bond and interest
rate derivative prices. This result is achieved by matching the historical
distribution and the SDF in order to get a Car risk-neutral joint dynamics
for (xt , zt ). Moreover, in this paper we deeply use the nice property of the
Car family of being able to incorporate lags and switching regimes.
     It is now well known [see Gourieroux, Monfort and Polimenis (2005),
and Darolles, Gourieroux, Jasiak (2006)] that the class of discrete-time
affine (Car) models is much larger than the discrete-time counterparts of
the continuous-time affine processes [see Duffie and Kan (1996), Dai and
Singleton (2000), and Duffie, Filipovic and Schachermayer (2003)].
   4
     A Car (discrete-time affine) process is a Markovian process with an exponential-affine
conditional Laplace transform [see Darolles, Gourieroux, Jasiak (2006) for details].



                                          2
     We develop the Switching Autoregressive Normal (SARN) Term Struc-
ture model of order p and the Switching Autoregressive Gamma5 (SARG)
Term Structure model of order p, and in both cases we propose multifac-
tor generalizations : the Switching Vector Autoregressive Normal (SVARN)
and the Switching Vector Autoregressive Gamma (SVARG) Term Structure
models of order p.
     Even if the Gaussian family of models does not guarantee the positive-
ness of the yields for every time to maturity [see, among the others, Vasicek
(1977), Dai and Singleton (2000), Bekaert and Grenadier (2001), Ang and
Bekaert (2002), Ang and Piazzesi (2003), Ang, Piazzesi and Wei (2005)], we
study the SARN(p) Term Structure model (and its multivariate generaliza-
tion), because it extends many standard models, like the ones just mentioned
above and the more recent ones like Dai, Singleton and Yang (2005). Indeed,
the historical and risk-neutral dynamics of (xt ) depends on several of their
lagged values and on several lagged values of the regime-indicator variable
(zt ). In this general setting, we are able to derive formulas, for the yield
curve and for the price of derivatives, with simple analytical or quasi explicit
representations.
     The second kind of models we propose in the paper, based on the (scalar
and vector) Switching Autoregressive Gamma process of order p (which has a
Regime-Switching AR(p) representation with a martingale difference error),
implies the positiveness of the yields for each time to maturity, and regard-
less of an exogenous or endogenous specification for the factor (xt ). The
SARG(p) and the SVARG(p) term structure models give the possibility to
replicate complex nonlinear (historical and risk-neutral) factor dynamics and
provide explicit or tractable formulas for zero-coupon bond and derivative
prices. In a related study, Bansal and Zhou (2002) propose an (approximate,
scalar and bivariate) discrete-time Cox-Ingersoll-Ross term structure model
with regime shifts. We extend their framework, using the exact discrete-
time equivalent of the CIR process (with switching regimes) generalized to
an autoregressive order p larger than one (the SARG(p) and the SVARG(p)
processes), allowing for a non homogeneous historical transition matrix for
(zt ), pricing the regime-shift risk, and providing an exact yield to maturity
formula [in Bansal and Zhou (2002), (zt ) is an homogeneous Markov chain,
the associated risk correction coefficient is assumed equal to zero, and the
term structure formula they provide is based on a log-linear approximation
   5
    The Autoregressive Gamma (ARG) process is a Car process, and the ARG(1) specifi-
cation is the discrete-time counterpart of the Cox-Ingersoll-Ross process [see Gourieroux
and Jasiak (2006), Cox, Ingersoll, and Ross (1985)].



                                           3
applied on the fundamental asset pricing equation].
     In a recent paper Dai, Le and Singleton (2006) propose a (discrete-time
multivariate) conditionally Gaussian term structure model where nonlinear-
ities are introduced in the (latent) state-factor (historical and risk-neutral)
dynamics by means of stochastic volatility factors, for which the risk-neutral
conditional distribution is described by a particular VARG(1) process with
conditionally independent components. The switching vector Autoregressive
Gamma process we use to describe the risk-neutral dynamics of the factor
(xt ), in the SVARG(p) term structure model, presents three generalizations
with respect to their Markovian of order one specification: a) we consider
an autoregressive order p in general larger than one; b) conditionally to the
present and past values of xt and zt , there is dependence between the com-
ponents of the factor xt+1 ; c) the historical and risk-neutral dynamics of
xt+1 is affected by switching regimes.
    The plan of the paper is as follows. In Section 2, we present the Index-
Car(p) processes. This family of processes is developed in a univariate and
multivariate setting, with and without Switching Regimes. In particular,
we study the (scalar and vector) Autoregressive Gaussian of order p mod-
els and the (scalar and vector) Autoregressive Gamma of order p models,
under single-regime and regime-switching specifications. Then, this class of
processes is used, following the SDF modeling principle, to the derive the
SARN(p) and the SARG(p) discrete-time term structure models, and their
multivariate generalizations. In Section 3 we study the SARN(p) and the
SVARN(p) Term Structure models, we derive the Generalized Linear Term
Structure formulas and we specify the historical and risk-neutral dynamics
of the yield curve processes. These results are given for an exogenous or an
endogenous factor. Moreover, we discuss the propagation of shocks on the
interest rate surface. Section 4 deal with the SARG(p) and the SVARG(p)
Term Structure models. Here, regardless the endogenous or exogenous na-
ture of the factor (xt ), we derive the Generalized Linear Term Structure
formulas and the yield curve processes, and we guarantee the positiveness of
the yields for each time to maturity. Finally, the pricing methodology pro-
posed in sections 3 and 4, for zero-coupon bonds, is generalized in Section 5
to the case of interest rate derivatives. Section 6 concludes and appendices
gather the proofs.




                                      4
2      LAPLACE TRANSFORMS, CAR(p)
       PROCESSES AND SWITCHING REGIMES

It is now well documented [see e.g. Darolles, Gourieroux and Jasiak (2006),
Gourieroux and Monfort (2006), Gourieroux, Monfort and Polimenis (2002,
2003), Polimenis (2001)] that the Laplace transform (or moment generating
function) is a very convenient mathematical tool in many financial domains.
It is, in particular, a crucial notion in the theory of Car(p) processes [see
Darolles, Gourieroux and Jasiak (2006) for details].

2.1     Definition of a Car(p) process
                                                                   ˜    x
Definition 1 [Car(p) process]: A n-dimensional process x = (˜t , t ≥ 0) is
a compound autoregressive process of order p [Car(p)] if the distribution of
˜                          ˜     x ˜
xt+1 given the past values xt = (˜t , xt−1 , . . .) admits a real Laplace transform
of the following type:
                   ˜        ˜
           E exp(u xt+1 ) | xt

                       ˜
           = Et [exp(u xt+1 )]                                                 (1)

           = exp a1 (u) xt + . . . + ap (u) xt+1−p + ˜
                 ˜      ˜            ˜      ˜        b(u) ,    u ∈ Rn ,

where ai (u), i ∈ {1, . . . , p}, and b(u) are nonlinear functions, and where
ap (u) = 0, ∀ u ∈ Rn . The existence of this Laplace transform in a neigh-
borhood of u = 0, implies that all the conditional moments exist, and that
the conditional expectations and variance-covariance matrices (and all con-
                                               x ˜             ˜
ditional cumulants) are affine functions of (˜t , xt−1 , . . . , xt+1−p ).

2.2     Univariate Index-Car(p) process
An important class of Car(p) processes are the Index-Car(p) processes, which
are built from a Car(1) process. In this section we consider a univariate
process xt and the multivariate case will be considered in sections 2.6 and
2.7.
Definition 2 [Univariate Index-Car(p) process]: Let exp[a(u)yt +b(u)]
be the conditional Laplace transform of a univariate Car(1) process yt , the
process xt admitting a conditional Laplace transform defined by:
    E exp(uxt+1 ) | xt = exp [a(u)(β1 xt + . . . + βp xt+1−p ) + b(u)] , u ∈ R ,
                                                                               (2)

                                         5
is called an Univariate Index-Car(p) process.
Note that, if yt is a positive process and if the parameters β1 , . . . , βp are
positive, the process xt will be positive.
   Using the notation β = (β1 , . . . , βp ) and Xt = (xt , xt−1 , . . . , xt+1−p ) ,
the Laplace transform (2) can be written as:

                   E exp(uxt+1 ) | xt = exp [a(u)β Xt + b(u)] .                          (3)

2.3     Examples of Univariate Index-Car(p) processes
a. Gaussian model
If yt is a Gaussian AR(1) process defined by:

                                 yt+1 = ν + ρyt + εt+1

where εt+1 is a gaussian white noise distributed as N (0, σ 2 ), the conditional
Laplace transform of yt+1 given yt is:

                                                              σ2 2
                  E exp(uyt+1 ) | yt = exp uρyt + uν +        2 u     .
                                                                2
The process is Car(1) with a(u) = uρ and b(u) = uν + σ u2 . The associated
                                                      2
Index-Car(p) process has a conditional Laplace transform defined by:
                                                                              σ2 2
      E exp(uxt+1 ) | xt = exp uρ(β1 xt + . . . + βp xt+1−p ) + uν +          2 u    ;

so, using the notation ϕi = ρβi , we see that xt+1 is the Gaussian AR(p)
process defined by:

                   xt+1 = ν + ϕ1 xt + . . . + ϕp xt+1−p + εt+1                           (4)

and its conditional Laplace transform becomes:
                                                               σ2 2
                E exp(uxt+1 ) | xt = exp uϕ Xt + uν +          2 u        ,              (5)

where ϕ = (ϕ1 , . . . , ϕp ) .
b. Gamma model
   Let us now consider an autoregressive gamma of order one [ARG(1)]
process yt . The conditional Laplace transform is [see Gourieroux and Jasiak
(2005) for details]:
                                  ρu
 E exp(uyt+1 ) | yt = exp        1−uµ   yt − ν log(1 − uµ) , ρ > 0 , µ > 0 , ν > 0 ,

                                            6
and it is well known that, given yt , yt+1 can be obtained by first drawing a
latent variable Ut+1 in the Poisson distribution P( ρyt ) and, then, drawing
                                                      µ
yt+1
  µ  in the gamma distribution γ(ν +Ut+1 ). The process yt+1 is positive and
the associated Index-Car(p) process xt+1 is also positive. The conditional
Laplace transform of this process is:
                                    ρu
 E exp(uxt+1 ) | xt = exp          1−uµ (β1 xt    + . . . + βp xt+1−p ) − ν log(1 − uµ) ,

with βi ≥ 0, for i ∈ {1, . . . , p}, or using the same notation as above:
                                                u
           E exp(uxt+1 ) | xt = exp           1−uµ ϕ    Xt − ν log(1 − uµ) .                (6)

                                                                           X
    Similarly, given Xt , xt+1 can be obtained by drawing Ut+1 in P( ϕ µ t )
     xt+1
and µ in γ(ν+Ut+1 ). It easily seen that the conditional mean and variance
of xt+1 , given xt , are respectively given by νµ + ϕ Xt and νµ2 + 2µϕ Xt ; so,
the process xt+1 has the weak AR(p) representation:

                               xt+1 = νµ + ϕ Xt + εt+1 ,                                    (7)

where εt+1 is a conditionally heteroscedastic martingale difference, whose
conditional variance is νµ2 + 2µϕ Xt ; the process is stationary if and only
if ϕ e < 1 [where e = (1, . . . , 1) ∈ Rp ] and, in this case, the process εt+1 has
                                                          ϕe
finite unconditional variance given by νµ2 + 2νµ2 1−ϕ e . The unconditional
                              νµ
mean of xt+1 is given by 1−ϕ e .

2.4     Univariate Switching regimes Car(p) process
Let us first consider a J-states homogeneous Markov Chain zt+1 , which can
take the values ej ∈ RJ , j ∈ {1, . . . , J}, where ej is the j th column of the
(J × J) identity matrix. The transition probability, from state ei to state
ej is π(ei , ej ) = P r(zt+1 = ej | zt = ei ). It is first worth noting that zt+1 is
a Car(1) process.
Proposition 1 : The Markov chain process zt+1 is a Car(1) process with a
conditional Laplace transform given by:

                      E[exp(v zt+1 )| zt ] = exp(az (v, π) zt ) ,                           (8)

where
                                                                                         
                     J                                              J
az (v, π) = log          exp(v ej )π(e1 , ej ) , . . . , log          exp(v ej )π(eJ , ej ) .
                     j=1                                            j=1


                                              7
[Proof : straightforward.]
   Let us now consider a univariate Index-Car(p) process with a conditional
Laplace transform given by exp [a(u)β Xt + b(u)], and let us assume that
b(u) can be written:

            b(u) = ˜
                   b(u) λ        where
                                                                             (9)
            ˜
            b(u) = (b1 (u), . . . , bm (u)) and λ = (λ1 , . . . , λm ) .

    We can generalize this model by assuming that the parameters λi are
stochastic and linear functions of Zt = (zt , . . . , zt−p ) . More precisely, we
assume that the conditional distribution of xt+1 given xt and zt+1 has a
Laplace transform given by:

           E[exp(uxt+1 )| xt , zt+1 ] = exp a(u)β Xt + ˜
                                                       b(u) ΛZt ,           (10)

where Λ is a [m, (p + 1)J] matrix. Note that we assume no instantaneous
causality between xt+1 and zt+1 and we admit one more lag in Zt that in Xt
[examples given in Section 2.5 show that this assumption may be convenient];
if the process zt is not observed by the econometrician the no instantaneous
causality assumption is not really important at the estimation stage since
we could rename zt as zt+1 , however it will be useful at the pricing level in
order to obtain simple pricing procedures [Dai, Singleton and Yang (2005)
also make this kind of assumption]. The joint process (xt+1 , zt+1 ) is easily
seen to be a Car(p + 1) process.
Proposition 2 : The conditional Laplace transform of (xt+1 , zt+1 ) given
xt , zt has the following form:

                  E exp(uxt+1 + v zt+1 ) | zt , xt
                                                                            (11)
              = exp a(u)β Xt + e1 ⊗ az (v, π) + ˜
                                                b(u) Λ Zt              ,

where e1 is the first component of the canonical basis in Rp+1 , and where ⊗
denotes the Kronecker product.
[Proof : straightforward.]




                                         8
2.5    Examples of Univariate Switching regimes Car(p) pro-
       cesses
a. Gaussian case
    Let us start from the AR(p) model (4). Its conditional Laplace transform
is given by (5):

                                                                σ2 2
               E exp(uxt+1 ) | xt = exp uϕ Xt + uν +            2 u       ,

                                                       2
and the function b(u) has the form (9) with ˜
                                            b(u) = u, u
                                                      2              and λ = (ν, σ 2 ).
    If λ is replaced by ΛZt , the joint process (xt+1 , zt+1 ) is Car(p + 1) with
a conditional Laplace transform given by:

                       E exp(uxt+1 + v zt+1 ) | zt , xt
                                                                                   (12)
                                               u2
                   = exp uϕ Xt + u,            2    ΛZt + az (v, π)zt .

                                                                              λ1
More precisely, the dynamics is given by [using the notation Λ =                 ]:
                                                                              λ2

                      xt+1 = λ1 Zt + ϕ Xt + (λ2 Zt )1/2 εt+1 ,                     (13)

where εt+1 is a gaussian white noise distributed as N (0, σ 2 ), Zt = (zt , . . . , zt−p )
and zt is a Markov chain such that P r(zt+1 = ej | zt = ei ) = π(ei , ej ).
   In particular, let us consider the case:

                              (1, −ϕ1 , . . . , −ϕp ) ⊗ ν ∗
                       Λ=                                                          (14)
                                     e1 ⊗ σ ∗2
              ∗          ∗             ∗2           ∗2
and ν ∗ = (ν1 , . . . , νJ ), σ ∗2 = (σ1 , . . . , σJ ), the conditional distribution
of xt+1 given xt and zt+1 is the one corresponding to the switching AR(p)
model defined by:

 xt+1 − ν ∗ zt = ϕ1 (xt − ν ∗ zt−1 ) + . . . + ϕp (xt+1−p − ν ∗ zt−p ) + (σ ∗ zt )εt+1 .
                                                                               (15)
b. Gamma case
    Let us now start from the ARG(p) process associated with the condi-
tional Laplace transform (6):

                                              u
            E exp(uxt+1 ) | xt = exp        1−uµ ϕ   Xt − ν log(1 − uµ) .

                                           9
Here we have ˜b(u) = − log(1 − uµ) and λ = ν. If ν is replaced by ΛZt , where
ΛZt > 0, the process xt has, conditionally to the process zt , a weak AR(p)
representation given by:

               xt+1 = µΛZt + ϕ1 xt + . . . + ϕp xt+1−p + ζt+1 ,                         (16)

where ζt+1 is a conditionally heteroscedastic martingale difference. For in-
stance, we can take :

                                                   ˜
                                                   ν
                                    Λ = e1 ⊗         ,                                  (17)
                                                   µ
                                                    ˜
                                                    ν
       ˜     ν          ˜     ˜
where ν = (˜1 , . . . , νJ ), νj ≥ 0. We have ΛZt = µ zt and, conditionally to
the process zt , the process xt has a weak AR(p) representation given by:

                       ˜
                xt+1 = ν zt + ϕ1 xt + . . . + ϕp xt+1−p + ζt+1 .                        (18)

                                                                                        ˜
                                                                                        ν
It is also possible to consider a Λ of the form (1, −ϕ1 , . . . , −ϕp ) ⊗               µ   if
                         J                         1                           J
    ν          ν
min(˜i ) > max(˜i )      i=1 ϕj ,
                          since in this case ΛZt = µ ν zt −
                                                     ˜                         i=1 ϕj   ˜
                                                                                        ν zt−i
≥ 0. The weak conditional AR(p) representation is then given by:

          ˜               ˜                               ˜
   xt+1 − ν zt = ϕ1 (xt − ν zt−1 ) + . . . + ϕp (xt+1−p − ν zt−p ) + ζt+1 .
                                                                          (19)

2.6    Specification of multivariate Car(1) processes
In order to have simple notations we will consider the bivariate case, but
all the results are easily extended to the general case. A bivariate Car(1)
process yt = (y1,t , y2,t ) will be defined in a recursive way. We consider two
univariate exponential affine Laplace transforms :

                                    exp [a1 (u1 )w1,t + b1 (u1 )] ,
                                                                                        (20)
                        and         exp [a2 (u2 )w2,t + b2 (u2 )] .

Then, we assume that the conditional distribution of y1,t+1 given (y2,t+1 , y1,t , y2,t )
has a Laplace transform given by :

            Et [exp(u1 y1,t+1 ) | y2,t+1 , y1t , y2t ]
                                                                                        (21)
               = exp [a1 (u1 )(βo y2,t+1 + β11 y1,t + β12 y2,t ) + b1 (u1 )]


                                              10
and the conditional distribution of y2,t+1 , given (y1,t , y2,t ), has a Laplace
transform given by

    Et [exp(u2 y2,t+1 ) | y1,t , y2,t ] = exp [a2 (u2 )(β21 y1,t + β22 y2,t ) + b2 (u2 )] .
                                                                                          (22)
    Note that, if the Laplace transforms (20) correspond to positive variables
and if the parameters βo , β11 , β12 , β21 , β22 are positive the bivariate process
yt has positive components. Moreover, we have the following result :
Proposition 3 : The bivariate process yt defined by the conditional dy-
namics (21), (22) is a bivariate Car(1) process with a conditional Laplace
transform given by :

              E[exp(u1 y1,t+1 + u2 y2,t+1 )| y1,t , y2,t ]

         = exp {[a1 (u1 )β11 + a2 (u2 + a1 (u1 )βo )β21 ]y1,t
                                                                                        (23)
                     +[a1 (u1 )β12 + a2 (u2 + a1 (u1 )βo )β22 ]y2,t

                                             +b1 (u1 ) + b2 (u2 + a1 (u1 )βo )} .

[Proof : see Appendix 1.]

2.7     Specification of multivariate Index-Car(p) processes
                                       ˜
We consider a bivariate process xt = (x1,t , x2,t ) and we introduce the no-
tations : X1t = (x1,t , . . . , x1,t+1−p ) , X2t = (x2,t , . . . , x2,t+1−p ) . Given the
univariate Laplace transforms like (20), a bivariate Index-Car(p) is defined
in the following way.
Definition 3 : A bivariate Index-Car(p) dynamics is defined by the condi-
tional Laplace transforms:

      Et [exp(u1 x1,t+1 ) | x2,t+1 , x1,t , x2,t ]

         = exp [a1 (u1 )(βo x2,t+1 + β11 X1t + β12 X2t ) + b1 (u1 )] ,

     Et [exp(u2 x2,t+1 ) | x1,t , x2,t ] = exp [a2 (u2 )(β21 X1t + β22 X2t ) + b2 (u2 )] ,
                                                                                     (24)
                                                                          ˜
where the βij are p-vectors. It is easily seen that the process xt is a Car(p)
process with a conditional Laplace transform given by (23) in which y1,t is



                                               11
replaced by X1t and y2,t by X2t and the βij by the βij , i.e.

                          ˜        ˜
                  E exp(u xt+1 ) | xt

             = exp{[a1 (u1 )β11 + a2 (u2 + a1 (u1 )βo )β21 ] X1t
                                                                            (25)
                           +[a1 (u1 )β12 + a2 (u2 + a1 (u1 )βo )β22 ] X2t

                                + b1 (u1 ) + b2 (u2 + a1 (u1 )βo )} .

From the properties of Car(p) processes we get a representation of the form:
        
         x1,t+1 = α1 + αo x2,t+1 + α11 X1t + α12 X2t + ε1,t+1
                                                                        (26)
        
           x2,t+1 = α2 + α21 X1t + α22 X2t + ε2,t+1

where the errors terms satisfy :

                                               ˜
                           E[ε1,t+1 | x2,t+1 , xt ] = 0
                                                                            (27)
                                      ˜
                           E[ε2,t+1 | xt ]          = 0;

in particular, we get

                    ˜
         E[ε1,t+1 | xt ]         = 0

                    ˜
         E[ε2,t+1 | xt ]         = 0

                                                   ˜
         Cov(ε1,t+1 , ε2,t+1 ) = E(ε1,t+1 ε2,t+1 | xt )                     (28)

                                                                ˜ ˜
                                 = E ε2,t+1 E(ε1,t+1 | x2,t+1 , xt ) | xt

                                 = 0.

So, the error terms are non correlated, conditionally heteroscedastic, mar-
tingale differences. In particular, in the stationary case, ε1,t and ε2,t are
uncorrelated weak white noises and (26) is a weak recursive VAR(p) repre-
                          ˜
sentation of the process xt .
    In the rest of the paper we will consider two important particular cases.




                                             12
a) Normal VAR(p) or VARN(p) processes
In this case the conditional distributions defined by (20) are gaussian, with
affine expectations and fixed variances. In other words:
                                                                 2
                                                                σ1 u 2
                       a1 (u1 ) = ρ1 u1 , b1 (u1 ) = ν1 u1 +      2
                                                                     1

                                                                                     (29)
                                                                 2
                                                                σ2 u 2
                       a2 (u2 ) = ρ2 u2 , b2 (u2 ) = ν2 u2 +      2
                                                                     2
                                                                         .
Using the notations ϕo = ρ1 βo , ϕ11 = ρ1 β11 , ϕ12 = ρ1 β12 , ϕ21 = ρ2 β21 ,
ϕ22 = ρ2 β22 , we have the following strong VAR(p) recursive representation
                 ˜
for the process xt = (x1,t , x2,t ) :
      
       x1,t+1 = ν1 + ϕo x2,t+1 + ϕ11 X1t + ϕ12 X2t + σ1 η1,t+1
                                                                       (30)
      
         x2,t+1 = ν2 + ϕ21 X1t + ϕ22 X2t + σ2 η2,t+1 ,
where ηt = (η1,t , η2,t ) is a bivariate gaussian white noise distributed as
N (0, I2 ), where I2 denotes the (2 × 2) identity matrix.
b) Gamma VAR(p) or VARG(p) processes
In this case we have:
                                ρ1 u1
               a1 (u1 ) =      1−u1 µ1   , b1 (u1 ) = −ν1 log(1 − u1 µ1 )
                                                                                     (31)
                                ρ2 u2
                  a2 (u2 ) =   1−u2 µ2   , b2 (u2 ) = −ν2 log(1 − u2 µ2 ) ,
                  ˜
and the process xt = (x1,t , x2,t ) has the following weak VAR(p) represen-
tation (using the same notation as above, and where all the parameters are
positive):
      
       x1,t+1 = ν1 µ1 + ϕo x2,t+1 + ϕ11 X1t + ϕ12 X2t + ξ1,t+1
                                                                       (32)
      
          x2,t+1 = ν2 µ2 + ϕ21 X1t + ϕ22 X2t + ξ2,t+1 ,
where ξ1,t and ξ2,t are non correlated, conditionally heteroscedastic, mar-
tingale differences. The conditional variances of ξ1,t+1 and ξ2,t+1 are given
by:
   V [ξ1,t+1 | xt ] = ν1 µ2 + 2µ1 [ϕo (ν2 µ2 + ϕ21 X1t + ϕ22 X2t )
               ˜          1


                                                             + ϕ11 X1t + ϕ12 X2t ]   (33)

   V [ξ2,t+1 | xt ]
               ˜      = ν2 µ2 + 2µ2 (ϕ21 X1t + ϕ22 X2t ) .
                            2
It is important to stress that the components of this VARG(p) process are
positive.

                                               13
2.8    Switching Multivariate Index-Car processes
Switching regimes can be introduced in a multivariate Index-Car(p) model
using a method extending the one retained in the univariate case. If we
assume that the functions b1 (u1 ), b2 (u2 ) appearing in definition 3 can be
                          ˜               ˜
written, respectively, as b1 (u1 ) λ1 and b2 (u2 ) λ2 , and if we replace λ1 and λ2 ,
respectively by Λ1 Zt and Λ2 Zt , we obtain the following conditional Laplace
transform for the distribution of (x1,t+1 , x2,t+1 , zt+1 ) given (x1,t , x2,t , zt ):

          E[exp(u1 x1,t+1 + u2 x2,t+1 + v zt+1 )| x1,t , x2,t , zt ]

      = exp {[a1 (u1 )β11 + a2 (u2 + a1 (u1 )βo )β21 ] X1t

                  +[a1 (u1 )β12 + a2 (u2 + a1 (u1 )βo )β22 ] X2t

                                      ˜             ˜                      ,
                  +[ e1 ⊗ az (v, π) + b1 (u1 ) Λ1 + b2 (u2 + a1 (u1 )βo ) Λ2 ]Zt
                                                                          (34)
where az (v, π) is given in proposition 1. So we obtain a multivariate Car(p+
1) process.
Proposition 4 : The Laplace transform of (x1,t+1 , x2,t+1 , zt+1 ), condition-
ally to (x1,t , x2,t , zt ), has the form given in (34) and the process (x1,t , x2,t , zt )
is Car(p + 1).

2.9    Examples of Switching Multivariate Index-Car processes
a. Gaussian case
    Taking
                                                       2
                                                     σ1 2                      2
                                                                               u
             a1 (u1 ) = ρ1 u1 , b1 (u1 ) = ν1 u1 +             ˜
                                                      2 u1 ,   b1 (u1 ) = u1 , 21 ,

                                                       2
                                                     σ2 2                      2
                                                                               u
             a2 (u2 ) = ρ2 u2 , b2 (u2 ) = ν2 u2 +             ˜
                                                      2 u2 ,   b2 (u2 ) = u2 , 22 ,

                     λ11               λ21
             Λ1 =        , Λ2 =               ,
                     λ12               λ22

and using the notations ϕo = ρ1 βo , ϕ11 = ρ1 β11 , ϕ12 = ρ1 β12 , ϕ21 = ρ2 β21 ,




                                             14
ϕ22 = ρ2 β22 , we obtain the Switching VARN(p) model:
  
   x1,t+1 = λ11 Zt + ϕo x2,t+1 + ϕ11 X1t + ϕ12 X2t + (λ12 Zt )1/2 η1,t+1
  
      x2,t+1 = λ21 Zt + ϕ21 X1t + ϕ22 X2t + (λ22 Zt )1/2 η2,t+1 ,
                                                                              (35)
where ηt = (η1,t , η2,t ) is a gaussian white noise distributed as N (0, I2 ),
Zt = (zt , . . . , zt−p ) , and where zt is a homogeneous J-states Markov chain
with transition probability π(ei , ej ). Note that (35) can also be written as:
                     ˜                                    1/2
                                ˜         ˜
  x1,t+1 = λ11 Zt + ϕ11 X1t + ϕ12 X2t + ϕo (λ22 Zt ) η2,t+1
 
 
 
 
                                                        + (λ12 Zt )1/2 η1,t+1 (36)
 
 
 
 
 
    x2,t+1 = λ21 Zt + ϕ21 X1t + ϕ22 X2t + (λ22 Zt )1/2 η2,t+1 ,
     ˜                    ˜                    ˜
with λ11 = λ11 + ϕo λ21 , ϕ11 = ϕ11 + ϕo ϕ21 , ϕ12 = ϕ12 + ϕo ϕ22 or, with
obvious notations

                   ˜      ˜ ˜    (λ12 Zt )1/2 ϕo (λ22 Zt )1/2
            ˜
            xt+1 = λ Zt + Φ Xt +                              ηt+1 .                  (37)
                                      0        (λ22 Zt )1/2
b. Gamma case
   If we take
      a1 (u1 ) =    ρ1 u1                                      ˜
                   1−u1 µ1 ,   b1 (u1 ) = −ν1 log(1 − u1 µ1 ), b1 (u1 ) = log(1 − u1 µ1 ),

      a2 (u2 ) =    ρ2 u2                                      ˜
                   1−u2 µ2 ,   b2 (u2 ) = −ν2 log(1 − u2 µ2 ), b2 (u2 ) = log(1 − u2 µ2 ) ,

we obtain the positive Switching VARG(p) model
    
     x1,t+1 = µ1 Λ1 Zt + ϕo x2,t+1 + ϕ11 X1t + ϕ12 X2t + ξ1,t+1
                                                                                      (38)
      
          x2,t+1 = µ2 Λ2 Zt + ϕ21 X1t + ϕ22 X2t + ξ2,t+1 ,
where ξ1,t and ξ2,t are non correlated, conditionally heteroscedastic, martin-
gale differences, the conditional variances being respectively given by:
   V [ξ1,t+1 | xt ] = Λ1 Zt µ2 + 2µ1 [ϕo (Λ2 Zt µ2 + ϕ21 X1t + ϕ22 X2t )
               ˜             1


                                                            +ϕ11 X1t + ϕ12 X2t ]      (39)

   V [ξ2,t+1 | xt ]
               ˜      = Λ2 Zt µ2 + 2µ2 (ϕ21 X1t + ϕ22 X2t ) .
                               2


                                              15
3     SWITCHING AUTOREGRESSIVE NORMAL
      (SARN) TERM STRUCTURE MODEL OF OR-
      DER p
We first consider the case of univariate exogenous factor; the endogenous
case and the multivariate cases will be discussed, respectively, in sections
3.7 and 3.8.

3.1     The historical dynamics
The first set of assumptions of a SARN(p) Term Structure model deals with
the historical dynamics. We assume that the historical dynamics of the
exogenous factor xt is given by

      xt+1 = ν(Zt ) + ϕ1 (Zt )xt + . . . + ϕp (Zt )xt+1−p + σ(Zt )εt+1 ,         (40)

where εt+1 is a gaussian white noise with N (0, 1) distribution, Zt = (zt , . . . ,
zt−p ) , and zt is a J-states non-homogeneous Markov chain such that P (zt+1 =
ej | zt = ei ; xt ) = π(ei , ej ; Xt ) (ei is the ith column of the identity matrix
IJ ). Equation (40) will be also written

                   xt+1 = ν(Zt ) + ϕ(Zt ) Xt + σ(Zt )εt+1 ,                      (41)

where Xt = (xt , . . . , xt+1−p ) , ϕ(Zt ) = (ϕ1 (Zt ), . . . , ϕp (Zt )) . This model
can also be rewritten in the following vectorial form:

                  Xt+1 = Φ(Zt )Xt + [ν(Zt ) + σ(Zt )εt+1 ] e1                    (42)

where                                                           
                                  ϕ1 (Zt ) . . .. . . ϕp (Zt )
                                    1      0   ...      0       
                                                                
                                    0      1   ...      0       
                    Φ(Zt ) =                                    
                                     .
                                      .         ..        .
                                                          .      
                                     .             .     .      
                                     0      ... 1        0
is a (p × p)-matrix, and where e1 is the first column of the identity matrix
Ip . Note that, since the coefficients ϕi are allowed to depend on Zt and since
the Markov chain zt may not be homogeneous, the dynamics of (xt , zt ) is
not Car in general.




                                          16
3.2   The Stochastic Discount Factor
The second element of a SARN(p) modeling is the SDF. We denote by Mt,t+1
the stochastic discount factor (SDF) between the date t and t + 1 and in
order to get time-varying risk-premia we specify it as an exponential affine
function of the variables (xt+1 , zt+1 ) but with coefficients depending on the
information at time t. More precisely we assume that:

          Mt,t+1 = exp [−c Xt − d Zt + Γ(Zt , Xt ) εt+1
                                                                                        (43)
                                        − 1 Γ(Zt , Xt )2 − δ(Zt , Xt ) zt+1 ,
                                          2

                             ˜
where Γ(Zt , Xt ) = γ(Zt ) + γ (Zt )Xt . Observe that this specification extends
to the multi-lag case the one proposed by Dai, Singleton, Yang (2005). It
is well known that the existence of a positive stochastic discount factor is
equivalent to the absence of arbitrage opportunity condition and that the
price pt at t of a payoff Wt+1 at t + 1 is given by:

                           pt = E[Mt,t+1 Wt+1 | It ]

                                  = Et [Mt,t+1 Wt+1 ] ,

where the information It , available for the investors at the date t, is given
by (xt , zt ). More generally, the price pt,h at t of an asset paying Wt+h at
t + h is:
                 pt,h = Et [Mt,t+1 · . . . · Mt+h−1,t+h Wt+h ] .
Using the absence of arbitrage assumption for the short-term interest rate
between t and t + 1, denoted by rt+1 and known at t, we get:

 exp(−rt+1 ) = Et (Mt,t+1 )

                                                 J
              = exp [−c Xt − d Zt ] ×            j=1 π (ei , ej   ; Xt ) exp [−δ(Zt , Xt ) ej ] ,

and assuming the normalization condition:
              J
              j=1 π (ei , ej   ; Xt ) exp [−δ(Zt , Xt ) ej ] = 1 ∀Zt , Xt ,             (44)

we obtain:
                                 rt+1 = c Xt + d Zt .                                   (45)




                                           17
3.3   Risk premia
In this paper we will use the following definition of a risk premium.
Definition 4 : Let pt the price of a given asset at time t. The risk premium
of this asset between t and t + 1 is ωt = log(Et pt+1 ) − log pt − rt+1 .
   Using this definition we obtain interpretations of the Γ and δ functions
appearing in the SDF which generalize that obtained by Dai, Singleton and
Yang (2005).
Proposition 5 : The risk premium between t and t+1 of an asset providing
the payoff exp(−θxt+1 ) at t + 1 is :
                          ωt (θ) = θΓ(Xt , Zt )σ(Zt ) .                  (46)
Therefore, θ, Γ(Xt , Zt ) and σ(Zt ) can be seen respectively as a risk sensi-
tivity of the asset, a risk price and a risk measure. [Proof : see Appendix
2.]
Proposition 6 : If we consider a digital asset providing one money unit at
t + 1 if zt+1 = ej , its risk premium between t and t + 1 is given by :
                             ωt (θ) = δj (Xt , Zt ) ,                    (47)
and the j th component of δ can be seen as the risk premium associated with
the digital asset.[Proof : see Appendix 2.]
    We observe that, in general, the magnitude of the risk premium ωt (θ) is
not just depending on the currently observed values xt and zt , but it reflects
the present and past values of both factors, that is, it is a function of the
larger information represented by Xt and Zt .

3.4   Risk-Neutral dynamics
The assumptions on the historical dynamics and on the SDF imply a risk-
neutral dynamics. The probability density function of the one-period con-
ditional risk-neutral probability with respect to the corresponding histori-
                     Mt,t+1                                           Q
cal probability is Et (Mt,t+1 ) = exp(rt+1 )Mt,t+1 . Note that using Et as the
conditional expectation with respect to this risk-neutral distribution, the
                                                        Q
risk-premium ωt can be written log(Et pt+1 ) − log(Et pt+1 ).
Proposition 7 : The risk-neutral dynamics of the process (xt , zt ) is given
by:
          Q
                                            ˜
   xt+1 = ν(Zt ) + γ(Zt )σ(Zt ) + [ϕ(Zt ) + γ (Zt )σ(Zt )] Xt + σ(Zt )ξt+1 ,
                                                                          (48)

                                       18
         Q
where = denotes the equality in distribution (associated to the probability
Q), ξt+1 is (under Q) a gaussian white noise with N (0, 1) distribution, and
where Zt = (zt , . . . , zt−p ) , zt being a Markov chain such that:

             Q(zt+1 = ej | zt ; xt ) = π (zt , ej ; Xt ) exp [(−δ(Zt , Xt )) ej ] .

Note that, from (44), these probabilities add to one. [Proof : see Appendix
3.]
    In order to get a generalized linear term structure we impose that the
risk-neutral dynamics is switching regime gaussian Car(p). Using (13), this
impose that the dynamics has to satisfy the following specification:
                               Q
                       xt+1 = ν ∗ Zt + ϕ∗ Xt + (σ ∗ Zt )ξt+1 ,                        (49)

where Zt = (zt , . . . , zt−p ) , with zt a J-states Markov chain such that

                          Q(zt+1 = ej | zt = ei ) = π ∗ (ei , ej ) .                  (50)
    From proposition 7, this implies the following restrictions on the histor-
ical dynamics and on the SDF:

   i) σ(Zt ) = σ ∗ Zt : the historical stochastic volatility must be linear in
      Zt ;

  ii)
                                                 ν ∗ Zt − ν(Zt )
                                   γ(Zt ) =                      :
                                                      σ ∗ Zt
        for a given historical stochastic drift ν(Zt ) and stochastic volatility
        σ ∗ Zt , the coefficient γ(Zt ) belongs to the previous family indexed by
        the free parameter vector ν ∗ .

 iii)
                                                   ϕ∗ − ϕ(Zt )
                                    ˜
                                    γ (Zt ) =                  :
                                                      σ ∗ Zt
        for a given historical stochastic slope parameter ϕ(Zt ) and stochastic
        volatility σ ∗ Zt the coefficient vector γ (Zt ) belongs to the previous
                                                 ˜
        family indexed by the free parameter vector ϕ∗ .




                                              19
  iv)
                                                       π (zt , ej ; Xt )
                           δj (Xt , Zt ) = log                           :
                                                         π ∗ (zt , ej )
        for a given historical transition matrix π (zt , ej ; Xt ), the coefficient
        δj (Xt , Zt ) depend on zt only and belongs to the previous family in-
        dexed by the entries π ∗ (zt , ej ) of a transition matrix.

Note that condition iv) implies that the risk premia coefficients δj , j ∈
{1, . . . , J}, cannot be all positive [or all negative] since this would imply
π (zt , ej ; Xt ) > π ∗ (zt , ej ), ∀j [or π (zt , ej ; Xt ) < π ∗ (zt , ej ), ∀j], which is
impossible since J π (zt , ej ; Xt ) = J π ∗ (zt , ej ) = 1. Also note that
                        j=1                         j=1
condition iv) implies the normalization condition (44).

3.5     The Generalised Linear Term Structure
We have seen in the previous section that the risk-neutral dynamics is defined
by relations (49), (50); relation (49) can be rewritten:
                              Q
                   Xt+1 = Φ∗ Xt + ν ∗ Zt + (σ ∗ Zt )ξt+1 e1                           (51)

where                                          
                          ϕ∗ . . .
                            1         . . . ϕ∗p
                         1   0       ... 0 
                                               
                         0   1       . . . 0  is a (p × p) − matrix ,
               Φ∗ =                            
                          .
                           .          ..     . 
                                             . 
                          .              . .
                          0       ... 1     0

                                  Xt = (xt , . . . , xt+1−p ) ,
and where e1 is the first column of the identity matrix Ip .
    Denoting by B(t, h) the price at t of a zero-coupon with residual maturity
h, we have the following result.
Proposition 8 : In the univariate SARN(p) Term Structure model the
price at date t of the zero-coupon bond with residual maturity h is :
                    B(t, h) = exp (Ch Xt + Dh Zt ) , for h ≥ 1 ,                      (52)
where the vectors Ch and Dh satisfy the following recursive equations :
   
    Ch = Φ∗ Ch−1 − c
                                                                       (53)
                                              ˜
      Dh = −d + C1,h−1 ν ∗ + 1 C1,h−1 σ ∗2 + Dh−1 + F (D1,h−1 ) ,
                              2
                                  2



                                               20
where C1,h−1 denotes the first component of the p-dimensional vector Ch−1 ,
D1,h−1 and D2,h−1 are, respectively, the first J-dimensional component
and the remaining (pJ)-dimensional component of Dh−1 , i.e. Dh−1 =
                     ˜
(D1,h−1 , D2,h−1 ) , Dh−1 = (D2,h−1 , 0) , and where F (D1,h−1 ) = e1 ⊗az (D1,h−1 ,
π ∗ ), e being the vector (1, 0, . . . , 0) of size (p + 1) and a is the J-vector
        1                                                        z
given in proposition 1; σ ∗2 is the vector whose components are the squares
of the entries of σ ∗ . The initial conditions are C0 = 0, D0 = 0 (or C1 = −c,
D1 = −d). [Proof : see Appendix 4.]
For clarity we give again the expression of az (D1,h−1 , π ∗ ) :

      az (D1,h−1 , π ∗ )
                                                 
                J
  = log             exp(D1,h−1 ej )π ∗ (e1 , ej ) ,
                j=1
                                                                                           
                                                           J
                                           . . . , log          exp(D1,h−1 ej )π ∗ (eJ , ej ) .
                                                           j=1

   From proposition 8 we see that the yields to maturity are:
                                  1
                       R(t, h) = − log B(t, h)
                                  h
                                                                                            (54)
                                     C      D
                                  = − h Xt − h Zt ,                  h ≥ 1.
                                      h      h
So, they are linear functions of the p-dimensional vector Xt and of the
(p + 1)J-dimensional vector Zt . This means that, the term structure at date
t depends on the present and past values of xt and zt , and not just on their
values in t. Moreover, we observe that there is, in general, instantaneous
causality between xt and zt .

3.6   The Switching VARMA yield curve process
The result presented in Proposition 8 describes, conditionally to Xt and Zt ,
the yields as a deterministic function of the time to maturity h, for a fixed
date t. Nevertheless, in many financial and economic contexts one needs,
for instance, also to study the effects of a shock, in the state variables, on
the yield curve at different future times and for several maturities (e.g.: a
Central Bank that needs to set a monetary policy). This means that we are

                                              21
interested in the dynamics of the process RH = [ R(t, h), 0 ≤ t < T, h ∈ H ],
for a given set of residual time to maturities H = (1, . . . , H).
    If we consider a fixed h, the process R = [ R(t, h), 0 ≤ t < T ] can be
described by the following proposition.
Proposition 9 : For a fixed time to maturity h, the process R = [ R(t, h), 0 ≤
t < T ] is, under the historical probability, a Switching ARMA(p, p − 1) pro-
cess of the following type :

  Ψ(L, Zt ) R(t + 1, h) = Dh (L) Ψ(L, Zt ) zt+1 + Ch (L) ν(Zt )
                                                                                 (55)
                                                   + Ch (L)[(σ ∗ Zt ) εt+1 ] .

where
                          1
                Ch (L) = − (C1,h + C2,h L + . . . + Cp,h Lp−1 )
                          h
                         1
               Dh (L) = − (D1,h + D2,h L + . . . + Dp+1,h Lp )
                         h

             Ψ(L, Zt ) = 1 − ϕ1 (Zt )L − . . . − ϕp (Zt )Lp ,

are lag polynomials in the lag operator L, and where the AR polynomial
Ψ(L, Zt ) applies to t. [Proof : see Appendix 5].
Proposition 10 : For a given set of residual time to maturities H =
(1, . . . , H), the stochastic evolution of the yield curve process RH = [ R(t, h),
0 ≤ t < T, h ∈ H ] takes the following particular Switching H-variate
VARMA(p, p − 1) representation:
                                              
                      R(t + 1, 1)         C1 (L)
                     R(t + 1, 2)   C2 (L) 
                                              
          Ψ(L, Zt )       .
                           .       =  .  (σ ∗ Zt )εt+1
                                             . 
                          .        .
                  R(t + 1, H)          CH (L)
                                                            
                                 D1 (L)                 C1 (L)
                                D2 (L)               C2 (L) 
                                                            
                             +  .  Ψ(L, Zt ) zt+1 +  .  ν(Zt ) .
                                . 
                                   .                   . .
                                    DH (L)                       CH (L)
                                                                                 (56)
Similar results are easily obtained in the risk-neutral world.


                                        22
3.7     Endogenous case
In the previous sections the factor xt was exogenous. It is often assumed, in
term structure models, that the factor xt is the short rate process rt+1 . In
this case the previous results remain valid, the only modification comes from
the absence of arbitrage opportunity condition for rt+1 , which imposes:
                                    c = e1 , d = 0 ,                                      (57)
with e1 the first column of the identity matrix Ip ; consequently, the initial
conditions in the recursive equations of proposition 8 become:
                                 C1 = −e1 , D1 = 0 .                                      (58)
Moreover, the Switching ARMA(p, p − 1) representation (55), or its analo-
gous in the risk-neutral world, could be used to analyse how a shock on εt , i.e.
on rt+1 = R(t, 1), is propagated on the surface [ R(t + τ, h), τ ∈ T , h ∈ H ],
where T = {0, . . . , T − t − 1} and H = (1, . . . , H) (for instance when the
process zt is exogenous).

3.8     Multi-Factor generalization : the SVARN(p) Term
        Structure model
For sake of notational simplicity we consider the two factor case but an ex-
tension to more that two factors is straightforward. The historical dynamics
   ˜
of xt = (x1,t , x2,t ) is a bivariate SVARN(p) model given by:
  
   x1,t+1 = ν1 (Zt ) + ϕo (Zt )x2,t+1 + ϕ11 (Zt ) X1t
  
  
  
  
                                          + ϕ12 (Zt ) X2t + σ1 (Zt )ε1,t+1 (59)
  
  
  
  
  
     x2,t+1 = ν2 (Zt ) + ϕ21 (Zt ) X1t + ϕ22 (Zt ) X2t + σ2 (Zt )ε2,t+1 ,
where ε1,t and ε2,t are independent standard normal white noises, X1t =
(x1,t , . . . , x1,t+1−p ) , X2t = (x2,t , . . . , x2,t+1−p ) , Zt = (zt , . . . , zt−p ) , with
zt a J-states non-homogeneous Markov chain such that P (zt+1 = ej | zt =
     ˜                    ˜                 ˜
ei ; xt ) = π(ei , ej ; Xt ), and where Xt = (X1t , X2t ) . The recursive form (59)
is equivalent to the canonical form :
    
                        ˜         ˜                   ˜
     x1,t+1 = ν1 (Zt ) + ϕ11 (Zt ) X1t + ϕ12 (Zt ) X2t
    
    
    
    
                                    + σ1 (Zt )ε1,t+1 + ϕo (Zt )σ2 (Zt )ε2,t+1               (60)
    
    
    
    
    
         x2,t+1 = ν2 (Zt ) + ϕ21 (Zt ) X1t + ϕ22 (Zt ) X2t + σ2 (Zt )ε2,t+1 ,

                                              23
       ˜               ˜                   ˜
where ν1 = ν1 +ϕo ν2 , ϕ11 = ϕ11 +ϕo ϕ21 , ϕ12 = ϕ12 +ϕo ϕ22 or, with obvious
notations:
                ˜           ˜       ˜       ˜
                xt+1 = ν (Zt ) + Φ(Zt )Xt + S(Zt )εt+1 ,                 (61)
where
                                   σ1 (Zt ) ϕo (Zt )σ2 (Zt )
                       S(Zt ) =
                                      0         σ2 (Zt )
Using the notation
                            ˜               ˜              ˜
                     Γ(Zt , Xt ) = Γ1 (Zt , Xt ), Γ2 (Zt , Xt )

               ˜                          ˜                           ˜
where Γi (Zt , Xt ) = γi (Zt ) + γi (Zt ) Xt , i ∈ {1, 2} and Γ(Zt , Xt ) = γ(Zt ) +
                                 ˜
      ˜ ˜                                           ˜      ˜
˜ t , Xt )Xt , with γ(Zt ) = [γ1 (Zt ), γ2 (Zt )] , Γ(Zt , Xt ) = [γ1 (Zt ) , γ2 (Zt ) ] ,
Γ(Z                                                                ˜          ˜
the SDF is defined as :
                   ˜                  ˜
   Mt,t+1 = exp −c Xt − d Zt + Γ(Zt , Xt ) εt+1
                                                                                      (62)
                                   1        ˜
                                 − 2 Γ(Zt , Xt )          ˜             ˜
                                                   Γ(Zt , Xt ) − δ(Zt , Xt ) zt+1 .

Assuming the normalization condition (44) and the absence of arbitrage
opportunity for rt+1 we get:
                                           ˜
                                  rt+1 = c Xt + d Zt .                                (63)
It is also easily seen that the risk premium for an asset providing the payoff
          ˜                                       ˜
exp(−θ xt+1 ) at t + 1 is ω(θ) = θ S(Zt )Γ(Zt , Xt ) and that the risk premium
associated with the digital payoff I(ej ) (zt+1 ) is unchanged.
                                                           x
Proposition 11 : The risk-neutral dynamics of the process (˜t , zt ) is given
by:
      Q                           ˜              ˜      ˜ ˜
 ˜      ˜
 xt+1 = ν (Zt ) + S(Zt )γ(Zt ) + [Φ(Zt ) + S(Zt )Γ(Zt , Xt )]Xt + S(Zt )ξt+1 ,
                                                                          (64)
        Q
where = denotes the equality in distribution (associated to the probability
Q), ξt+1 is (under Q) a bivariate gaussian white noise with N (0, I2 ) distri-
bution, and where Zt = (zt , . . . , zt−p ) , with zt a Markov chain such that:

                               ˜                  ˜                 ˜
            Q(zt+1 = ej | zt ; xt ) = π(zt , ej ; Xt ) exp (−δ(Zt , Xt )) ej .

[Proof : see Appendix 6.]
If we want to obtain a Switching bivariate Car process in the risk-neutral
world, we must have using (37) :

                                            24
   i)
                                                  ∗
                                      σ1 (Zt ) = σ1 Zt

                                                  ∗
                                      σ2 (Zt ) = σ2 Zt

                                      ϕo (Zt ) = ϕ∗ ,
                                                  o

        and, therefore,
                                           ∗        ∗
                                          σ1 Zt ϕ∗ σ2 Zt
                                                 o
                              S(Zt ) =             ∗Z
                                            0    σ2 t

  ii)
                            γ(Zt ) = [S(Zt )]−1 [ν ∗ Zt − ν (Zt )] ,
                                                          ˜

        where ν ∗ is a (2 × (p + 1)J)-matrix.

 iii)
                          ˜      ˜                      ˜
                          Γ(Zt , Xt ) = [S(Zt )]−1 Φ∗ − Φ(Zt ) ,

        where Φ∗ is a (2 × 2p)-matrix.

  iv)
                                                             ˜
                                                π(zt , ej ; Xt )
                                ˜
                            δj (Xt , Zt ) = log                    .
                                                 π ∗ (zt , ej )

The risk-neutral dynamics can be written:
                
                          Q             ˜
                 x1,t+1 = ν1 Zt + Φ∗ Xt + S1 (Zt )ξt+1
                              ∗
                                      1
                                            ∗

                                                                             (65)
                  
                                Q
                                ∗         ˜     ∗
                      x2,t+1 = ν2 Zt + Φ∗ Xt + S2 (Zt )ξt+1 ,
                                        2

       ∗         ∗
where νi , Φ∗ , Si are the ith row of ν ∗ , Φ∗ , S ∗ , with i ∈ {1, 2}, or
            i

   ˜    Q ˜ ˜       ∗       ∗                    ∗       ∗
   Xt+1 = Φ∗ Xt + [ν1 Zt + S1 (Zt )ξt+1 ] e1 + [ν2 Zt + S2 (Zt )ξt+1 ] ep+1 ,




                                           25
where e1 (respectively, ep+1 ) is of size 2p, with entries equal to zero except
the first (respectively, the (p + 1)th ) one which is equal to one, and
                                      ∗         
                                       Φ11 Φ∗ 12
                                                
                                                
                                      I ˜    ˜ 
                                              0 
                                     
                               ˜
                              Φ∗ =              
                                                
                                     Φ∗ Φ∗ 
                                      21     22 
                                                
                                        ˜
                                        0     ˜
                                              I

where Φ∗ = (Φ∗ , Φ∗ ), Φ∗ = (Φ∗ , Φ∗ ), and where ˜ is a [(p−1)×p ]-matrix
         1       11   12    2       21  22              0
of zeros and I˜ is a [(p − 1) × p ]-matrix equal to (Ip−1 , 0), where 0 is a vector
of size (p − 1).
The term structure is given by the following proposition:
Proposition 12 : In the bivariate SVARN(p) Term Structure model the
price at date t of the zero-coupon bond with residual maturity h is :
                                   ˜
                  B(t, h) = exp Ch Xt + Dh Zt , for h ≥ 1                     (66)

where the vectors Ch and Dh satisfy the following recursive equations :
  
           ˜
   Ch = Φ∗ Ch−1 − c
  
  
  
  
  
     Dh = −d + C1,h−1 ν1 + 1 C1,h−1 (σ1 + ϕ∗2 σ2 )
                         ∗
                              2
                                 2       ∗2
                                               o
                                                  ∗2
                                                                       (67)
  
  
  
  
  
                                               ˜
              + Cp+1,h−1 ν2 + 1 Cp+1,h−1 σ2 + Dh−1 + F (D1,h−1 ) ,
                           ∗
                              2
                                  2        ∗2


        ˜
where Dh−1 and F (D1,h−1 ) have the same meaning as in proposition 8, and
the initial conditions are C0 = 0, D0 = 0 (or C1 = −c, D1 = −d). [Proof :
see Appendix 7.]
So, proposition 12 shows that the yields to maturity are:
                                    Ch ˜  D
                   R(t, h) = −        Xt − h Zt ,      h ≥ 1.                 (68)
                                    h      h
In the endogenous case we can take x1t = rt+1 , and x2t = R(t, H) for a
given time to maturity H. In this case the absence of arbitrage conditions
for rt+1 and R(t, H) imply:
                (i) C1     = −e1 , D1 = 0 , or c = e1 , d = 0
                                                                              (69)
                (ii) CH    = − H ep+1 , DH = 0 .

                                        26
                                    ∗             ∗        ˜       ∗          ˜
Using the notations Ch = (C1,h , C1,h , Cp+1,h , C2,h ) , C1,h = (C1,h , 0) , C2,h =
  ∗                                             ˜        ˜     ˜
(C2,h , 0) (where the zeros are scalars), and Ch = (C1,h , C2,h ) , it easily seen
                                    ˜
that the recursive equation Ch = Φ∗ Ch−1 − c can be written :
                                                ˜
                 Ch = Φ∗ C1,h−1 + Φ∗ Cp+1,h−1 + Ch−1 − c .
                       1           2

Conditions (i) are used as initial values in the recursive procedure of proposi-
                                                                       ˜   ∗ ∗
tion 10, and conditions (ii) implies restrictions on the parameters Φ∗ , ν1 , ν2 ,
  ∗ , σ ∗ , ϕ∗ , π ∗ (z , e ) which must be taken into account at the estimation
σ1 2 o                 t j
stage.


4     SWITCHING AUTOREGRESSIVE GAMMA
      (SARG) TERM STRUCTURE MODEL OF OR-
      DER p
Like for SARN(p) models, we start the description of the SARG(p) modeling
by the case of one exogenous factor.

4.1    The historical dynamics
We assume that the Laplace transform of the conditional distribution of
xt+1 , given (xt , zt ), is:

                                             u
 E exp(uxt+1 ) | xt , , zt      = exp    1−uµ(Xt ,Zt )   [ϕ1 (Zt )xt + . . . + ϕp (Zt )xt−p+1 ]

                                            − ν(Zt ) log(1 − uµ(Xt , Zt ))] ,
                                                                             (70)
where Zt = (zt , . . . , zt−p ) , with zt a J-states non-homogeneous Markov
                                                            ˜
chain such that P (zt+1 = ej | zt = ei ; xt ) = π(ei , ej ; Xt ), and where Xt =
(xt , . . . , xt+1−p ) . Using the notation:
                                     u                                                u
 A[u; ϕ(Zt ), µ(Xt , Zt )] =     1−uµ(Xt ,Zt ) [ϕ1 (Zt ), . . . , ϕp (Zt )]   =   1−uµ(Xt ,Zt ) ϕ(Zt )


 b[u; ν(Zt ), µ(Xt , Zt )]   = − ν(Zt ) log(1 − uµ(Xt , Zt )) ,

relation (70) can be written:

    E exp(uxt+1 ) | xt , , zt    = exp {A[u; ϕ(Zt ), µ(Xt , Zt )] Xt
                                                                                             (71)
                                                     + b[u; ν(Zt ), µ(Xt , Zt )]} .

                                            27
The process (xt ) can also be written:

   xt+1 = ν(Zt )µ(Xt , Zt ) + ϕ1 (Zt )xt + . . . + ϕp (Zt )xt+1−p + εt+1
                                                                                   (72)
           = ν(Zt )µ(Xt , Zt ) + ϕ(Zt ) Xt + εt+1 ,

where εt+1 is a martingale difference sequence with conditional Laplace
transform given by:

 E exp(uεt+1 ) | xt , , zt   = exp {−u[ν(Zt )µ(Xt , Zt ) + ϕ(Zt ) Xt ]

                                          + A[u; ϕ(Zt ), µ(Xt , Zt )] Xt

                                                    + b[u; ν(Zt ), µ(Xt , Zt )]}

                             = exp {[A[u; ϕ(Zt ), µ(Xt , Zt )] − uϕ(Zt )] Xt

                                       + b[u; ν(Zt ), µ(Xt , Zt )] − u ν(Zt )µ(Xt , Zt )} .
                                                                               (73)
Note that the dynamics of (xt , zt ) is in general not Car.

4.2    The Stochastic Discount Factor
In the SARG(p) model the SDF is specified in the following way:

 Mt,t+1 = exp {−c Xt − d Zt + Γ(Zt , Xt )εt+1 + Γ(Zt , Xt ) [ν(Zt )µ(Xt , Zt ) + ϕ(Zt ) Xt ]

                     − A[Γ(Zt , Xt ); ϕ(Zt ), µ(Xt , Zt )] Xt

                        −b[Γ(Zt , Xt ); ν(Zt ), µ(Xt , Zt )] − δ(Zt , Xt ) zt+1 } ,
                                                                                  (74)
                             ˜
where Γ(Zt , Xt ) = γ(Zt ) + γ (Zt )Xt , or, equivalently

 Mt,t+1 = exp {−c Xt − d Zt + Γ(Zt , Xt )xt+1 − A[Γ(Zt , Xt ); ϕ(Zt ), µ(Xt , Zt )] Xt

                     −b[Γ(Zt , Xt ); ν(Zt ), µ(Xt , Zt )] − δ(Zt , Xt ) zt+1 } ,
                                                                               (75)
Assuming the normalisation condition (44), we get that:

                                rt+1 = c Xt + d Zt .                               (76)



                                          28
4.3    Useful lemmas
In the subsequent sections we will use several times the following lemmas.
Let us consider the functions:
                             ρu
            a(u; ρ, µ) =
            ˜                     and ˜ ν, µ) = −ν log(1 − uµ) ;
                                      b(u;
                           1 − uµ
we have:
Lemma 1 :

                        a(u + α; ρ, µ) − a(α; ρ, µ) = a(u; ρ∗ , µ∗ )
                        ˜                ˜            ˜

                        ˜ + α; ν, µ) − ˜ ν, µ) = ˜ ν, µ∗ )
                        b(u            b(α;      b(u;

                                         ρ               µ
                        with ρ∗ =              , µ∗ =        ,
                                     (1 − αµ)2        1 − αµ

[Proof : see Appendix 8.]
Lemma 1 immediately implies lemma 2.
Lemma 2 :

      A[u + α; ϕ(Zt ), µ(Xt , Zt )] − A[α; ϕ(Zt ), µ(Xt , Zt )] = A[u; ϕ∗ (Zt ), µ∗ (Xt , Zt )]

      b[u + α; ν(Zt ), µ(Xt , Zt )] − b[α; ν(Zt ), µ(Xt , Zt )] = b[u; ν(Zt ), µ∗ (Zt , Xt )]

                              ϕ(Zt )                             µ(Xt , Zt )
      with ϕ∗ (Zt ) =                     2
                                            , µ∗ (Zt , Xt ) =                  .
                        [1 − αµ(Zt , Xt )]                    1 − αµ(Xt , Zt )




                                            29
4.4    Risk-neutral dynamics
The Laplace transform of the risk-neutral conditional distribution of (xt+1 , zt+1 )
is, using the notation Γt = Γ(Xt , Zt ):
            Q
           Et [exp(uxt+1 + v zt+1 )]

      = Et {exp [(u + Γt )xt+1 − A[Γt ; ϕ(Zt ), µ(Xt , Zt )] Xt

                       −b[Zt ; ν(Zt ), µ(Xt , Zt )] + (v − δ(Xt , Zt )) zt+1 ]}

      = exp {[(A[u + Γt ; ϕ(Zt ), µ(Xt , Zt )] − A[Γt ; ϕ(Zt ), µ(Xt , Zt )]) Xt

                       + b[u + Γt ; ν(Zt ), µ(Xt , Zt )] − b[Γt ; ν(Zt ), µ(Xt , Zt )]]}

                         J
                   ×     j=1 π(zt , ej   ; Xt ) exp [(v − δ(Zt , Xt )) ej ] ,
                                                                                       (77)
and, using lemma 2, (78) can be written:
             Q
            Et [exp(uxt+1 + v zt+1 )]

       = exp{A[u; ϕ∗ (Zt ), µ∗ (Xt , Zt )] Xt + b[u; ν(Zt ), µ∗ (Zt , Xt )]}           (78)

                           J
                    ×      j=1 π(zt , ej   ; Xt ) exp [(v − δ(Zt , Xt )) ej ] ,

                        ϕ(Zt )                                 µ(Xt , Zt )
with ϕ∗ (Zt ) =                       2
                                        and µ∗ (Zt , Xt ) =                    .
                  [1 − Γt µ(Zt , Xt )]                      1 − Γt µ(Xt , Zt )
   So, from (71), we see that the risk-neutral conditional distribution of
xt+1 , given (xt , zt ), is in the same class as the historical one and obtained
by replacing ϕ(Zt ) with ϕ∗ (Zt ), and µ(Xt , Zt ) with µ∗ (Zt , Xt ).
    In order to get a generalize linear term structure we impose that the
risk-neutral dynamics is a switching regime Gamma Car(p) process. So,
using the results in section 2.5.b, we get that ϕ∗ (Zt ) and µ∗ (Zt , Xt ) must be
constant, ν(Zt ) = ν ∗ Zt and π (zt , ej ; Xt ) = π ∗ (zt , ej ) exp [(δ(Zt , Xt )) ej ].
Also note that µ∗ must be positive as well as the components of ν ∗ and ϕ∗ .
This implies the following constraint on the historical dynamics and on the




                                              30
SDF:
                 µ(Xt , Zt )      = µ∗ [1 − Γ(Xt , Zt )µ(Xt , Zt )]

                 ϕ(Zt )           = ϕ∗ [1 − Γ(Xt , Zt )µ(Xt , Zt )]2

                 ν(Zt )           = ν ∗ Zt

                                            π(zt ,ej ;Xt )
                 δj (Xt , Zt ) = log         π ∗ (zt ,ej )   .
                      ϕ    ∗
We see that ϕ(Zt ) = µ∗2 µ(Xt , Zt )2 , so µ(Xt , Zt ) must depend only on Zt ,
and therefore the same is true for Γ(Xt , Zt ). Finally, we have the constraint:

   i)
                                 µ(Zt ) = µ∗ [1 − Γ(Zt )µ(Zt )]

  ii)
                               ϕ(Zt ) = ϕ∗ [1 − Γ(Zt )µ(Zt )]2

 iii)
                                        ν(Zt ) = ν ∗ Zt

  iv)
                                                     π (zt , ej ; Xt )
                          δj (Xt , Zt ) = log                          ;
                                                       π ∗ (zt , ej )
                              ϕ     ∗
In particular, since ϕ(Zt ) = µ∗2 µ(Zt )2 , the random vector must be propor-
tional to a deterministic vector.
    Moreover, it is easily seen that the risk premium corresponding to the
payoff exp(−θxt+1 ) at t + 1 is:

          ωt (θ) = {A[−θ; ϕ(Zt ), µ(Zt )] − A[−θ; ϕ∗ , µ∗ ]} Xt

                                 + b[−θ; ν ∗ Zt , µ(Zt )] − b[−θ; ν ∗ Zt , µ∗ ] .

Like in the gaussian case, we obtain an affine function in Xt also depending
on Zt . The risk premium associated with the digital asset providing one
money unit at t + 1 if zt+1 = ej , is still given by (47).




                                            31
4.5    The Generalised Linear Term Structure
Let us introduce the notations:
                        A∗ (u) = A(u; ϕ∗ , µ∗ )
                                                                             (79)
                        ˜
                        Ch       = (C2,h , . . . , Cp,h , 0) .

As usual, B(t, h) is the price at t of a zero-coupon bond with residual ma-
turity h.
Proposition 13 : In the univariate SARG(p) Term Structure model the
price at date t of the zero-coupon bond with residual maturity h is :

                 B(t, h) = exp (Ch Xt + Dh Zt ) , for h ≥ 1 ,                (80)

where the vectors Ch and Dh satisfy the following recursive equations :
   
    Ch = −c + A∗ (C1,h−1 ) + Ch−1  ˜
                                                                       (81)
                      ∗ log(1 − C            ˜ h−1 + F (D1,h−1 ) ,
                                         ∗) + D
       Dh = −d − ν                1,h−1 µ

        ˜
where Dh−1 and F (D1,h−1 ) have the same meaning as in proposition 8; the
initial conditions are C0 = 0, D0 = 0 (or C1 = −c, D1 = −d) [Proof : see
Appendix 9].
Again, we obtain a generalised linear term structure given by:

                                    Ch     D
                   R(t, h) = −         Xt − h Zt ,         h ≥ 1,            (82)
                                    h       h
and, in the same spirit of propositions 9 and 10 for the univariate SARN(p)
model [see section 3.6], it is easy to verify that the processes R = [ R(t, h), 0 ≤
t < T ] and RH = [ R(t, h), 0 ≤ t < T, h ∈ H ] are, respectively, a
weak Switching ARMA(p, p − 1) process and a weak H-variate Switching
VARMA(p, p − 1) process.
In the endogenous case, where xt = rt+1 , the previous results remains valid
with C1 = −e1 , D1 = 0.

4.6    Positiveness of the yields
Since rt+1 = R(t, 1) = c Xt + d Zt , and since the components of Xt are pos-
itive, the short term process will be positive as soon as the components of c
and d are nonnegative. The positiveness of rt+1 implies that of R(t, h), at any

                                         32
                                                    1      Q
date t and time to maturity h, because R(t, h) = − h log Et [exp(−rt+1 − . . .
−rt+h )].
    This positiveness can also be observed from the recursive equations of
proposition 13. Indeed, using the fact that µ∗ and the components of ϕ∗
                                   ∗
and ν ∗ are positive and that 0 < πij < 1, it easily seen that, for any u < 0,
the components of A∗ (u) and −ν ∗ log(1 − C1,h−1 µ∗ ) are negative and the
result follows.

4.7   Multi-Factor generalization : the SVARG(p) Term
      Structure model
                       ˜
The bivariate process xt = (x1,t , x2,t ) is a SVARG(p) model defined by the
following conditional Laplace transforms:

         Et [exp(u1 x1,t+1 ) | x2,t+1 , x1,t , zt ]

                     u1
      = exp                     ϕo (Zt )x2,t+1 + ϕ11 (Zt ) X1t + ϕ12 (Zt ) X2t
                1 − u1 µ1 (Zt )

                           −ν1 (Zt ) log(1 − u1 µ1 (Zt ))} ,
                                                                                    (83)


                  Et [exp(u2 x2,t+1 ) | x1,t , x2,t , zt ]

                               u2
              = exp                       ϕ21 (Zt ) X1t + ϕ22 (Zt ) X2t             (84)
                          1 − u2 µ2 (Zt )

                                     −ν2 (Zt ) log(1 − u2 µ2 (Zt ))} .
We will use the notations:
            ϕo (Zt ) = ϕo,t ,

            [ ϕ11 (Zt ) , ϕ12 (Zt ) ] = ϕ1,t , [ ϕ21 (Zt ) , ϕ22 (Zt ) ] = ϕ2,t ,

            µi (Zt ) = µi,t , νi (Zt ) = νi,t , i ∈ {1, 2} ,

and using the functions a, ˜ A, B defined in lemma 1 and in section 4.1, we
                        ˜ b,




                                            33
will introduce the notations:
         a1,t (u1 )     ˜
                      = a(u1 ; ϕo,t , µ1,t )

         b1,t (u1 )   = ˜ 1 ; ν1,t , µ1,t ) , b2,t (u2 ) = ˜ 2 ; ν2,t , µ2,t )
                        b(u                                b(u

         A1,t (u1 ) = A(u1 ; ϕ1,t , µ1,t ) , A2,t (u2 ) = A(u2 ; ϕ2,t , µ2,t ) .

With these notations, the Laplace transforms (83) and (84) become respec-
tively:
                  Et [exp(u1 x1,t+1 ) | x2,t+1 , x1,t , zt ]
                                                                                   (85)
                                                     ˜
                 = exp a1,t (u1 )x2,t+1 + A1,t (u1 ) Xt + b1,t (u1 ) ,



                                Et [exp(u2 x2,t+1 ) | x1,t , x2,t , zt ]
                                                                                   (86)
                                            ˜
                           = exp A2,t (u2 ) Xt + b2,t (u2 ) ,

        ˜
where Xt = (X1t , X2t ) . Moreover, the joint conditional Laplace transform
of (x1,t+1 , x2,t+1 ), given (x1,t , x2,t , zt ), is:

        Et [exp(u1 x1,t+1 + u2 x2,t+1 ) | x1,t , x2,t , zt ]

                                                  ˜
   = exp [A1,t (u1 ) + A2,t (u2 + a1,t (u1 ))] Xt + b1,t (u1 ) + b2,t (u2 + a1,t (u1 )) .
                                                                                 (87)
The process zt is assumed to be a non-homogeneous Markov chain such that
                         ˜                  ˜
P (zt+1 = ej | zt = ei ; xt ) = π(ei , ej ; Xt ).
    We now introduce the SDF:
                  ˜
  Mt,t+1 = exp{−c Xt − d Zt + Γ1t x1,t+1 + Γ2t x2,t+1

                                                                     ˜
                          − [A1,t (Γ1t ) + A2,t (Γ2t + a1,t (Γ1t ))] Xt

                                                                          ˜
                      − [b1,t (Γ1t ) + b2,t (Γ2t + a1,t (Γ1t ))] − δ(Zt , Xt ) zt+1 } ,
                                                                                    (88)
where Γ1t = Γ1 (Zt ) and Γ2t = Γ2 (Zt ).




                                              34
4.8     Risk-neutral dynamics in the multifactor case
We can now present, using the lemmas presented above, the joint conditional
Laplace transform of (x1,t+1 , x2,t+1 ) in the risk-neutral world in the following
proposition.
Proposition 14 : The joint conditional Laplace transform of (x1,t+1 , x2,t+1 )
in the risk-neutral world is given by :
               Q
              Et [exp(u1 x1,t+1 + u2 x2,t+1 ) | x1t , x2t , zt ]

                                                 ˜
           = exp [A∗ (u1 ) + A∗ [u2 + a∗ (u1 )]] Xt                            (89)
                   1,t        2,t      1,t


                                     + b∗ [u2 + a∗ (u1 )] + b∗ (u1 ) ,
                                        2,t      1,t         1,t

where

           A∗ (u1 )
            1,t                   = A1 (u1 ; ϕ∗ , µ∗ ) ,
                                              1t 1t


           A∗ [u2 + a∗ (u1 )] = A u2 + a(u1 ; ϕ∗ , µ∗ ); ϕ∗ , µ∗ ,
            2,t      1,t               ˜       ot 1,t     2t 2t


           b∗ [u2 + a∗ (u1 )]
            2,t      1,t          = ˜ u2 + a(u1 ; ϕ∗ , µ∗ ); ν2t , µ∗ ,
                                    b      ˜       ot 1,t
                                                              ∗
                                                                    2t


           b∗ (u1 )
            1,t                   = ˜1 (u1 ; ν1t , µ∗ ) ,
                                    b         ∗
                                                    1t


and with
                 ϕot           ∗         ϕ1t           ∗               ϕ2t
 ϕ∗ =
  ot                      2 , ϕ1t =               2 , ϕ2t =
            (1 − Γ1t µ1t )          (1 − Γ1t µ1t )          {1 − [Γ2t + a1,t (Γ1t )]µ2t }2
                µ1t                          µ2t
 µ∗ =
  1t                       , µ∗ =
                              2t                                .
            (1 − Γ1t µ1t )        {1 − [Γ2t + a1,t (Γ1t )]µ2t }
So, (89) has exactly the same form as (87) with different parameters. In
other words the risk-neutral dynamics belongs to the same class as the his-
torical one. [Proof : see Appendix 10.]
    In order to have a Car process in the risk-neutral world, we know from
section 2.9 that we must have the following constraint between the SDF and
the historical dynamics:
   i)
                                         µ1t
                                                  = µ∗
                                                     1
                                      1 − Γ1t µ1t

                                           35
  ii)
                                        ϕ1t
                                                       = ϕ∗
                                                          1
                                   (1 − Γ1t µ1t )2
 iii)
                                                 ∗
                                     ν1 (Zt ) = ν1 Zt

  iv)
                                        ϕot
                                                       = ϕ∗
                                                          o
                                   (1 − Γ1t µ1t )2
  v)
                                        µ2t
                                                         = µ∗
                                                            2
                              1 − [Γ2t + a1,t (Γ1t )]µ2t
  vi)
                                     ϕ2t
                                                               = ϕ∗
                                                                  2
                          (1 − [Γ2t + a1,t (Γ1t )]µ2t )2
 vii)
                                                 ∗
                                     ν2 (Zt ) = ν2 Zt .

Moreover, the constraint on the dynamics of the Markov chain are the same
as in the gaussian case, namely:
viii)
                                                         ˜
                                             π(zt , ej ; Xt )
                             ˜
                         δj (Xt , Zt ) = log                            .
                                                ∗ (z , e )
                                              π t j

It is worth noting that, if there is no instantaneous causality between x1,t+1
and x2,t+1 , that is if ϕot = 0, function a1t is also equal to zero and constraint
v) and vi) are simpler and become similar to i) and ii).

4.9     The Generalized Linear Term Structure in the multifac-
        tor case
Using the notations:
            a∗ (u1 )
             1         = a(u1 ; ϕ∗ , µ∗ )
                         ˜       o    1


            A∗ (u1 ) = A(u1 ; ϕ∗ , µ∗ )
             1                 1 1


            A∗ (u2 ) = A(u2 ; ϕ∗ , µ∗ )
             2                 2 2

            ˜
            Ch         = (C2,h , . . . , Cp,h , 0, Cp+2,h , . . . , C2p,h , 0) ,

                                            36
we have
Proposition 15 : In the bivariate SVARG(p) Term Structure model the
price at date t of the zero-coupon bond with residual maturity h is :

                                  ˜
                 B(t, h) = exp Ch Xt + Dh Zt , for h ≥ 1                   (90)

where the vectors Ch and Dh satisfy the following recursive equations :

                                                             ˜
 Ch = −c + A∗ (C1,h−1 ) + A∗ [Cp+1,h−1 + a∗ (C1,h−1 )] + Ch−1

                  1            2              1


                    ∗
  D = −d − ν1 log(1 − C1,h−1 µ∗ )                                             (91)
 h

                                    1


                                                        ˜
                ∗
            − ν2 log[1 − (Cp+1,h−1 + a∗ (C1,h−1 ))µ∗ ] + Dh−1 + F (D1,h−1 ) ,
                                      1            2

        ˜
where Dh−1 and F (D1,h−1 ) have the same meaning as in proposition 8; the
initial conditions are C0 = 0, D0 = 0 (or C1 = −c, D1 = −d) [Proof : see
Appendix 11].
So, proposition 15 shows that, also for the SVARG(p) model, yields to ma-
                               ˜
turity are linear functions of Xt and Zt .
    In the endogenous case, we can consider as factors the short rate rt+1
and the long rate R(t, H), for a given time to maturity H. Now, if we
want to define a joint historical and risk-neutral dynamics for these vari-
ables, compatible with the no-arbitrage opportunity condition, we have to
take into account domain restrictions on R(t, H) : given that the support
of rt+1 is D1 = (0, + ∞), under A.A.O. the support of R(t, H) has to be
DH = [ b, + ∞), for some constant b > 0 [see Gourieroux, Monfort (2006)
                                                              ˜
for details]. Consequently, the bivariate SVARG(p) process xt , being with
support D = D1 × D1 , will be specified for x1t = rt+1 and x2t = R(t, H) − b,
and the results presented for the SVARN(p) case [see section 3.8] will apply
also in this case.
It is also easily seen that the risk premium of the payoff pt+1 = exp(−θ1 x1,t+1
−θ2 x2,t+1 ) is:

       ωt (θ1 , θ2 ) = {A2,t [−θ2 + a1,t (−θ1 )] + A1,t (−θ1 )

                          − A∗ [−θ2 + a∗ (−θ1 )] − A∗ (−θ1 )} Xt
                             2         1            1


                               + b2,t [−θ2 + a1,t (−θ1 )] + b1,t (−θ1 )

                                    − b∗ [−θ2 + a∗ (−θ1 )] − b∗ (−θ1 ) ,
                                       2,t       1            1,t


                                       37
with
                                     ∗
                      b1,t (u1 ) = −ν1 Zt log(1 − u1 µ∗ )
                                                      1

                                     ∗
                      b2,t (u2 ) = −ν2 Zt log(1 − u2 µ∗ ) ,
                                                      2

and the risk premium of the digital asset is still given by relation (47).


5      DERIVATIVE PRICING
5.1    Generalization of the recursive pricing formula
In the previous sections we have derived recursive formulas for the zero-
coupon bond price B(t, h) in various contexts which share the feature that
              x
the process (˜t , zt ) is Car in the risk-neutral world. In fact the recursive
approach can be generalized to other assets.
                                          ˜
    Let us consider a class of payoffs g(Xt+h , Zt+h ), (t, h) varying, for a given
g function and let us assume that the price at t of this payoff is of the form:

                                           ˜
                    Pt (g, h) = exp Ch (g) Xt + Dh (g) Zt .                  (92)

It is clear that:
                      ˜
           exp Ch (g) Xt + Dh (g) Zt

                                ˜
       = Et Mt,t+1 exp Ch−1 (g) Xt+1 + Dh−1 (g) Zt+1

                ˜           Q              ˜
       = exp(−c Xt − d Zt )Et exp Ch−1 (g) Xt+1 + Dh−1 (g) Zt+1              ;

so the sequences Ch (g), Dh (g), h ≥ 1, follow recursive equations which does
not depend on g and, therefore, are identical to the case g = 1, that is to
say to the zero-coupon bond pricing formulas given in the previous sections.
The only condition for (92) to be true is to hold for h = 1 and, of course,
this initial condition depends on g.
                                          ˜                  u ˜      ˜
    Formula (92) is valid for h = 1 if g(Xt+h , Zt+h ) = exp(˜ Xt+h + v Zt+h )
                  ˜     ˜
for some vector u and v . Indeed, using the notations

                         ˜ ˜         ˜          ˜
                         u Xt+1 = u1 xt+1 + u−1 Xt

                         ˜
                         v Zt+1    = v1 zt+1 + v−1 Zt ,



                                       38
with u−1 = (u2 , . . . , up , 0), v−1 = (v2 , . . . , vp , 0), we get:

                  u ˜                ˜               ˜
              Pt (˜, v ; 1) = exp(−c Xt − d Zt + u−1 Xt + v−1 Zt )
                                                                                 (93)
                                              Q
                                                         ˜
                                           × Et [exp (u1 xt+1 + v1 zt+1 )] ,

                                              x
which, using the Car representation of (˜t+1 , zt+1 ) under the probability
Q, has obviously the exponential linear form (92) and provides the initial
conditions of the recursive equations. The standard recursive equations
                        u ˜                                  u ˜      ˜
provide the price Pt (˜, v ; h) at date t for the payoff exp(˜ Xt+h + v Zt+h ).
So we have the following proposition.
                                   u ˜                             ˜
Proposition 16 : The price Pt (˜, v ; h) at time t of the payoff g(Xt+h , Zt+h ) =
exp(˜ X
     u  ˜ t+h + v Zt+h ) has the exponential form (92) where Ch (g) and Dh (g)
                ˜
follow the same recursive equations as in the zero-coupon bond case with
                                                                  ˜
initial values C1 (g) and D1 (g) given by the coefficients of Xt and Zt in
equation (93).
      ˜      ˜                               u ˜
When u and v have complex components, Pt (˜, v ; h) provides the complex
                                 u ˜      ˜
Laplace transform Et [Mt,t+h exp(˜ Xt+h + v Zt+h )].

5.2     Explicit and quasi explicit pricing formulas
The explicit formulas for zero-coupon bond prices also immediately provide
explicit formulas for some derivatives like swaps. Moreover, the result of
                   ˜      ˜
section 5.1, where u and v have complex components, can be used to price
payoffs of the form:
                                                                         +
                    u ˜       ˜               u ˜       ˜
                exp(˜1 Xt+h + v1 Zt+h ) − exp(˜2 Xt+h + v2 Zt+h )            ,

like caps, floors or options on zero-coupon bonds. Let us consider, for in-
stance, the problem to price, at date t, a European call option on the zero-
coupon bond B(t + h, H − h), then the pricing relation is :

   pt (K, h) = Et Mt,t+h (B(t + h, H − h) − K)+
                                                                                 (94)
                = Et Mt,t+h (exp[−(H − h)R(t + h, H − h)] − K)+ ,

and, substituting here the yield to maturity formula (68), for the SVARN(p)




                                              39
model, or formula (90), for the SVARG(p) model, we can write :
                                                                             +
                                ˜
 pt (K, h) = Et Mt,t+h exp[CH−h Xt+h + DH−h Zt+h ] − K


                                 ˜
            = Et Mt,t+h exp[CH−h Xt+h + DH−h Zt+h ] − K I[−C                            ˜
                                                                                    H−h Xt+h −DH−h Zt+h <− log K]



                                 ˜
            = Et Mt,t+h exp[CH−h Xt+h + DH−h Zt+h ] I[−C                          ˜
                                                                             H−h Xt+h −DH−h Zt+h <− log K]



                          −KEt Mt,t+h I[−C               ˜
                                                    H−h Xt+h −DH−h Zt+h <− log K]


            = Gt (CH−h , DH−h , −CH−h , −DH−h , − log K; h)

                          −KGt ( 0, 0, −CH−h , −DH−h , − log K; h) ,
                                                                                        (95)
where I denotes the indicator function, and where

                  u ˜ ˜ ˜
              Gt (˜0 , v0 , u1 , v1 , K; h)

                         u ˜       ˜
         = Et Mt,t+h exp[˜0 Xt+h + v0 Zt+h ] I[−˜
                                                u                 ˜
                                                                 1 Xt+h −˜1 Zt+h <K]
                                                                         v

denotes the truncated real Laplace transform that we can deduce from the
(untruncated) complex Laplace transform. More precisely, we have the fol-
lowing formula [see Duffie, Pan, Singleton (2000) for details]:

                                              u ˜
                                          Pt (˜0 , v0 , h)
            u ˜ ˜ ˜
        Gt (˜0 , v0 , u1 , v1 , K; h) =
                                                2
                         +∞
                    1               u      u ˜        v
                            Im[Pt (˜0 + i˜1 y, v0 + i˜1 y; h)] exp(−iyK)
                −                                                         dy
                    π0                             y
                                                                            (96)
where Im(z) denotes the imaginary part of the complex number z. So,
formula (95) is quasi explicit since it only requires a simple (one-dimensional)
integration to derives the values of Gt .




                                               40
6     Empirical Analysis
6.1   Introduction
The purpose of this section is to propose an empirical analysis of the Gaus-
sian term structure models presented in Section 3, using observations on the
U. S. term structure of interest rates.
    We have seen that the Gaussian SVARN(p) Term Structure Models can
be characterized by an exogenous or endogenous factor (xt ). In the present
empirical analysis we follow an endogenous approach, given that it gives
several important advantages coming from the observations we have about
the factor, that is, the short rate in the scalar case, or yields at different
maturities in the multivariate framework. First, thanks to data, we are
able to detect stylized facts on interest rates which give us the possibility
to justify the autoregressive model with switching regimes we propose for
the historical dynamics of (xt ) : indeed, a large empirical literature on bond
yields show that interest rates have an historical multi-lag dynamics char-
acterized by switching of regimes [see, among the others, Hamilton (1989),
Christiansen and Lund (2003), Cochrane and Piazzesi (2005)]. Second, ob-
servations about the Gaussian-distributed factor lead to a maximum likeli-
hood estimation of historical parameters: in this way, we are able to test
hypotheses using likelihood ratio statistics, and rank the models in terms
of various information criteria. Finally, the difference between directly ob-
served and estimated factor values determine model residuals that can be
used to derive various diagnostic criteria.
    By a comparison with this multi-lag regime-switching endogenous ap-
proach, the classical continuous-time affine term structure approach ` la     a
Duffie and Kan (1996) and Dai and Singleton (2000) has some different
features. First, the factors are in general assumed not observable and there-
fore justifications for the (historical) factors dynamics, along with a precise
econometric analysis of model residuals, are not possible. Second, in order
to reconstruct a time series of the latent factors, for an exact maximum like-
lihood estimation, prices of some zero-coupon bonds are assumed to be per-
fectly observed in order to inverse the pricing equations [see Chen and Scott
(1987) and Pearson and Sun (1994)]; this inversion technique depends on the
zero-coupon bonds selected values of the parameters, which are not initially
available, and therefore the reconstructed time series is model-sensitive [see
Collin-Dufresne, Goldstein and Jones (2004) for a discussion]. Third, the
class of discrete-time affine (Compound Autoregressive) processes is much



                                      41
larger than the discrete-time counterpart of the continuous-time affine class6
[see Gourieroux, Monfort and Polimenis (2005), and Darolles, Gourieroux
and Jasiak (2006)].
    We will start the empirical analysis by the single-regime framework, with
the estimation of AR(p) and VAR(p) Factor-Based Term Structure models
[see Monfort and Pegoraro (2006)], in a scalar (short rate) and bivariate
(short rate and spread between the long and short rate) setting. The his-
torical parameters are estimated by exact Maximum Likelihood, while the
risk-neutral parameters are estimated by nonlinear least squares (NLLS).
We observe that the introduction of lags greatly improve the goodness-of-fit
of the models, and replicate stylized facts as the increasing shape of the
interest rate autocorrelation as a function of the time to maturity.
    The further step of the empirical analysis concerns the regime-switching
framework, that is, the estimation of the SARN(p) and bivariate SVARN(p)
term structure models, where the latent variable (zt ) is assumed to be a
two-states non-homogeneous Markov chain. As in the single-regime speci-
fications, the factor is the short rate in the scalar case, and the short rate
and spread in the bivariate case. The historical parameters are estimated
by maximization of the likelihood function calculated using the Kitagawa-
Hamilton filter [see Hamilton (1994)].

6.2    Description of the Data
The CRSP data set on the U. S. term structure of interest rates [treasury
zero-coupon bond (ZCB) yields], that we consider in the following appli-
cation, covers the period from June 1964 to December 1995 and contains
379 monthly observations for each of the nine maturities : 1, 3, 6 and 9
months and 1, 2, 3, 4 and 5 years [Figure 1 shows a plot of (annualized)
monthly ZCB yields of maturity 1, 12 and 60 months]7 . Summary statistics
about the above mentioned (annualized) yields are presented in Table 1 :
the term structure is, on average, upward sloping and the yields with larger
standard deviation, skewness and kurtosis are those with shorter maturities.
Moreover, yields are highly autocorrelated with a persistence which is in-
creasing with the time to maturity : we call this feature of interest rates as
   6
      For instance, the discrete-time Gaussian VAR(1) process has a continuous-time equiv-
alent if and only if there exists a matrix ϑ such that ϕ = exp(−ϑ), or, any Car(p) process
[like the Gaussian VAR(p) process], with p ≥ 2, cannot be the time discretized version of
a continuous-time affine process.
    7
      The same data set is used in the papers of Longstaff and Schwartz (1992) and Bansal
and Zhou (2002). We are grateful to Ravi Bansal and Hao Zhou for providing us the data
set.


                                           42
the increasing term structure of autocorrelations stylized fact [see Figure 2].
Table 1 : Summary Statistics on U. S. Monthly Yields from June 1964 to December 1995. ACF(k) indicates
the empirical autocorrelation between yields R(t, h) and R(t, h − k), with h and k expressed on a monthly basis.


      Maturity      1-m       3-m       6-m       9-m       1-yr      2-yr       3-yr      4-yr      5-yr


      Mean         0.0645    0.0672    0.0694    0.0709    0.0713    0.0734    0.0750    0.0762     0.0769
      Std. Dev.    0.0265    0.0271    0.0270    0.0269    0.0260    0.0252    0.0244    0.0240     0.0237
      Skewness     1.2111    1.2118    1.1518    1.1013    1.0307    0.9778    0.9615    0.9263     0.8791
      Kurtosis     4.5902    4.5237    4.3147    4.1605    3.9098    3.6612    3.5897    3.5063     3.3531
      Minimum      0.0265    0.0277    0.0287    0.0299    0.0311    0.0366    0.0387    0.0397     0.0398
      Maximum      0.1640    0.1612    0.1655    0.1644    0.1581    0.1564    0.1556    0.1582     0.1500


      ACF(5)       0.8288    0.8531    0.8579    0.8588    0.8604    0.8783    0.8915    0.8986     0.9053
      ACF(10)      0.7278    0.7590    0.7691    0.7699    0.7683    0.7885    0.8021    0.8075     0.8212
      ACF(15)      0.5887    0.6164    0.6285    0.6313    0.6395    0.6720    0.6908    0.6987     0.7201
      ACF(20)      0.4303    0.4631    0.4880    0.4996    0.5156    0.5742    0.6051    0.6193     0.6431




6.3      Estimated VAR(p) Factor-Based Term Structure Models
6.3.1      Estimation Method
The methodology we follow to estimate the parameters of the endogenous
VAR(p) term structure models is based on a consistent two-step procedure.
     In the first step, thanks to observations on the n-dimensional endogenous
factor (xt ), we estimate the [n(1 + np) + (n(n + 1)/2)]-dimensional vector
of parameters θP = [ν , vec(ϕ) , vech(σσ ) ] , characterizing the historical
dynamics (xt ), by Maximum Likelihood (ML).
     In the second step, using observations on yields with maturities different
from those used in the first step and for a given estimates of vech(σσ ), we es-
timate the [n(1+np)]-dimensional vector of parameters θQ = [(ν ∗ ) , vec(ϕ∗ ) ] ,
characterizing the risk-neutral dynamics of (xt ), by minimizing the sum of
squared fitting errors between the observed and theoretical yields. More pre-
cisely, in the scalar case, we estimate θQ by nonlinear lest squares (NLLS),
while, in the multivariate case, these parameters are estimated by con-
strained NLLS. The constraints are imposed to satisfy restrictions (69) im-
plied by the absence of arbitrage opportunity principle [see Section 3.8].
     Given the complete set of nine maturities of our data base, and given
a number m of yields used to estimate the vector of historical parameters
                       ∗
θP , we denote by Hm the set of remaining maturities used to estimate the
vector of risk-neutral parameters θQ .
     In the AR(p) Factor-Based case, xt is the one-month yield to maturity
R(t, 1) expressed at a monthly frequency, while, in the bivariate VAR(p)
Factor-Based case the factor is given by:
                               xt = [R(t, 1), R(t, 60) − R(t, 1)] ,

                                                      43
where [R(t, 60) − R(t, 1)] is the spread at date t between the five-year and
one-month yield to maturity, expressed at a monthly frequency [see Ang and
Bekaert (2002), and Ang, Piazzesi and Wei (2005) for similar specifications].
   The NLLS estimator for the AR(p) case, is determined by :
                 
                  ˆ
                  θQ = Arg minθQ S 2 (θQ ),
                 
                 
                 
                                T                                       (97)
                  2
                  S (θQ ) =              ˜ h) − R(t, h)]2 ,
                 
                                        [R(t,
                                      ∗
                                t=p h∈H1

               ∗
given the set H1 of maturities used to estimate the risk-neutral parameters;
˜ h) is the observed yield, and R(t, h) is the theoretical yield.
R(t,
    The constrained NLLS estimator, in our bivariate model specification, is
given by :       
                  ˆ                   2
                  θQ = Arg minθQ S (θQ )
                 
                 
                 
                 
                 
                              T
                 
                  2
                  S (θ ) =
                                        ˜
                                        [R(t, h) − R(t, h)]2 ,
                         Q
                              t=p h∈H2∗                                (98)
                 
                 
                 
                 
                 
                 
                 
                          T
                 
                  s. t.      ˜
                 
                            [R(t, 60) − R(t, 60)]2 = 0 ,
                          t=p

where R(t, h) is the theoretical yield. The constraint in the minimization
program (98) guarantees the absence of arbitrage opportunity on the five-
year yield to maturity.

6.3.2   Estimation Results for the AR(p) model
Historical Parameter Estimates
The maximum value of the mean Log-Likelihood and the values of the es-
timated vector of parameters θP = (ν, ϕ1 , . . . , ϕp , σ) of the AR(p) Factor-
Based Term Structure models, for p ∈ {1, . . . , 6}, are presented in Tables 2
and 3 [the t-values are given in parenthesis]. We also rank the models in
terms of the Akaike Information Criterion (AIC).




                                      44
Table 2 : AR(p) Factor-Based Term Structure models. Maximum value of the mean Log-Likelihood, AIC and
parameter estimates of ν and σ 2 . The short rate observations are expressed at a monthly frequency. Parameter
estimates are expressed in basis points (bp). We denote with mlogL the mean log-Likelihood of the AR(p)
model : mlogL = logL(θP |x1 , . . . , xT −p )/(T − p). (∗∗ ) denotes a parameter significant at 0.05; (∗ ) denotes a
parameter significant at 0.1. The Akaike Information Criterion (AIC) is given by 2mlogL − (2k/(T − p)), with
k denoting the dimension of θP .


                      AR(1)           AR(2)              AR(3)                AR(4)               AR(5)               AR(6)


     mlogL            5.95657         5.95868           5.96082              5.96134              5.97224           5.97092

      AIC             11.8973         11.8961           11.8950              11.8907              11.9071           11.8990


       ν             2.3∗∗ bp        2.1∗∗ bp           2.3∗∗ bp            2.1∗∗ bp             1.9∗∗ bp          1.9∗∗ bp

                      [2.6725]        [2.4822]          [2.6598]             [2.4761]             [2.1571]          [2.1262]

           2              ∗∗              ∗∗                  ∗∗                    ∗∗                ∗∗
       σ            0.0039     bp   0.0039      bp    0.0039       bp      0.0039        bp     0.0038     bp     0.0038∗∗ bp

                     [13.7483]       [13.7301]          [13.7118]           [13.6937]            [13.6754]         [13.6571]




Table 3 : AR(p) Factor-Based Term Structure models Parameter estimates of (ϕ1 , . . . , ϕp ). (∗∗ ) denotes a
parameter significant at 0.05; (∗ ) denotes a parameter significant at 0.1.


                        AR(1)         AR(2)           AR(3)              AR(4)                AR(5)           AR(6)


               ϕ1     0.9580 ∗∗     0.8798 ∗∗        0.8861 ∗∗          0.8912 ∗∗         0.8814 ∗∗          0.8806 ∗∗
                      [65.5620]     [17.2393]        [17.1525]          [17.1688]         [17.1628]          [16.9714]

               ϕ2                     0.0811         0.1547 ∗∗          0.1456 ∗∗         0.1672 ∗∗          0.1675 ∗∗
                                     [1.5938]         [2.2869]           [2.0843]          [2.4260]           [2.3885]

               ϕ3                                    −0.0829 ∗      −0.1372 ∗            −0.1595 ∗∗        −0.1586 ∗∗
                                                     [−1.6459]      [−1.9204]             [−2.3048]         [−2.2623]

               ϕ4                                                        0.0608            −0.0790            −0.0798
                                                                        [1.1455]          [−1.1788]          [−1.1240]

               ϕ5                                                                         0.1557 ∗∗          0.1510 ∗∗
                                                                                           [3.1048]           [2.4443]

               ϕ6                                                                                             0.0053
                                                                                                             [0.1232]




    An examination of the above displayed parameter estimates show, first
of all, that the historical dynamics of the (one-month to maturity) short rate
is not Markovian of order one, given that, in the AR(5) and AR(6) specifi-
cations, the parameters (ϕ1 , ϕ2 , ϕ3 , ϕ5 ) are always significative. Moreover,
the AIC indicates these models as the preferred ones.
    Another indication of the key role played by the lagged values, in deter-
mining well specified models for the short rate historical dynamics, is given
by the Ljung-Box test (for the Gaussian AR(p) short rate model residuals)
presented in Table 4.



                                                               45
Table 4 : AR(p) Factor-Based Term Structure models. Ljung-Box test for model residuals. (∗∗ ) denotes the
null hypothesis accepted at 0.05; (∗ ) denotes the null hypothesis accepted at 0.01.


           Lags         AR(1)          AR(2)           AR(3)           AR(4)            AR(5)           AR(6)


               5       13.2842 ∗    11.7456 ∗       9.2398 ∗∗      9.0748 ∗∗       0.2110 ∗∗        0.2933 ∗∗

            10         20.7693 ∗    19.4151 ∗      17.4643 ∗∗      17.3474 ∗∗      8.8896 ∗∗        8.7842 ∗∗
                                               ∗               ∗               ∗             ∗∗
            15         33.2058      28.8190         25.5995        25.3238         18.3397          18.2022 ∗∗

            20         43.9331         41.4980        39.3640          38.2511     28.2745 ∗∗       28.1044 ∗∗




We observe that, the models with small autoregressive orders are not able
to pass the test for large lags, denoting a lack of these specifications in
explaining the strong autocorrelation characterizing the short rate [see Table
1]. We are able to explain the short rate persistence also for large lags only
when the autoregressive order move to p = 5 and p = 6.

Risk-Neutral Parameter Estimates
The minimum value of the mean nonlinear least square criterion [S 2 (θQ )/(T − ˆ
p)] and the values of the estimated vector of risk-neutral parameters θQ =
(ν ∗ , ϕ∗ , . . . , ϕ∗ ), with p ∈ {1, . . . , 6}, are presented in Tables 5 and 6 [the
        1            p
t-values are given in parenthesis].
Table 5 : AR(p) Factor-Based Term Structure models. Minimum value of the mean NLLS criterion, RMSE,
MAE and parameter estimates of ν ∗ . Yields to maturity observations are expressed at a monthly frequency.
Parameter estimates are expressed in basis points (bp). (∗∗ ) denotes a parameter significant at 0.05; (∗ )
denotes a parameter significant at 0.1.


                            AR(1)           AR(2)              AR(3)           AR(4)            AR(5)           AR(6)


   S 2 (θQ )/(T − p)
        ˆ                 0.00000054      0.00000051      0.00000050       0.00000048      0.00000047      0.00000046

        RMSE               0.000736        0.000716        0.000709         0.000696         0.000687       0.000679

        MAE                0.000530        0.000526        0.000528         0.000524         0.000517       0.000509


          ν∗               1.10∗∗ bp       1.51∗∗ bp       1.52∗∗ bp        1.48∗∗ bp       1.48∗∗ bp       1.52∗∗ bp

                           [33.2526]       [22.6031]       [22.9266]        [22.9794]        [22.7051]      [22.4479]




                                                           46
Table 6 : AR(p) Factor-Based Term Structure models. Parameter estimates of (ϕ∗ , . . . , ϕ∗ ). (∗∗ ) denotes a
                                                                             1            p
parameter significant at 0.05; (∗ ) denotes a parameter significant at 0.1.


                    AR(1)        AR(2)        AR(3)          AR(4)          AR(5)          AR(6)


           ϕ∗
            1     0.9899 ∗∗    0.5076 ∗∗     0.7333 ∗∗     0.7758 ∗∗      0.7382 ∗∗       0.7037 ∗∗
                    [1877]      [9.6003]     [14.2703]     [15.4922]      [14.2057]       [13.3209]

           ϕ∗
            2                  0.4788 ∗∗      −0.0299      0.2291 ∗∗      0.2947 ∗∗       0.2998 ∗∗
                                [9.1313]     [−0.4132]      [2.8931]       [3.6124]        [3.6802]

           ϕ∗
            3                                0.2832 ∗∗     −0.3860 ∗∗    −0.1600 ∗∗        −0.1069
                                              [7.5221]      [−5.3681]     [−2.0834]       [−1.3898]

           ϕ∗
            4                                              0.3685 ∗∗     −0.1977 ∗∗         0.0123
                                                           [10.2233]      [−2.6864]        [0.1609]

           ϕ∗
            5                                                             0.3126 ∗∗      −0.2173 ∗∗
                                                                           [8.4180]       [−2.9386]

           ϕ∗
            6                                                                             0.2961 ∗∗
                                                                                           [7.7697]




    One may observe the significativity of all AR risk-neutral coefficients
in the AR(4) and AR(5) model specifications, and the significativity of the
coefficients ϕ∗ and ϕ∗ in the AR(6) case.
            5      6


6.3.3       Estimation Results for the bivariate VAR(p) model
Historical Parameter Estimates
As in the scalar case, we present the maximum value of the mean Log-
Likelihood and the values of the estimated vector of parameters θP = [ν ,
vec(ϕ) , vech(σσ ) ] of the bivariate VAR(p) Factor-Based Term Structure
models, for an AR order p = 1 and p = 2. These results are presented in
Tables 7 and 8 [the t-values are given in parenthesis]. We also rank the
models in terms of the Akaike Information Criterion (AIC)8 .




   8
    We have also estimated the historical parameters of the above mentioned bivariate
VAR(p) model, for p larger than 2, but the AIC criterion has indicated the first two AR
orders as the preferred ones.


                                                      47
Table 7 : VAR(p) Factor-Based Term Structure models. Maximum value of the mean Log-Likelihood, AIC and
                                        2          2
parameter estimates of (ν1 , ν2 ) and (σ1 , σ21 , σ2 ). The short rate and long rate observations are expressed at a
monthly frequency. Parameter estimates are expressed in basis points (bp). We denote with mlogL the mean
log-Likelihood of the VAR(p) model : mlogL = logL(θP |x1 , . . . , xT −p )/(T − p). (∗∗ ) denotes a parameter
significant at 0.05; (∗ ) denotes a parameter significant at 0.1. The Akaike Information Criterion (AIC) is given
by 2mlogL − (2k/(T − p)), with k denoting the dimension of θP .



                                                  VAR(1)                VAR(2)


                             mlogL                12.6403               12.6837

                             AIC                  25.2330               25.2984


                              ν1                  0.65 bp               1.32 bp

                                                  [0.5856]              [1.2262]

                              ν2                  0.80 bp               0.26 bp

                                                  [0.8157]              [0.2701]

                               2
                              σ1                0.0039∗∗ bp           0.0036∗∗ bp

                                                 [5.94750]              [6.02614]

                              σ21           −0.0028∗∗ bp             −0.0026∗∗ bp

                                                 [-6.0995]              [-6.2100]

                               2
                              σ2                0.0030∗∗ bp           0.0028∗∗ bp

                                                  [7.6713]              [8.0731]




Table 8 : VAR(1) and VAR(2) Factor-Based Term Structure models. Parameter estimates of (ϕ1 , ϕ2 ). (∗∗ )
denotes a parameter significant at 0.05; (∗ ) denotes a parameter significant at 0.1.


                                     VAR(1)                        VAR(2)


                        ϕ1          0.9742∗∗       0.0719∗∗       1.3318∗∗          0.6207∗∗

                                    [59.8835]       [2.2174]      [15.0111]         [7.0095]

                                     0.0091        0.8769∗∗       −0.2744∗∗         0.4353∗∗

                                     [0.6388]       [30.7835]     [-3.4988]         [5.5601]

                        ϕ2                                        −0.3648∗∗     −0.5762∗∗

                                                                  [-3.6117]         [-5.8201]

                                                                  0.2893∗∗          0.4642∗∗

                                                                   [3.2397]         [5.3020]




    If we consider the parameter estimates of Tables 7 and 8, we observe
that the joint historical dynamics of short rate and spread is not Markovian
of order one, given that, in the VAR(2) specification, the parameters in
the second autoregressive matrix ϕ2 are significantly different from zero.
Moreover, the AIC indicates this model as the preferred one. Table 7 shows
also that the constant term (ν1 , ν2 ) is not significative for both AR orders.

                                                             48
Table 9 : VAR(p) Factor-Based Term Structure models. LBshort denotes the value of the Ljung-Box test
statistic for short rate residuals, while LBspread denotes the value of the Ljung-Box test statistic for spread
residuals. Q denotes the value of the (adjusted) Portmanteau test statistic for the VAR(1) and VAR(2) model
residuals. (∗∗ ) denotes the null hypothesis accepted at 0.05; (∗ ) denotes the null hypothesis accepted at 0.01.


                                     VAR(1)                                   VAR(2)


             Lags      LBshort      LBspread         Q          LBshort      LBspread         Q


               5      10.2909 ∗∗     22.8735      58.9900      5.8051 ∗∗     12.4876 ∗     29.9870

              10      13.5065 ∗∗     34.1952      82.6678      9.0209 ∗∗     19.8705 ∗    51.2640 ∗

              15      29.2167 ∗      63.4161     134.2488     23.0114 ∗∗      35.8770      91.3809

              20      36.2600 ∗      69.7833     165.8618     30.0413 ∗∗      45.5552      127.9043




    With regard to the autocorrelation analysis of model residuals, presented
in Table 9, we observe that the ability of the VAR(p) model to explain the
serial dependence in the (univariate and joint) short rate and spread his-
torical dynamics, improves when we move from the VAR(1) to the VAR(2)
specification, even if both models are not able to pass the portmanteau test
on the bivariate residual vectors. Indeed, for p = 1, the Ljung-Box test on
the short rate residuals accepts serial non correlation only at 0.01 for large
lags, while, the same test on spread residuals rejects it strongly for all lags.
When we consider p = 2, the Ljung-Box, when applied on the short rate
residuals, always accepts non serial correlation at 0.05, while, when applied
on the spread residuals, it accepts it for five and ten lags, but rejects it for
larger lags. The rejection of the portmanteau test (stronger when p = 1)
stresses the difficulty of the specified models to explain the serial dependence
in the spread historical dynamics.

Risk-Neutral Parameter Estimates
We present the minimum value of the mean nonlinear least square crite-
           ˆ
rion [S 2 (θQ )/(T − p)] and the values of the estimated vector of risk-neutral
parameters θQ = [(ν ∗ ) , vec(ϕ∗ ) ] , for the bivariate VAR(1) and VAR(2)
Factor-Based Term Structure models, in Tables 10 and 11 [the t-values are
given in parenthesis].




                                                         49
Table 10 : AR(p) Factor-Based Term Structure models. Minimum value of the mean NLLS criterion, RMSE,
                                   ∗   ∗
MAE and parameter estimates of (ν1 , ν2 ). Yields to maturity observations are expressed at a monthly
frequency. Parameter estimates are expressed in basis points (bp). (∗∗ ) denotes a parameter significant at 0.05;
 ∗
( ) denotes a parameter significant at 0.1.


                                                   VAR(1)                    VAR(2)


                           S 2 (θQ )/(T − p)
                                ˆ                0.00000009              0.00000008

                               RMSE               0.000297                   0.000283

                                MAE               0.000208                   0.000198


                                   ∗
                                  ν1             −0.58∗∗ bp              −0.55∗∗ bp

                                                  [−6.6459]               [−4.9423]
                                   ∗                  ∗∗
                                  ν2              0.72        bp          0.71∗∗ bp

                                                   [5.7860]                  [4.5783]




Table 11 : VAR(1) and VAR(2) Factor-Based Term Structure models. Parameter estimates of (ϕ∗ , ϕ∗ ). (∗∗ )
                                                                                          1    2
denotes a parameter significant at 0.05; (∗ ) denotes a parameter significant at 0.1.


                                    VAR(1)                          VAR(2)


                      ϕ∗
                       1            1.0131∗∗    0.1105∗∗           1.3154∗∗        0.6020∗∗

                                   [805.8869]   [34.5743]          [28.4716]        [9.5120]
                                           ∗∗            ∗∗               ∗∗
                                   −0.0156      0.9072             −0.2528         0.4142∗∗

                                    [-8.6611]   [203.2978]         [-3.5778]        [4.2509]

                      ϕ∗
                       2                                           −0.3004∗∗
                                                                                  −0.4890∗∗

                                                                   [-6.5177]       [-7.8923]

                                                                   0.2342∗∗        0.4839∗∗

                                                                    [3.3244]        [5.0769]




    We find that, also in this bivariate risk-neutral (pricing) framework, the
lagged values of the short rate and spread play an important role in the
model specification. Moreover, one may observe the significativity of all
risk-neutral AR coefficients in the VAR(2) specification.
    The goodness-of-fit of the VAR(2) Factor-Based Term Structure Model,
outperforms the results of the VAR(1) specification. In other words, a
VAR(2) specification for the historical and risk-neutral dynamics of the fac-
tor driving term structure shapes, lead to propose a bivariate term structure
model which is able to fit yields to maturity better than the VAR(1) (bi-
variate endogenous Vasicek) specification.




                                                         50
6.3.4   The Term Structure of Autocorrelations
We have mentioned in Section 6.2 that interest rates are characterized by an
autocorrelation which is, for each lag, increasing with the time to maturity.
The purpose of this section is to propose this stylized fact as a new test that
a well specified term structure model should be able to overcome, and to
verify if the endogenous AR(p) and VAR(p) term structure models presented
in the previous sections are able to replicate the above mentioned shape of
the term structure of autocorrelations.
    This test is based on the comparison between the empirical and model-
implied term structure of yields autocorrelations, for each of the estimated
models, at 5, 10, 15 and 20 lags. The model-implied autocorrelations are
calculated on the basis of 10.000 simulations of yields from each of the fitted
                                                                 ∗
yield to maturity formula [estimated using maturities h ∈ H1 in the scalar
case, and maturities h ∈ H2   ∗ in the bivariate case], with each replication

having a sample size of 379. The replication of yields are based on the
simulation of factor values using the estimated historical dynamics. The
results, for each of the selected lag length, are respectively presented in
Figures 3, 4, 5 and 6.
    The first result we observe is that, for each lag, the model specifications
showing the best fitting of the short-term part (1, 3 and 6 months to ma-
turity) of the autocorrelation curve are the scalar AR(5) and AR(6) term
structure models, and not the bivariate specifications. Moreover, for 5 and
10 lags, this dominance is extended to 9 and 12 months to maturity. There-
fore, it seems that, if we want to correctly replicate the short-term part of
the autocorrelation curve, it is more important to increase the autoregres-
sive order in the AR(p) short-rate term structure model, than to move to
a bivariate (short rate and spread) setting. In other words, the AR(5) and
AR(6) short-rate term structure models lead to a more precise representa-
tion of the short-term interest rates persistence, than the proposed bivariate
cases. Nevertheless, for maturities larger than 1 year, the best fitting is ob-
tained from the bivariate VAR(1) and VAR(2) specifications, thanks to the
information on the long-term part of the yield curve supplied by the spread-
factor. In particular, while scalar models replicate an autocorrelation curve
which is flat in the long-term part, the VAR(1) and VAR(2) models are
able to reproduce the observed increasing shape for each lag and time to
maturity. In addition, if we compare the performances of the two bivariate
cases, we observe that the introduction of the second lag leads to improve
the replication of the short-term part of the autocorrelation curve for 5 and
10 lags, while, for larger lags or maturities, the best fitting is obtained in


                                      51
the VAR(1) setting.

6.4     Estimated SARN(p) Term Structure Models [to be com-
        pleted]
6.4.1     Estimated Models and Estimation Method
In the following sections we will present the parameter estimations of alterna-
tive regime-switching short rate historical dynamics, specified as particular
cases of the general relation (40) presented in Section 3.1. In particular, we
will first estimate the following model [see Hamilton (1988)] :

 xt+1 − ν zt = ϕ1 [ xt − ν zt−1 ] + . . . + ϕp [ xt+1−p − ν zt−p ] + (σ zt )εt+1 ,
                                                                              (99)
and, then, we will consider the generalization where also the autoregressive
coefficients are function of the regime indicator function :

   xt+1 − ν zt = (ϕ1 zt−1 ) [ xt − ν zt−1 ] + . . .
                                                                                         (100)
                               + (ϕp zt−p ) [ xt+1−p − ν zt−p ] + (σ zt )εt+1 .

Model (99) will be called SARN1 (p) model, while, model (100) will be called
SARN2 (p) model. In both cases (εt ) is a Gaussian white noise with N (0, 1)
distribution, p ∈ {1, . . . , 6}, and (zt ) is a 2-states non-homogeneous Markov
chain. In the latter case, the transition probabilities have the following
logistic form:

        P (zt+1 = ei |zt = ei , rt+1 ) = π(ei , ei , rt+1 )
                                                                                         (101)
                                                  eai +bi rt+1
                                          =                      , i ∈ {1, 2} .
                                                1 + eai +bi rt+1
The estimations are obtained from the maximization of the likelihood func-
tion calculated by means of the Kitagawa-Hamilton filter [see Hamilton
(1994)].

6.4.2     Estimation Results for the SARN1 (p) model
Historical Parameter Estimates
The maximum value of the mean Log-Likelihood and the values of the es-
timated vector of parameters θP = [ ν1 , ν2 , ϕ1 , . . . , ϕp , σ1 , σ2 , a1 , b1 , a2 , b2 ] of


                                              52
the SARN1 (p) model, for p ∈ {1, . . . , 6}, are presented in Tables 12, 13 and
14 [the t-values are given in parenthesis].
    An analysis of the parameter estimates presented in Table 13 shows
that, even with the introduction of (non-homogeneous) switching regimes
in the parameters, the short rate historical dynamics is not Markovian of
order one. Indeed, we have the significativity of the AR coefficient ϕ4 in the
SARN1 (4), SARN1 (5) and SARN1 (6) models, and the significativity of ϕ5 in
the SARN1 (5) case. In other words, the important role played by large AR
(historical) coefficients, for the AR(p) short rate dynamics of Section 6.3.2,
seems to be not just a model misspecification effect, induced by the lack
of nonlinearities in the modelisation, but a proper feature of the short rate
behavior. In additions, the AIC indicate models SARN1 (4) and SARN1 (5)
as the preferred specifications.
    In Table 12 we observe that (ν1 , ν2 ) and (σ1 , σ2 ) are always significantly
different from zero in each of the estimated models, while Table 14 high-
lights the significant role played by the lagged short rate in the transition
probabilities. Parameter b1 is significantly different from zero for models
SARN1 (4), SARN1 (5) and SARN1 (6), and parameter b2 is significant for
all the estimated models. In the first (low volatility) regime the negative
sign and the magnitude taken by b1 implies, when the short rate increases,
an increased probability of switching to the second (high volatility) regime,
while, in the high volatility regime, the positive sign of b2 induce, as the
short rate rises, an increased probability of remaining in this second regime
[see Figure 7, and Ang and Bekaert (2002b) for similar results].
Table 12 : SARN1 (p) model. Maximum value of the mean Log-Likelihood, AIC and parameter estimates of
           2    2
ν1 , ν2 , σ1 , σ2 . The short rate observations are expressed at a monthly frequency. Parameter estimates are
expressed in basis points (bp). We denote with mlogL the mean log-Likelihood of the model :
mlogL = logL(θP |x1 , . . . , xT −p )/(T − p). (∗∗ ) denotes a parameter significant at 0.05; (∗ ) denotes a
parameter significant at 0.1. The Akaike Information Criterion (AIC) is given by 2mlogL − (2k/(T − p)), with
k denoting the dimension of θP .


              Vasicek       SARN1 (1)        SARN1 (2)        SARN1 (3)       SARN1 (4)        SARN1 (5)         SARN1 (6)


 mlogL        5.95657         6.2972           6.3015           6.3012          6.3073           6.3107            6.3085

   AIC        11.8973         12.5468         12.5499          12.5439          12.5506         12.5519           12.5419


   ν1        0.23∗∗ bp        56∗∗ bp          69∗ bp          64∗∗ bp         56∗∗ bp          54∗∗ bp           54∗∗ bp
              [2.6725]        [4.1940]        [1.9038]         [2.4589]        [5.1030]         [4.9714]          [4.8997]

   ν2            −            51∗∗ bp          68∗ bp          63∗∗ bp         50∗∗ bp          48∗∗ bp           47∗∗ bp
                              [3.6951]        [1.8661]         [2.3859]        [4.4237]         [4.2362]          [4.1762]

    2
   σ1       0.0039∗∗ bp    0.00078∗∗ bp     0.00078∗∗ bp    0.00078∗∗ bp     0.00090∗∗ bp    0.00078∗∗ bp       0.00078∗∗ bp
              [13.7483]       [7.9280]         [7.6578]        [7.3951]         [9.0231]        [9.1997]           [9.2470]

    2
   σ2            −          0.0144∗∗ bp     0.0144∗∗ bp      0.0144∗∗ bp     0.0158∗∗ bp      0.0158∗∗ bp       0.0161∗∗ bp
                              [5.8170]        [5.8270]         [5.7826]        [5.6030]         [5.2660]          [5.2091]




                                                    53
Table 13 : SARN1 (p) model. Parameter estimates of (ϕ1 , . . . , ϕp ). (∗∗ ) denotes a parameter significant at
0.05; (∗ ) denotes a parameter significant at 0.1.


                Vasicek      SARN1 (1)        SARN1 (2)    SARN1 (3)    SARN1 (4)     SARN1 (5)      SARN1 (6)


       ϕ1    0.9580∗∗         0.9834∗∗        0.9001∗∗     0.9040∗∗     0.8946∗∗      0.9318∗∗       0.9294∗∗
             [65.5556]        [83.7917]       [17.7576]    [17.9132]    [14.7545]     [14.7533]      [15.0001]

       ϕ2          −             −             0.0911∗     0.1443∗∗     0.1788∗∗      0.1619∗∗       0.1671∗∗
                                               [1.7924]     [2.1623]     [2.2788]      [2.1545]       [2.2149]

       ϕ3          −             −                −          -0.0591      0.0482        0.0277         0.0274
                                                            [-1.2024]    [0.6917]      [0.4609]       [0.4352]

       ϕ4          −             −                −            −        -0.1412∗∗     -0.2549∗∗      -0.2579∗∗
                                                                         [-2.6212]     [-3.2006]      [-3.3820]

       ϕ5          −             −                −            −            −          0.1147∗         0.0953
                                                                                       [1.9014]       [1.2282]

       ϕ6          −             −                −            −            −             −            0.0202
                                                                                                      [0.3462]



Table 14 : SARN1 (p) model. Parameter estimates of (a1 , b1 , a2 , b2 ). (∗∗ ) denotes a parameter significant at
0.05; (∗ ) denotes a parameter significant at 0.1.


         Vasicek       SARN1 (1)           SARN1 (2)      SARN1 (3)      SARN1 (4)      SARN1 (5)       SARN1 (6)


 a1         −             5.5958∗∗          4.0455∗∗       4.1295∗∗      6.1772∗∗        6.1917∗∗        6.2571∗∗
                          [3.9982]           [3.3300]       [3.0575]      [4.0186]        [4.5137]        [4.5461]

  b1        −          -512.1201∗∗         -292.6488       -306.5667    -588.1480∗∗    -617.5216∗∗      -629.2369∗∗
                         [-2.2888]          [-1.4467]       [-1.3875]     [-2.4228]      [-2.7342]        [-2.7924]

 a2         −           -4.5652∗∗          -9.1173∗∗       -9.2568∗       -4.3966        -4.1375∗         -4.0951∗
                         [-2.0169]          [-1.9671]      [-1.8638]     [-1.7952]       [-1.9189]        [-1.8769]

  b2        −          912.7172∗∗         1657.95914∗∗    1676.5307∗∗   880.3991∗∗      770.6290∗∗      757.2393∗∗
                        [2.4675]             [2.0872]       [1.9784]      [2.1495]        [2.0248]        [2.0073]




    With regard to the ability of the estimated models to explain autocorre-
lation in the short rate historical dynamics, we observe that the Ljung-Box
test for models residuals [see Table 15] rejects the null hypothesis of non
serial correlation for each lag, for the SARN1 (1) model, while the lower
values of the test statistics are obtained for the SARN1 (4), SARN1 (5) and
SARN1 (6) cases, where the null is accepted for each lag. This means that,
also in this regime-switching setting, as in the single-regime case presented in
Section 6.3.2, the introduction of lags gives the possibility to explain better
the observed short rate autocorrelation.
    The introduction of switching regimes turns out to be determinant in the
explanation of nonlinear serial dependence. Indeed, if we study the presence
of serial correlation in squared model residuals, the Ljung-Box test accepts
it for univariate and bivariate single-regime models at each lag9 , while the
   9
       The results are available upon request from the authors.



                                                            54
introduction of switching regimes strongly reduces the values of the test
statistics, and the test rejects serial correlation for each lag and AR order.
Table 15 : SARN1 (p) model. Ljung-Box test for model residuals. (∗∗ ) denotes the null hypothesis accepted at
0.05; (∗ ) denotes the null hypothesis accepted at 0.01.


    Lags     Vasicek     SARN1 (1)    SARN1 (2)     SARN1 (3)      SARN1 (4)      SARN1 (5)      SARN1 (6)


      5     13.2842∗      17.5327      8.8110∗∗      7.2849∗∗      1.6654∗∗       0.6126∗∗       1.0633∗∗

     10     20.7693∗      25.1401      14.0125∗∗    13.4573∗∗      8.8147∗∗       7.7319∗∗       8.8291∗∗
                                                ∗             ∗∗             ∗∗             ∗∗
     15      33.2058      40.4152      25.0287      24.8636        21.1171        19.8278        18.7125∗∗

     20      43.9331      41.8320      26.7724∗∗    26.3952∗∗      22.8609∗∗      21.7085∗∗      20.0667∗∗



Table 16 : SARN1 (p) model. Ljung-Box test for model squared residuals. (∗∗ ) denotes the null hypothesis
accepted at 0.05; (∗ ) denotes the null hypothesis accepted at 0.01.


    Lags     Vasicek     SARN1 (1)    SARN1 (2)     SARN1 (3)      SARN1 (4)      SARN1 (5)      SARN1 (6)


      5      74.8243     7.6442∗∗      2.1786∗∗      1.5209∗∗      11.7751∗∗      7.9440∗∗       5.7324∗∗

      10     129.0656    15.4161∗∗     5.3786∗∗      3.6072∗∗      14.1805∗∗      12.1060∗∗      8.1270∗∗
                                ∗∗           ∗∗            ∗∗            ∗∗             ∗∗
      15     147.6571    20.5339       7.4220        6.0125        18.2715        18.4684        15.2539∗∗

      20     160.3003    26.0210∗∗     9.5002∗∗      7.4935∗∗      19.1359∗∗      20.6802∗∗      18.8141∗∗




6.4.3      Estimation Results for the SARN2 (p) model
Historical Parameter Estimates
The maximum value of the mean Log-Likelihood and the values of the es-
timated vector of parameters θP = [ ν1 , ν2 , ϕ1 , . . . , ϕp , σ1 , σ2 , a1 , b1 , a2 , b2 ] of
the SARN2 (p) model, for p ∈ {1, . . . , 6}, are presented in Tables 17, 18 and
19 [the t-values are given in parenthesis].
    We observe, in this model specification where also the AR coefficients
are regime-switching, that coefficient ϕ4 is significantly different from zero
in both regimes and for all the estimated models, and that, in the SARN2 (6)
case, also the coefficients ϕ5 and ϕ6 are significantly different from zero in
each regime [see Table 17]. These results give a further important indication
about the non-Markovian historical dynamics of the short rate.
    Parameters (ν1 , ν2 ) and (σ1 , σ2 ) are significantly different from zero in all
the estimated models [see Table 17], and the coefficients b1 and b2 , character-
izing the state dependence of transition probabilities, are always significantly
different from zero with a magnitude and sign of the same kind as in the
SARN1 (p) case [see Table 19].
    With regard to the Ljung-Box test for model residuals [see Table 20]
and squared model residuals [see Table 21], we find, as in the SARN1 (p)

                                                    55
model, the important role played by lagged values to explain short rate
autocorrelation, and the one played by switching regimes to explain the
short rate nonlinear serial dependence.
Table 17 : SARN2 (p) model. Maximum value of the mean Log-Likelihood, AIC and parameter estimates of
           2    2
ν1 , ν2 , σ1 , σ2 . The short rate observations are expressed at a monthly frequency. Parameter estimates are
expressed in basis points (bp). We denote with mlogL the mean log-Likelihood of the model : mlogL =
logL(θP |x1 , . . . , xT −p )/(T − p). (∗∗ ) denotes a parameter significant at 0.05; (∗ ) denotes a parameter
significant at 0.1. The Akaike Information Criterion (AIC) is given by 2mlogL − (2k/(T − p)), with k denoting
the dimension of θP .


              Vasicek       SARN2 (1)        SARN2 (2)       SARN2 (3)        SARN2 (4)       SARN2 (5)          SARN2 (6)


 mlogL        5.95657         6.3049           6.3148          6.3212           6.3270          6.3321             6.3356

   AIC        11.8973         12.5569         12.5659          12.5679         12.5687          12.5679           12.5640



   ν1        0.23∗∗ bp       78∗∗ bp           50∗∗ bp         51∗∗ bp         38∗∗ bp          46∗∗ bp            52∗∗ bp
              [2.6725]       [8.5840]         [19.0153]       [20.1164]        [6.3938]        [10.3287]          [15.8177]

   ν2            −           75∗∗ bp           47∗∗ bp         46∗∗ bp         37∗∗ bp          44∗∗ bp            46∗∗ bp
                             [7.3373]         [13.9981]       [14.7861]        [7.1289]        [11.4610]          [12.1064]

    2
   σ1      0.0039∗∗ bp     0.00078∗∗ bp    0.00078∗∗ bp     0.00068∗∗ bp    0.00068∗∗ bp     0.00078∗∗ bp       0.00068∗∗ bp
             [13.7483]        [8.4306]        [8.9994]         [9.5433]        [6.8239]         [8.7889]           [8.9726]

    2
   σ2            −         0.0139∗∗ bp      0.0121∗∗ bp      0.0121∗∗ bp     0.0154∗∗ bp      0.0161∗∗ bp       0.0156∗∗ bp
                             [5.9586]         [5.2087]         [5.3261]        [4.8471]         [3.9924]          [4.7739]




                                                     56
Table 18 : SARN2 (p) model. Parameter estimates of (ϕ1 , . . . , ϕp ). (∗∗ ) denotes a parameter significant at
0.05; (∗ ) denotes a parameter significant at 0.1.


         zt       Vasicek      SARN2 (1)       SARN2 (2)       SARN2 (3)       SARN2 (4)      SARN2 (5)    SARN2 (6)


  ϕ1     e1    0.9580∗∗         0.9958∗∗        1.0867∗∗        1.0531∗∗        1.0496∗∗      1.0611 ∗∗     1.0663∗∗
               [65.5556]       [187.8362]       [19.6710]       [20.7064]       [19.0948]     [20.5741]     [21.5081]

         e2         −           0.8715∗∗        0.7625∗∗        0.7178∗∗        0.7056∗∗      0.6964∗∗      0.7808∗∗
                                [20.0540]       [15.6587]       [14.5150]       [13.4447]     [13.2841]     [16.3176]

  ϕ2     e1         −                −          -0.1017∗          0.0473         0.0441         0.0508        0.0269
                                                [-1.8322]        [0.6045]       [0.4390]       [0.6064]      [0.3371]

         e2         −                −          0.1970∗∗        0.2770∗∗        0.2927∗∗      0.3009∗∗      0.1618∗∗
                                                [3.7571]         [4.0588]        [4.0965]      [4.3338]      [2.6858]

  ϕ3     e1         −                −                 −         -0.0276         0.0176         -0.0389      -0.0034
                                                                [-0.5664]       [0.1707]       [-0.3909]    [-0.0715]

         e2         −                −                 −       -0.1202 ∗∗        0.0861         0.0602       -0.0467
                                                                [-2.1338]       [1.2984]       [1.0520]     [-1.0932]

  ϕ4     e1         −                −                 −            −          -0.1322∗∗      -0.1557 ∗∗   -0.3221 ∗∗
                                                                                [-2.3060]      [-2.2900]    [-5.3856]

         e2         −                −                 −            −          -0.1116∗∗      -0.1814 ∗∗   -0.2013 ∗∗
                                                                                [-2.6367]      [-3.1485]    [-4.4420]

  ϕ5     e1         −                −                 −            −              −            0.0626      0.3358 ∗∗
                                                                                               [1.1601]      [5.4958]

         e2         −                −                 −            −              −          0.0875 ∗∗     0.3858 ∗∗
                                                                                               [1.9589]      [7.7233]

  ϕ6     e1         −                −                 −            −              −              −        -0.1228 ∗∗
                                                                                                            [-2.7012]

         e2         −                −                 −            −              −              −        -0.1791 ∗∗
                                                                                                            [-4.9709]



Table 19 : SARN2 (p) model. Parameter estimates of (a1 , b1 , a2 , b2 ). (∗∗ ) denotes a parameter significant at
0.05; (∗ ) denotes a parameter significant at 0.1.


        Vasicek       SARN2 (1)          SARN2 (2)          SARN2 (3)       SARN2 (4)       SARN2 (5)      SARN2 (6)


  a1       −            5.2109∗∗          5.7561∗∗          5.8630∗∗         4.1691∗∗        4.4126∗∗       6.1397∗∗
                        [4.4917]           [5.9835]          [6.2337]         [4.2009]        [4.3359]       [6.7242]

  b1       −         -471.7369∗∗         -617.2554∗∗       -639.1367∗∗      -322.2644∗∗     -328.7332∗∗    -714.5599∗∗
                       [-2.3394]           [-4.1536]         [-4.3501]        [-2.3534]       [-2.0826]      [-5.1708]

  a2       −            -4.8429∗∗         -2.8015∗∗         -2.8419∗          -1.7158         -1.8244       -2.3127∗
                         [-2.0089]         [-2.0076]        [-1.9278]        [-1.2910]       [-1.3822]      [-1.9331]

  b2       −         942.0812∗∗          477.1256∗∗        454.7091∗∗       305.8535∗       354.1494∗∗     230.9667∗
                      [2.3251]             [2.5297]          [2.1931]        [1.7822]         [2.0198]      [1.7044]




                                                              57
Table 20 : SARN2 (p) model. Ljung-Box test for model residuals. (∗∗ ) denotes the null hypothesis accepted at
0.05; (∗ ) denotes the null hypothesis accepted at 0.01.


    Lags     Vasicek     SARN2 (1)    SARN2 (2)     SARN2 (3)      SARN2 (4)      SARN2 (5)      SARN2 (6)


      5     13.2842∗     14.7609∗      8.4215∗∗      5.8669∗∗      1.5441∗∗       1.3751∗∗       5.7215∗∗

     10     20.7693∗     23.1617∗      18.0274∗∗    13.9979∗∗      4.4104∗∗       8.3853∗∗       9.1638∗∗
                                                              ∗∗             ∗∗             ∗∗
     15      33.2058      34.9264       31.4173     23.9347        12.6148        19.4970        20.4498∗∗

     20      43.9331     35.7057∗      34.7604∗     26.9634∗∗      15.9674∗∗      24.5028∗∗      28.9044∗∗



Table 21 : SARN2 (p) model. Ljung-Box test for squared model residuals. (∗∗ ) denotes the null hypothesis
accepted at 0.05; (∗ ) denotes the null hypothesis accepted at 0.01.


    Lags     Vasicek     SARN2 (1)    SARN2 (2)     SARN2 (3)      SARN2 (4)      SARN2 (5)      SARN2 (6)


      5      74.8243     13.7833∗      3.1002∗∗      2.9669∗∗      4.8844∗∗       3.3438∗∗       11.5600∗∗

      10     129.0656    19.0444∗      7.0689∗∗      4.0882∗∗      6.4805∗∗       13.6854∗∗      13.4562∗∗
                                ∗∗           ∗∗            ∗∗            ∗∗             ∗∗
      15     147.6571    23.4250       9.6184        9.1265        11.8291        21.4573        16.0471∗∗

      20     160.3003    27.5641∗∗     14.5261∗∗    10.7503∗∗      13.8441∗∗      22.1072∗∗      19.9279∗∗




                                                    58
7    Conclusions
This paper has developed a general discrete-time modeling of the term struc-
ture of interest rates able to take into account at the same time several
important features : a) interest rates with an historical dynamics involving
several lagged values and the present and past values of the (non homoge-
neous) regime indicator function (zt ); b) a specification of the exponential-
affine stochastic discount factor (SDF) with time-varying coefficients imply-
ing stochastic risk premia, functions of the present and past values of the
factor (xt ) and the regime indicator function (zt ); c) the possibility to de-
rive explicit or quasi explicit formulas for zero-coupon bond (the Generalized
Linear Term Structure formula) and interest rate derivative prices; d) the
positiveness of the yields at each maturity (in the Autoregressive Gamma
framework), regardless the endogenous or exogenous nature of the factor
(xt ).
     We have studied, in the Gaussian framework, the SARN(p) and the
SVARN(p) Term Structure models, providing a generalization of the recent
modelisation proposed by Dai, Singleton and Yang (2005). In the Autore-
gressive Gamma setting, we have proposed the SARG(p) and the SVARG(p)
Term Structure models, extending several discrete time CIR term structure
models like Bansal and Zhou (2002).
     In the last section of the paper, using monthly observations on the U.S.
term structure of interest rates, from June 1964 to December 1995, we have
estimated several endogenous Gaussian (single-regime and regime-switching)
Term Structure models, with scalar and bivariate factors. We have verified
that, the goodness-of-fit of these models improves when the autoregressive
order increases, under both the historical and risk-neutral setting.




                                      59
                                 Appendix 1

                          Proof of Proposition 3




    E[exp(u1 y1,t+1 + u2 y2,t+1 )| y1,t , y2,t ]

= E exp(u2 y2,t+1 )E exp(u1 y1,t+1 )| y1,t , y2,t+1         | y1,t , y2,t


= exp [a1 (u1 )(β11 y1,t + β12 y2,t ) + b1 (u1 )] Et (u2 + a1 (u1 )βo )y2,t+1 | y1,t , y2,t

= exp [a1 (u1 )(β11 y1,t + β12 y2,t ) + b1 (u1 )

          +a2 (u2 + a1 (u1 )βo )(β21 y1,t + β22 y2,t ) + b2 (u2 + a1 (u1 )βo )]

= exp {[a1 (u1 )β11 + a2 (u2 + a1 (u1 )βo )β21 ]y1,t

          +[a1 (u1 )β12 + a2 (u2 + a1 (u1 )βo )β22 ]y2,t + b1 (u1 ) + b2 (u2 + a1 (u1 )βo )} .




                                        60
                                 Appendix 2

                      Proof of propositions 5 and 6


Proof of Proposition 5 : Let us first consider an asset providing the
payoff exp(−θxt+1 ) at t + 1; the price at t of this asset is
 pt = Et [Mt,t+1 exp(−θxt+1 )]

      = exp −rt+1 − θν(Zt ) − θϕ(Zt ) Xt − 1 Γ(Xt , Zt )2 ×
                                           2

              Et {exp [[Γ(Xt , Zt ) − θσ(Zt )] εt+1 ]}

                                                                         θ2 2
      = exp −rt+1 − θν(Zt ) − θϕ(Zt ) Xt − θΓ(Xt , Zt )σ(Zt ) +          2 σ (Zt )   ,

and
            Et pt+1 = Et [exp(−θxt+1 )]

                     = exp [−θν(Zt ) − θϕ(Zt ) Xt ] ×

                               Et {exp [[−θσ(Zt )] εt+1 ]}

                                                             θ2 2
                     = exp −θν(Zt ) − θϕ(Zt ) Xt +           2 σ (Zt )   .
Finally, from Definition 4, the risk premium is:
                           ωt (θ) = θΓ(Xt , Zt )σ(Zt ) .
Proof of Proposition 6 : Similarly, if we consider a digital asset providing
one money unit at t + 1 if zt+1 = ej , we get:
              pt = Et [Mt,t+1 I(ej ) (zt+1 )]

                  = exp[−rt+1 ] exp[−δj (Xt , Zt )]π (zt , ej ; Xt ) ,
and
                          Et pt+1 = Et [I(ej ) (zt+1 )]

                                     = π (zt , ej ; Xt ) .
Therefore, applying Definition 4, the risk premium is :
                               ωt (θ) = δj (Xt , Zt ) .


                                         61
                                     Appendix 3

                           Proof of Proposition 7


The Laplace transform of the one-period conditional risk-neutral probability
is:
        Q
       Et [exp(uxt+1 + v zt+1 )]

                                1
   = Et {exp[Γ(Xt , Zt ) εt+1 − 2 Γ(Xt , Zt )2 − δ (Zt , Xt )zt+1

              +u[ν(Zt ) + ϕ(Zt ) Xt + σ(Zt )εt+1 ] + v zt+1 ]}

                                                       1
   = exp u[ ϕ (Zt )Xt + Γ(Xt , Zt )σ(Zt )] + uν(Zt ) + 2 u2 σ(Zt )2 ×

                 J
                 j=1 π (zt , ej   ; Xt ) exp [(v − δ(Zt , Xt )) ej ]

                                                                    1
   = exp u[ ϕ(Zt ) + γ (Zt )σ(Zt )] Xt + u[ν(Zt ) + γ(Zt )σ(Zt )] + 2 u2 σ(Zt )2 ×
                     ˜

                 J
                 j=1 π (zt , ej   ; Xt ) exp [(v − δ(Zt , Xt )) ej ] .

Therefore, we get the result of Proposition 7.




                                            62
                               Appendix 4

                         Proof of Proposition 8




 B(t, h) = exp(Ch Xt + Dh Zt )

                          Q
          = exp (−rt+1 ) Et [B(t + 1, h − 1)]

                                 Q
          = exp [−c Xt − d Zt ] Et exp Ch−1 Xt+1 + Dh−1 Zt+1

          = exp [−c Xt − d Zt ] ×

                 Q                                                          ˜
                Et exp Ch−1 Φ∗ Xt + ν ∗ Zt + σ ∗ Zt ξt+1 e1 + D1,h−1 zt+1 + Dh−1 Zt


          = exp     Φ∗ Ch−1 − c                                            ˜
                                    Xt + −d + C1,h−1 ν ∗ + 1 C1,h−1 σ ∗2 + Dh−1
                                                              2                       Zt ×
                                                           2


                      Q
                     Et exp D1,h−1 zt+1


          = exp      Φ∗ Ch−1 − c    Xt +


                                                        ˜
                      −d + C1,h−1 ν ∗ + 1 C1,h−1 σ ∗2 + Dh−1 + F (D1,h−1 ) Zt
                                           2                                      ,
                                        2


and the result follows by identification.




                                     63
                               Appendix 5

                         Proof of Proposition 9


Using the lag polynomials:
                         1
               Ch (L) = − (C1,h + C2,h L + . . . + Cp,h Lp−1 )
                         h
                        1
              Dh (L) = − (D1,h + D2,h L + . . . + Dp+1,h Lp )
                        h

            Ψ(L, Zt ) = 1 − ϕ1 (Zt )L − . . . − ϕp (Zt )Lp ,

we get from (54):

                     R(t, h) = Ch (L)xt + Dh (L) zt ,

and
 Ψ(L, Zt ) R(t + 1, h) = Ch (L) Ψ(L, Zt ) xt+1 + Dh (L) Ψ(L, Zt ) zt+1 ,

                       = Dh (L) Ψ(L, Zt ) zt+1 + Ch (L) ν(Zt ) + Ch (L)[(σ ∗ Zt ) εt+1 ] .




                                      64
                                    Appendix 6

                          Proof of Proposition 11


The Laplace transform of the one-period conditional risk-neutral distribu-
tion is :
        Q
       Et [exp(u xt+1 + v zt+1 )]
                 ˜

               ˜                1   ˜           ˜                   ˜
   = Et {exp[Γ(Xt , Zt ) εt+1 − 2 Γ(Xt , Zt ) Γ(Xt , Zt ) − δ (Zt , Xt )zt+1

                   ν        ˜     ˜
               +u [˜(Zt ) + Φ(Zt )Xt + S(Zt )εt+1 ] + v zt+1 ]}

             ˜     ˜            ˜              ˜         1
   = exp u [ Φ(Zt )Xt + S(Zt )Γ(Xt , Zt )] + u ν (Zt ) + 2 u S(Zt )S(Zt ) u ×

                 J                 ˜                    ˜
                 j=1 π(zt , ej   ; Xt ) exp (v − δ(Zt , Xt )) ej

             ˜              ˜      ˜ ˜                                     1
   = exp u [ Φ(Zt ) + S(Zt )Γ(Zt , Xt )]Xt + u [ ν (Zt ) + S(Zt )γ(Zt )] + 2 u S(Zt )S(Zt ) u ×
                                                 ˜

                 J                 ˜                    ˜
                 j=1 π(zt , ej   ; Xt ) exp (v − δ(Zt , Xt )) ej .

Therefore, we get the result of Proposition 11.




                                           65
                               Appendix 7

                        Proof of Proposition 12




                  ˜
 B(t, h) = exp(Ch Xt + dh Zt )

                          Q
          = exp (−rt+1 ) Et [B(t + 1, h − 1)]

                   ˜          Q          ˜
          = exp −c Xt − d Zt Et exp Ch−1 Xt+1 + Dh−1 Zt+1

                   ˜
          = exp −c Xt − d Zt ×

                 Q          ˜ ˜              ∗       ∗
                Et exp Ch−1 Φ∗ Xt + C1,h−1 (ν1 Zt + S1 (Zt )ξt+1 )

                                   ∗       ∗                             ˜
                       +Cp+1,h−1 (ν2 Zt + S2 (Zt )ξt+1 ) + D1,h−1 zt+1 + Dh−1 Zt


          = exp     ˜
                    Φ∗ Ch−1 − c    Xt + −d + C1,h−1 ν1 + 1 C1,h−1 (σ1 + ϕ∗2 σ2 )
                                                     ∗      2       ∗2       ∗2
                                                         2               o



                                                     ˜
                     + Cp+1,h−1 ν2 + 1 Cp+1,h−1 σ2 + Dh−1 + F (D1,h−1 ) Zt ,
                                 ∗      2        ∗2
                                     2


and the result follows by identification.




                                     66
                          Appendix 8

                      Proof of Lemma 1



                                       ρ(u + α)     ρα
     ˜                ˜
     a(u + α; ρ, µ) − a(α; ρ, µ) =               −
                                     1 − (u + α)µ 1 − αµ
                                                     u
                                = ρ
                                      (1 −   αµ)2   − uµ(1 − αµ)
                                         ρ         u
                                =             2 1 − uµ
                                     (1 − αµ)      1−αµ

                                       ρ∗ u
                                =            = a(u; ρ∗ , µ∗ ) ;
                                               ˜
                                     1 − uµ∗



˜ + α; ν, µ) − ˜ ν, µ) = −ν log(1 − (u + α)µ) + −ν log(1 − αµ)
b(u            b(α;

                                      1 − (u + α)µ
                         = −ν log
                                         1 − αµ

                                            uµ
                         = −ν log 1 −
                                          1 − αµ

                         = −ν log(1 − uµ∗ )

                         = ˜ ν, µ∗ ) .
                           b(u;




                                67
                               Appendix 9

                        Proof of Proposition 13




   B(t, h) = exp(Ch Xt + Dh Zt )

                                   Q
            = exp [−c Xt − d Zt ] Et exp Ch−1 Xt+1 + Dh−1 Zt+1

                                 ˜         ˜
            = exp −c Xt − d Zt + Ch−1 Xt + Dh−1 Zt

                        Q
                       Et exp C1,h−1 xt+1 + D1,h−1 zt+1

                                 ˜         ˜
            = exp −c Xt − d Zt + Ch−1 Xt + Dh−1 Zt + A∗ (C1,h−1 ) Xt


                       −ν ∗ Zt log(1 − C1,h−1 µ∗ ) + F (D1,h−1 )Zt ,

and the result follows by identification.




                                     68
                                   Appendix 10

                             Proof of Proposition 14


The joint conditional Laplace transform of (x1,t+1 , x2,t+1 ) in the risk-neutral
world is:
        Q
       Et [exp(u1 x1,t+1 + u2 x2,t+1 ) | x1,t , x2,t , zt ]

                                            ˜
   = exp A2,t [u2 + Γ2t + a1,t (u1 + Γ1t )] Xt + b2,t (u2 + Γ2t + a1,t (u1 + Γ1t ))

                             ˜
          + A1,t (u1 + Γ1t ) Xt + b1,t (u1 + Γ1t )

                                           ˜
                − A2,t (Γ2t + a1,t (Γ1t )) Xt − b2,t (Γ2t + a1,t (Γ1t ))

                                                                ˜
                                                  − A1,t (Γ1t ) Xt − b1,t (Γ1t ) .

Using lemma 2 we get:

                A2,t [u2 + Γ2t + a1,t (u1 + Γ1t )] − A2,t (Γ2t + a1,t (Γ1t ))

           = A [u2 + a1,t (u1 + Γ1t ) − a1,t (Γ1t ); ϕ∗ , µ∗ ] ,
                                                      2t 2t

with
                                              ϕ2t
                       ϕ∗ =
                        2t
                                   {1 − [Γ2t + a1,t (Γ1t )]µ2t }2
                                              µ2t
                       µ∗
                        2t    =                                  ,
                                   {1 − [Γ2t + a1,t (Γ1t )]µ2t }
and using lemma 1

               A [u2 + a1,t (u1 + Γ1t ) − a1,t (Γ1t ); ϕ∗ , µ∗ ]
                                                        2t 2t


          = A [u2 + a(u1 + Γ1t ; ϕot , µ1,t ) − a(Γ1t ; ϕot , µ1,t ); ϕ∗ , µ∗ ]
                    ˜                           ˜                      2t 2t


          = A u2 + a(u1 ; ϕ∗ , µ∗ ); ϕ∗ , µ∗
                   ˜       ot 1,t     2t 2t


          = A∗ [u2 + a∗ (u1 )] (say)
             2,t      1,t




                                            69
with
                                             ϕot
                             ϕ∗ =
                              ot
                                        (1 − Γ1t µ1t )2
                                            µ1t
                             µ∗ =
                              1t                       .
                                        (1 − Γ1t µ1t )
Similarly, we get:

               b2,t [u2 + Γ2t + a1,t (u1 + Γ1t )] − b2,t (Γ2t + a1,t (Γ1t ))

           = ˜ u2 + a(u1 ; ϕ∗ , µ∗ ); ν2t , µ∗
             b      ˜       ot 1,t
                                       ∗
                                             2t


           = b∗ [u2 + a∗ (u1 )] (say) ,
              2,t      1t



               b1,t (u1 + Γ1t ) − b1,t (Γ1t )

           = ˜1 (u1 ; ν1t , µ∗ )
             b         ∗
                             1t


           = b∗ (u1 ) (say) ,
              1,t



               A1,t (u1 + Γ1t ) − A1,t (Γ1t )

           = A1 (u1 ; ϕ∗ , µ∗ )
                       1t 1t


           = A∗ (u1 ) (say) ,
              1,t

with
                                             ϕ1t
                            ϕ∗ =
                             1t                         .
                                        (1 − Γ1t µ1t )2
And finally, the joint conditional Laplace transform of (x1,t+1 , x2,t+1 ) be-
comes:
                Q
               Et [exp(u1 x1,t+1 + u2 x2,t+1 ) | x1t , x2t , zt ]

                                                 ˜
           = exp [A∗ (u1 ) + A∗ [u2 + a∗ (u1 )]] Xt
                   1,t        2,t      1,t


                                      + b∗ [u2 + a∗ (u1 )] + b∗ (u1 ) ,
                                         2,t      1,t         1,t

and the result of Proposition 14 is proved.

                                         70
                              Appendix 11

                        Proof of Proposition 15




                  ˜
 B(t, h) = exp(Ch Xt + Dh Zt )

                   ˜          Q          ˜
          = exp −c Xt − d Zt Et exp Ch−1 Xt+1 + Dh−1 Zt+1

                   ˜           ˜    ˜    ˜
          = exp −c Xt − d Zt + Ch−1 Xt + Dh−1 Zt

                      Q
                     Et exp C1,h−1 x1,t+1 + Cp+1,h−1 x2,t+1 + D1,h−1 zt+1

                   ˜           ˜    ˜    ˜                      ˜
          = exp −c Xt − d Zt + Ch−1 Xt + Dh−1 Zt + A∗ (C1,h−1 ) Xt
                                                    1

                       ∗                                                       ˜
                     −ν1 Zt log(1 − C1,h−1 µ∗ ) + A∗ [Cp+1,h−1 + a∗ (C1,h−1 )] Xt
                                            1      2              1

                       ∗
                     −ν2 Zt log[1 − (Cp+1,h−1 + a∗ (C1,h−1 ))µ∗ ] + F (D1,h−1 )Zt ,
                                                 1            2

and the result follows by identification.




                                     71
                        REFERENCES


   Ahn, D., Dittmar, R., and R. Gallant (2002) : ”Quadratic Term Struc-
ture Models : Theory and Evidence”, Review of Financial Studies, 15, 243-
288.

   Ang, A., and G. Bekaert, (2002a) : ”Regime Switches in Interest Rates”,
Journal of Business and Economic Statistics, 20, 163-182.

   Ang, A., and G. Bekaert, (2002b) : ”Short Rate Nonlinearities and
Regime Switches”, Journal of Economic Dynamics and Control, 26, 1243-
1274.

   Ang, A., and G. Bekaert, (2005) : ”Term Structure of Real Rates and
Expected Inflation”, Working Paper.

    Ang, A., and M. Piazzesi, (2003) : ”A No-Arbitrage Vector Autoregres-
sion of Term Structure Dynamics with Macroeconomic and Latent Vari-
ables”, Journal of Monetary Economics, 50, 745-787.

    Ang, A., Piazzesi, M., and M. Wei (2005) : ”What does the Yield Curve
tell us about GDP Growth”, forthcoming Journal of Econometrics.

   Bansal, R., and H. Zhou, (2002) : ”Term Structure of Interest Rates
with Regime Shifts”, Journal of Finance, 57, 1997-2043.

   Beaglehole, D. R., and M. S. Tenney (1991) : ”General Solutions of
Some Interest Rate Contingent Claim Pricing Equations”, Journal of Fixed
Income, September, 69-83.

  Bekaert, G. and S. Grenadier (2001) : ”Stock and Bond Pricing in an
Affine Equilibrium”, Working Paper.

   Boudoukh, J., Richardson, M., Smith, T., and R. F. Whitelaw (1999) :
”Regime Shifts and Bond Returns”, Working Paper.

   Cai, J., (1994) : ”A Markov Model of Unconditional Variance in ARCH”,
Journal of Business and Economic Statistics, 12, 309-316.

  Campbell, J., and R. J. Shiller (1991) : ”Yield Spreads and Interest Rate
Movements : A Bird’s Eye View”, Review of Economic Studies, 58, 495-514.


                                    72
  Cheng, P., and O. Scaillet, (2005) : ”Linear-Quadratic Jump-Diffusion
modeling with Application to Stochastic Volatility”, Working Paper.

   Cheridito, R., Filipovic, D., and R. Kimmel (2005) : ”Market Prices
of Risk in Affine Models : Theory and Evidence”, forthcoming Journal of
Financial Economics.

    Christiansen, C., (2002) : ”Regime Switching in the Yield Curve”, Work-
ing Paper.

   Christiansen, C., and J. Lund (2005) : ”Revisiting the Shape of the Yield
Curve : the Effect of Interest Rate Volatility”, Working Paper.

   Cochrane, J., and M. Piazzesi, (2005) : ”Bond Risk Premia”, American
Economic Review, 95/1, 138-160.

    Cox, J.C., Ingersoll, J.E. and S. A. Ross (1985) : ”A theory of the term
structure of interest rates”, Econometrica, 53/2, pp. 385-407.

   Dai, Q., and K. Singleton (2000) : ”Specification Analysis of Affine Term
Structure Models”, Journal of Finance, 55, 1943-1978.

   Dai, Q., and K. Singleton (2002) : ”Expectations Puzzles, Time-varying
Risk Premia, and Affine Model of the Term Structure”, Journal of Financial
Economics, 63, 415-441.

   Dai, Q., and K. Singleton (2003) : ”Term Structure Dynamics in Theory
and Reality”, Review of Financial Studies, 16, 631-678.

   Dai, Q., Singleton, K., and W. Yang (2005) : ”Regime Shifts in a Dy-
namic Term Structure Model of U.S. Treasury Bond Yields”, Working Paper.

  Dai, Q., Le, A., and K. Singleton (2006) : ”Discrete-time Term Structure
Models with Generalized Market Prices of Risk”, Working Paper.

   Darolles, S., Gourieroux, C. and J. Jasiak (2006) : ”Structural Laplace
Transform and Compound Autoregressive Models”, forthcoming Journal of
Time Series Analysis.

   Driffill, J., and M. Sola, (1994) : ”Testing the Term Structure of Interest
Rates using a Stationary Vector Autoregression with Regime Switching”,
Journal of Economic Dynamics and Control, 18, 601-628.


                                    73
    Driffill, J., T., Kenc, and M. Sola, (2003) : ”An Empirical Examination
of Term Structure Models with Regime Shifts”, Working Paper.

   Duarte, J. (2004) : ”Evaluating An Alternative Risk Preference in Affine
Term Structure Models”, Review of Financial Studies, 17, 379-404.

  Duffee, G. R., (2002) : ”Term Premia and Interest Rate Forecasts in
Affine Models”, Journal of Finance, 57, 405-443.

   Duffie, D., Filipovic, D. and W. Schachermayer (2003) : ”Affine processes
and applications in finance”, The Annals of Applied Probability 13, 3.

  Duffie, D., and R., Kan (1996) : ”A yield-factor model of interest rates”,
Mathematical Finance, 6, 379-406.

   Duffie, D., J. Pan, and K. Singleton (2000) : ”Transform Analysis and
Asset Pricing for Affine Jump Diffusions”, Econometrica, 68, 1343-1376.

   Evans, M. (2003) : ”Real Risk, Inflation Risk, and the Term Structure”,
The Economic Journal, 113, 345-389.

  Garcia, R. and P. Perron (1996) : ”An Analysis of Real Interest Rate
Under Regime Shifts”, Reviews of Economics and Statistics, 1, 111-125.

    Gourieroux, C., and J. Jasiak (2006) : ”Autoregressive Gamma Pro-
cesses”, forthcoming Journal of Forecasting.

  Gourieroux, C., Jasiak, J., and R. Sufana (2004) : ”A Dynamic Model for
Multivariate Stochastic Volatility : The Wishart Autoregressive Process”,
Working Paper.

    Gourieroux, C., and A. Monfort (2006) : ”Domain Restrictions on In-
terest Rates Implied by No Arbitrage”, Crest DP.

   Gourieroux, C., and A. Monfort (2006) : ”Econometric Specifications of
Stochastic Discount Factor Models”, forthcoming Journal of Econometrics.

  Gourieroux, C., Monfort, A. and V. Polimenis (2003) : ”Discrete Time
Affine Term Structure Models”, Crest DP.

    Gourieroux, C., Monfort, A. and V. Polimenis (2005) : ”Affine Model
for Credit Risk Analysis”, Crest DP.


                                   74
   Gourieroux, C., and R. Sufana, (2003) : ”Wishart Quadratic Term Struc-
ture Models”, Working Paper.

   Gray, S., (1996) : ”Modeling the Conditional Distribution of Interest
Rates as a Regime Switching Process”, Journal of Financial Economics, 42,
27-62.

    Hamilton, J. D., (1988) : ”Rational-Expectations Econometric Analysis
of Changes in Regimes - An Investigation of the Term Structure of Interest
Rates”, Journal of Economic Dynamics and Control, 12, 385-423.

   Jamshidian, F. (1989) : ”An Exact Bond Option Formula”, Journal of
Finance, 44, 205-209.

    Monfort, A., and F. Pegoraro (2006) : ”Multi-Lag Term Structure Mod-
els with Stochastic Risk Premia”, CREST DP.

   Piazzesi, M. (2003) : ”Affine Term Structure Models”, forthcoming
Handbook of Financial Econometrics.

   Polimenis, V. (2001) : ”Essays in Discrete Time Asset Pricing”, Ph. D.
Thesis, Wharton School, University of Pennsylvania.

   Vasicek, O. (1977) : ”An Equilibrium Characterization of the Term
Structure”, Journal of Financial Economics, 5, 177-188.




                                   75
76
77
78

				
DOCUMENT INFO