Efficient Estimation of Markov Models Where the Transition Density

Document Sample
Efficient Estimation of Markov Models Where the Transition Density Powered By Docstoc
					  Efficient Estimation of Markov Models Where the
                        Transition Density is Unknown
                             George J. Jiang∗ and John L. Knight†

                                     First version: March 2001
                                  This Version: December 2001‡

       George J. Jiang, Finance Department, Eller College of Business & Public Administration, University of
Arizona, P.O. Box 210108, Tucson, Arizona 85721-0108 and Finance Area, Schulich School of Business, York
University, 4700 Keele Street, Toronto, Ontario, Canada M3J 1P3. E-mail: George
J. Jiang is also a SOM research fellow at the University of Groningen in The Netherlands.
      John L. Knight, Department of Economics, University of Western Ontario, London, Ontario, Canada,
    We wish to thank Alan Rogers for helpful comments along with participants at the New Zealand Economet-
ric Study Group, Auckland, March 2001. Both authors acknowledge financial support from NSERC, Canada.
The usual disclaimer applies.
   Efficient Estimation of Markov Models Where the
            Transition Density is Unknown

   In this paper we consider the estimation of Markov models where the transition den-
sity is unknown. The approach we propose is the empirical characteristic function (ECF)
estimation procedure with an approximate optimal weight function. The approximate
optimal weight function is obtained through an Edgeworth/Gram-Charlier expansion of
the logarithmic transition density of the Markov process. Based on the ECF estimation
procedure, we derive the estimating equations which are essentially a system of mo-
ment conditions. When the approximation error of the optimal function is arbitrarily
small, the new estimation approach, which we term as the method of system of moments
(MSM), leads to consistent parameter estimators with ML efficiency. We illustrate our
approach with examples of various Markov processes. Monte Carlo simulations are per-
formed to investigate the finite sample properties of the proposed estimation procedure
in comparison with other methods.

   JEL Classification: C13, C22, C52, G10

   Key Words: Markov Process, Efficient Estimation, Empirical Characteristic Function
(ECF), Method of System of Moments (MSM), Edgeworth/Gram-Charlier Expansion
1 Introduction

The estimation of Markov models, in either continuous or discrete time, can proceed straight-
forwardly via the maximum likelihood (ML) estimation method based on the observations
of state variables when the transition density is known in closed form. However, when the
transition density is unknown alternative estimators to ML estimation need to be considered.
Various estimation methods have been proposed and applied in the literature, e.g. the quasi-
maximum likelihood (QML) method by Fisher and Gilles (1996) among others, simulation
based methods such as the simulated moments estimator (SME) by Duffie and Singleton
(1993), the indirect inference approach by Gouriéroux, Monfort and Renault (1993) and the
efficient method of moments (EMM) by Gallant and Tauchen (1996) and Gallant and Long
(1997) with applications by Andersen, Benzoni, and Lund (2001) and Chernov and Ghy-
sels (2000) among others, and the generalized method of moments (GMM) by Hansen and
Scheinkman (1995), Liu (1997), Jiang (2000) and Pan (2000), and the C-GMM by Carrasco
and Florens (2000a) that extends the GMM procedure to the case of a continuum of mo-
ment conditions, as well as the Bayesian method by Jones (1997), etc. A new approximate
maximum likelihood (AML) approach to the estimation of continuous-time diffusion mod-
els is developed recently in Aït-Sahalia (1999, 2000). The basic idea is to approximate the
transition density, after a suitable transformation of the diffusion process, by deriving a se-
quence of Hermite approximations to the transition density function and then to maximize
the approximate log-likelihood function, which is given by the sum of the logarithms of these
approximate densities.
   A number of estimation methods have also been developed in the frequency domain ex-
ploiting the analytical characteristic functions of the state variables. It is noted that, for
many Markov processes, while a closed form of the transition density is not available, the
associated conditional characteristic function (CCF) of the state variables can often be de-

rived analytically. For instance, in the continuous time finance literature, models specified
within the affine framework often have closed form CCF. This observation opens the door
to alternative estimation methods using the characteristic functions. In this regard we have
the estimators developed in Chacko and Viceira (1999), Singleton (2001), Jiang and Knight
(2001) and Carrasco, Chernov, Florens and Ghysels (2001) for continuous time diffusion
and jump-diffusion models. In these approaches the models are estimated by minimizing
some weighted distance between the empirical characteristic function (ECF) or joint ECF
and their theoretical counterparts. These methods are essentially in the framework of GMM,
albeit may involve a continuum of conditional or unconditional moment restrictions. Single-
ton (2001) also proposes to proceed with estimation using the likelihood function obtained
by Fourier inversion of the conditional characteristic function of the state variables.
   The question of efficiency is paramount in these alternative estimation techniques and
it can be shown that those based on the ECF and CCF can have efficiency arbitrarily close
to ML estimator. However, in the case of the estimator proposed in Singleton (2001) the
efficiency is related to the optimal choice of the discrete grid of points at which the ECF
and CCF are matched with the weight function being the covariance of the ECF. While in
the univariate case the choice of these points has received some attention, see Feuerverger
and Mureika (1977), Schmidt (1982), Knight and Satchell (1997), Yu (1998), and Singleton
(2001), in the multivariate case the choice is indeed an open question. To overcome some
of these problems, Jiang and Knight (2001) propose minimizing the integrated MSE (IMSE)
between the empirical characteristic function (ECF) and its theoretical counterpart. This thus
avoids the choice of points at which the matching is done. The optimal weight function used
in the estimation, however, is no longer readily available.
   The approach we propose in this paper combines the ECF technique along with an ap-
proximate optimal weight function. The basic idea stems from the fact that, due to the one-
one correspondence between the CCF and the transition density, the first-order conditions

for ML estimation can indeed be written as the sum of a weighted integral of the difference
between ECF and CCF. The optimal weight in this set-up is the inverse Fourier transform of
the tth score. Thus by approximating the logarithm of the transition density we can approx-
imate the optimal weight function and hence solve the integral to develop the appropriate
estimating equations, the solution of which leads to the approximate ML estimator. The es-
timation equations turn out to be in the form of a system of moment conditions, which is
different from the conventional method of moments (MM) or the generalized method of mo-
ments (GMM). We name this new estimation approach as the method of system of moments
or MSM. In addition to being exact and parsimonious, the method leads to consistent and
asymptotically efficient parameter estimators. When the approximate error of the optimal
weight function is arbitrarily small, the asymptotic efficiency of the estimator approaches
that of ML estimator.
   The paper is organized as follows. Section 2 will develop the ECF estimation approach
with approximate optimal weight functions, detailing the expansion for the transition density
for an N-dimensional Markov process and deriving the appropriate estimating equations. In
a series of remarks we relate our approach to various existing estimation methods, such as
GMM, QML, and the approximate ML estimator of Aït-Sahalia (1999, 2000). We show that
the estimating equations are in the form of a system of moment restrictions and specialize the
equations explicitly in the univariate case. In Section 3 we illustrate the application of our
approach with examples of various Markov processes in both discrete time and continuous
time and for both univariate and multivariate cases. In section 4, we perform Monte Carlo
simulations to investigate the finite sample properties of the proposed estimation procedure
in comparison with other methods for selected models. A brief conclusion is contained
in Section 5 along with some ideas for further research. All proofs are collected in the

2 ECF Estimation and the Method of System of Moments

Let xt ∈ RN , t > 0, be a N-dimensional stationary Markov process defined in either a dis-
crete time set or a continuous time set with a complete probability space (Ω, F, P ). Suppose
that {xt }T +1 represents an observed sample over a discrete set of time from the Markov pro-

cess. Let f (xt+1 |xt ; θ) : RN × RN × Θ → RN denote the measurable transition density for
the Markov process and θ ∈ Θ ⊂ RK denote the parameter vector of the data generating
process for xt . Following Singleton (2001), a consistent estimator of the parameters based
on the empirical characteristic function (ECF) is given by

                   1 XZ
                         w(θ, t, r|xt )(eir xt+1 − φ(r, xt+1 |xt ))dr = 0                       (1)
                   T t=1

where φ(r, xt+1 |xt ) = E[exp(ir0 xt+1 )|xt ] is the conditional characteristic function (CCF)
and w(θ, t, r|xt ) ∈ W (θ, t, r|xt ) with W (θ, t, r|xt ) being the set of “instrument" or “weight"
functions as defined in Singleton (2001). Namely, for each “instrument" or “weight" func-
tion w(θ, t, r|xt ) : Θ × R+ × RN × RN → CK where CK denotes the complex numbers, we
have w(θ, t, r|xt ) ∈ It and w(θ, t, r|xt ) =w (θ, t, −r|xt ), t = 1, 2, ..., T + 1, where It is the
σ-algebra generated by xt . The ECF estimation procedure has been proposed by Feuerverger
and Mureika (1977), Schmidt (1982), and Feuerverger and McDunnough (1981b) for i.i.d.
cases and Feuerverger and McDunnough (1981b) and Feuerverger (1990) for generic sta-
tionary Markov processes using the joint ECF of state variables.
    As shown in Feuerverger (1990), Singleton (2001) and Jiang and Knight (2001), there
exists an optimal “instrument" or “weight" function in the sense that the estimator defined in
(1) is also an efficient estimator with the same asymptotic properties as ML estimator. We
summarize these results in the following lemma.

Lemma 1 Under standard regularity conditions where w(θ, t, r|xt ) ∈ W (θ, t, r|xt ) is a

well-defined “instrument” or “weight” function 1 , equation (1) leads to consistent parameter
estimators. Furthermore, let f (xt+1 |xt ) be the transition density of the Markov process, the
following weight function is optimal

                                        1 Z Z ∂ ln f (xt+1 |xt ) −ir0 xt+1
                  w(θ, t, r |xt ) =         ..                  e          dxt+1                          (2)
                                      (2π)N           ∂θ

in the sense that equation (1) will result in parameter estimator with ML efficiency.

       Proof: See appendix:
       It is noted that the optimal weight function is determined by the logrithm of the tran-
sition density or likelihood function of the Markov process. When f (xt+1 |xt ) is explicitly
known, the Markov model can be estimated straightforwardly via ML method and it can
be shown that the estimation equation (1) will result exactly in conditional ML estimator.
However, if f (xt+1 |xt ) is not known explicitly but φ(r, xt+1 |xt ) is, then (1) must be imple-
mented with other than the optimal weight function w(θ, t, r|xt ). The solution proposed in
Singleton (2001) is to first approximate the integral in (1) with a sum over a discrete set of
r and then appeal to the GMM theory to find the approximate optimal weight function. As
shown in both Feuerverger (1990) and Singleton (2001), under certain regularity conditions
as the grid of r0 s becomes increasingly fine in RN , the asymptotic efficiency of the estimator
approaches that of ML estimator. The appeal to GMM to find the optiomal weight matrix
in essence is a GLS solution but its major drawback is the necessity to choose the vectors
r over which the sum and hence the integral is approximated. In the scalar Markov case
the choice of the points has been considered in the literature, at least in the i.i.d. case, see
Feuerverger and Mureika (1977), Schmidt (1982), Knight and Satchell (1997), Yu (1998),
and Singleton (2001). In the N-dimensional case (N ≥ 2) the problem is much more com-
plicated and there is virtually no guidance given in the literature. In general, there is an
       For the ECF based estimation procedures considered in this paper, as in Singleton (2000) we assume that
Hansen (1982)’s regularity conditions of the generalized method of moments (GMM) are satisfied.

obvious trade-off between finer grids and coarser grids. With a too coarse grid, the GMM
estimation based on selected set of moments is easier to implement but would achieve lower
asymptotic efficiency due to the loss of information. While with a too fine grid, the number
of moment conditions can be so large that the variance-covariance matrix becomes singu-
lar and the implementation of the estimation procedure turns out to be infeasible. Thus, in
practice there is a limitation to the actual implementation of very fine grid. In the context of
the estimation of mixture distributions, Carrasco and Florens (2000b) provide Monte Carlo
evidence of efficiency loss relative to the ML estimation for the ECF based GMM estimators
using discrete grid. To overcome some of these problems, Jiang and Knight (2001) propose
minimizing the integrated MSE (IMSE) between the empirical characteristic function (ECF)
and its theoretical counterpart. This thus avoids the choice of points at which the matching
is done. The optimal weight function used in the estimation, however, is no longer readily
   The approach we are proposing in this paper is to first approximate the ln f (xt+1 |xt )
which will then, via (2), give us an approximate weight function to use in (1) and hence result
in a consistent estimator. When the approximation error of the optimal weight function is
arbitrarily small, the estimator also has ML efficiency.
   In this paper, we consider series expansions for the log transition density rather than
for the density itself. In addition to the fact that the log transition density appears explic-
itly in the optimal weight function and thus is the function we aim to approximate, for a
number of other reasons better approximations may be obtained by approximating the log
transition density and then exponentiating. Since the solution of (1) requires the knowledge
of φ(r, xt+1 |xt ) or ln φ(r, xt+1 |xt ), we can use this function to develop approximations to
ln f (xt+1 |xt ). The approximation we propose is the multivariate Edgeworth/Gram-Charlier
expansion. This expansion will be taken around a multivariate normal density as an initial
first-order approximation with the mean vector and covariance matrix consistent with the

model and readily derived from ln φ(r, xt+1 |xt ). The correction terms in the expansion will
involve generalized third, fourth and higher order cumulant products as coefficients of the
Hermite polynomials. More generally, however, in order to minimize the number of correc-
tion terms, it maybe advantageous to use a first-order approximation other than the normal
density. The correction terms then involve derivatives of the approximating density function
and are not necessarily polynomials.
   Following McCullagh (1987), via the use of tensor notation, we have the general Gram-
Charlier/Edgeworth expansion for the log multivariate density ln f (xt+1 |xt ) given by

  ln f (xt+1 |xt ) = ln f0 (xt+1 |xt )                                                               (3)
                      + K i,j,k hijk (xt+1 |xt )
                      + K i,j,k,l hijkl (xt+1 |xt )
                      + K i,j,k,l,m hijklm (xt+1 |xt )
                      + [K i,j,k,l,m,n hijklmn (xt+1 |xt ) + K i,j,k K l,m,n hijk,lmn (xt+1 |xt )[10]]

where f0 (xt+1 |xt ) is chosen such that its first-order and second-order moments agree with
those of xt+1 conditional on xt . Upon letting ψ(r, xt+1 |xt ) = ln φ(−ir, xt+1 |xt ) (the cumu-
lant generating function), we have the conditional cumulants of various orders
                               λi    =          ψ(r, xt+1 |xt ) |r=0
                              λi,j   =              ψ(r, xt+1 |xt )|r=0
                                            ∂ri ∂rj
                         K i,j,k     =                  ψ(r, xt+1 |xt ) |r=0
                                            ∂ri ∂rj ∂rk
                        K i,j,k,l    =                      ψ(r, xt+1 |xt |r=0 )
                                            ∂ri ∂rj ∂rk ∂rl

It is noted that Edgeworth series used for approximations to distributions are most conve-

niently expressed using cumulants. Moreover, where approximate normality is involved,
higher-order cumulants can usually be neglected but not higher-order moments. In the
present paper, since we often deal with situations where the log characteristic functions have
simpler expression, the cumulants can be more conveniently obtained than the moments.
Furthermore, the Hermite polynomial tensors in the general Edgeworth/Gram Charlier ex-
pansion of equation (3) are given by2

                hi   = λi,j (xj − λj )

               hij   = hi hj − λi,j

              hijk   = hi hj hk − hi λj,k [3]

             hijkl   = hi hj hk hl − hi hj λk,l [6] + λi,j λk,l [3]

           hijklm    = hi hj hk hl hm − hi hj hk λl,m [10] + hi λj,k λl,m [15]

          hijklmn    = hi − hi hj hk hl λm,n [15] + hi hj λk,l λm,n [45] − λi,j λk,l λm,n [15]


In this paper, we let the initial approximating function be the multivariate normal density
with mean vector λi and covariance matrix λi,j , i.e.
                       ¯                         −1/2        1
             f0 (xt+1 ¯xt ) = (2π)−N/2 |λi,j |
                      ¯                                 exp(− (xi − λi )(xj − λj )λi,j )
                                                                          t+1                                  (4)
                                                             2 t+1

with xt+1 being a N × 1 vector whose ith element is xi , mean λi and covariance matrix

λi,j , λi,j is the inverse matrix of λi,j and |λi,j | is the determinant of the covariance matrix.
       For clarity of presentation and ease of notation and without loss of generality, in the
following discussion we focus on the Gram-Charlier series expansion and set the truncation
       In tensor notation it is understood that any index repeated once as a subscript and once as a superscript is
interpreted as sums over these repeated scripts, i.e.
                                     hi = λi,j (xj − λj ) =       λi,j (xj − λj )

etc. Also, the numbers in square brackets refer to the number of permutations of the various subscripts.

order p = 4, consequently we have,
                                                N            1 ¯ ¯
                       ln f (xt+1 |xt ) = −        ln 2π − ln ¯λi,j ¯
                                                                ¯    ¯
                                                2            2
                                               − (xi − λi )(xj − λj )λi,j
                                                1              1
                                               + K i,j,k hijk + K i,j,k,l hijkl            (5)
                                                3!             4!
                                           = ln f (xt+1 |xt ) − ln f ∆ (xt+1 |xt )

where ln f ∆ (xt+1 |xt ) is the approximation error. It is noted that the choice between Edge-
worth expansion and Gram-Chalier expansion will have no impact on the following analysis
and the derivation of the results presented in Lemma 3 as long as the expansion is based on
the cumulants. The only difference between the use of Edgeworth series and that of Gram-
Chalier series with the same order is that different cumulants or moments may appear in the
estimating equations. Similarly, the difference that the truncation order makes is the highest
order of cumulants or moments included in the estimating equations. With a higher trunca-
tion order, the expressions in Lemma 3 will be more cumbersome. The Edgeworth series
and Gram-Chalier series are formally identical when the expansion order is infinite and the
main difference is the different criteria used in collecting terms in a truncated series. The
Edgeworth series is often preferred for statistical calculations. According to Blinnikov and
Moessner (1998), while both the Edgeworth series and Gram-Chalier series diverge in many
situations of practical interest, when fitting weakly non-normal distributions better results
can often be achieved with the asymptotic Edgeworth expansion.
    Let the parameter vector to be estimated be denoted by θ ∈ Θ, then all cumulants and
the Hermite tensors are functions of θ. The approximate score function, i.e., the derivative
of the ln f (xt+1 |xt ) is given by:
        ∂ ln f (xt+1 |xt )        1 ∂ ¯ i,j ¯ ∂λi     1                     ∂λi,j
                                        ¯   ¯
                           = − ¯ i,j ¯  ¯λ ¯ +    hi − (xi − λi )(xj − λj )
               ∂θ             2 ¯λ ¯ ∂θ
                                ¯    ¯         ∂θ     2                      ∂θ
                                       "                             #
                                  1 ∂K i,j,k                ∂hijk
                                +            hijk + K i,j,k
                                  6  ∂θ                      ∂θ

                                        "                                  #
                                   1 ∂K i,j,k,l                   ∂hijkl
                                 +              hijkl + K i,j,k,l                                  (6)
                                   24  ∂θ                          ∂θ

    Using the approximate score in (6), we can define an approximate optimal weight func-
tion from (2) as

                                    1 Z Z ∂ ln f (xt+1 |xt ) −ir0 xt+1
                ω(θ, t, r |xt ) =       ..                  e          dxt+1                       (7)
                                  (2π)N           ∂θ

and hence our approximate ML estimator as the solution of (1) with w(θ, t, r|xt ) replaced by
ω(θ, t, r|xt ). That is

                    1 XZ
                          ω(θ, t, r|xt )(eir xt+1 − φ(r, xt+1 |xt ))dr = 0
                          b                                                                        (8)
                    T t=1

Manipulating the above equation as in the proof of Lemma 1, we have
              T     b
           1 X ∂ ln f (xt+1 |xt )                       b
                                                   ∂ ln f (xt+1 |xt )
                 {                (xt+1 |xt ) − E[                    (xt+1 |xt )]} = 0            (9)
           T t=1      ∂θ                                  ∂θ

When the true transition density of the Markov process f (xt+1 |xt ) is known and f (xt+1 |xt ) =
f (xt+1 |xt ), then we have E[ ∂ ln f (xt+1 |xt ) ] = 0. However, this will not necessarily be the case
if we approximate f (xt+1 |xt ), i.e. f (xt+1 |xt ) 6= f (xt+1 |xt ).

Lemma 2 Under standard regularity conditions, the approximate ML estimator defined in
(8) or (9) is consistent and asymptotically normal, i.e.

                                    √           d
                                     T (θ − θ) −→ N(0, Ω)                                         (10)

where the limiting covariance matrix Ω is given in the appendix. Furthermore, when the ap-
proximation error of the optimal weight function is arbitrarily small, the limiting covariance
matrix Ω reaches the asymptotic Cramer-Rao lower bound.

    Proof: See appendix.

   Furthermore, from the definition of the hermite polynomials we can readily establish that

                      ∂hi   ∂λi,j j               ∂λj
                          =      (x − λj ) − λi,j
                      ∂θ     ∂θ                   ∂θ
                                −                           −    −
                                          j        j         j       −
                           = λi,j (x − λ ) − λi,j λ =hi − z i
                  ∂hijk      ∂hi               ∂hi               ∂λj,k
                           =      hj hk [3] −      λj,k [3] − hi       [3]
                   ∂θ        ∂θ                 ∂θ                ∂θ
                  ∂hijkl     ∂hi                       ∂λk,l       ∂hi
                           =      hj hk hl [4] − hi hj       [6] −      hj λk,l [12]
                   ∂θ        ∂θ                         ∂θ         ∂θ
                             +        λk,l [6]

   Substituting these derivatives into the expansion given by (6) and taking expectations we
can derive the appropriate estimating equations, which are stated in the following Lemma.

Lemma 3 For an N-dimensional Markov process with known conditional characteristic
function associated with an unknown transition density, following the ECF estimation proce-
dure with approximate optiomal weight function the use of a Gram-Charlier approximation
for the unknown transition density as in (5) results in the following estimating equations.
                                    1 X ∂λi
                                                 1                               ∂λi,j
                                             hi − [(xi − λi )(xj − λj ) − λi,j ]
                                    T t=1 ∂θ     2                                ∂θ
             1 ∂K i,j,k                                       −             −
           +              (hijk − E(hijk |xt ) + 3K i,j,k [(hi hj hk − E[hi hj hk |xt ])
             6      ∂θ
                                    −                                 −           ∂λj,k
                                 − z i (hj hk − E[hj hk |xt ]) − hi λj,k − hi           ]
      1 ∂K i,j,k,l                                        −               −
    +              (hijkl − E(hijkl |xt )) + K i,j,k,l [4(hi hj hk hl − E[hi hj hk hl |xt ])
      24   ∂θ
                       −                                         −              −
                  −4 z i (hi hk hl − E[hj hk hl |xt ]) − 12(hi hj λk,l − E[hi hj λk,l |xt ])
                                          −                                       ∂λk,l
                                    −12   zi   hj λk,l − 6(hi hj − E[hi hj |xt ])              =0   (11)

If θ is of dimension K then there will be K such equations, the solution of which will lead to
approximate ML estimation.

   Proof : See Appendix.

   Remark 1 As we have mentioned, there are various approaches to approximate proba-
bility density functions. In this paper we use the widely used multivariate Edgeworth/Gram-
Charlier expansion for the approximation of the logarithmic transition density. Still many
choices remain flexible, such as how many terms should be used in the expansion, what is
choice of the initial approximating density, etc. In practice, most of these issues have to be
dealt on a case by case basis. One difficulty associated with the infinite series expansion of
the density or log density function is that it is very likely divergent. In the univariate case,
when the state variable is the standardized sum of N independent and identically distributed
random variables, Feller (1971) gives conditions ensuring the validity of the Edgeworth ex-
pansions for the density and log density. Similar conditions dealing with the multivariate
case are given by Barndorff-Nielsen and Cox (1979).
   Remark 2 Our method is similar but different than the conventional method of moments
(MM) or the generalized method of moments (GMM). Our estimating equations turn out
to be based also on moment restrictions, but may involve more moments than in the case
of MM estimation. Moreover, the dimension of the estimating equations equals that of the
parameter vector instead of the number of moments used as in GMM.
   Remark 3 When applied to the univariate diffusion process, our method is similar to
the approximate ML estimator of Aït-Sahalia (2000). The approximate MLE in Aït-Sahalia
(2000) is developed based on an alternative expansion of ln f (xt+1 |xt ) after first convert-
ing the original diffusion process to a unit diffusion process. The method is based on the
following estimating equation
                                       X       b
                                          ∂ ln f
                                                 (xt+1 |xt ) = 0.
                                       t=1 ∂θ
                                 ·                      ¸
                                     ∂ ln f
Since it is very likely that E        ∂θ
                                            (xt+1   |xt ) 6= 0 due to approximation error, the estimators
can be inconsistent.
   Remark 4 (Consistency of QML) If one was merely to approximate f (xt+1 |xt ) by a

normal density, i.e. only the initial approximating density and f (xt+1 |xt ) 6= f (xt+1 |xt ), the
approach is essentially the quasi maximum likelihood (QML) estimation. In this case we do
                                   "                  #
                                   ∂ ln f
                                 E        (xt+1 |xt ) = 0.
which shows the consistency of quasi maximum likelihood (QML) estimation.
      Remark 5 While the estimating equations in the general case are cumbersome, in the
univariate case they collapse into the well-known method of moments as we will now illus-
trate, albeit the moment restrictions are a system of conventional moment conditions. In the
univariate case we essentially just let i = j = k = l = 1 and thus drop the supersript index

                                 λ11 = K2

                                  h1 = (xt+1 − K1 )/K2

                                 h11 = h2 = h2 − 1/K2

                                h111 = h3 − 3h1 /K2

                                h1111 = h4 − 6h2 /K2 + 3/K2
                                         1     1

                             ∂h1       ∂K2 /∂θ ∂K1 /∂θ
                                 = −h1        −
                             ∂θ          K2      K2
                              ∂h1       ∂K2 /∂θ      ∂K1 /∂θ
                           h1     = −h2
                                      1         − h1
                              ∂θ          K2           K2
                              ∂h1       ∂K2 /∂θ      ∂K1 /∂θ
                            1     = −h3
                                      1         − h2
                              ∂θ          K2           K2
                              ∂h1       ∂K2 /∂θ      ∂K1 /∂θ
                            1     = −h4
                                      1         − h3
                              ∂θ          K2           K2
and since

                                       E(h1 |xt ) = 0

                                    E(h2 |xt ) = K2

                                   E[h11 |xt ] = 0
                                  E[h111 |xt ] = K3 /K2


                     E[h1111 |xt ] = E[h4 |xt ] − 6E[h2 |xt ]/K2 + 3/K2
                                        1             1

                                  = K4 /K2

The estimating equations collapse to:
                                        1 X ∂K1 1 2
                                              h1    + (h1 − 1/K2 )∂K2 /∂θ
                                        T t=1    ∂θ  2
                                        "                                     #
                                     1 ∂K3 3              3    3∂K3
                                   +          (h1 − K3 /K2 ) −       h1 /K2
                                     6 ∂θ                       ∂θ
             K3          3 ∂K2 /∂θ                    ∂K1 /∂θ        ∂K2 /∂θ
            − (h3 − K3 /K2 )
                 1                     + (h2 − 1/K2 )
                                            1                  − 2h1     2
             2                 K2                        K2            K2
                      "    Ã                     !                          #
                   1 ∂K4      4    (K4 + 3K2 )        ∂K4 2
                 +           h1 −                  −6      (h − 1/K2 )/K2
                   24 ∂θ                K2 4
                                                       ∂θ 1
                    Ã    Ã             !!               Ã         !
                K4     4   K4 + 3K2         ∂K2 /∂θ        3   K3 ∂K1 /∂θ
              − 4 h1 −           4
                                                    + 4 h1 − 3
                24             K2             K2               K2       K2
                                                 ∂K2 /∂θ        ∂K1 /∂θ
                              −18(h2 − 1/K2 )
                                     1                   − 12h1      2
                                                                         =0              (12)
                                                   K2             K2

Again the derivatives are taken with respect to all elements in the K-dimensional parameter
vector θ.
      Remark 6 The above estimating equations can be readily put into a more recognizable
form by combining coefficients on (hj − E(hj |xt )), j = 1, 2, 3, 4(p = 4). Letting Ai be
                                   1      1                                         jt

the appropriate coefficient on (hj − E(hj |xt )), j = 1, 2, 3, 4(p = 4), associated with the
                                1      1

derivative with respect to the ith element of θ, we have the estimating equations given by:
                                           At gt = 0
                                     T t=1

where At is a K × 4(p = 4) matrix with the ith row being associated with        ∂
                                                                                     and gt is a
4 × 1(p = 4) vector given by
                                                          
                                                          
                                                          
                                         h2 − 1/K2
                                          1                
                            g=                            
                                         h3 − K3 /K2
                                                  2    4
                                     h4 − (K4 + 3K2 )/K2

More specifically, we have

                      ∂K1 ∂K3 /∂θ         ∂K2 /∂θ     K4
              A1t =         −        + K3     2
                                                   +    2
                                                          ∂K1 /∂θ
                       ∂θ     2K2           K2       2K2
                      ∂K2 /∂θ   K3              1 ∂K4 3K4
              A2t   =         −     ∂K1 /∂θ −           +     2
                                                                ∂K2 /∂θ
                          2     2K2            4K2 ∂θ      4K2
                      ∂K3 /∂θ   K3              K4
              A3t   =         −     ∂K2 /∂θ −      ∂K1 /∂θ
                          6     2K2            6K2
                      ∂K4 /∂θ   K4
              A4t   =         −     ∂K2 /∂θ
                         24     6K2

   Remark 7 In the estimating equation (11) for the multivariate case and (12) for the uni-
variate case, we implicitly assume that various moments or cumulants are known in analyti-
cal form. When the moments or cumulants exist but are unavailable in analytical form, path
simulation of the Markov process can be used to generate the moments or cumulants used in
the estimation procedure. The estimation would be in the framework of Duffie and Singleton
(1993) and under the regularity conditions there we have both consistency and asymptotic
normality of the parameter estimators.

3 Illustrative Examples of Markov Processes

Example 1: The Ornstein-Uhlenbeck Process (Equivalence to MLE) The O-U process is a
univariate diffusion specified by the following stochastic differential equation:

                               dxt = β(α − xt )dt + σdwt                                   (13)

where wt is a standard Brownian motion. Its discrete time representation is a AR(1) process
with Gaussian error,
                                  xt+1 = α + (xt − α)e−β + ²t+1
where ²t+1 ∼ N(0, 2β (1 − e−2β ). The Ornstein-Uhlenbeck process or the discrete time

AR(1) process has a normal transition density function given by

                                        1         (xt+1 − α − (xt − α)e−β )2
                      f (xt+1 |xt ) = √     exp{−                            }
                                       2πs2                  2s2
              σ2                                                                  PT
where s2 =    2β
                 (1   − e−2β ). The conditional likelihood is given by ln L =          t=1   ln f (xt+1 |xt )
and maximize the likelihood function leads to the ML estimator.
      As a member of the affine class of diffusions, the conditional characteristic function of
the O-U process is given by

                                                                 r2 σ 2
               φ(r, xt+τ |xt ) = exp{ir(α + (xt − α)e−β ) −             (1 − e−2β )}

or the cumulant generating function

                                                        −β      r2 σ 2
                ln φ(−ir, xt+τ |xt ) = r(α + (xt − α)e       )+        (1 − e−2β )

The conditional cumulants can be easily derived as

                                    K1 = (α + (xt − α)e−β )
                                    K2 =        (1 − e−2β )


                                       Ki = 0, ∀i ≥ 3

Substituting the cumulants into the estimating equation, we have
                               1 X ∂K1 1 2          1 ∂K2
                                     (h1    + (h1 −   )   )=0
                               T t=1     ∂θ  2      K2 ∂θ

where h1 = (xt+1 − K1 )/K2 , θ = (α, β, σ). It is straightforward to verify that this is
equivalent to the ML estimation.
    Example 2: The Square-Root Diffusion Process: (Continuous-Time Process) The square-
root process is a univariate diffusion specified by the following stochastic differential equa-

                             dxt = β(α − xt )dt + σ xt dwt                                (14)

This is a also member of the affine class of diffusions and has the following conditional
characteristic function associated with its transition density

                                            ir −(q+1)        ire−βτ
                   φ(r, xt+τ |xt ) = (1 −      )      exp{            xt }
                                             c             (1 − ir/c)

or the cumulant generating function

                                                      r      re−βτ
                ln φ(−ir, xt+τ |xt ) = −(q + 1) ln(1 + ) −           xt
                                                      c    (1 + r/c)
where c = 2β/(σ 2 (1−e−βτ )), q =     σ2
                                           −1. The following four cumulants are easily derived.

                    K1 = α(1 − e−βτ ) + xt e−βτ
                           ασ 2                xt σ 2 −βτ
                    K2 =        (1 − e−βτ )2 +       e (1 − e−βτ )
                            2β                  β
                           ασ 4                3xt σ 4 −βτ
                    K3   =      (1 − e−βτ )3 +        e (1 − e−βτ )2
                           2β 2                 2β 2
                           3ασ 6                3xt σ 6 −βτ
                    K4   =       (1 − e−βτ )4 +        e (1 − e−βτ )3
                            4β 3                   β3

    From the estimating equations detailed above, it is clear we require the derivatives of
these cumulants with respect to the elements in the parameter vector θ, where θ = (α, β, σ 2 ).
The necessary derivatives can be easily derived and are included in the Appendix.
    Other Examples

4 Monte Carlo Simulations

4.1 Model I: The continuous-time square-root diffusion process

The square-root diffusion process is specified in (14). The parameter values are set as α =
0.075, β = 0.80, σ = 1.00, which are close to the estimates of interest rate process using
historical U.S. 3-month Treasury bill yields. The choice of parameter values gives an integer
value of the degree of freedom for the non-central chi-square transition density function and
makes it feasible to generate exact sampling path. Thus, there is no approximation error
involved in the path simulation and differences between different estimates are entirely due
to the different estimation methods.
   We set two sampling intervals, i.e. ∆ = 1/4, 1 with sample size T = 250, 500. In each
sampling path simulation, the first 200 observations are discarded to mitigate the start-up
effect. The number of replications in the Monte Carlo simualtion is 1000. The estimation
methods we consider include the empirical characteristic function based method of system of
moments (ECF/MSM) we developed in this paper, the GMM based on the continuous-time
model, the GMM based on discretized model, the MLE and QMLE based on continuous-
time model.

4.1.1 GMM estimation based on the Continuous-Time Model

The same moment conditions as in Chan, Karolyi, Longstaff and Sanders (1992) are used for
GMM, except that the moment conditions are exact in the sense that they are derived from
the continuous-time model, i.e.
                                                                  
                             ft (θ) =                             
                                            t   −   E[²2 |xt−1 ]

where t = 1, 2, ..., T, θ = (α, β, σ) and the lagged variable is used as instrumental variable
in the estimation, where ²t = ∆xt − E[∆xt |xt−1 ] with

                    E[∆xt |xt−1 ] = (1 − e−β )(α − xt−1 )
                                        ν 2 −β
                                          σ                  αν 2
                       E[²2 |xt−1 ] =
                          t                 (e − e−2β )xt−1 + σ (1 − e−β )2                 (15)
                                        β                    2β

These moment conditions correspond to transitions over a unit period and are not subject to
discretization bias.

4.1.2 GMM estimation based on the Discretized Model

In financial economics literature, estimation of the square-root process using GMM often
consists in first discretizing the continuous-time model, then deriving the moment condi-
tions based on the discrete-time model. The moment conditions used in the literature are as
                                                               
                                   ft (θ) =                    
                                                ²2 − σ 2 xt−1

also with the lagged variable as instrumental variable in the estimation of the squared-root
process, where ²t = ∆xt − β(α − xt−1 ). It is clear that these moment conditions are different
from those derived from the continuous-time model.

4.1.3 ML estimation based on the Continuous-Time Model

Solving from the Kolmogrov backward (or Fokker-Planck) equation or from the conditional
characteristic function via Fourier inversion, the transition density function of the square-root
process can be obtained as

                           f (xt |xt−1 ) = ce−u−ν ( )q/2 Iq (2(uν)1/2 )                     (16)

with xt taking nonnegative values, where c = 2β/(σ 2 (1 − e−βτ )), u = cxt−1 e−βt , ν =
cxt , q =    σ2
                  − 1, and Iq (·) is the modified Bessel function of the first kind of order q.

The transition density function is non-central chi-square, χ2 [2cxt ; 2q + 2, 2u], with 2q + 2
degrees of freedom and parameter of noncentrality 2u proportional to the current level of
the stochastic process. If the process displays the property of mean reversion (β > 0), the
process is stationary and its marginal distribution can be derived from the transition density,
                                                                 ω s s−1 −ωxt
which is a gamma probability density function, i.e., g(xt ) =       x e
                                                                Γ(s) t
                                                                                where ω = 2β/σ 2
and s = 2αβ/σ 2 , with mean α and variance ασ 2 /2β.

4.1.4 QML estimation based on the Continuous-Time Model

The QMLE is based on the conditional mean and variance as well as the unconditional mean
and variance of the square-root process. The conditional mean and variance are given in (15)
and the unconditional mean and variance are given in section 4.1.3.
   The simulation results for alternative estimators are reported in Table 1 for different sam-
pling intervals and sample sizes. Overall, the ECF/MSM estimator performs closely as well
as the ML estimator and consistently better than other estimators. For simulations with dif-
ferent sample size and parameter values, the relative performance of alternative estimators is
similar to those reported in Table 1.

5 Conclusion

In this paper we have developed a new estimator for Markov models by combining an ap-
proximation to the transition density along with the first-order conditions associated with the
ECF estimation approach. The estimator is guaranteed to be consistent and the asymptotic
efficiency of the estimator approaches that of the exact ML estimator when the approxima-
tion error of the optimal function is arbitrarily small.
   We are currently pursuing an extensive Monte Carlo study to ascertain both the accuracy
of the approximation of the transition density as well as that of the estimator itself.

       Table 1: Monte Carlo Simulation Results of Alternative Estimation Methods
Panel A: Sampling Interval ∆ = 1/4, Sample Size = 250
parameter   estimation method   mean     median   st dev    m.s.e   (95 percentiles)
   α           ECF/MSM          0.0748   0.0746   0.0043   0.0043   (0.0671 0.0829)
 (0.075)          ML            0.0748   0.0746   0.0043   0.0043   (0.0672 0.0840)
                  QML           0.0748   0.0746   0.0043   0.0043   (0.0672 0.0838)
                 GMM            0.0747   0.0746   0.0044   0.0044   (0.0667 0.0839)
                 dGMM           0.0746   0.0746   0.0044   0.0044   (0.0669 0.0837)
   β           ECF/MSM          0.8925   0.8797   0.2002   0.2205   (0.5535 1.3618)
 (0.800)          ML            0.8958   0.8803   0.1991   0.2208   (0.5486 1.3545)
                  QML           0.8929   0.8790   0.2089   0.2285   (0.5221 1.3691)
                 GMM            0.8840   0.8692   0.2046   0.2211   (0.5465 1.3476)
                 dGMM           0.7718   0.7635   0.1582   0.1606   (0.5015 1.1180)
   σ           ECF/MSM          0.0995   0.0994   0.0050   0.0051   (0.0894 0.1084)
 (0.100)          ML            0.0994   0.0994   0.0051   0.0051   (0.0896 0.1100)
                  QML           0.0712   0.0712   0.0038   0.0291   (0.0642 0.0792)
                 GMM            0.0997   0.0996   0.0055   0.0055   (0.0894 0.1103)
                 dGMM           0.0897   0.0898   0.0048   0.0113   (0.0805 0.0990)
Panel B: Sampling Interval ∆ = 1/4, Sample Size = 500
parameter   estimation method   mean     median   st dev    m.s.e   (95 percentiles)
   α           ECF/MSM          0.0748   0.0747   0.0031   0.0031   (0.0686 0.0802)
 (0.075)          ML            0.0748   0.0746   0.0031   0.0031   (0.0687 0.0811)
                  QML           0.0748   0.0746   0.0031   0.0031   (0.0688 0.0811)
                 GMM            0.0747   0.0746   0.0031   0.0031   (0.0687 0.0810)
                 dGMM           0.0747   0.0746   0.0031   0.0031   (0.0686 0.0809)
   β           ECF/MSM          0.8508   0.8420   0.1374   0.1465   (0.6133 1.1521)
 (0.800)          ML            0.8517   0.8427   0.1370   0.1464   (0.6094 1.1480)
                  QML           0.8453   0.8337   0.1450   0.1519   (0.5921 1.1606)
                 GMM            0.8428   0.8301   0.1403   0.1466   (0.6016 1.1400)
                 dGMM           0.7427   0.7352   0.1101   0.1241   (0.5497 0.9692)
   σ           ECF/MSM          0.0995   0.0991   0.0034   0.0035   (0.0922 0.1050)
 (0.100)          ML            0.0990   0.0990   0.0035   0.0036   (0.0924 0.1056)
                  QML           0.0709   0.0709   0.0026   0.0292   (0.0659 0.0759)
                 GMM            0.0998   0.0997   0.0037   0.0037   (0.0927 0.1074)
                 dGMM           0.0903   0.0903   0.0032   0.0103   (0.0839 0.0966)
 Panel C: Sampling Interval ∆ = 1, Sample Size = 250
 parameter   estimation method    mean    median    st dev    m.s.e    (95 percentiles)
     α          ECF/MSM          0.0749   0.0749   0.0023    0.0023    (0.0705 0.0793)
  (0.075)           ML           0.0749   0.0749   0.0022    0.0022    (0.0709 0.0796)
                   QML           0.0749   0.0749   0.0022    0.0022    (0.0709 0.0796)
                   GMM           0.0750   0.0749   0.0023    0.0023    (0.0708 0.0797)
                  dGMM           0.0748   0.0748   0.0023    0.0023    (0.0706 0.0793)
     β          ECF/MSM          0.8333   0.8236   0.1424    0.1462    (0.5896 1.1490)
  (0.800)           ML           0.8385   0.8246   0.1364    0.1417    (0.5955 1.1497)
                   QML           0.8376   0.8203   0.1473    0.1520    (0.5883 1.1710)
                   GMM           0.8327   0.8200   0.1438    0.1474    (0.5861 1.1631)
                  dGMM           0.5402   0.5383   0.0578    0.2661    (0.4292 0.6599)
     σ          ECF/MSM          0.1003   0.0997   0.0067    0.0067    (0.0881 0.1137)
  (0.100)           ML           0.1002   0.1000   0.0066    0.0066    (0.0883 0.1138)
                   QML           0.0714   0.0711   0.0050    0.0291    (0.0625 0.0818)
                   GMM           0.1001   0.0997   0.0070    0.0070    (0.0874 0.1143)
                  dGMM           0.0697   0.0697   0.0038    0.0306    (0.0623 0.0773)
 Panel D: Sampling Interval ∆ = 1, Sample Size = 500
 parameter   estimation method    mean    median    st dev    m.s.e    (95 percentiles)
     α          ECF/MSM          0.0750   0.0749   0.0015    0.0015    (0.0718 0.0779)
  (0.075)           ML           0.0749   0.0748   0.0016    0.0016    (0.0718 0.0781)
                   QML           0.0749   0.0748   0.0016    0.0016    (0.0718 0.0781)
                   GMM           0.0749   0.0749   0.0016    0.0016    (0.0718 0.0781)
                  dGMM           0.0748   0.0748   0.0016    0.0016    (0.0715 0.0780)
     β          ECF/MSM          0.8145   0.8131   0.0991    0.1002    (0.6452 1.0244)
  (0.800)           ML           0.8212   0.8146   0.0977    0.0999    (0.6531 1.0294)
                   QML           0.8173   0.8113   0.1051    0.1065    (0.6329 1.0490)
                   GMM           0.8151   0.8108   0.0998    0.1009    (0.6451 1.0286)
                  dGMM           0.5350   0.5345   0.0413    0.2682    (0.4578 0.6205)
     σ          ECF/MSM          0.0998   0.0996   0.0047    0.0047    (0.0907 0.1090)
  (0.100)           ML           0.0997   0.0996   0.0047    0.0047    (0.0911 0.1093)
                   QML           0.0710   0.0708   0.0036    0.0293    (0.0644 0.0784)
                   GMM           0.0999   0.0997   0.0049    0.0049    (0.0906 0.1098)
                 dGMM         0.0702 0.0701 0.0026 0.0300 (0.0651 0.0754)
Note: dGMM is the GMM estimation based on the discretized model, all other methods are based on
the continuous-time model.

    Aït-Sahalia, Y. (1999), “Transition Densities for Interest Rate and Other Nonlinear Dif-
fusions,” Journal of Finance 54, 1361-1395.
            (2000), “Maximum-liklihood Estimation of Discretely Sampled Diffusions: A
Closed Form Approach,” Econometrica, forthcoming, 2001.
    Andersen, T. G., L. Benzoni, and J. Lund (2001), “Estimating jump-diffusions for equity
returns” [Working paper, Northwestern University].
    Barndorff-Nielsen, O.E. and D.R. Cox (1979), “Edgeworth and saddle-point approxima-
tions with statistical applications," J. Roy. Statist. Soc. B, 41,279-312.
    Blinnikov,S. and R. Moessner (1998), “Expansions for nearly Gaussian distributions",
Astron. Astrophys. Suppl. Ser., 130, 193-205.
    Carrasco, M., M. Chernov, J. Florens, and E. Ghysels (2001), “Estimating diffusions with
a continuum of moment conditions” [Working paper, University of North Carolina].
    Carrasco, M., J. Florens (2000a), “Generalization of GMM to a continuum of moment
conditions”, Econometric Theory, 16, 797-834.
    Carrasco, M., J. Florens (2000b), “Efficient GMM estimation using the empirical char-
acteristic function”, [Working paper, CREST, Paris].
    Chacko, G. and L. M. Viceira (1999), “Spectral GMM Estimation of Continuous-Time
Processes,” [Working Paper, Graduate School of Business Administration, Harvard Univer-
    Chen, R., L. Scott (1985), “Maximum likelihood estimation for a multifactor equilibrium
model of the term structure of interest rates.” Journal of Fixed Income, 3, 14-31.
    Chernov, M. and E. Ghysels (2000), “A study toward a unified approach to the joint
estimation of objective and risk-neutral measures for the purposes of options valuation”,
Journal of Financial Economics, 56, 407-458.

   Cox, J. C., J. E. Ingersoll and S. A. Ross (1985), “A Theory of the Term Structure of
Interest Rates,” Econometrica, 53, 385-407.
   Duffie, D. and K.J. Singleton (1993), “Simulated moments estimation of Markov models
of asset prices”, Econometrica, 61, 929-952.
   Feller, W. (1971), An Introduction to Probability Theory and Its Applications 2, New
York: Wiley.
   Feuerverger, A. (1990), “An efficiency result for the empirical characteristic function in
stationary time-series models”, The Canadian Journal of Statistics, 18, 155–161.
   Feuerverger, A. and P. McDunnough (1981a), “On some Fourier methods for inference.”,
Journal of the American Statistical Association, 76, 379-141.
   Feuerverger, A. and P. McDunnough (1981b), “On the efficiency of empirical charateris-
tic function procedures”, Journal of the Royal Statistical Society, Series B 43, 20-27.
   Feuerverger, A. and R. A. Mureika (1977), “The empirical characteristic function and its
applications”, The Annals of Statistics, 5, 88–97.
   Fisher, M. and C. Gilles (1996), “Estimating exponential affine models of the term struc-
ture.” [Working Paper].
   Fisher, A. and R.A. Mureika (1977), “The empirical characteristic function and its appli-
cations”, The Annals of Statistics, 5, 88–97.
   Gallant, A. R. and J. R. Long (1997), “Estimation stochastic differential equation effi-
ciently by minimum Chi-square,” Biometrika, 84, 125-141.
   Gallant, A. R and G. E. Tauchen (1996), “Which moments to match?”, Econometric
Theory, 12, 657–681.
   Gouriéroux, C. and Monfort, A. and Renault, E. (1993), “Indirect Inference”, Journal of
Applied Econometrics, 8, S85–S199.
   Hansen, L.P. and J.A. Scheinkman (1995), “Back to the Future: Generating Moment
Implications for Continuous-time Markov processes,” Econometrica, 63, 767-804.

   Jiang, G. J. and J. L. Knight (2001), “Estimation of Continuous Time Processes Via the
Empirical Characteristic Function,” Journal of Business & Economic Statistics, forthcoming.
   Jones, C.S. (1997), “Bayesian analysis of the short-term interest rate,” [Working paper,
The Wharton School, University of Pennsylvania.]
   Knight, J. L. and S. E. Satchell (1997), “The Cumulant Generating Function Estimation
Method,” Econometric Theory, 13, 170-184.
   Liu, J. (1997), “Generalized method of moments estimation of affine diffusion pro-
cesses.” [Working Paper, Graduate School of Business, Stnaford University].
   McCullagh, P. (1987), Tensor Methods in Statistics, Chapman and Hall, London.
   Pearson, N. D. and T. Sun (1994), “Exploiting the conditional density in estimating the
term structure: an application to the Cox, Ingersoll, and Ross model”, Journal of Finance,
XLIX(4), 1279-1304.
   Schmidt, P. (1982), “An Improved Version of the Quandt-Ramsey MGF Estimator for
Mixtures of Normal Distributions and Switching Regressions,” Econometrica, 50, 501-524.
   Singleton, K. J. (2001), “Estimation of Affine Asset Pricing Models Using the Empirical
Characteristic Function,” Journal of Econometrics, 102, 114-141.
   Yu, J. (1998), “Empirical Characteristic Function in Time Series Estimation and a Test
Statistic in Financial Modelling,” [unpublished Ph.D dissertation].


Proof of Lemma 1: From (2), we have
                        ∂ ln f                  0
                               (xt+1 |xt ) = eir xt+1 w(θ, t, r|xt )dr

                                  w(θ, t, r|xt )φ(r, xt+1 |xt )dr
                              Z                    Z
                        =         w(θ, t, r|xt )       eir xt+1 f (xt+1 |xt )dxt+1 dr
                              Z Z
                        =         ( w(θ, t, r|xt )eir xt+1 dr)f (xt+1 |xt )dxt+1
                        = E[ eir xt+1 w(θ, t, r|xt )dr]
                                  ∂ ln f (xt+1 |xt )
                        = E[                         ]

Thus the estimating equations (1) lead to
                     1 X ∂ ln f                  ∂ ln f
                          [     (xt+1 |xt ) − E[        (xt+1 |xt )]] = 0.
                     T t=1 ∂θ                     ∂θ

which is equivalent as ML estimation.
      Proof of Lemma 2: Firstly, from (8) or (9), we note immediately from Singleton (2001)
that our estimator is merely a GMM estimator and as such will be asymptotically normally
distributed. Secondly, denote equation (8) as
                                         H(θ) =       ht (θ) = 0
                                                T t=1

under certain regularity conditions, we have that the asymptotic variance-covariance matrix
Ω is given by
                                         Ω = D(θ)−1 Σ(θ)D(θ)−1
                      PT      ∂ht (θ)
with D(θ) = plim T
                        t=1     ∂θ
                                        and Σ(θ) = plimH(θ)H(θ)0 .

    Proof of Lemma 3: The results in Lemma 3 follow immediately from the substitution of
equation (6) into equation (9). Alternatively, the estimating equation (11) can be derived by
subsituting the approximating weight function into equation (8) and apply the definition of
    The Square-Root Diffusion Process: Derivatives of Cumulants with respect to the Pram-
                     = (1 − e−βτ )
              ∂K2      σ2
                     =    (1 − e−βτ )2
               ∂α      2β
              ∂K3       σ4
                     =      (1 − e−βτ )3
               ∂α      2β 2
              ∂K4      3σ 6
                     =      (1 − e−βτ )4
               ∂α      4β 3
                     = τ (α − xt )e−βτ
              ∂K2      ασ 2
                     =      (1 − e−βτ )((1 + 2βτ )e−βτ − 1)
               ∂β      2β 2
                         xt σ 2
                       + 2 e−βτ ((1 + 2βτ )e−βτ − 1 − βτ )
              ∂K3      ασ 4       −βτ 2
                     =    3 (1 − e    ) ((2 + 3βτ )e−βτ − 2)
               ∂β      2β
                         3xt σ 4 −βτ
                       +      3 e    (1 − e−βτ )((2 + 3βτ )e−βτ − 2 − βτ )
              ∂K4      3ασ 6       −βτ 3
                     =     4 (1 − e     ) ((3 + 4βτ )e−βτ − 3)
               ∂β       4β
                         3xt σ 6
                       + 4 e−βτ (1 − e−βτ )2 ((3 + 4βτ )e−βτ − 3 − βτ )
                     = 0
              ∂σ 2
                     = K2 /σ 2
              ∂σ 2
                     = 2K3 /σ 2
              ∂σ 2
                     = 3K4 /σ 2
              ∂σ 2


Shared By:
Description: Efficient Estimation of Markov Models Where the Transition Density ...