Introduction to Time Series Regression by cPitDSc


									                              Introduction to Time Series Regression

        Time series data is extremely common and forecasting is perhaps the most employable
sub discipline in economics. It is also a field that has seen a revolution in the last 20 years.
Better yet, data is readily available making paper writing much easier. Furthermore, the
procedures the profession wants to see have become more clearly defined – there is a template
students can follow. Within the discipline imposed by the template there is still plenty of room
for creativity.
        This handout loosely follows the content of chapter 12 of Stock and Watson. We begin
by examining the simplest time series regression - a first order autoregression. This simply
means we regress a variable on one of its lags.

 .16                                                                                 It is generally a good
                                                                             idea to plot the data you will
                                                                             be studying. This plot of
 .12                                                                         inflation was generated in
                                                                             eviews from the commands:
 .08                                                                         Create q 46.1 2004.1
                                                                             Cfetch pzunew
                                                                             Genr inf =
                                                                             Plot inf
                                                                      The graph shows gradually
 -.04                                                                 rising inflation until 1980 or
                                                                      so and falling thereafter. This
                                                                      corresponds to a well-known
 -.08                                                                 structural change in monetary
        1950       1960      1970     1980     1990       2000        policy. On Oct 6, 1979 Paul
                                                                      Volcker announced the Fed
                                                                      would control the money
                                                                      supply and not interest rates
in an effort to bring inflation under control. The first order autoregressive model of inflation is
LS d(inf) c d(inf(-1))
Dependent Variable: D(INF)
Method: Least Squares
Date: 04/15/03 Time: 14:06
Sample(adjusted): 1947:4 2000:1
Included observations: 210 after adjusting endpoints
       Variable        Coefficient    Std. Error       t-Statistic       Prob.
           C            -0.000463     0.001629     -0.284402           0.7764
       D(INF(-1))       -0.182134     0.067176     -2.711316           0.0073
R-squared                0.034136    Mean dependent var              -0.000437
Adjusted R-squared       0.029492    S.D. dependent var               0.023961
S.E. of regression       0.023605    Akaike info criterion           -4.645261
Sum squared resid        0.115894    Schwarz criterion               -4.613384
Log likelihood           489.7524    F-statistic                      7.351235
Durbin-Watson stat       2.151828    Prob(F-statistic)                0.007262
We differenced the data for reasons explained later.
A fourth order autoregressive model is
LS d(inf) c d(inf(-1)) d(inf(-2)) d(inf(-3)) d(inf(-4))
Dependent Variable: D(INF)
Method: Least Squares
Date: 04/15/03 Time: 14:10
Sample(adjusted): 1948:3 2000:1
Included observations: 207 after adjusting endpoints
       Variable        Coefficient    Std. Error       t-Statistic       Prob.
          C             -0.000363     0.001504     -0.241720           0.8092
      D(INF(-1))        -0.202177     0.070248     -2.878045           0.0044
      D(INF(-2))        -0.372561     0.070958     -5.250449           0.0000
      D(INF(-3))         0.124972     0.070586      1.770475           0.0782
      D(INF(-4))        -0.044090     0.069287     -0.636335           0.5253
R-squared                0.200928    Mean dependent var              -0.000219
Adjusted R-squared       0.185105    S.D. dependent var               0.023948
S.E. of regression       0.021618    Akaike info criterion           -4.806693
Sum squared resid        0.094405    Schwarz criterion               -4.726193
Log likelihood           502.4928    F-statistic                      12.69832
Durbin-Watson stat       1.990404    Prob(F-statistic)                0.000000

Adding unemployment produces an autoregressive distributed lag model:
LS d(inf) c d(inf(-1)) d(inf(-2)) d(inf(-3)) d(inf(-4)) lhur(-1) lhur(-2) lhur(-3) lhur(-4)
Dependent Variable: D(INF)
Method: Least Squares
Date: 04/15/03 Time: 15:13
Sample(adjusted): 1950:1 2000:1
Included observations: 201 after adjusting endpoints
       Variable        Coefficient    Std. Error       t-Statistic       Prob.
          C              0.006672     0.005521      1.208511           0.2283
      D(INF(-1))        -0.338561     0.071643     -4.725677           0.0000
      D(INF(-2))        -0.431820     0.075299     -5.734727           0.0000
      D(INF(-3))         0.017905     0.072459      0.247104           0.8051
      D(INF(-4))        -0.076628     0.068315     -1.121686           0.2634
      LHUR(-1)          -0.016334     0.004572     -3.572454           0.0004
      LHUR(-2)           0.012626     0.009012      1.401065           0.1628
      LHUR(-3)           0.004539     0.009021      0.503131           0.6154
      LHUR(-4)          -0.001980     0.004592     -0.431099           0.6669
R-squared                0.284873    Mean dependent var               0.000194
Adjusted R-squared       0.255076    S.D. dependent var               0.022060
S.E. of regression       0.019040    Akaike info criterion           -5.040853
Sum squared resid        0.069601    Schwarz criterion               -4.892944
Log likelihood           515.6057    F-statistic                      9.560489
Durbin-Watson stat       1.965237    Prob(F-statistic)                0.000000

We can check whether adding the unemployment variables is informative by conducting the F-
Test that all are jointly insignificant.
To do this in eviews, while the equation above is displayed, click view, coefficient tests,
redundant variables and enter the four lags of unemployment in the dialogue box.
Redundant Variables: LHUR(-1) LHUR(-2) LHUR(-3) LHUR(-4)
F-statistic              6.101250    Probability                     0.000121
Log likelihood ratio     24.05091    Probability                     0.000078

This kind of test is so common it has a name “Granger Causality” and is automated in Eviews.
Cause(4) d(inf) lhur

Pairwise Granger Causality Tests
Date: 04/15/03 Time: 15:14
Sample: 1950:1 2004:1
Lags: 4
 Null Hypothesis:                              Obs        F-Statistic     Probability
 LHUR does not Granger Cause D(INF)            201         6.10125         0.00012
 D(INF) does not Granger Cause LHUR                        0.52608         0.71668

Note that we reject that unemployment does not “Granger Cause” a change in inflation to exactly
the same degree in both tests because the tests are the same. As the text points out, the test is
badly named. Granger informative would be better. We have shown that unemployment helps
predict changes in inflation even if the lags of inflation are available. The Eviews print out also
performed a different test. It ran unemployment on four lags of unemployment and four lags of
the change in inflation. The change in inflation is not informative in predicting changes in
unemployment. Causation requires exogeneity and neither inflation or unemployment can be
said to be exogenous.

We have used four lags arbitrarily. To select lag length use the Schwartz criteria. For example, if
we run the above regression with three lags
Dependent Variable: D(INF)
Method: Least Squares
Date: 04/15/03 Time: 15:16
Sample(adjusted): 1950:1 2000:1
Included observations: 201 after adjusting endpoints
       Variable        Coefficient    Std. Error       t-Statistic       Prob.
          C              0.005331     0.005367      0.993322            0.3218
      D(INF(-1))        -0.342109     0.071489     -4.785486            0.0000
      D(INF(-2))        -0.405157     0.068535     -5.911658            0.0000
      D(INF(-3))         0.036063     0.068039      0.530037            0.5967
      LHUR(-1)          -0.016935     0.004499     -3.764030            0.0002
      LHUR(-2)           0.015377     0.007987      1.925363            0.0556
      LHUR(-3)           0.000638     0.004609      0.138386            0.8901
R-squared                0.279143    Mean dependent var               0.000194
Adjusted R-squared       0.256848    S.D. dependent var               0.022060
S.E. of regression       0.019017    Akaike info criterion           -5.052772
Sum squared resid        0.070159    Schwarz criterion               -4.937732
Log likelihood           514.8036    F-statistic                      12.52066
Durbin-Watson stat       1.975165    Prob(F-statistic)                0.000000

The Schwarz criteria weighs improvement in SSR against additional variables. The criteria is
defined so that new variables increase the value unless the fit improves and SSR declines. The
criteria uses the log of SSR and our SSR is a fraction so the Schwartz criteria is negative. Still,
we find the smallest value. Here, -4.93< -4.89 so the three lag model is preferred. We would
continue making these comparisons. Note that my comparisons use exactly the same data and the
same number of observations. This is not automatic. I reset the sample to start in 1950 so that
enough prior data exists to allow 3 or 4 lags. (With two lags the Schwartz criteria is –4.988 and
is better still. With one lag it is –4.765 so 2 lags is correct.) (We could have avoided negative
info criteria by defining inflation as 400*d(log(pzunew)). This would give whole numbers rather
than fractions for inflation rates. )

The text notes that the Akaike criteria is commonly used but has a theoretical flaw that leads to
too many lags.

         We began by noting that the series for inflation rose then fell and that the switch occurs at
the same time Fed changes policy. Estimating a single equation through both historical periods
is apt to be misleading because the relationships we are estimating are unlikely to be stable or
stationary. There are two forms of non-stationarity that are important: structural breaks and unit
roots. Structural breaks are intuitive and dealt with in straightforward ways. If you believe
Y = a + bX changes at a particular date, define a dummy variable d, that becomes 1 at that date
and is zero before. Now estimate: Y   0   1 X   3 d   4 dX Conduct the F test that
  3   4  0. If you are uncertain, define many d’s and conduct many F tests and refer to a table
of critical values in the text for the QLR statistic. As it happens, Eviews provides a rather
different test for the purpose so we shall leave this for now.

The second form of nonstationarity is more subtle but no less important.
Unit roots are a fairly subtle point in econometrics. It is a point that the profession tried to ignore
as inconvenient for a time but one that we all now recognize as fundamental. Students often go
through the same process, why torture us with one more problem? Isn’t checking for
multicollinearity, heteroschedasticity and autocorrelation enough? After all it should be given
the classical regression model.

Unfortunately, much, perhaps most, economic data does not follow the classical regression
model. Economic data is time series, the prior value powerfully affects the future value. Worse,
much data is the value of an asset. If the asset’s value is expected to rise, people will buy it and
the value rises until further increases are not predicted. Therefore rational expectations and
efficient markets require that asset values be unpredictable. If the change in an asset’s value is
unpredictable then the changes are random and may be written as
(1) pt  pt 1  
where  is a normally distributed random error.
Rearranging we have
(2) pt  pt 1  
which allows us to recognize the series as a random walk.
        Random walks are quite intuitive. Consider a person at a particular place. If they take a
step, they are where they were plus a one step movement in some direction. That is precisely
what the equation says. If the person takes enough steps, even if the steps themselves are
random, they may end up at a place arbitrarily far from the initial point.
          Now consider two random walkers both beginning from Denver. Each flips a fair coin
and if the toss is a head, takes one step north, if tails, one step south. We stop them 1 million
steps later. One eventually finds himself in Canada and the other swimming in the Gulf of
Mexico. If we record their position as distance from the equator, a regression would uncover a
negative correlation that is completely spurious. If they had both ended up in Canada the
regression would claim a positive correlation for a purely random and unpredictable event.
Indeed only if one of the walkers miraculously finds themself back in Denver will the regression
give the correct result that the two walks are uncorrelated.
          If all series are random walks, the solution is easy, regress the change in position on the
change in position. Knowing that on the 50,000th trial walker A took a step north will tell you
nothing about walker B.
          There is a closely related time series process with very different properties. Consider
(3) pt  a  bpt 1   , b  1.
If we want to continue with the random walker analogy the story becomes rather strange. Think
of a hot air balloon with an elastic band connecting it to the equator and engines that push it
north. The parameter b measures the weakness of the elastic band. For b = 1 the band is so
weak the balloon stays wherever it was the previous period. For b = .9, the band is strong
enough to pull the balloon 10% of the way to the equator. The parameter a measures the strength
of the engines pushing north. At some point, the strength of the engines is just enough to offset
the pull from the elastic band. At this long run equilibrium point, pt  pt 1 and
 pt (1  b)  a   or
         a       
 pt                .
     1 b 1 b

Clearly, such process is in deep trouble if b  1. Our balloon is pushed farther and farther north
by its engines while the elastic band has no ability to pull it back. Eventually, it will find itself
well beyond the North Pole and in deep space while the error term becomes arbitrarily large as
well. These facts are commonly referred to by the mystic muttering “unit roots produce non-
stationary processes that are not mean reverting and have variances that increase with the number
of observations.” Mean reversion just means the process has an equilibrium it returns to
although shocks may push it away temporarily. The process above is mean reverting only if
b  1.
         It is easy to give (3) an economic interpretation. Consider the following model:
         C  a  bY(-1)  
Substituting we find
         C  a  bI  b(C(-1))  
We hope the system converges to an equilibrium where C = C(-1), if so
C - b(C(-1))  a  bI  
     a  bI     
C                 .
     1 b     1 b
        To summarize: Random walks are formally rather similar to convergent processes like
the multiplier. Regressing random walks on each other is very likely to produce spurious
regression results. Therefore before conducting regressions it is important to at least know
whether the series are stationary. This process will take a number of forms that we will explore
soon, but certainly commonsense plays a role. Is the series an asset value that theoretically ought
to be a random walk? Is the model underlying the process a convergent series?
        Before moving on to testing for unit roots, it is a good idea to play around with
regressions where we know the structure to get some feeling for the seriousness of the problem.
The following program is easily run in Eviews.
Create u 1000
smpl 1 1
genr a = 10
genr b = 5
genr x = 10
genr y = 5
smpl 2 1000
genr a = a(-1) +nrnd
genr b = b(-1) +nrnd
genr x = .80*x(-1) +nrnd
genr y = .80*y(-1)+nrnd
smpl 1 1000
graph g1.line a b
graph g2.line x y
equation a c b
equation x c y

You should be able to see the mean reversion in x and y but not a and b. The regression of b on
a will likely generate a strong t-stat indicating an association that does not exist while the t-stat
for y will likely be small. Try it.


     40                                                8




    -60                                               -8
               250         500       750   1000                  250       500       750     1000

                           A     B                                        X      Y
Dependent Variable: A
Method: Least Squares
Date: 04/15/03 Time: 16:24
Sample: 1 1000
Included observations: 1000
       Variable         Coefficient   Std. Error    t-Statistic    Prob.
          C              13.86998      0.396701     34.96335      0.0000
          B             -0.121270      0.016461    -7.367050      0.0000

Dependent Variable: X
Method: Least Squares
Date: 04/15/03 Time: 16:24
Sample: 1 1000
Included observations: 1000
       Variable         Coefficient   Std. Error    t-Statistic    Prob.
          C             -0.314450      0.054595    -5.759645      0.0000
          Y             -0.007032      0.030604    -0.229767      0.8183

Unit Root Testing

So, we need a test to help distinguish which of the two we have, non-mean reverting, non-
stationary unit roots or convergent series. What follows is informal. I strongly recommend
reading the sections in the Eviews manual on unit roots. (Starts on pg. 325 of users guide.)

Recall that the problem is that
 yt  ayt 1   t
and we want to know if a is too close to 1 or not. The accepted procedure, known as the Dickey
Fuller Test, is to run
 d ( yt )  yt 1   t
and test if   0. If   0 , then a = 1. If instead, we can reject the null, then the series is not
unit root. There are many problems. First, the t-stat for this test, given the hypothesis of a unit
root does not have the normal critical values. Eviews will display the correct critical values for
each test it runs.
        Next, we need the residual to be white noise. Autocorrelation is treated by adding lagged
difference terms. So the test equation might be:
 d ( yt )  yt 1  i d ( yt 1 )  t with enough lags to remove autocorrelation. This is referred to
as the Augmented Dickey-Fuller Test. It may also be necessary to include an intercept or time
trend. Mei Chu Hsiao recommends always adding an intercept and putting in the time trend if it
is statistically significant. She suggests using the Schwartz or Akaike criteria to select number of
lags. Stock and Watson suggest using the Akaike criteria because it is known to provide too
many lags and for this test, the problems from too many lags are not severe. They also point out
that linear trends are not the only alternative to stochastic trends. It may be that a non-linear trend
is the appropriate alternative hypothesis in which case they refer you to a more advanced text. In
the exercise you will see how to get Eviews to print out a summary of several tests for lags. The
Eviews manual is less clear. They prefer a theory based decision but are aware this may not be

To top