Docstoc

BABS 502

Document Sample
BABS 502 Powered By Docstoc
					       BABS 502


ARIMA Forecasting
Lecture 8 - March 16-18, 2009
              General Overview
• An ARIMA model is a mathematical model for time
  series data.
• George Box and Gwilym Jenkins developed a systematic
  approach for fitting these models to data so these
  models are often called Box-Jenkins models.
• We always use statistical or forecasting programs to fit
  these models
   – The programs fit models and produce forecasts for us.
• But it is beneficial to understand the basic model to know
  that what the software is doing makes sense
   – Especially if we use an automatic forecasting program.



                          (c) Martin L. Puterman              2
               ARIMA Models
• ARIMA Stands for AutoRegressive Integrated Moving
  Average
• We speak also of AR models, MA models, ARMA
  models, IMA models which are special cases of this
  general class.
• Models generalize regression but “independent”
  variables are past values of the series itself and
  unobservable random disturbances.
• Estimation is based on maximum likelihood; not least
  squares.
• We distinguish between seasonal and non-seasonal
  models.

                      (c) Martin L. Puterman             3
                    Notation
• Y1, Y2, …, Yt denotes a series of values for a
  time series.
  – These are observable.
• e1, e2, …, et denotes a series of random
  disturbances.
  – These are not observable.
  – They may be thought of as a series of random
    shocks.
  – Usually they are assumed to be generated from a
    Normal distribution with mean 0 and standard
    deviation  and to be uncorrelated with each other.
  – They are often called “white noise”.
                     (c) Martin L. Puterman               4
 An Autoregressive (AR(p)) Model
• AR(1) Model: Yt = A1Yt-1 + et
    – A1 is an unknown parameter with values between -1 and +1 which is to
      be estimated from data
    – As a first approximation we can estimate A1 by linear regression (with
      intercept set equal to 0)
• When A1 = 1, the model is called a random walk.
    – In this case,
             Yt = Yt-1 + et
    – or alternatively
             Yt - Yt-1 = et
    – We can show (by back substitution and assuming Y0 = 0) that for a
      random walk
        • E(Yt ) = 0 and Var(Yt) = t2
        • Hence the values get more variable as you move out in the series.
        • This means that when data follows a random walk the best
          prediction of the future is the present (a naïve forecast) and the
          prediction gets less accurate the further into the future we forecast.


                              (c) Martin L. Puterman                          5
         Other AR(p) models
• The AR(2) Model
  – Yt = A1Yt-1 +A2 Yt-2 + et
  – Here, A1 and A2 are unknown parameters
• The AR(p) Model
  – Yt = A1Yt-1 +A2 Yt-2 + … + Ap Yt-p+ et
  – Here, A1, … Ap are unknown parameters
• To apply these in practice, we estimate the
  parameters and then use the model for
  forecasting by substituting past observed values.
• These models are called ARIMA(p,0,0) models.

                    (c) Martin L. Puterman        6
            Which Model to Fit?
• The Autocorrelation Function (ACF) and Partial
  Autocorrelation Function (PACF) give some insight into
  what model to fit to data.
   – We work backwards here.
      • Given a theoretical model, we can determine theoretically what its
        ACF and PACF should be.
      • So if the ACF and PACF from the data have a recognizable pattern
        then we try fitting a model that could generate that pattern to the
        data.
• What is a PACF?
   – The pth partial autocorrelation is the coefficient of Yt-p in a
     regression of Yt on Yt-1, Yt-2, …, Yt-p.
   – Thus, if the data was generated by an AR(2) model, in theory the
     first two PACFs would be non-zero and all PACF’s higher than
     two would be zero.


                           (c) Martin L. Puterman                             7
 Some further comments on ACFs
           and PACFs
• Computing autocorrelations (ACs) is similar to
  performing a series of simple regressions of Yt on Yt-1,
  then on Yt-2, then on Yt-3, ….
   – The AC coefficients reflect only the relationship between the two
     quantities included in the regression.
• Computing partial autocorrelations (PACs) is more in the
  spirit of multiple regression. The PACs remove the
  effects of all lower order lags before computing the
  autocorrelation.
   – For example the 2nd order PAC is the effect of observations two
     periods ago on the current observation, given that the effect of
     the observation one period ago has been removed.
   – This can be viewed as multiple regression.

                          (c) Martin L. Puterman                        8
Example: AR(1) model A1 = .8
                             Model: ArmaRoutine(0.8;0;0;0)                                                                                 Model: ArmaRoutine(0.8;0;0;0)
                    1.0                                                                                                          1.0




                                                                                                      Partial Autocorrelations
                    0.5                                                                                                          0.5
 Autocorrelations




                    0.0                                                                                                          0.0



                    -0.5                                                                                                         -0.5



                    -1.0                                                                                                         -1.0
                       0.0   10.3        20.5                 30.8     41.0                                                         0.0     10.3       20.5        30.8    41.0
                                         Lag                                                                                                           Lag




                                                                                 Plot of Simulated Data
                                                              4.0



                                                              1.5
                                             Simulated Data




                                                              -1.0



                                                              -3.5



                                                              -6.0
                                                                 0.9          25.9       50.9               75.9                          100.9
                                                                                         Time


                                                                       (c) Martin L. Puterman                                                                                     9
Example: AR(1) Model; A1 =-.7
                             Model: ArmaRoutine(-0.7;0;0;0)
                                                                                                                                                        Model: ArmaRoutine(-0.7;0;0;0)
                    1.0
                                                                                                                                         1.0




                                                                                                              Partial Autocorrelations
                    0.5
 Autocorrelations




                                                                                                                                         0.5


                    0.0
                                                                                                                                         0.0


                    -0.5
                                                                                                                                         -0.5


                    -1.0
                       0.0   10.3         20.5                    30.8     41.0                                                          -1.0
                                                                                                                                            0.0         10.3         20.5        30.8    41.0
                                          Lag
                                                                                                                                                                     Lag




                                                                                     Plot of Simulated Data
                                                                  6.0



                                                                  3.0
                                                 Simulated Data




                                                                  0.0



                                                                  -3.0



                                                                  -6.0
                                                                     0.9          25.9       50.9         75.9                                  100.9
                                                                                          L.
                                                                              (c) Martin TimePuterman                                                                                           10
                            Example: AR(2) Model
                            Model: ArmaRoutine(0.8,-0.5;0;0;0)
                                                                                                                                                      Model: ArmaRoutine(0.8,-0.5;0;0;0)
                   1.0
                                                                                                                                      1.0




                                                                                                           Partial Autocorrelations
                   0.5
Autocorrelations




                                                                                                                                      0.5


                   0.0
                                                                                                                                      0.0


                   -0.5
                                                                                                                                      -0.5


                   -1.0
                      0.0     10.3        20.5                   30.8          41.0                                                   -1.0
                                                                                                                                         0.0            10.3        20.5         30.8      41.0
                                           Lag
                                                                                                                                                                     Lag



                                                                                            Plot of Simulated Data
                                                                        4.0



                                                                        2.0
                                                    Simulated Data




                                                                        0.0



                                                                        -2.0



                                                                        -4.0
                                                                           0.9           25.9       50.9                                       75.9             100.9
                                                                                                    Time
                                                                                      (c) Martin L. Puterman                                                                                      11
                                       Random Walk
                            Model: ArmaRoutine(1;0;0;0)                                                                                              Model: ArmaRoutine(1;0;0;0)
                   1.0                                                                                                               1.0




                                                                                                        Partial Autocorrelations
                   0.5                                                                                                               0.5
Autocorrelations




                   0.0                                                                                                               0.0



                   -0.5                                                                                                             -0.5



                   -1.0                                                                                                             -1.0
                      0.0   10.3       20.5        30.8                      41.0                                                      0.0       10.3           20.5        30.8   41.0
                                       Lag                                                                                                                      Lag




                                                                                       Plot of Simulated Data
                                                                      6.0



                                                                      2.5
                                                     Simulated Data




                                                                      -1.0



                                                                      -4.5



                                                                      -8.0
                                                                         0.9        25.9       50.9                                75.9      100.9
                                                                                               Time




                                                                       (c) Martin L. Puterman                                                                                             12
                          Monthly Pulp Price Data
                                                                                                                                                Partial Autocorrelations of pulp (0,0,12,1,0)
                            Autocorrelations of pulp (0,0,12,1,0)                                                                      1.0

                   1.0




                                                                                                          Partial Autocorrelations
                                                                                                                                       0.5

                   0.5
Autocorrelations




                                                                                                                                       0.0

                   0.0

                                                                                                                                       -0.5

                   -0.5

                                                                                                                                       -1.0
                                                                                                                                          0.0         10.3          20.5         30.8           41.0
                   -1.0
                      0.0     10.3          20.5         30.8              41.0                                                                                    Time
                                           Time



                                                                                           Plot of pulp
                                                                  1200.0



                                                                   950.0
                                                           pulp




                                                                   700.0



                                                                   450.0



                                                                   200.0
                                                                       0.9          63.9     126.9                                   189.9       252.9
                                                                                              Time


                                                                                  (c) Martin L. Puterman                                                                                               13
                                      Annual Births Data
                                                                                                                                            Partial Autocorrelations of Births (0,0,12,1,0)
                            Autocorrelations of Births (0,0,12,1,0)
                                                                                                                                    1.0
                   1.0




                                                                                                       Partial Autocorrelations
                                                                                                                                    0.5
                   0.5
Autocorrelations




                                                                                                                                    0.0
                   0.0


                                                                                                                                   -0.5
                   -0.5


                                                                                                                                   -1.0
                   -1.0                                                                                                               0.0             10.3       20.5         30.8            41.0
                      0.0      10.3          20.5         30.8          41.0
                                                                                                                                                                Time
                                            Time




                                                                                      Plot of Births
                                                           500000.0



                                                           450000.0
                                                          Births




                                                           400000.0



                                                           350000.0



                                                           300000.0
                                                                  0.9          14.1       27.4                                    40.6         53.9
                                                                                          Time

                                                                           (c) Martin L. Puterman                                                                                                    14
                            Stationarity
• A time series is stationary if:
     – It’s mean is the same at every time
     – It’s variance is the same every time
     – It’s autocorrelations are the same at every time
•   A series of outcomes from independent identical trials is stationary.
•   A series with a trend is not stationary.
•   A random walk is not stationary.
•   If a time series is non-stationary, its ACF dies off slowly and the first
    partial autocorrelation is near 1.
     – In such cases we can sometimes create a stationary series by
       differencing the original series.
     – If Yt is a random walk, then its differences are white noise which is
       stationary
• A unit root test is a formal test for non-stationarity
     – One such test is the Dickey-Fuller test

                                  (c) Martin L. Puterman                       15
                          Differenced Births Data
                                                                                                                           Partial Autocorrelations of Births (1,0,12,1,0)
                            Autocorrelations of Births (1,0,12,1,0)
                                                                                                                  1.0
                   1.0




                                                                                       Partial Autocorrelations
                                                                                                                  0.5
                   0.5
Autocorrelations




                                                                                                                  0.0
                   0.0


                                                                                                                  -0.5
                   -0.5


                                                                                                                  -1.0
                   -1.0                                                                                              0.0          10.3          20.5         30.8            41.0
                      0.0      10.3          20.5         30.8        41.0
                                                                                                                                               Time
                                            Time




                      The PACF suggests that the differences of the birth data
                      may follow an AR(1) or AR(2) or AR(5) model.




                                                                       (c) Martin L. Puterman                                                                                       16
Differenced Pulp Price Data
                                                                                                                       Partial Autocorrelations of pulp (1,0,12,0,0)
                             Autocorrelations of pulp (1,0,12,0,0)
                                                                                                              1.0
                    1.0




                                                                                   Partial Autocorrelations
                                                                                                              0.5
                    0.5
Autocorrelations




                                                                                                              0.0
                    0.0


                                                                                                              -0.5
                    -0.5


                                                                                                              -1.0
                    -1.0                                                                                         0.0         10.3          20.5         30.8           41.0
                       0.0     10.3          20.5         30.8       41.0
                                                                                                                                          Time
                                            Time




                   The story is less clear here. Perhaps the differences
                   follow an AR(1), the lag 1 PAC is .346, the lag 2 PAC is
                   .184.




                                                                      (c) Martin L. Puterman                                                                                  17
        Differenced Models
• We let Zt = Yt – Yt-1.
• When the differenced model is stationary,
  we can write a model in terms of Zt .
• If Zt follows an AR(p) model, then Yt
  follows and ARIMA(p,1,0) model.
• In practice ARIMA(1,1,0) and
  ARIMA(2,1,0) are quite common.

                 (c) Martin L. Puterman       18
                       Pulp Data
• The fit from an                                                       1.0
                                                                                  Autocorrelations of Residuals




  ARIMA(1,1,0) model                                                    0.5




                                                  Autocorrelations
  is                                                                    0.0




  – A1 =.346 (t-value 5.46)                                            -0.5




  – So fitted model is                                                 -1.0
                                                                          0.0     12.3        24.5
                                                                                              Lag
                                                                                                           36.8    49.0



     • Zt = .346 Zt-1 + et
  – The residuals appear                                             1200.0
                                                                                           pulp Chart



    to have no remaining                                              900.0


    autocorrelation
                                                  pulp
                                                                      600.0


  – Forecasts seem pretty                                             300.0


    flat; 561.7, 562.3,                                                 0.0

    562.6, 562.6, 562.6                                                  982.9   1051.9      1120.9
                                                                                             Time
                                                                                                         1189.9   1258.9




                             (c) Martin L. Puterman                                                                        19
                     MA(q) Models
• These are less plausible but fit many series well.
• MA(1) model:
   – Yt = et + W1 et-1
• MA(2) model:
   – Yt = et + W1 et-1 + W2 et-2
• MA(q) model
   – Yt = et + W1 et-1 + W2 et-2 +…+ Wq et-q
   – This is referred to as an ARIMA(0,0,q) model.
• Rationale for MA models is that effects of disturbances
  are short lived (q periods) as opposed to an AR model
  where they persist forever.
• Note that the disturbances are not observable.

                             (c) Martin L. Puterman         20
                   An MA(1) Model: W1 = .7
                            Model: ArmaRoutine(0;0;.7;0)
                                                                                                                                              Model: ArmaRoutine(0;0;.7;0)
                   1.0
                                                                                                                              1.0




                                                                                                   Partial Autocorrelations
                   0.5
Autocorrelations




                                                                                                                              0.5


                   0.0
                                                                                                                              0.0


                   -0.5
                                                                                                                              -0.5


                   -1.0
                      0.0   10.3        20.5                    30.8      41.0                                                -1.0
                                                                                                                                 0.0          10.3        20.5        30.8   41.0
                                        Lag
                                                                                                                                                          Lag




                                                                                    Plot of Simulated Data
                                                                  4.0



                                                                  2.0
                                               Simulated Data




                                                                  0.0



                                                                 -2.0



                                                                 -4.0
                                                                    0.9          25.9       50.9                                       75.9            100.9
                                                                                            Time
                                                                           (c) Martin L. Puterman                                                                                   21
                   An MA(1) Model: W1 = -.7
                            Model: ArmaRoutine(0;0;-.7;0)                                                                             Model: ArmaRoutine(0;0;-.7;0)
                   1.0                                                                                                    1.0




                                                                                               Partial Autocorrelations
                   0.5                                                                                                    0.5
Autocorrelations




                   0.0                                                                                                    0.0



                   -0.5                                                                                                   -0.5



                   -1.0                                                                                                   -1.0
                      0.0   10.3        20.5                    30.8       41.0                                              0.0      10.3        20.5        30.8    41.0
                                        Lag                                                                                                       Lag


                                                                                     Plot of Simulated Data
                                                                   4.0



                                                                   2.0
                                               Simulated Data




                                                                   0.0



                                                                  -2.0



                                                                  -4.0
                                                                     0.9          25.9       50.9                                  75.9           100.9
                                                                                             Time

                                                                            (c) Martin L. Puterman                                                                           22
                     Births Data
• Clearly differencing is
  required                                                                  Autocorrelations of Residuals


• Consider fitting an MA(1)
                                                                   1.0




  model to the differenced                                         0.5




                                                Autocorrelations
  data                                                             0.0


• Find that estimated                                              -0.5

  coefficient is -.42 with a
  T-value of -3.87                                                 -1.0
                                                                      0.0   12.3        24.5
                                                                                        Lag
                                                                                                     36.8   49.0



• But autocorrelation of
  residuals contains
  information
   – Note lag 2 AC = .349

                            (c) Martin L. Puterman                                                             23
                 Births Data
• Try an ARIMA(0,1,2)                                       1.0
                                                                     Autocorrelations of Residuals




  model                                                     0.5




                                         Autocorrelations
• Parameters are -.37                                       0.0




  (t =-3.47 ), -.59 (t=-                                    -0.5




  5.76)                                                     -1.0
                                                               0.0   12.3        24.5
                                                                                 Lag
                                                                                              36.8   49.0




• Residuals appear to                                                        Births Chart

  be white noise.                            550000.0




• Forecasts are
                                             450000.0




                                         Births
                                             350000.0

  338311, 340936,
  340936,….
                                             250000.0



                                             150000.0
                                                    0.9              20.1        39.4         58.6   77.9
                                                                                Time




                    (c) Martin L. Puterman                                                                  24
The ARIMA(0,1,1) Model Revisited
• This model can be written as (letting w = -W1)
   Yt –Yt-1 = et - w et-1
• The forecast from this model is
   Ft = Yt-1 - w(Yt-1 - Ft-1) = (1-w) Yt-1 + w Ft-1
• This is simple exponential smoothing
• The new concept here is that the ARIMA(0,1,1)
  model is a formal statistical model while simple
  exponential is an ad hoc approach to forecasting.
   This means that there is an error term and hence forecast
     errors and hypothesis tests are part of the model.

                        (c) Martin L. Puterman                 25
 Relationship between MA and AR Models
• Any finite AR model can be written as an infinite MA
  model
• Any finite MA model can be written as an infinite AR
  model.
   – These results can be shown by backward substitution (as
     we did previously for the AR models)
• Two consequences of these observations
   – Model Selection
      • If your best fit is an AR model with several terms (i.e., 4 or
        more); try an MA model with a few terms and conversely
   – Identification
      • AR models have ACF with several terms and short PACFs
      • MA models have short ACF’s and long PACFs

                            (c) Martin L. Puterman                       26

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:1
posted:7/22/2012
language:
pages:26