VIEWS: 46 PAGES: 10 CATEGORY: Education POSTED ON: 1/4/2010 Public Domain
Lecture 5, page 1 Lecture 5: Estimation of time series Outline of lesson 5 (chapter 4) (Extended version of the book): a.) Model formulation Explorative analyses Model formulation b.) Model estimation Identification of order Estimation of parameters c.) Model checking Residual checking (forecasting) ___________________________________________ C:\Kyrre\studier\drgrad\Kurs\Timeseries\lecture 05 010123.doc, KL, 26.09.02, page 1 of 1 Lecture 5, page 2 a.) Model formulation Explorative analysis • Always plot the data - Best way to discover features (i.e. non-stationarity) of the data that you might want to take into consideration (variance change, trends, seasonality, normality) par(mfrow=c(1,1)) ts.plot(airpass, main="International Airline Passengers (Airpass.dat)", xlab="Year", ylab="(thousands)", ylim=c(-100, 600)) ts.points(airpass, pch=28, col=8) ts.lines(airpass.stl$seas, col=4) ts.lines(airpass.stl$rem, lty=1, col=3) legend(locator(1), legend=c("Data", "Seasonal effects", "remainder"), lty=1, col=c(1,4,3)) International Airline Passengers (Airpass.dat) 600 Data Seasonal effects remainder 400 (thousands) 200 0 Jan 54 Jan 56 Jan 58 Jan 60 Jan 62 Jan 64 Year ___________________________________________ C:\Kyrre\studier\drgrad\Kurs\Timeseries\lecture 05 010123.doc, KL, 26.09.02, page 2 of 2 Lecture 5, page 3 • Estimate the ACF and PACF (Tell you about trends. How?) Seasonal components (Airpass.dat) Series : airpass.ln.stl$rem 1.0 6.0 ln(thousands) 0.6 5.6 ACF 0.2 5.2 4.8 -0.2 Jan 54 Jan 56 Jan 58 Jan 60 Jan 62 Jan 64 0.0 0.5 1.0 1.5 Lag Year • Decide on transformations, trend removal etc. Residual time series w/linear trend (Airpass.dat) Detrended time series 0.10 6.0 0.05 0.0 5.5 -0.05 5.0 -0.15 Jan 54 Jan 56 Jan 58 Jan 60 Jan 62 Jan 64 Jan 54 Jan 56 Jan 58 Jan 60 Jan 62 Jan 64 Series : airpass.resid Series : airpass.resid ___________________________________________ C:\Kyrre\studier\drgrad\Kurs\Timeseries\lecture 05 010123.doc, KL, 26.09.02, page 3 of 3 Lecture 5, page 4 Model formulation -> We will return more to this theme. In this context: What kind of lag-structure is likely/possible in your data? "Disturbance": "Disturbance": "Disturbance": Weather etc. Weather etc. Weather etc. Time t-1 Time t Time t+1 Time t-2 Time t-1 Time t Time t+1 εt-1 εt Mature adults Progeny Give rise to Surviving adults and new New progeny adults Give rise to - How many time steps back is it reasonable that there is a biological relation? Generation time? - For how long back can we expect external processes to influence? Remember that the previous generation is also influenced by disturbance, which is independent of the disturbance of the current generation. We are looking only for direct influence on "this" generation. ___________________________________________ C:\Kyrre\studier\drgrad\Kurs\Timeseries\lecture 05 010123.doc, KL, 26.09.02, page 4 of 4 Lecture 5, page 5 b.) Model estimation Identification of order -> Should be based on biology: What is your biological assessment of the most comprehensive model that you can believe? Our formal tool to chose the appropriate model is AIC To understand AIC, we need some background. (Try to understand the concepts.) Remember: We wanted our data to be normally distributed ⇓ To do Maximum Likelihood (ML) estimation ___________________________________________ C:\Kyrre\studier\drgrad\Kurs\Timeseries\lecture 05 010123.doc, KL, 26.09.02, page 5 of 5 Lecture 5, page 6 Maximum Likelihood and AIC If we know the distribution of the noise, we can write out the probability density function for the data. We write: ε1, …, εn iid f(x, θ) (The noise is independent, identically distributed by a function f() that is dependent on the data x, and some parameters, θ) If so, we can find the most likely values of the parameters, given these data. This is denoted L(θ, x) – The likelihood. If we can assume normality, the likelihood is given by T 1 ε2 1 − L=∏ e 2σ 2 t =1 2Πσ and the ε2 is dependent on some parameters, θ. We want to find a θˆ (estimate of the parameters) that is most likely, given the data and the distribution. As many distributions are exponential, the likelihood is log'ed and by technical reasons the negative is often taken. Thus We want to minimise -log L by respect to some parameters, θˆ (by some numerical optimality routine) This is then the NLLH, Negative Log Likelihood. The AIC – Akaike's Information Criterion is: AIC = - 2 loglik + 2 (independent parameters. Here: p+q+1) 2( p + q + 1)n Corrected AIC, AICC = –2loglik + n− p−q−2 ___________________________________________ C:\Kyrre\studier\drgrad\Kurs\Timeseries\lecture 05 010123.doc, KL, 26.09.02, page 6 of 6 Lecture 5, page 7 Practically: We estimate parameters for different models, compare AIC- values from the models, and adapt the model with the lowest AIC-value (a difference of 2 is significant). ___________________________________________ C:\Kyrre\studier\drgrad\Kurs\Timeseries\lecture 05 010123.doc, KL, 26.09.02, page 7 of 7 Lecture 5, page 8 Estimation of parameters Two alternatives: 1.) If you know (i.e. are sure) that you are to fit an AR-process: AR- estimation: - > The parameters can be fitted by least squares estimation (as we do in ordinary regression) by minimising: ∑ [x ] N − µ − α 1 ( xt −1 − µ ) − ⋅ ⋅ ⋅ − α p ( xt − p − µ ) 2 S= t t = p +1 (If we assume normality, this is also the maximum likelihood estimates and AIC can be used to decide on the number of lags. This is what Splus does.) In the book: The Partial Autocorrelation Coefficient can be used to determine the order of the process. 2.) If MA-terms are involved (alone or in addition to the AR-terms), iterative methods must be used to fit the parameters. Concept, least square: Different values for the parameters are tried successively until the lowest deviation from the observed values are obtained (page 59). In practise: Maximum likelihood optimised by some exact method by the software. (Splus: arima.mle) Then AIC can be used to choose the appropriate order of the process. ___________________________________________ C:\Kyrre\studier\drgrad\Kurs\Timeseries\lecture 05 010123.doc, KL, 26.09.02, page 8 of 8 Lecture 5, page 9 c.) Model checking As in all other statistics: We check the residuals: Residual = observation - fitted value In time-series, this is the one step ahead forecast (we will return to forecasting later). E.g., an AR(1) model: ε t = xt − α 1 xt −1 ˆ ˆ To check the residuals, the book recommends: - Plot the residuals (in time) - Calculate the correlogram - (test) When fitting models using arima.mle all this can be taken care of by arima.diag(). Here the test is Ljung and Box (1978) airpass.arima22 <- arima.mle(airpass.resid, model = list(order=c(2,0,2)),n.cond=6) aicc <- function(loglik, p, q, n) { a <- loglik + ((2 * (p + q + 1) * n)/(n - p - q - 2)) return(a) } > airpass.arima22$aic [1] -531.36 > aicc(airpass.arima22$loglik, p=2, q=2, n=144) [1] -528.93 par(mfrow=c(1,2)) arima.diag(airpass.arima22) ___________________________________________ C:\Kyrre\studier\drgrad\Kurs\Timeseries\lecture 05 010123.doc, KL, 26.09.02, page 9 of 9 Lecture 5, page 10 ARIMA Model Diagnostics: airpass.resid Plot of Standardized Residuals 2 1 0 -1 -2 -3 1953 1955 1957 1959 1961 1963 1965 ACF Plot of Residuals 1.0 0.5 ACF 0.0 -1.0 0.0 0.5 1.0 1.5 PACF Plot of Residuals 0.0 0.1 PACF -0.1 0.5 1.0 1.5 P-values of Ljung-Box Chi-Squared Statistics 0.15 0.10 p-value 0.05 0.0 0.4 0.6 0.8 1.0 1.2 Lag ARIMA(2,0,2) Model with Mean 0 ___________________________________________ C:\Kyrre\studier\drgrad\Kurs\Timeseries\lecture 05 010123.doc, KL, 26.09.02, page 10 of 10