VIEWS: 10 PAGES: 4 CATEGORY: Academic Papers POSTED ON: 6/18/2012
MODEL BUILDING IN NEURAL NETWORKS FOR TIME SERIES FORECASTING BY USING INFERENCE OF R2 INCREMENTAL AND SIC CRITERION Suhartono *, Subanar †, Suryo Guritno † * PhD Student, Mathematics Department, Gadjah Mada University Statistics Department, Sepuluh Nopember Institute of Technology † Mathematics Department, Gadjah Mada University suhartono@statistika.its.ac.id ABSTRACT In general, there are two procedures usually used to find the best NN model (the optimal architecture), those The aim of this paper is to discuss and propose a are “general-to-specific” or “top-down” and “specific-to- procedure for model building in neural networks for time general” or “bottom-up” procedures. “Top-down” series forecasting. We focus on the model selection procedure is started from complex model and then applies strategies based on statistical concepts particularly on the an algorithm to reduce number of parameters by using inference of R2 incremental and SIC criterion. In this some stopping criteria, whereas “bottom-up” procedure paper, we employ this new procedure in bottom-up or works from a simple model. The first procedure in some forward approach which starts with a simple neural literatures is also known as “pruning” (see [11]), or networks. We use simulation data as a case study. The “backward” method in statistical modeling. The second results show that statistical inference of R2 incremental procedure is also known as “constructive learning” and combining by SIC criteria is an effective procedure for one of the most popular is “cascade correlation” (see e.g. model selection in neural networks for time series [2, 10]), and it can be seen as “forward” method in forecasting statistical modeling. The aim of this paper is to discuss and propose a new Keywords: neural networks, model building, statistical forward procedure by combining the inference of R2 inference, time series forecasting. incremental and SIC (Schwarz Information Criteria). We emphasize on the used of NN for time series forecasting. 1 INTRODUCTION In recent years, an impressive array of publications has 2 FFNN OR FEEDFORWARD NEURAL appeared claiming considerable successes of neural NETWORKS networks (NN) in data analysis and engineering Feedforward Neural Networks (FFNN) is the most applications. NN model is a prominent example of such a popular NN models for time series forecasting flexible functional form. The use of the NN model in applications. Figure 1 shows a typical three-layer FFNN applied work is generally motivated by a mathematical used for forecasting purposes. The input nodes are the result stating that under mild regularity conditions, a previous lagged observations, while the output provides relatively simple NN model is capable for approximating the forecast for the future values. Hidden nodes with any Borel-measureable function to any given degree of appropriate nonlinear transfer functions are used to accuracy (see e.g. [5, 6, 16]). process the information received by the input nodes. In the application of NN, it contains limited number of The model of FFNN in figure 1 can be written as parameters (weights). How to find the best NN model, i.e. q p how to find an accurate combination between number of y t 0 j f ij y t i oj t , (1) input variables and unit nodes in hidden layer, is a central j 1 i 1 topic on the some NN literatures that discussed on many where p is the number of input nodes, q is the number articles and books (see e.g. [1, 4, 12]). of hidden nodes, f is a sigmoid transfer function such as Time series forecasting has been an important the logistic: application of NN from the very beginning. Lapedes and 1 f ( x) , (2) Farber [9] were among the first researchers who used a 1 e x NN for time series forecasting. They explored the ability { j , j 0,1, , q} is a vector of weights from the hidden to of a multi-layer perceptron to forecast a nonlinear computer generated signal such as, e.g., the Mackey-glass output nodes and { ij , i 0,1, , p; j 1,2, , q} are weights differential equation. from the input to hidden nodes. Note that equation (1) indicates a linear transfer function is employed in the ( y h y) 2 ˆ output node. R2h . (5) ( y y )( y h y h ) ˆ ˆ Functionally, the FFNN expressed in equation (1) is equivalent to a nonlinear AR model. This simple structure Now the R2 incremental contribution of unit hidden cell h of the network model has been shown to be capable of is given as approximating arbitrary function (see e.g. [5, 6, 16]). R(2h ) R 2 R 2 h . (6) However, few practical guidelines exist for building a FFNN for a time series, particularly the specification of The same procedure can be applied to reduce the FFNN architecture in terms of the number of input and number of input layer cells. In this case, { y i (t )} is ˆ hidden nodes is not an easy task. network output, given network parameter estimates, without input cell i. The contribution of unit input cell i is put to zero ( ih 0, where i 1,2, , p; h 1,2, , q ) , then the reduced network can be quantified by the square of the ˆ correlation coefficient Ri between y and y i with 2 ( y i y) 2 ˆ R2i . (7) ( y y )( y i y i ) ˆ ˆ The R2 incremental contribution of input cell i is measured as R(2i ) R 2 R 2 i . (8) 2 The relative value of R incremental contribution can be used in evaluating whether an input or unit hidden cell can be omitted or not [7]. Figure 1. Architecture of neural network model with single hidden layer Kaashoek and Van Dijk [7] introduced a “pruning” 2.2 Statistically inference of R2 incremental procedure by implementing three kinds of methods to find the best FFNN model; those are incremental contribution contribution (R2 incremental), principal component analysis, and In this paper we propose a new forward procedure graphical analysis. Whereas, Swanson and White [14, 15] based on the statistical inference of R2 incremental applied a criterion of model selection, SIC, on “bottom- contribution. This approach involves three basic steps, up” procedure to increase number of unit nodes in hidden which we now describe in turn. layer and input variables until finding the best FFNN (i). Simple or Reduced model model. We begin with the simple model considered to be 2 appropriate for the data, which in this context is called 2.1 Incremental Contribution through R the reduced or restricted model. In this case, we firstly Kaashoek and Van Dijk [7] stated that a natural evaluate the contribution of unit hidden cells. For the candidate for quantification of the network performance is simple case, the reduced model is a linear model or NN the square of the correlation coefficient of y and y ˆ model without hidden layer, i.e. ( y y ) 2 ˆ p R2 (3) yt 0 j yt j t . Reduced model (9) ( y y )( y y ) ˆˆ j 1 where y is the vector of network output points. The ˆ We fit this reduced model and obtain the error sum of network performance with only one unit hidden cell squares, denoted by SSE(R). deleted can be measured in a similar way. For instance, if (ii). Complex or Full model the contribution of hidden cell h is put to zero ( h 0) , then the network will produce an output y h with errors ˆ Next, we consider the complex or full model, i.e. NN model as in equation (1). We start fitting NN model with eh y y h . ˆ (4) single unit hidden cell or q 1 . The error sum of squares This reduced network can be measured by the square of this full model denoted by SSE(F). Here, we have: ˆ of the correlation coefficient R h between y and y h , 2 SSE ( F ) ( y y h ) 2 . ˆ Full model (10) (iii). Test Statistic optimal unit hidden layer cells. The result of an optimization steps are reported in Table 1. Kutner, Nachsteim and Neter [8] stated when a large- sample test concerning several parameters (i.e. j and ij in equation (1)) model simultaneously is desired, we can use the same approach as for the general linear test. First, we fit the reduced model and obtaining SSE(R), then fit the full model and obtaining SSE(F), and finally calculate the test statistic: SSE ( R) SSE ( F ) SSE ( F ) F* . (11) df R df F df F For large n, this test statistic is distributed approximately as F (v1 df R df F , v 2 df F ) when H 0 holds, i.e. additional parameters in full model all equal to 0. Gujarati [3] showed that equation (11) can be written in R2 incremental contribution as R(2F ) R(2R ) 2 Figure 2. Time series and lags (yt-1 and yt-2) plots of simulated data (1 RF ) F* . (12.a) df R df F df F Table 1 shows that two unit hidden layer cells are the or optimal result and further optimization runs are not R(2Incremental ) 2 (1 RF ) needed. The graphs of network output by adding one unit F* . (12.b) hidden cell are presented at Figure 3. Then, we continue df R df F df F an optimization to find the optimal input cells. We continue step 1 to 3 until the optimal of unit hidden cells are found. Then, the forward procedure Table 1. The results of the optimal unit hidden cells determination by implementing forward procedure continues to find the optimal unit input cells. We start with the input which has the largest R2. In this paper, we combine this test statistic with SIC criteria for determining the optimal cells. 3 RESEARCH METHODOLOGY In this paper, the proposed forward procedure is implemented by using a simulated data. The simulation experiment is carried out to show how the proposed NN modeling procedure work. Simulated data are generated as ESTAR (Exponential Smoothing Transition Autoregres- sive) model, i.e. yt 6.5 yt 1. exp(0.25 yt21 ) u t , (13) where u t ~ nid(0,0.5 2 ) . Time series and the lags plots of this simulated data can be seen in Figure 2. We can observe clearly that data follow nonlinear autoregressive pattern at lag 1. 4 EMPIRICAL RESULTS In this section the empirical results for the proposed forward procedure are presented and discussed. 4.1 Unit hidden selection Figure 3. The network output by adding one unit hidden layer cell compared with actual data First, we apply the proposed forward procedure starting with a FFNN with six variable inputs ( yt 1 , yt 2 , , yt 6 ) and one constant input to find the 4.2 Input unit selection [5] K. Hornik, M. Stinchombe and H. White, Multi- layer feedforward networks are universal approxi- The results of an optimization steps for determining mators, Neural Networks, 2, 1989, pp. 359–366. the optimum unit input cells are reported in Table 2. It [6] K. Hornik, M. Stinchombe and H. White, Universal shows that unit input 1, i.e. yt 1 , is the optimal input cell approximation of an unknown mapping and its of the network. Hence, this forward procedure yields the derivatives using multilayer feedforward networks, optimal network is FFNN with one input cell and two Neural Networks, 3, 1990, pp. 551–560. hidden unit layer cells or FFNN(1,2). [7] J. F. Kaashoek and H.K. Van Dijk, Neural Network In general, the results of this simulation study show Pruning Applied to Real Exchange Rate Analysis, that the optimal FFNN architecture yielded by this Journal of Forecasting, 21, 2002, pp. 559–577. forward procedure is similar to the paper of Subanar and [8] M. H. Kutner, C.J. Nachtsheim and J. Neter, Suhartono [13]. Applied Linear Regression Models, McGraw Hill International, New York, 2004. Table 2. The results of the optimal unit inputs determination by implementing forward procedure [9] A. Lapedes and R. Farber, Nonlinear Signal Processing using Neural Networks: Prediction and System Modelling, Technical Report LAUR-87- 2662, Los Alamos National Laboratory, Los Alamos, NM, 1987. [10] L. Prechelt, Investigation of the CasCor Family of Learning Algorithms, Neural Networks, 10, 1997, pp. 885-896. [11] R. Reed, Pruning algorithms - A survey, IEEE Transactions on Neural Networks, 4, 1993, pp. 740-747. [12] B. D. Ripley, Pattern Recognition and Neural Networks, Cambridge University Press, Cambridge, 1996. [13] Subanar and Suhartono, Model selection in Neural 5 CONCLUSION Networks by using inference of R2 incremental and Based on the results at the previous section, we can Principal Component Analysis for Time Series conclude that forward procedure by combining inference Forecasting, Presented at the 2nd IMT-GT Regional of R2 incremental and SIC criterion is an effective and Conference on Mathematics, Statistics and Their efficient procedure for determining the best NN model Applications, Universiti Sains Malaysia, 2006. applied to time series forecasting. Additionally, the results [14] N. R. Swanson and H. White, A model-selection also show that the proposed forward procedure gives an approach to assessing the information in the term advantage for FFNN modeling, i.e. the building process of structure using linear models and artificial neural FFNN model is not a black box. networks, Journal of Business and Economic Statistics, 13, 1995, pp. 265–275. [15] N. R. Swanson and H. White, A model-selection approach to real-time macroeconomic forecasting REFERENCE using linear models and artificial neural networks, [1] C. M. Bishop, Neural Network for Pattern Recog- Review of Economic and Statistics, 79, 1997, pp. nition, Oxford: Clarendon Press, 1995. 540–550. [2] S. E. Fahlman and C. Lebiere, The Cascade- [16] H. White, Connectionist nonparametric regression: Correlation Learning Architecture, in Touretzky, D. Multilayer feedforward networks can learn S. (ed.), Advances in Neural Information Proces- arbitrary mapping, Neural Networks, 3, 1990, pp. sing Systems 2, Los Altos, CA: Morgan Kaufmann 535-550. Publishers, 1990, pp. 524-532. [3] D. N. Gujarati, Basic Econometrics, 5th edition, McGraw Hill International, New York, 1996. [4] H. Haykin, Neural Networks: A Comprehensive Foundation, Second edition, Prentice-Hall, Oxford, 1999.