VIEWS: 24 PAGES: 6 CATEGORY: Business POSTED ON: 11/22/2010 Public Domain
A study of techniques for mining data from the Australian Stock Exchange Mark B Barnes, Russell J Rimmer and Kai Ming Ting1 School of Computing and Mathematics Deakin University Clayton, Victoria 3168, Australia ABSTRACT Next-day predictions of the All Ordinaries Index of the inputs. Training was on 500 weeks of data. Predictions Australian Stock Exchange are generated using lagged were formed for the following 20 through to 50 weeks. values of the index. Three approaches are compared: Researchers distinguish short- and long-term prediction. exponential moving average, linear regression and neural Short term is taken to mean one step ahead. For the network. The first predicts 'tomorrow' best when little Hang-Sen analysis one step is a week. (In one account is taken of previous index values other than the exchange-rate study [8] the short-term is one hour.) The value 'yesterday'. Sliding windows are used to do one- approach below is short-term technical. That is, next- day ahead prediction for the other approaches. In two day predictions of the All Ords closing values are periods when the All Ordinaries Index behaves generated using only lagged values of the index. In differently, neural networks did not perform as well as Section II the All Ords is reviewed briefly. Section III the other techniques. Performance, as measured by contains a discussion of the three approaches to mean average percentage error, depends on which day is technical analysis. Short-term prediction results for two being predicted. It appears that linear regression and periods are reported in Section IV. Further research is exponential moving averages better account for discussed in Section V. relationships between index values on successive days. 1 INTRODUCTION 2. AUSTRALIA’S ALL ORDINARIES INDEX Technical analysts aim to predict values of financial All Ords closing values are investigated for trading days instruments using only previous values or prices of the in 1986-8 and 1994-7. Over these years the All Ords instrument. Proponents of technical systems ‘know’ they was a weighted average of more than 250 share prices. work and may rely on little else. Seasoned brokers and It accounted for about 95 percent of total market fundamentals analysts look at other information, but are capitalisation of listed entities. It is a widely cited often mindful of technical forecasts. Considerable effort indicator of activity on the ASX. The Sydney Futures is expended to predict changes in the All Ordinaries Exchange hosts very active trading in contracts based on Index (All Ords) of the Australian Stock Exchange the All Ords. (ASX). Three approaches are investigated in this paper. Generally ASX activity was buoyant over 1994-7. The They are exponential moving averages, linear regression All Ordinaries grew from 1, 846 to 2, 881. There was and neural networks. increased activity in initial public offerings. Many Neural networks can capture nonlinear relationships in fundamentals were favourable. Among them were five financial markets. In [9] lagged values of the Kuala reductions in the official cash rate and the election of a Lumpar Stock Exchange Index are augmented with Commonwealth government committed to budgetary values derived in technical analysis to identify important restraint. The period 1986-8 encompassed a severe features. The researchers trained on data for 1984 to downturn. In many weeks during 1986-8 the All Ords 1988 and forecast after 1991 having validated using closing value fell each day from Monday to Friday. intervening values. Predictions are nearly three years Over October 1987 the All Ords lost approximately 40 beyond the training period. In [5] neural-network percent of its value. forecasting of the Hang Sen index is augmented with autocorrelation analysis to decide the number of lagged 1 The authors would like to thank Mr. Charles Arcovitch of Unica Technologies, Inc. for providing Pattern Recognition Workbench (PRW). 3. THE PREDICTION TECHNIQUES nodes in the hidden layer should be smaller than the 3.1 Exponential moving average (EMA) square root of the number of training patterns. The This is the convex combination of actual closing value Baum-Haussler rule suggests that the hidden layer in and the prediction for today. That is, our configuration could have as few as two nodes [5]. Et X t 1 (1 ) Et 1 0 1 , One, two, three, 6, 7 and 8 were tried for 1986-8 and 1994-7. where E t is tomorrow’s estimated value, X t 1 is actual Training sets of size s = 5, 10, 20, 65, 130 and 260 were closing value today and E t 1 is the prediction for today. tried. When selecting training sets the final 450 E 0 was taken to be X0. In this approach the prediction observations in each period were reserved as test cases. for tomorrow (time t) lies in the interval between actual To do this initial training sets of size s were drawn as and predicted values today (time t – 1). This technique the s observations immediately before the first of the will therefore be in error whenever tomorrow’s index is 450 test items. Thus for all training sizes, the rth outside this interval. If = 1 then no account is taken of prediction concerns the same day among the 450, for r = earlier predictions. Or to put this another way, the index 1, 2, …, 450. tomorrow is best predicted using today’s value. When 3.3 Linear regression = 0, actual value is of no importance. For each day the PRW also gives a linear forecasting process. To predict prediction is unchanged. EMA involves all of the tomorrow’s value, E t , the pairs available data when 0 1 . However as t advances, ( X t s 2 , X t s 1 ), ( X t s 3 , X t s 2 ), ( X t s 4 , X t s 3 ), data from the ‘distant’ past have diminishing effect. In the other techniques a fixed number of earlier values of ( X t 3 , X t 2 ), ( X t 2 , X t 1 ) Xt are used. ˆ are first used to estimate the coefficients a and b in ˆ 3.2 Neural network ˆ ˆ X j a bX j 1 . Then set X j 1 X t 1 (closing value Backpropagation neural networks are capable of producing smooth nonlinear mappings between input ˆ ˆ today) and compute Et a bX t 1 . An obvious and output variables. A simple neural network consists misspecification is that the actual model is of three layers: input, output and hidden. Each layer contains one or more nodes. The numbers of nodes in the X t a b1 X t 1 b2 X t 2 br X t m ut . input and output layers are the same as the numbers of This was tried with our data for m =2, 3, …, 10. input and output variables in the problem under However, predictions one-day ahead were not consideration. improved. Our finding for EMA in Section IV supports A sliding window [4] is used to do one-step ahead this dependence on only one lagged value. prediction. Data X 0 , X 1 , X 2 , are treated as a 3.4 Comparing predictions continuous stream, with training size, test size and slide Prediction accuracy is measured with a single indicator, size pre-defined. A window consists of a training set and the mean absolute percentage error [6], a test set is obtained by sliding across the stream of data n X t Et 1 by the slide size. This process is repeated until the data is MAPE 100 , n t 1 Xt exhausted. We are interested in predicting the index value ‘tomorrow’ given the value ‘today’. In this case where as before Xt is the true value and Et is the the test- and slide-sizes are one. On day t – 1 (call it predicted value at time t. Changing patterns in the data ‘today’) s pairs are assessed with the mean absolute difference between actual closing values on successive days, ( X t s 2 , X t s 1 ), ( X t s 3 , X t s 2 ), ( X t s 4 , X t s 3 ), n X ( X t 3 , X t 2 ), ( X t 2 , X t 1 ) 1 MAD t X t 1 . of the form (value one day, next day’s value) are used to n t 1 train a backpropagation neural network in Pattern MAPE and MAD are averages over n test sets. Recognition Workbench (PRW) [4]. The network Specification of n requires care. For example we report consists of one input layer with one node for closing the mean of the absolute percentage errors in predicting value ‘today’, Xt-1; a single hidden layer; and one output closing value each Monday. In the 450 items reserved as node for closing value ‘tomorrow’, Xt. test data there are 90 Monday closing values. Hence n = For our data the PRW software suggests that 6 nodes be 90. There are 90 Tuesday closing values also. So, like used. The PRW documentation notes that the number of MAPE, MAD is a mean over 90 items. 4. RESULTS performed best when tomorrow’s expected value is Predictions of the All Ords with the three techniques are derived nearly exclusively from today’s value. On each compared. Data for 1994-7 are examined first (Table 1 day the MAPE is greatest for the neural net and (except and Figure 1). Only the smallest MAPEs are given. For for Monday) least for regression. Each approach the regression the smallest error arose when s = 260. returned the smallest MAPE when predicting closing Minimum errors for the neural network consistently value on Thursdays. This is apparent in Figure 1. At arose when s = 130. The number of hidden nodes worst (predicting Thursday’s close) the neural-net producing the minima is shown in brackets in Table 1. MAPE is greater than the regression error by 11.5 The learning rate and momentum were set to 0.1. percent. At best (predicting the Monday closing value) Sensitivity to reduction of these is reported below. the neural-net MAPE is greater by 5.9 percent. Other The EMA approach minimised MAPE when research finds in favour of neural nets [2; 8]. Et 0.99X t 1 0.01Et 1 X t 1 . That is, EMA MAD MAPE Neural net (m Linear regression Exp’l moving hidden nodes) (s = 260) average Successive days (s = 130) ( = 0.99) Friday & Monday 11.315 0.560 (2) 0.529 0.513 Monday & Tuesday 11.388 0.528 (3) 0.496 0.513 Tuesday & Wednesday 11.354 0.541 (8) 0.502 0.519 Wednesday & Thursday 9.967 0.466 (6) 0.418 0.457 Thursday & Friday 12.199 0.573 (7) 0.525 0.559 Table 1 MADs and MAPEs for the All Ordinaries Index, 1994-7. 0.60 0.58 0.56 0.54 0.52 0.50 0.48 0.46 0.44 0.42 0.40 Monday Tuesday Wednesday Thursday Friday NN min Linear Regression EMA Figure 1 MAPEs for the prediction of the All Ordinaries 1994-7. The EMA returns a constant error for Monday and markedly improve the neural-network predictions Tuesday, while the MAPEs for the other methods compared with linear regression and EMA. decrease over the two days. All three MAPEs decrease Are the predictive outcomes replicated for 1986-8? Now from Wednesday to Thursday and increase again on EMA predicts best. MAPEs are again largest for the Friday. What explains this uniformity among the neural network. As in the earlier table and figure, neural- percentage errors in prediction? Consider Thursday. net results in Table 2 and Figure 2 are for learning rate Recall that after training the neural net or estimating the and momentum set at 0.1. The largest MAPE in Table 1 regression, Thursday’s closing value is predicted from is 0.573. In Table 2 it is 1.407, more than twice as large. Wednesday’s value. Note in Table 1 that the MAD is The smallest MAPE in Table 2 is 0.715. This exceeds minimised for Wednesday and Thursday. Index values the largest error in Table 1. Given the rapidity and on these days are usually clustered around the median magnitudes of changes in the All Ords it is probably not value in any week [3]. That is, on these days the surprising that the errors are much greater than during maximum or minimum weekly values are relatively the more ebullient times of 1994-7. unlikely compared with other pairs of successive trading Now compare Figures 1 and 2. Over 1994-7 MAPES are: days. On Wednesday the median value for the week is (i) generally greater on Monday and Friday; and (ii) most likely, while the second-highest or second-lowest lowest on Thursday. But for 1986-8 MAPES are: (i) low weekly values are most likely on Thursday. With six on Monday, Wednesday and Friday; and (ii) highest on hidden nodes the neural net predicted about as well as Tuesday. Prediction success on Thursday for 1994-7 EMAs on Thursday. appears to be associated with features of the closing- Now consider Friday. The MAPEs increase from value distributions on Wednesday and Thursday that are Thursday to Friday. MAD is greatest on these days. absent in the earlier data. Rather, in 1986-8 the pairs of While closing values on Thursday tend not to fall at the distributions on Friday and Monday, Tuesday and weekly extremes, on Friday the weekly maximum or Wednesday, and Thursday and Friday have common minimum is likely [3]. This may be one reason features. This further contrasts with 1994-7 where the prediction on Friday seems difficult. The neural network first and third pairs of distributions differ in important does best with 7 hidden nodes. respects. Greater detail for 1986-8 is available in [3]. The MAD for Friday and Monday is 7.8 per cent less than for Thursday and Friday. Yet the MAPE on 5. CONCLUSION Monday with the neural network is down by only 2.3 per Predictions one day ahead for the All Ords index of the cent. For EMA the error declined 8.2 per cent – close to ASX were obtained using a neural network, linear the reduction in the MAD. As for Friday, weekly highs regression and exponential moving averages for the and lows are more likely on Monday. When = 0.99, years 1986-8 and 1994-7. EMA effectively predicts Monday using only Fridays close. If a high (low) on Friday is generally followed by Broadly it seems the neural net responded to patterns in a high (low) on Monday EMA will capture that. the closing values, but not as sensitively as either linear Whatever the pattern, with two nodes the neural network regression or EMA. Nevertheless, the neural-network does not predict as well. (Nor does it do better on percentage errors approximately tracked the profiles of Monday with more hidden nodes and/or reductions in the errors for the other techniques. This was learning rate and momentum.) accomplished by varying the number of nodes in a single hidden layer. The number of hidden nodes in Between Monday and Tuesday the MAD increases 1986-8 was different on each day. For 1994-7 the marginally. On Tuesday the commonest closing value is number of hidden nodes coincided on only Monday and the second lowest for the week, followed by the Thursday. Further, the number of hidden nodes for any minimum [3]. Regression and the neural network have day in 1986-8 differs from the number required on the improved prediction success compared with Monday. same weekday in 1994-7. The networks were also From generally low weekly values on Tuesday, finetuned with learning rate and momentum values Wednesday is characterised by values most commonly ranging from 0.1 down to 0.01. In addition inputs were at the weekly median. Further, values either side of the pre-processed [1]. For none of these did the neural net median are less common, but about equally likely [3]. outperform EMA in 1986-8 or linear regression in 1994- The MAPEs increased from Tuesday errors, even though 7 on the MAPE criterion. there is little change (and that a reduction) in the MAD. In 1986-8 the ranked MAPEs for regression and EMA Reductions in learning rate and momentum did not are highly correlated with the ranked MADs. Thus, as MAD increases there is always an increase in the MAPEs. For the neural net the rank-order correlation is series we forecast. Some researchers derive additional 0.8. In 1994-7 the corresponding correlations are time series from the index and use them as inputs also positive, but smaller. Moreover, the largest rank-order [9]. Others, including financial economists and many correlation is between MAD and the MAPE for the stock-market agents, use evidence from economic and neural net. Thus what drives the MAPEs is not company accounts, other financial markets and social, completely captured by the MADs. Therefore a direction political and international news. These are avenues we for further inquiry is how robust our results are when the will explore. There are precedents in neural science [8]. indicator of prediction success is changed from MAPE At another level, suppose that the data for the period and the indicator of difference on successive days is no 1986 to 1997 were amalgamated. Our results suggest longer MAD. Another strategy is to employ multiple that short-term prediction should be handled differently indicators of predictive success and of data patterns on in two sub-periods. This might be associated in part with successive days. different expectations and psychological outlooks in the In the tradition of technical analysis, we have used as time intervals. Further research is required to investigate inputs for each technique only lagged values of the MAD MAPE Neural net (m Linear regression Exp’l moving hidden nodes) (s = 260) average Successive days (s = 130) ( = 0.99) Friday & Monday 11.638 1.176 (7) 0.921 0.794 Monday & Tuesday 14.514 1.407 (6) 1.176 0.986 Tuesday & Wednesday 11.065 1.051 (3) 0.822 0.755 Wednesday & Thursday 12.456 1.147 (7) 0.937 0.866 Thursday & Friday 10.453 1.057 (2) 0.786 0.715 Table 2 MADs and MAPEs for the All Ordinaries, 1986-8. 1.50 1.40 1.30 1.20 1.10 1.00 0.90 0.80 0.70 Monday Tuesday Wednesday Thursday Friday NN min Linear Regression EMA Figure 2 MAPEs for the All Ordinaries Index, 1986-8. whether in longer time series these ‘animal spirits’ might improve forecasting accuracy. To an extent the judicious use of technical series derived from the original data may serve this purpose. Alternately, additional inputs might be sought from among the many surveys of business and investor confidence. 6. REFERENCES [1] E. Azoff, Neural Network Time Series Forecasting of Financial Markets, Chichester: John Wiley, 1994. [2] S. Avouyi-Dovi and R. Caulet, "Using artificial neural networks to forecast prices of financial assets", in A. Refenes, Y. Abu-Mostafa, J. Moody and A. Weigend, Neural Networks in Financial Engineering, Proceedings of the Third International Conference on Neural Networks in the Capital Markets, Singapore: World Scientific, 1996. [3] M. Barnes, R. Rimmer and K. Ting, "Data patterns and estimating the All Ordinaries Index", Melbourne: Deakin University, 1999. [4] R. Kennedy, Y. Lee, B. Roy, C. Reed and R. Lippman, Solving Data Mining Problems through Pattern Recognition, Upper Saddle River: Prentice Hall, 1998. [5] F. Lin, H. Xing, S. Gregor and R. Irons, "Time series forecasting with neural networks", Complexity International, Vol. 2, 1995, pp. 1-12. [6] S. Makridakis, S. Wheelwright and V. McGee, Forecasting: methods and applications, Chichester: John Wiley, 1983. [7] T. Mills, The econometric modelling of financial time series, Cambridge: Cambridge University Press, 1993. [8] A. Refenes and M. Azema-Barac, "Neural network applications in financial asset management", Neural Computation and Applications, Vol. 2, 1994, pp. 13-39. [9] J. Yao, and H. Poh, "Equity forecasting: a case study on the KLSE index", in A. Refenes, Y. Abu- Mostafa, J. Moody and A. Weigend, Neural Networks in Financial Engineering, Proceedings of the Third International Conference on Neural Networks in the Capital Markets, Singapore: World Scientific, 1996.