2009 - Stock Price Forecasting Using Exogenous Time Series andCombined Neural Networks

Document Sample
2009 - Stock Price Forecasting Using Exogenous Time Series andCombined Neural Networks Powered By Docstoc
					          Stock Price Forecasting Using Exogenous Time Series and
                         Combined Neural Networks
                              Manoel C. Amorim Neto, Victor M. O. Alves, Gustavo Tavares,
                             Lenildo Arag˜ o Junior, George D. C. Cavalcanti and Tsang Ing Ren

   Abstract— Time series forecasting is useful in many re-                   autoregressive conditional heteroscedasticity (GARCH) [1]
searches areas. The use of models that provide a reliable                    among other models.
prediction in financial time series may to bring valuable profits                 Artificial neural networks (ANN) for time series prediction
for the investors. An intelligent agent can be built from a
suitable prediction model, to make operations in stock market                have been successfully used in the last years, because of
daily. Furthermore, even that the investor had caution about                 some interesting features such as universality in function
the use of an automatic agent to make operations he can to                   approximations, robustness and fault tolerance [4]. For these
use the prediction model as a valuable decision support. A                   reasons, neural networks are considered useful to build
methodology based on information obtained from exogenous                     models for prediction of non-stationary time series [4].
series was used in combination with a neural network to predict
stock series. Exogenous series were selected by analyzing the                Furthermore, ANN handles well noise data and it is able
correlation between the series with the stocks series used. In               to predict nonlinear systems, which are the type of systems
this way, the prediction was obtained by not just using the                  that we are interested to predict, the stock market. Among the
previous values of the series but also by using information                  various ANN models, the most used in literature is multilayer
external to the main series. Additionally, the best trained neural           perceptron (MLP) [5]. Radial basis function (RBF), wavelet-
networks were used in a combination to improve the prediction
capacity of single networks. To evaluate the proposed models                 based and recurrent neural networks have been also applied
for prediction, some known metrics were used plus a proposed                 with success [6].
one - Prediction in Direction and Accuracy (PDA), which uses                    Stock Market is a complex system composed of many
some features to determine if a model has a great accuracy and               investors selling and buying financial products in form of
trend in prediction. Through this novel metric, we have used                 securities. Here, we are interested in the prediction of stocks
an evolutionary algorithm to choose the best trained models
in order to obtain better results. Experiments with two of the               of the biggest Brazilian oil Company, Petrobras, and one
most important Brazilian companies’ stock quotes have shown                  of the biggest miners companies of the world, Vale do Rio
the usefulness of the proposed prediction system to generate                 Doce. The Petrobras stock index is named PETR4 and the
profits in investments.                                                       Vale do Rio Doce is named VALE5. These time series were
                         I. I NTRODUCTION                                    analyzed between the years of 2003 and 2009.
                                                                                In this paper, a comparison between two models of ANN,
   Time series are sets of variables observed over a defined                  named MLP and RBF networks, both with and without ex-
period of time. These observations may be discrete or con-                   ogenous time series are presented. Additionally, we propose
tinuous and they are taken in an equal time interval [1].                    a novel performance metric to select the best trained models,
   There are many research areas involving time series anal-                 which aims to maximize trend prediction and accuracy. The
ysis, such economy, physics, engineering, social sciences,                   propose metric was used for selection of the best trained
computing, biology, medicine, meteorology and others.                        networks to be combined in a combination machine.
   Perhaps the most applied analysis of a time series is in pre-                This paper is organized as follows. Section II describes
diction. The prediction can be made using past observations                  briefly the stock market and the exogenous time series used.
of the series that will be forecast or even other time series.               Section III presents the performance metrics which were
These different ones used to predict the main are know as                    used and the novel introduced metric. Section IV presents
Exogenous Time Series.                                                       the proposed methods for combining neural networks in a
   There are two types of models in time series predic-                      combination machine. Section V describes the experiments
tion: linear and non-linear. A known linear method is the                    and results obtained. Finally, the Section VI presents the
ARIMA, proposed by Bob and Jenkins [2]. Some examples                        conclusions and final remarks.
of non-linear models are: bilinear, exponential autoregressive,
threshold autoregressive, smooth transition autoregressive,                  II. T HE S TOCK M ARKET AND E XOGENOUS T IME S ERIES
autoregressive with time dependent coefficients [3], autore-                     The main function of the capital market is the trade of
gressive conditional heteroscedasticity (ARCH) and general                   stocks with the purpose of finance development, which in its
   Manoel C. Amorim Neto, Gustavo Tavares, Victor M.                         turn produce and nourish the market itself. On this way, a
O. Alves are with the Facilit Technology Company, Brazil,                    third function is attributed: the market of its own sources of
{manoel,gustavotavares,victor}@facilit.com.br. Site: www.aistocktrend.com    incomes [7]. The monetary market, as a whole, is important
   George D. C. Cavalcanti and Tsang Ing Ren are with the Center of Infor-
matics, Federal University of Pernambuco, Brazil, {gdcc,tir}@cin.ufpe.br.    for the economic development. However, when the economy
Site: www.cin.ufpe.br/∼viisar                                                and the market develops, the market of the source of capital
emerges, which are the stock market, debt titles and real               III. P ERFORMANCE M EASUREMENT OF P REDICTION
estate market.                                                                                  M ODELS
   Globalization is a trend that allows an intense interchange          There are several metrics used to evaluate models of
between countries. Consequently, it is common nowadays               time series forecasting. In this paper we have employed five
that the stock market of an emergent country like Brazil             metrics that are commonly used in literature: MSE, MAPE,
attain an increasing importance in the international scenario.       POCID, THEIL (or NMSE) and ARV. Additionally, it was
Today the stock market is not only an important source               used SLG, which was proposed by Amorim Neto [8], and
of corporation finance but also an individual capitalization          a novel metric proposed in this work, named Prediction in
resource. When investing in a portfolio, the investor wishes         Direction and Accuracy (PDA).
to obtain a large return in other to compensate the risks               A simple measure to evaluate the accuracy of a forecasting
associated, in other words, the objective is to minimize risk        model is the diference between the expected value and the
and maximize capital returns. Hence, a prediction method is          output value of model. From Equation 1, Tt is a expected
most useful and a neural network is well-suited for this kind        value and Yt is the output of the forecast model, and et is
of optimization procedure.                                           the calculed error, both at time t. Consider this measure as
   Currently, the Brazilian stock market, which is also known        a basis for the others.
in the World Federation of Exchange (WFE) by S˜ o Paulo
SE, has a global importance. From the 51 stocks monitored                                   et = |Tt − Yt |                    (1)
by WEF, BOVESPA was in eighth position among the
biggest stock market in the world in terms of capitalization            The performance measurement metrics used in this work
and stock values, in a ranking for developing countries. Two         are briefly described here. Consider for every metric: Tt as
of the biggest companies in the BOVESPA stock market are             the desired output of the forecasting model at time t and
the Petrobras oil company and Vale do Rio Doce, which                Yt as the output of the proposed model and N as the total
makes them ideal stocks to be analyzed. For the professional         amount of available patterns.
investor to understand the behavior of a stock, at least five         A. MSE (Mean Squared Error)
series are necessary:
                                                                        The Mean Squared Error is the most known metric to
  1) The highest value that the stock was negotiated in a            evaluate the performance of forecasting models. It is defined
     certain day.                                                    as:
  2) The lowest value that the stock was negotiated during
     the same day.                                                                                      N
  3) The value of the first negotiation of the day: opening                               M SE =               (et )2           (2)
     price.                                                                                       N     t=1
  4) The value of the last negotiation of the day: closing           B. MAPE (Mean Absolute Percent Error)
     price.                                                            The Mean Absolute Percent Error measure the accuracy
  5) The business volume of the stock during the same day.           of model in percentage. It is defined as:
   The closing prize is the series that is really important, since
most of the professional investors and financial institutions                                       1
take action based on its value.                                                         M AP E =                               (3)
                                                                                                   N     t=1
   From the methods for forecasting time series, the choice
of the input variables is an important step. In this work,             A lower value of MAPE is the desired result from a
we are interested in the prediction of the stocks quotations         prediction method.
of PETR4 and VALE5. To predict these stock values, we                C. THEIL or NMSE (Normalized Mean Squared Error)
have used exogenous time series that were chosen based
                                                                       The Normalized Mean Squared Error evaluate the relation-
on the autocorrelation analyzes, similarly to work done
                                                                     ship of the model with the random walk model. Equation 4
previously [8].
                                                                     defines this value.
   For the Petrobras Company (PETR4) the exogenous time
series utilized were: Dollar, IBOV, CLF, NSY:PBR, DAX                                                   N
and SP500.                                                                                              t=1    (et )2
                                                                                   T HEIL =       N
   Dollar time series is the Brazilian Real quotation converted                                   t=1   (Yt − Yt−1 )2
to United States Dollar. IBOV is the BOVESPA quotation.                When THEIL is equal to one, the proposed model is
CLF is the Crude Light Oil Future quotation. NSY:PBR is              equivalent to random walk model. The random walk model
the quotation of Brazilian Oil. DAX is the German stock              proposes that the time series future value is equal to the
market index. SP500 is the S&P 500 index.                            current value. If THEIL is lower than one, then the proposed
   For the Vale do Rio Doce Company (VALE5) the exoge-               model is better than random walk model. If THEIL is greater
nous time series used were: Dollar and IBOV. This stocks             than one, then the proposed model has a performance worse
were chosen based on economic analyzes [8].                          than random walk model.
D. POCID (Prediction On Change In Direction)
   POCID is the percentage of the correct trend of the model                                             N
relative to the trend of expected value. This metric is defined                               P DA =                               (10)
by Equation 5.
                                                                        where Gt is defined in Equation 11 :
                                          t=1   Dt
                  P OCID =100                                   (5)            
                                                                                1−
                                                                                                , if (Dt = 1) and ret < remax ,
                                          N                                              remax
                                                                                   0, if (Dt = 1) and ret ≥ remax ,
  The value of Dt is defined by Equation 6                               Gt =       −1 + reret , if (Dt = 0) and ret < remax ,
                                                                                             max
                                                                                   −1, if (Dt = 0) and ret ≥ remax

                  1, if (Tt − Tt−1 )(Yt − Yt−1 ) > 0,                    where Dt is defined by Equation 12, ret = Tt and    e
         Dt =     0, otherwise.                                 (6)                                                           t
                                                                      remax = 0.02. This constant value is the relative maximum
E. ARV (Average Relative Variance)                                    error accepted by the prediction. In this case, the maximum
                                                                      tolerance is 2% error.
   The Average Relative Variance evaluates the relationship
of the model with the other model, which proposes that the
time series future value is equal to the arithmetic mean of                               1, if (Tt − Tt−1 )(Yt − Yt−1 ) > 0,
                                                                               Dt =       0, otherwise.
the past values. It is defined as:

                                                                         If the models have a right prediction in direction (Dt =
                                 t=1    (et )2                        1) and the relative error is lower than maximum error then
                  ARV =       N
                                                                (7)          ret
                              t=1   (Yt − T )2                        1 − remax is added; if the models have a right prediction in
                                                                      direction (Dt = 1) but the relative error is greater or equal
   When ARV is equal to one, the proposed model is equiv-
                                                                      than the maximum error then nothing is added; if the model
alent to the mean of past values. If ARV is lower than one,
                                                                      has a wrong prediction in direction (Dt = 0) and the relative
then the proposed model is better than the mean of past                                                                ret
                                                                      error is lower than maximum error then −1+ remax is added;
values. If the ARV is greater than one, the proposed model
                                                                      if the model have a wrong prediction (Dt = 0) in direction
has a performance worse than mean of past values.
                                                                      and the relative error is greater or equal than maximum error
F. SLG (Sum of Losses and Gains)                                      then −1 is added. After the summation, the mean is calculed.
   The SLG was proposed by Amorim Neto [8] and was                             IV. N EURAL N ETWORKS C OMBINATION
inspired by POCID. It defined as the mean of the losses and
gains of the model. The SLG measurement is defined by                     Neural Network is a stochastic mathematical model that
Equation 8:                                                           aims to simulate the functionality of a biological network.
                                                                      A Neural Network is formed by a set of connected neurons
                                    N                                 organized in layers. Each neuron can be considered a com-
                                    t=1   Lt
                      SLG =                                     (8)   putational processing unit. There are several kinds of Neural
                                    N                                 Networks [4], and Multi-layer Perceptron (MLP) and Radial
  In Equation 8, the value of Lt is defined by Equation 9              Basis Function (RBF) were used in this paper.
                                                                         The training of MLPs using exogenous time series im-
           + |(Tt − Tt−1 )| , if (Tt − Tt−1 )(Yt − Yt−1 ) > 0         proves the ability of the model in forecasting, as it has
  Lt =     − |(Tt − Tt−1 )| , otherwise.                              been was demonstrated by Amorim Neto [8]. Additionally,
                                                                (9)   a combination of MLPs trained with exogenous time series
                                                                      improves the single MLP performance [8].
  SLG less than zero indicates financial losses.                          A combination of neural networks is an architecture which
G. Prediction in Direction and Accuracy (PDA)                         uses a set of trained models and combines the outputs of
                                                                      these models, in the same input, in a unique system. The
   PDA is the novel metric proposed in this paper. The                combination architecture used here is depicted in Figure 1.
objective is to benefit models of forecasting with the better             This paper presents two ways to choose the neural net-
behavior in trend and accuracy. This is possible by the               works which will integrate the combination: (i) the selection
maximization of POCID and the minimization relative error.            of the best networks through PDA, and (ii) selection through
The accuracy of a model is measured by maximum relative               an evolutionary algorithm. The details are in the Section V.
error. The best behavior in trend and the most accurate
model will have a higher value of PDA. In other words,                                      V. E XPERIMENTS
the higher the value of PDA implies in a better model.
It is a improvement of the SLG metric. This model is                    This Section describes all experiments performed to eval-
mathematically described in Equations 10 and 11.                      uate the prediction metric and methods describe above.
                                                                                               TABLE II
                                                                     D ISTRIBUTION OF STOCK QUOTES PER PATTERN IN THE DATABASES
                                                                                          WITH EXOGENOUS .

                                                                                PETR4 database            VALE5 database
                                                                                 Stock        Lag          Stock       Lag
                                                                              PETR4 close        -1    VALE5 close         -1
                                                                              PETR4 close        -2    VALE5 close         -2
                                                                              PETR4 close        -3    VALE5 close         -3
                                                                              PETR4 open         -1    VALE5 open          -1
                                                                             PETR4 highest       -1   VALE5 highest        -1
                                                                             PETR4 lowest        -1   VALE5 lowest         -1
                                                                              Dollar close       -1    Dollar close        -1
                                                                               DAX close         -1     IBOV close         -1
         Fig. 1.    Combination architecture used in this work.               IBOV close         -1
                                                                               CLF close         -1
                           TABLE I
                                                                              SP500 close        -1
                                                                             NSY:PBR close       -1
                      WITHOUT EXOGENOUS .

               PETR4 database           VALE5 database
                   Stock       Lag        Stock        Lag
                                                                      First, two experiments using the main time series, PETR4
             PETR4 close        -1    VALE5 close       -1
                                                                   without exogenous, were done. One was using MLP and
             PETR4 close        -2    VALE5 close       -2
             PETR4 close        -3    VALE5 close       -3
                                                                   another using RBF. The training of the MLP was made with
                                                                   a variation of hidden neurons in [20 . . . 60], ten times for
                                                                   each number, generating 410 trained MLPs. The training of
                                                                   RBF was made with a variation in [10 . . . 100], generating
A. Databases                                                       910 trained RBFs. We have used the validation dataset to
   Two databases were used for the evaluation of the pro-          choose the best network configuration, and then use it in a
posed methods: PETR4 stock quotes dataset and VALE5                test set.
stock quotes dataset. Besides, all exogenous time series              Afterwards, two more experiments were done, but now
described in Section III were used to complement the main          including the exogenous time series. Again, 410 trained
series.                                                            MLPs and 910 trained RBFs were obtained.
   The experiments were performed using two groups of                 In the end, following Equation 13, where X is the total
datasets: dataset without exogenous and dataset with ex-           number of trained networks, the N bet networks (trained
ogenous time series. Both PETR4 and VALE5 had the two              with exogenous) were chosen for combination (according to
dataset groups. Table I shows the stock quote distribution for     validation dataset).
each database without the exogenous time series and Table II
shows the distribution with exogenous time series. The ”lag”                             N = round(log2 (X))                    (13)
notation is equivalent to the time t of the series, i.e., lag
−1 corresponds to the stock quote on previous day; lag −2             We have seven metrics to choose the bests networks for
corresponds to the stock quote on two days past; and lag 0         combining. There are a lot of combination possibilities, and
corresponds to the stock quote from the current day.               we have used two approaches: (i) the combination of the
                                                                   N best networks based on P DA metric and (ii) a genetic
B. Experimental Setup                                              algorithm (GA) for this task, where the variable to be
   Before the experiments, the databases were organized in         maximized was P DA.
three datasets: training, validation and test sets. The training      Genetic Algorithm is a evolutionary technique which aims
set was used for the learning of the neural networks. The val-     to get optimization by evolution, through some operators,
idation set was used for tuning of some training parameters.       such mutation and crossover. In this method, each possible
The training was performed varying other parameters such           solution to the problem, that must be optimized, is repre-
as: the number of hidden neurons for MLP and the spread            sented by a chromosome.
of the RBF, resulting in a large number of experiments and            In general, these results show that the RBF has a better
neural networks trained. From 1, 500 days of stock quotation       performance than MLP according to the proposed metric.
in the database, 1, 200 were used for training and validation         In the experiments with combination in PETR4 database,
and the last 300 days for testing. For each experiment, the        the genetic algorithm found the following metrics combina-
training and validation sets were divided randomly, with 900       tion: POCID + THEIL + ARV + PDA for MLP and MSE
for training and 300 for validation. However, the test dataset     + MAPE + THEIL + ARV + PDA for RBF. For VALE5
remained the same. The experiments were done as follows.           database, the GA found these metrics combination: POCID
                                                                                            TABLE IV
+ PDA for MLP and MSE + MAPE + THEIL + ARV + PDA
                                                                   B EST RESULTS WITH COMBINATION BY THE BEST NETWORKS RANKED
for RBF. Table III shows the combination results generated
                                                                                          BY   PDA METRIC (X(σ)).
by GA.
   In both databases, the combination of MLP was better than                                    PETR4 database
RBF, according to PDA metric. In fact, for both databases,          Metric                 MLP                              RBF
the MLP combination presented the best combination of                MSE             0.6056 (0.01061)               0.61335(0.010417)
accuracy and trend prediction than RBF combination.                 MAPE          0.015732 (0.00015777)            0.016285(0.000224)
   The results of the combination of best PDA networks can          POCID            81.5436 (0.47457)               79.4295(0.65319)
be seen in Table IV. As in GA combination, this results              SLG            0.72031 (0.021801)              0.69428(0.017808)
shows that MLP is the best choice. For both databases,              THEIL          0.26983 (0.0035511)             0.29633(0.0038447)
the MLP combination presented the biggest PDA value,                 ARV         0.0020787 (3.5254e-005)        0.002107(3.6346e − 005)
indicating that this combination had better accuracy and             PDA            0.27865 (0.004667)               0.2365(0.011121)
trend prediction than RBF combination. Even when RBF                                            VALE5 database
outperforms MLP in some metric, the performance of both             Metric                 MLP                              RBF
are very close considering this metric.                              MSE           0.72078(0.024093)                0.59518 (0.013024)
                                                                    MAPE         0.015687(0.00027368)             0.015478 (0.00025494)
                         TABLE III
                                                                    POCID            82.6756 (0.41113)               80.9365(0.49857)
                                                                     SLG            0.83816 (0.010863)              0.80828(0.010803)
                         PETR4 database                             THEIL          0.33828(0.011696)               0.26519 (0.0069003)
 Metric             MLP                           RBF                ARV       0.0017837(6.0259e − 005)          0.0014727 (3.2269e-005)
  MSE         0.61812 (0.0063709)         0.62212(0.014382)          PDA           0.26401 (0.0080416)             0.22843(0.0074966)
 MAPE       0.015824 (7.8961e-005)      0.016028(0.00015723)
 POCID         81.4094(0.17329)            81.5436 (0.27399)
  SLG         0.70212(0.010981)           0.70862 (0.003683)
                                                                  be more suitable because genetic algorithm have a high
 THEIL        0.27291 (0.0033154)        0.28875(0.0052983)
                                                                  computational cost.
  ARV       0.0021179 (2.153e-005)    0.0021358(4.9668e − 005)
  PDA         0.28546 (0.0054852)        0.26451(0.0044274)
                                                                     Also, the proposed metric is a natural evolution of SLG
                                                                  that aims to improve the rank of network based on accuracy
                         VALE5 database
                                                                  as well, instead of trend exclusively. The results obtained
 Metric             MLP                           RBF
                                                                  showed some relation between this metric and other accu-
  MSE         0.71341(0.021186)             0.6008 (0.017369)     racy/trend metrics. In other words, when PDA increases, the
 MAPE       0.015662(0.00017252)          0.015511 (0.00027798)   accuracy/trend has also an improvement.
 POCID         83.1104 (0.47951)            81.0033(0.56406)
  SLG          0.83871 (0.012075)          0.80091(0.012646)                                   R EFERENCES
 THEIL       0.33196(0.0093126)            0.26268 (0.0055378)    [1] Brockwell, P. J. and Davis, R. A. Introduction to Time Series and
  ARV     0.0017654(5.3089e − 005)        0.0014873 (4.29e-005)       Forecasting. New York, USA : Springer Verlag, 1996.
  PDA         0.27251 (0.0097393)          0.23946(0.012403)      [2] Chu, Ching W., Ching Z. and Guoqiang P. A comparative study of
                                                                      linear and nonlinear models for aggregate retail sales forecasting.
                                                                      International Journal of Production Economics, pp. 217-23, 2003.
                                                                  [3] De Gooijer, Jan G., Jan K. and Kuldeep. Some recent developments in
                                                                      non-linear times series modeling, testing and forecasting. Prentice Hall,
                     VI. C ONCLUSION                                  1998.
                                                                  [4] Haykin S. Neural Networks: a Comprehensive Foundation. Second
   This paper presented a comparison between MLP and                  Edition. International Journal of Forecasting, vol. 8, pp. 135-156, 1992.
                                                                  [5] Charkha, Pritam R. Stock Prediction and Trend Prediction using Neural
RBF neural networks using PETR4 and VALE5 time series                 Network. First International Conference on Emerging Trends in Engi-
with and without exogenous data. It also introduced a new             neering and Technology, pp. 592-594, 2008.
performance metric for selection of trained networks to           [6] Ferreira T. A. E., Vasconcelos G. C. and Adeodato P. J. L. A New Intel-
                                                                      ligent System Methodology for Time Series Forecasting with Artificial
combine in combination machines. Experiments were made                Neural Networks. Neural Process Letters, vol. 28, pp. 113-129, 2008.
to verify the usefulness of the proposed metric and the           [7] Schumpeter J. A. The Theory of Economic Development: An Inquiry
proposed combinations.                                                into Profits, Capital, Credit, Interest, and the Business Cycle. Transac-
                                                                      tion Publishers, 1982.
   The experiments showed that: (i) without combination,          [8] Amorim Neto M. C., Calvalcanti G. D. C., Ren T. I. Financial time
RBF outperforms MLP in general; (ii) with combination,                series prediction using exogenous series and combined neural networks.
MLP makes an improvement in performance and overcomes                 International Joint Conference on Neural Networks, pp. 2578-2585,
RBF; (iii) the proposed novel metric is useful for network
selection based on the main metrics for financial investments,
especially it is suitable for minimization/maximization algo-
rithms, as used in a genetic algorithm.
   The two proposed combination methods had similar gains
in prediction. However, the selection by the best PDA can