Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out
Get this document free

James_Krieger

VIEWS: 4 PAGES: 36

									      “Vector Quantization and entropy technique to predict high frequency data”




                         Independent study

 “Vector Quantization and entropy technique
       to predict high frequency data”
     Sponsor: Professor Campbell R. Harvey




                            Subject: high frequency data
                                      Credits: 1




                                                    James Krieger




Author : James Krieger             26.05.2011                         Page 1 sur 36
          “Vector Quantization and entropy technique to predict high frequency data”



1     Introduction ................................................................................................................. 3
2     Theory of information. ................................................................................................ 3
   2.1      Model a none discrete data stream ...................................................................... 3
3     Use of history data to predict future............................................................................ 4
   3.1      Fix path length with maximum entropy. ............................................................. 4
   3.2      Multi path length or PPM. .................................................................................. 4
   3.3      Quantization ........................................................................................................ 6
      3.3.1      VQ size........................................................................................................ 6
      3.3.2      Period use for the return.............................................................................. 6
      3.3.3      Implementation algorithm........................................................................... 6
4     Data used..................................................................................................................... 7
5     Redundancy................................................................................................................. 7
   5.1      Path length......................................................................................................... 10
   5.2      VQ size.............................................................................................................. 10
   5.3      Return period..................................................................................................... 10
6     Forecast ..................................................................................................................... 11
   6.1      Algorithm .......................................................................................................... 11
      6.1.1      Multi path length algorithm ...................................................................... 11
      6.1.2      Fix path length algorithm .......................................................................... 11
      6.1.3      Return Period ............................................................................................ 12
   6.2      Result ................................................................................................................ 13
      6.2.1      Daily data .................................................................................................. 13
      6.2.2      5 minutes data ........................................................................................... 17
7     Additional study ........................................................................................................ 19
8     Conclusion ................................................................................................................ 19
9     References ................................................................................................................. 20
10       Appendix ............................................................................................................... 21
   10.1 LBG design ....................................................................................................... 21
   10.2 Update the LBG Algorithm into a geometric world ......................................... 27
   10.3 Redundancy result............................................................................................. 28
   10.4 Return result ...................................................................................................... 31




Author : James Krieger                                  26.05.2011                                           Page 2 sur 36
       “Vector Quantization and entropy technique to predict high frequency data”



1 Introduction
Entropy is a concept highly used in data compression.
As suggest in a draft paper from Professor Campbell R. Harvey, this technique sounds
reasonable to predict future return on a high frequency data. Why? The basic assumption
is that if we have some redundancy information on historical data, in other word if we
could compress this historical data, it means that some repeatability should exist. This
repeatability could be use to predict future return.

In this memo, I will try to implement and validate this assumption base on a basic
implementation of this technique on FX, which are know to pass most autocorrelation test.

2 Theory of information.
The basic theory of information is simple but it generalizes implementation is complex. If
we considering the compression of a text book:



             while paying special attention to the case where      is English text.

   Let       represents the first   symbol. The entropy rate in the general case is given by:




         Where the sum is over all       possible values of   . It is virtually impossible to
         calculate the entropy rate according to the above equation. Using a prediction
         method, Shannon has been able to estimate that the entropy rate of the 27- letter
         English text is 2.3 bits/character.

So the goal of an implementation is to have a simplify model of lower order which will
capture most of the entropy. For example, with a 3 rd order model, the entropy rate of the
27-letter English text is 2.77 bits/character. This is already substantially lower than the
entropy of zero order which is 4.75 bits/character.
So a 3rd order redundancy on an English text is already 42%. It means we are able to
guess pretty well what will be the letter following two known letters.

2.1 Model a none discrete data stream
To compress the stream of data which are not discrete, the typical approaches are:
      - Quantization (as VQ) and after use of lossless compression technique.
      - Transformation technique (Fourier analysis) and after use of lossless
          compression technique.



Author : James Krieger                  26.05.2011                           Page 3 sur 36
       “Vector Quantization and entropy technique to predict high frequency data”


In this paper we will use the first approach. (VQ)

3 Use of history data to predict future
We will use the frequency of historical path to determine future return. Here a tree to
illustrate the process:

                   Period 1                Period 2             Period 3         frequency
                                                             return x''
                                                                                      0.01



                                      return x'              return y''
                                                        1                              0.1

                                                             .
                                      .                      .
                                      .                      return z''
               return x               .                                              0.002
                                 1    .
                                      .
                                      .
                                                             return x''
                                                                                    0.0001
               .                      return z'
               .                                        1    .
               .                                             .
               .                                             return z''
               .                                                                       0.2
               .


               return z
                                                                                       0.1

Base on this tree, if we have a return x follow-up with a return x’ (called prefix x,x’), we
have 0.01 probability of having a return of x’’, 0.1 probability of having a return of y’’
and so and so. Base on this probability we will forecast future return.


3.1 Fix path length with maximum entropy.
To determine the best length of path to use, we will compute the redundancy (inverts of
entropy) on historical data and select the path length base on the higher redundancy.
(High redundancy should let us forecast with better success future return)

Once the path length is determine, we will us it to compute the probability of the future
return. But to improve the result, we will use the geometric mean of the returns of
historical data having this particular path instead of the probability of the future path.

3.2 Multi path length or PPM.
PPM has set the performance standard in data compression research since its introduction
in 1984. PPM’s success stems from its ad hoc probability estimator which dynamically




Author : James Krieger                    26.05.2011                       Page 4 sur 36
       “Vector Quantization and entropy technique to predict high frequency data”


blends distinct frequency distributions contained in a single model into a probability
estimate for each input symbol.

In other word, PPM is able to blend the frequency distributions of different path length
into one single probability.
At the same time, the algorithm is proposing a solution for the zero probability problem.

Let’s define:
S(n)=prefix of length n. (path of length n)
P(a¦S(n)) = probability of have “a” with a prefix S(n). (Conditional probability)
Count(X)=number of time X appear in the past.
Count(X,S(n))=Number of time X appear in the past following a prefix S(n).
W(S(n))=mixture weighting.

We can write the recursive definition to be used to compute the probability.
(Each recursion will add 1 to the length of the prefix)

P(a¦S(n)) = W(S(n))*count(a,S(n))/count(S(n)) + (1-W(S(n)))*P(a¦S(n-1))

This equation could be applying recursively to increase the order. So from a probability
coming from a path of length 1, we could estimate the blended probability of a path of
length 1 and 2.

With the above equation, we are able the blend multi length path into one single
probability. This probability will be use to compute the expected future return.

The critical part of this equation is the W(). Different solution a proposed in the literature
(know as PPMD or PPMC) and is related with the zero probability problem.

The zero probability problems could be simplified as follow:
Given a prefix, which probability should be assumed for an event that has never happen
in the past? (In compression literature, it is referred as the escape mechanisms)

For example if you have a prefix compose of the letter “P,R,O,B,A,B,L” and that with
this given prefix you have seen 3 times the letter “Y” following this given prefix and no
another letter. Should you assume that in this context “Y” as 100% probable? Or should
you assume none zero probability for another letter as “E”.

In compression, the different suggested solutions are:
W(S(n)) = count(S(n)) / (count(S(n)) + count(a)/DS

and

P(a¦S(n)) = W(S(n))*(count(a,S(n))-K)/count(S(n)) + (1-W(S(n)))*P(a¦S(n-1))

With the following value of DS and K:



Author : James Krieger                 26.05.2011                            Page 5 sur 36
       “Vector Quantization and entropy technique to predict high frequency data”


Algorithm                     DS                             K
PPM B                         1                              -1
PPM C                         1                              0
PPM D                         2                              -0.5

The parameter DS could be seen as the weight distribution between long path versus
short path. If DS equal 1, a bigger weight is put on the short path and reversely, if DS is
big, the weight is put on the longer path.

Based on experimental measure, we will determine which value of DS and K seems the
most appropriate.


3.3 Quantization
To simplify our data stream, we are good to use a VQ quantization technique. The VQ is
going to capture the return on our FX data. But we need to decide:
        - How precise the VQ need to be. (what size the VQ need to be)
        - What period we are going to use for the return.

3.3.1 VQ size
This choice is driven by the amount of data available. For example, if you are looking at
a VQ of size 4 (4 vectors) you can see that a sequence of 3 VQs generate 64 possibilities.
And if we would like to capture a path which is compose of 10 sequences of VQ, the
possibilities are around 1 millions. But if you have only thousand of sample available,
you will never be able to populate the tree enough to get statistical significant probability
on these possible paths.

3.3.2 Period use for the return
The choice of the period use to compute the return is driven by:
       - The frequency of data available. High frequency will reduce the period.
       - The noise in the data. Longer period will reduce noise and improve the quality
           of the data.
       - The volatility between the sample data. You can imagine that if the volatility
           between the samples is high, forecasting base on this sample will be very
           noisy.

Based on experimental measure, we will determine some possible period use in the
computation of the return.


3.3.3 Implementation algorithm
We will use the LBG-VQ algorithm to determine the VQ. This algorithm is base on
iteration and a description is joined in the annex I.




Author : James Krieger                 26.05.2011                           Page 6 sur 36
                “Vector Quantization and entropy technique to predict high frequency data”


To take into consideration that we are working on return and not on absolute gain, the
algorithm need to be transposed into a geometric world. A solution is described in the
annex II.


4 Data used
We will use the following data:
      -daily FX rate of the pair usd_chf and gbp_usd (from Oanda) from 1990 until
2002.
      -5 minutes data stream of eur_usd and usd_chf from January 1999 until February
2002

To be able to test the forecasting and trading strategy we are going to use an out of
sample data:
       - for the daily data, from 1997 until 2002.
       - for the 5 minutes data, from August 2001 until February 2002.




5 Redundancy
In appendix III, you will find the detail value of redundancy found.

With a VQ of size two, we have the following redundancy:

                                              Daily data VQ = 2

                7.0%

                6.0%

                5.0%
   redundancy




                4.0%                                                                  1 day return
                                                                                      2 days return
                3.0%                                                                  7 days return
                2.0%

                1.0%

                0.0%
                       0   2      4       6        8        10    12   14     16
                                              path length

It is interesting to see that the maximum redundancy is around the same path length for
each type of return. This suggests that the sample size is driving the shape of the above
curve.



Author : James Krieger                         26.05.2011                          Page 7 sur 36
                          “Vector Quantization and entropy technique to predict high frequency data”


With a VQ of size 4 and 8, we have the following redundancy:
                                    Daily data VQ size = 4                                                                Daily data VQ size = 8

              20.0%                                                                                   20.0%
              18.0%                                                                                   18.0%
              16.0%                                                                                   16.0%
              14.0%                                                                                   14.0%
 redundancy




                                                                                         redundancy
              12.0%                                                      1 day return                 12.0%                                                      1 day return
              10.0%                                                      2 days return                10.0%                                                      2 days return
              8.0%                                                       7 days return                8.0%                                                       7 days return
              6.0%                                                                                    6.0%
              4.0%                                                                                    4.0%
              2.0%                                                                                    2.0%
              0.0%                                                                                    0.0%
                      0     1   2    3        4        5     6   7   8                                        0   1   2    3        4        5     6     7   8
                                         path length                                                                           paht length



Here again the redundancy for the 1 day, 2 days and 7 days return have a very similar
shape, and the redundancy in decreasing almost immediately as the path length increased.




Author : James Krieger                                                    26.05.2011                                                                   Page 8 sur 36
                               “Vector Quantization and entropy technique to predict high frequency data”


On the 5 minutes data, we find some similar result:

                                                                                                  5 min data VQ = 2

                           6.0%

                           5.0%

                           4.0%
              redundancy




                           3.0%

                           2.0%

                           1.0%

                           0.0%
                                       0                                                5                                                       10                                         15                                           20
                                                                                                                  path length

                                                                                                  3 hours return                                     20 hours return


                                              5 min data VQ size = 4                                                                                                      5 min data VQ size = 8

              20.0%                                                                                                                20.0%
              18.0%                                                                                                                18.0%
              16.0%                                                                                                                16.0%
              14.0%                                                                                                                14.0%
                                                                                                                      redundancy
 redundancy




              12.0%                                                                                                                12.0%
              10.0%                                                                                                                10.0%
               8.0%                                                                                                                 8.0%
               6.0%                                                                                                                 6.0%
               4.0%                                                                                                                 4.0%
               2.0%                                                                                                                 2.0%
               0.0%                                                                                                                 0.0%
                           0       1          2          3            4           5          6      7         8                                    0           1         2         3            4           5         6        7         8
                                                                path length                                                                                                            path length

                                                     3 hours return       20 hours return                                                                                      3 hours return       20 hours return




If we compute the average number of sample available for a given VQ size and a
particular path of a fix length, we found that the maximum redundancy need more sample
as the size of the VQ increase. (See appendix III)

If we compare the redundancy over time we found:
                                                  daily data 2 days return                                                                                           daily data 7 days return VQ size of 4

              16%                                                                                                                               12.00%

              14%
                                                                                                                                                10.00%
              12%                                                                                                 2                                                                                                                          2
                                                                                                                  3                             8.00%                                                                                        3
 redundancy




                                                                                                                                   redundancy




              10%
                                                                                                                  4                                                                                                                          4
               8%                                                                                                                               6.00%
                                                                                                                  5                                                                                                                          5
               6%                                                                                                 6                                                                                                                          6
                                                                                                                                                4.00%
               4%                                                                                                 7                                                                                                                          7
                                                                                                                                                2.00%
               2%
               0%                                                                                                                               0.00%
               28-Oct-95          11-Mar-97        24-Jul-98      06-Dec-99           19-Apr-01   01-Sep-02                                        28-Oct-95       11-Mar-97   24-Jul-98        06-Dec-99       19-Apr-01   01-Sep-02
                                                               date                                                                                                                     date




Author : James Krieger                                                                                  26.05.2011                                                                                          Page 9 sur 36
                        “Vector Quantization and entropy technique to predict high frequency data”


                            5 minutes data 3 hours return VQ size of 8                                                 5 minutes data 20 hours return VQ size of 8

               20.00%                                                                                      12.00%
               18.00%
               16.00%                                                                                      10.00%
                                                                                          2
               14.00%                                                                                                                                                                 2
                                                                                          3
  redundancy



                                                                                                           8.00%                                                                      3




                                                                                              redundancy
               12.00%
                                                                                          4
               10.00%                                                                                                                                                                 4
                                                                                          5                6.00%
                8.00%                                                                                                                                                                 5
                                                                                          6
               6.00%                                                                                       4.00%                                                                      6
               4.00%                                                                      7
                                                                                                                                                                                      7
               2.00%                                                                                       2.00%
               0.00%
                  08-Jul-01 28-Jul-01 17-Aug- 06-Sep- 26-Sep- 16-Oct- 05-Nov-   25-Nov-                    0.00%
                                         01      01      01     01      01        01                          08-Jun-01 28-Jul-01 16-Sep-01 05-Nov-01 25-Dec-01 13-Feb-02 04-Apr-02
                                                 date                                                                                         date




(Discard the abrupt change, which are due to a recali bration of the VQ at the beginning of every year )

We can see that the redundancy is not surprisingly very stable. (This is normal, because
adding just a couple of sample to the all data sample should not affect the overall
redundancy too much)

But these charts are interesting because they tell a lot on how good a predictor model will
do. Assuming that there is not a rolling effect of redundancy and patterns are constant
over time, you can have 3 scenarios:
        - The redundancy in decreasing. In this case, your prediction is likely to getting
           worst.
        - The redundancy is constant. In this case, your prediction should be good.
        - The redundancy is increasing. In this case, your prediction is likely to get
           better.

If you look at the daily data, we can see an increase in redundancy until 1999. This is the
date where the euro has been introduced (Euro tied to the other currency in Europe). So
not surprisingly, after 1999, the redundancy is flat and suggest that the introduction of
new dynamics in the FX market. (The daily data is the FX pair usd_chf and gbp_usd)


5.1 Path length
Based on the above experiment, we can conclude that the path length which generates the
biggest redundancy is determined by the amount of sample ava ilable.

5.2 VQ size
As for the path length the size of the VQ which generate the biggest redundancy is
determined by the amount of sample available. As a rule of thumb you should have a
sample size equal to (VQ size)path length . Also, the redundancy seems to increase with the
size of the VQ but decrease for large VQ due to the limited sample size.

5.3 Return period
In our experiment, the period has a significant impact on the redundancy. In general, the
redundancy increased as the period shortening. There is one exception which is on the 5
minutes data with a VQ of size 2, where the shorter period generated a smaller



Author : James Krieger                                                             26.05.2011                                                           Page 10 sur 36
       “Vector Quantization and entropy technique to predict high frequency data”


redundancy. This could be explain by the fact that with a VQ of size 2, you will capture
only directional change and the information become noisy if the return period is too small.

6 Forecast
6.1 Algorithm
For the forecast we are going to use the following algorithm:

6.1.1 Multi path length algorithm

6.1.1.1 PPM p
 This is based on the PPM blending mechanism. Using the historical data we will find the
corresponding probability for each vector. With this probability we will find a forecast by
adding each vector with their corresponding probability. This forecast will determine the
amount we will bet for the period. We will evaluate this return for different value of ds
and k.

6.1.1.2 PPM g
This is similar to the PPM p algorithm but instead of using the probability, we are using
the average historical return for each similar path length and blend this return using the
PPM algorithm. This forecast will determine the amount we will bet for the period. We
will evaluate this return for different value of ds.

6.1.2 Fix path length algorithm
For the following algorithm we will use the path length that has the higher redundancy.

6.1.2.1 R sign
This is based on the historical direction of the return for identical paths. It is computed by
counting the number of positive return minus the number of negative return and divided
by the number of sample. R sign can take a value between 1 to -1. This value is going to
be use as the bet we are going to take.

6.1.2.2 R g
This is the average on the historical return for identical paths. This average will be use as
our bet.

6.1.2.3 R g * count
Similar as R g but we count the number of time we have identical path. This allows
weighting our forecast based on how significant this path happ ens in the past. This
computed value will be use as our bet.

6.1.2.4 R g * count / stdev
Similar as above but this time we divide our average return with the standard deviation of
the historical return with the identical path.



Author : James Krieger                 26.05.2011                          Page 11 sur 36
       “Vector Quantization and entropy technique to predict high frequency data”


6.1.2.5 R g*count*sign ifsame
Similar as above but this time we take a bet only if the average return and the R sign are
pointing in the same direction.

6.1.3 Return Period
For the return period we will use 1,2,7 days return for the daily data and 20 hours, 3
hours for the 5 minutes data.
For the daily data, we are going to bet every day based on the last trading information. It
means that for 2 and 7 days return, we will assume a rolling bet. (Multi bet at the same
time waiting for their period to expired)

For the 5 minutes data, we are going to take into consideration inactivate period. So
instead of having exactly 3 hours or 20 hours, as a period for the return, we are going to
us the number of sample. For example for the 3 hours period we compute the return
between 12*3=36 samples (the data has a 5 minutes sample rate). So all inactivity period
will be skipped. (There is no sample if no activity)
We will take a new bet at the end of the return period. (every 36 or 120 samples in this
case).




Author : James Krieger                26.05.2011                         Page 12 sur 36
                              “Vector Quantization and entropy technique to predict high frequency data”




6.2 Result
6.2.1 Daily data
For the daily data we found the following monthly sharp ratio in the out of sample date starting in 1/1/1997 and ending 1/1/2002
The sharp ratio was calculated on a daily basis and adjusted to reflect a monthly value (assuming 20 trades during a month). For the 7
days we are assuming that we take a bet every day and hold it until the end of the period. (see appendix for detail)

                                                           PPM p     PPM p      PPM p     PPM                                        r        r
                                                 PPM p     k-        k-         k-        p k-                             r         g*cou    g*count
                   PPM g      PPM      PPM g     k0_0:d    0_5:ds    0_5:ds     0_5:ds    1_0:d                            g*cou     nt/std   *sign
                   2          g4       6         s2        2         4          6         s2        r sign     rg          nt        ev       ifsame
         gbp_usd     2.43      3.00      3.30       2.96      2.73      2.58       2.40      2.49       4.12        3.71      1.92     1.66       4.19
 1 day
         usd_chf     1.30      1.54      1.68       1.70      1.75      2.22       2.50      1.77       3.02        1.68      0.62     0.78       1.19
 VQ 4
         both        2.16      2.63      2.88       2.65      2.54      2.72       2.79      2.41       4.41        3.19      1.50     1.45       3.31


         gbp_usd       2.79     3.14      3.35      3.58      3.87      3.86       3.89     4.10       1.66       2.76       0.88      0.58      0.48
 2 day
         usd_chf     (3.41)   (3.11)    (2.94)    (2.04)    (1.74)    (1.19)     (0.95)   (1.45)     (2.38)     (1.41)     (2.51)    (2.68)    (1.25)
 VQ 4
         both        (0.37)     0.02      0.25      0.91      1.27      1.60       1.76     1.58     (0.44)       0.82     (0.96)    (1.25)    (0.47)


         gbp_usd     (1.92)   (0.65)      0.06      3.25      3.45     3.40        3.21     3.62     (3.59)       2.15       0.35    (0.58)    (1.72)
 7 day
         usd_chf     (6.66)   (4.45)    (3.19)    (0.87)    (0.60)     1.14        1.89   (0.34)       2.01     (0.50)       0.39      0.44      1.30
 VQ 4
         both        (5.34)   (3.20)    (1.98)      1.39      1.67     2.65        2.97     1.91     (1.02)         1.06     0.47    (0.09)    (0.27)

Overall the gbp_usd outperform the usd_chf rate. This could mean that the usd_chf is a leading indicator for the gbp_usd FX rate.

The “PPM g” algorithm has disappointing result compare to the “PPM p”. This suggests that the proposed solution for the “zero
statistics” problem is improving the result. (“PPM g” do not assign a default probability if an event has never occur, but “PPM p” does)

The “r sign” algorithm is performing very well for short time period (1 day return) which is surprising due to the simplicity of this
algorithm. It means that the direction of the return is much more important than the forecasted value of the return.



Author : James Krieger                 26.05.2011                              Page 13 sur 36
                                                                                                             “Vector Quantization and entropy technique to predict high frequency data”



Do better understand the risk taken we should check how the repartition of the return are. For the 1 day return, we have the following
repartition of the return

algorithm : r sign                                                                                                                                                                   algorithm : r sign
                                                                 return repartition gbp_usd                                                                                                                                                           Return repartition usd_chf

                                    60%                                                                                                                                                                                  80%
                                                                                                                                                                                                                                                                                                                                                    7/2/2001 1/1/2002
                                    50%                                                                                                                        7/2/2001 1/1/2002
  % of total return (gain + loss)




                                                                                                                                                                                       % of total return (gain + loss)
                                                                                                                                                                                                                         60%                                                                                                                        1/1/2001 7/2/2001
                                    40%                                                                                                                        1/1/2001 7/2/2001
                                                                                                                                                                                                                                                                                                                                                    7/2/2000 1/1/2001
                                    30%                                                                                                                        7/2/2000 1/1/2001                                         40%                                                                                                                        1/1/2000 7/2/2000




                                                                                                                                                                                                for the range
           for the range




                                                                                                                                                               1/1/2000 7/2/2000
                                    20%                                                                                                                                                                                  20%                                                                                                                        7/3/1999 1/1/2000
                                                                                                                                                               7/3/1999 1/1/2000                                                                                                                                                                    1/1/1999 7/3/1999
                                    10%
                                                                                                                                                               1/1/1999 7/3/1999                                          0%                                                                                                                        7/2/1998 1/1/1999
                                     0%




                                                                                                                                                                                                                                                                                                                                     more
                                                                                                                                                                                                                                 0 to 1


                                                                                                                                                                                                                                             1 to 2


                                                                                                                                                                                                                                                         2 to 3


                                                                                                                                                                                                                                                                           3 to 4


                                                                                                                                                                                                                                                                                             4 to 5


                                                                                                                                                                                                                                                                                                               5 to 6


                                                                                                                                                                                                                                                                                                                            6 to 7
                                                                                                                                                               7/2/1998 1/1/1999




                                                                                                                                                more
                                           0 to 1


                                                       1 to 2


                                                                         2 to 3


                                                                                           3 to 4


                                                                                                             4 to 5


                                                                                                                           5 to 6


                                                                                                                                       6 to 7
                                                                                                                                                                                                                                                                                                                                                    1/1/1998 7/2/1998
                                    -10%                                                                                                                                                                                 -20%
                                                                                                                                                               1/1/1998 7/2/1998                                                                                                                                                                    7/2/1997 1/1/1998
                                    -20%                                                                                                                       7/2/1997 1/1/1998                                         -40%                                                                                                                       1/1/1997 7/2/1997
                                    -30%                                                                                                                       1/1/1997 7/2/1997                                                                                                                                                                    Total:r sign
                                    -40%                                                                                                                       Total:r sign                                              -60%
                                                                            standard deviation                                                                                                                                                                 standard deviation


algorithm : r g*count*sign ifsame                                                                                                                                                    algorithm : r g*count*sign ifsame
                                                                 return repartition gbp_usd                                                                                                                                                           Return repartition usd_chf

                                    50%                                                                                                                                                                                  150%
                                                                                                                                                                                                                                                                                                                                            7/2/2001 1/1/2002
                                    40%                                                                                                                7/2/2001 1/1/2002
  % of total return (gain + loss)




                                                                                                                                                                                       % of total return (gain + loss)
                                                                                                                                                                                                                                                                                                                                            1/1/2001 7/2/2001
                                                                                                                                                       1/1/2001 7/2/2001
                                                                                                                                                                                                                         100%
                                    30%                                                                                                                                                                                                                                                                                                     7/2/2000 1/1/2001
                                                                                                                                                       7/2/2000 1/1/2001
                                    20%                                                                                                                                                                                   50%                                                                                                               1/1/2000 7/2/2000



                                                                                                                                                                                                for the range
           for the range




                                                                                                                                                       1/1/2000 7/2/2000
                                                                                                                                                                                                                                                                                                                                            7/3/1999 1/1/2000
                                    10%                                                                                                                7/3/1999 1/1/2000
                                                                                                                                                                                                                           0%                                                                                                               1/1/1999 7/3/1999
                                     0%




                                                                                                                                                                                                                                                                                                                             more
                                                                                                                                                                                                                                   0 to 1

                                                                                                                                                                                                                                            1 to 2

                                                                                                                                                                                                                                                      2 to 3

                                                                                                                                                                                                                                                                  3 to 4

                                                                                                                                                                                                                                                                                    4 to 5

                                                                                                                                                                                                                                                                                                      5 to 6

                                                                                                                                                                                                                                                                                                                   6 to 7
                                                                                                                                                       1/1/1999 7/3/1999                                                                                                                                                                    7/2/1998 1/1/1999
                                                                                                                                       more
                                           0 to 1

                                                    1 to 2

                                                                2 to 3

                                                                                  3 to 4

                                                                                                    4 to 5

                                                                                                                  5 to 6

                                                                                                                             6 to 7




                                    -10%                                                                                                               7/2/1998 1/1/1999                                                 -50%                                                                                                               1/1/1998 7/2/1998
                                                                                                                                                       1/1/1998 7/2/1998                                                                                                                                                                    7/2/1997 1/1/1998
                                    -20%
                                                                                                                                                       7/2/1997 1/1/1998                                                 -100%                                                                                                              1/1/1997 7/2/1997
                                    -30%                                                                                                               1/1/1997 7/2/1997                                                                                                                                                                    Total:r g*count*sign ifsame
                                    -40%                                                                                                               Total:r g*count*sign ifsame                                       -150%
                                                                standard deviation                                                                                                                                                                    standard deviation




Author : James Krieger                                                                                                                26.05.2011                                              Page 14 sur 36
                                                                                   “Vector Quantization and entropy technique to predict high frequency data”



For the 2 day return, we have the following repartition of the return
algorithm : PPM p k-0_5:ds 6                                          algorithm : PPM p k-0_5:ds 6
                                                              return repartition gbp_usd                                                                                                                                           Return repartition usd_chf

                                    80%                                                                                                                                                               200%
                                                                                                                                                                                                                                                                                                        7/2/2001 1/1/2002
                                                                                                                                      7/2/2001 1/1/2002                                               150%




                                                                                                                                                                   % of total return (gain + loss)
  % of total return (gain + loss)




                                    60%                                                                                                                                                                                                                                                                 1/1/2001 7/2/2001
                                                                                                                                      1/1/2001 7/2/2001                                               100%                                                                                              7/2/2000 1/1/2001
                                    40%                                                                                               7/2/2000 1/1/2001                                                                                                                                                 1/1/2000 7/2/2000
                                                                                                                                                                                                       50%




                                                                                                                                                                            for the range
           for the range




                                                                                                                                      1/1/2000 7/2/2000                                                                                                                                                 7/3/1999 1/1/2000
                                    20%                                                                                                                                                                 0%
                                                                                                                                      7/3/1999 1/1/2000                                                                                                                                                 1/1/1999 7/3/1999




                                                                                                                                                                                                                                                                                                more
                                                                                                                                                                                                               0 to 1

                                                                                                                                                                                                                          1 to 2

                                                                                                                                                                                                                                     2 to 3

                                                                                                                                                                                                                                                 3 to 4

                                                                                                                                                                                                                                                             4 to 5

                                                                                                                                                                                                                                                                         5 to 6

                                                                                                                                                                                                                                                                                     6 to 7
                                     0%                                                                                               1/1/1999 7/3/1999                                               -50%                                                                                              7/2/1998 1/1/1999




                                                                                                                             more
                                           0 to 1

                                                     1 to 2

                                                               2 to 3

                                                                          3 to 4

                                                                                       4 to 5

                                                                                                   5 to 6

                                                                                                                 6 to 7
                                                                                                                                      7/2/1998 1/1/1999                                              -100%                                                                                              1/1/1998 7/2/1998
                                    -20%                                                                                              1/1/1998 7/2/1998                                                                                                                                                 7/2/1997 1/1/1998
                                                                                                                                                                                                     -150%
                                                                                                                                      7/2/1997 1/1/1998                                                                                                                                                 1/1/1997 7/2/1997
                                    -40%                                                                                                                                                             -200%
                                                                                                                                      1/1/1997 7/2/1997                                                                                                                                                 Total:PPM p k-0_5:ds 6
                                    -60%                                                                                              Total:PPM p k-0_5:ds 6
                                                                                                                                                                                                     -250%

                                                                standard deviation                                                                                                                                                    standard deviation


For the 7 day return, we have the following repartition of the return
algorithm : PPM p k-0_5:ds 6                                          algorithm : PPM p k-0_5:ds 6
                                                              return repartition gbp_usd                                                                                                                                           Return repartition usd_chf

                                    120%                                                                                                                                                             250%
                                                                                                                                                                                                                                                                                                       7/2/2001 1/1/2002
                                    100%                                                                                            7/2/2001 1/1/2002
  % of total return (gain & loss)




                                                                                                                                                               % of total return (gain & loss)
                                                                                                                                                                                                     200%                                                                                              1/1/2001 7/2/2001
                                    80%                                                                                             1/1/2001 7/2/2001
                                                                                                                                                                                                     150%                                                                                              7/2/2000 1/1/2001
                                    60%                                                                                             7/2/2000 1/1/2001
                                                                                                                                                                                                                                                                                                       1/1/2000 7/2/2000


                                                                                                                                                                        for the range
           for the range




                                    40%                                                                                             1/1/2000 7/2/2000                                                100%                                                                                              7/3/1999 1/1/2000
                                                                                                                                    7/3/1999 1/1/2000
                                    20%                                                                                                                                                               50%                                                                                              1/1/1999 7/3/1999
                                                                                                                                    1/1/1999 7/3/1999
                                     0%                                                                                                                                                                                                                                                                7/2/1998 1/1/1999
                                                                                                                                    7/2/1998 1/1/1999                                                  0%
                                                                                                                          more
                                           0 to 1

                                                    1 to 2

                                                              2 to 3

                                                                        3 to 4

                                                                                    4 to 5

                                                                                                5 to 6

                                                                                                             6 to 7




                                                                                                                                                                                                                                                                                                       1/1/1998 7/2/1998




                                                                                                                                                                                                                                                                                              more
                                                                                                                                                                                                             0 to 1

                                                                                                                                                                                                                        1 to 2

                                                                                                                                                                                                                                   2 to 3

                                                                                                                                                                                                                                              3 to 4

                                                                                                                                                                                                                                                          4 to 5

                                                                                                                                                                                                                                                                      5 to 6

                                                                                                                                                                                                                                                                                  6 to 7
                                    -20%
                                                                                                                                    1/1/1998 7/2/1998                                                -50%                                                                                              7/2/1997 1/1/1998
                                    -40%                                                                                            7/2/1997 1/1/1998
                                                                                                                                                                                                     -100%                                                                                             1/1/1997 7/2/1997
                                    -60%                                                                                            1/1/1997 7/2/1997                                                                                                                                                  Total:PPM p k-0_5:ds 6
                                    -80%                                                                                            Total:PPM p k-0_5:ds 6                                           -150%
                                                              standard deviation                                                                                                                                                    standard deviation




Author : James Krieger                                                                                      26.05.2011                                         Page 15 sur 36
                             “Vector Quantization and entropy technique to predict high frequency data”




On the above graphs we see that the return generated by the positive skewness is important. Also, we can see in the appendix that the
skew value is always positive and relatively high for small period return (1 day return).

Also, the algorithm “r g*count*sign ifsame” for the gbp_usd 1 day return give very good result. Except for the range of return from 1
to 2 standard deviations which is negative, all the other range is positive with a very fat positive tail. Not surprisingly the monthly
sharp ratio for this case is 4.19. This is very high.




Author : James Krieger                26.05.2011                        Page 16 sur 36
                              “Vector Quantization and entropy technique to predict high frequency data”




6.2.2 5 minutes data
For the 5 minutes data, we found the following monthly sharp ratio. Due to the processing time require, only two scenarios were tested
The sharp ratio was calculated on a period basis (20 hours or 3 hours) and adjusted to reflect a monthly duration (assuming 20 trades
during a month for the 20 hours return and 100 trades for the 3 hours return)
(a detail of the return are available at the appendix IV)
.
                                                                  PPM p       PPM p      PPM p                                                  r
                                                       PPM p      k-          k-         k-                                          r          g*count
                       PPM g      PPM g     PPM g      k0_0:ds    0_5:ds      0_5:ds     0_5:ds                            r         g*coun     *sign
                       12         4         8          2          12          4          8           r sign      rg        g*count   t/stdev    ifsame
            eur_usd     (4.13)     (2.33)     (3.44)    (10.46)      (8.54)    (10.68)      (9.54)        4.81    (4.96)      2.56       2.80       2.64
 20 hours
            usd_chf     (4.68)     (2.65)     (4.00)    (11.86)      (9.62)    (11.92)    (10.61)       (0.56)    (2.91)      0.51       0.39       0.24
   VQ 8
            both         (4.56)    (2.56)     (3.83)    (11.40)     (9.26)    (11.53)    (10.28)         2.18     (4.19)      1.59      1.65       1.48

            gbp_usd    (14.38)    (16.80)   (14.19)      (6.03)     14.70        7.19      12.57       70.41       54.92   (18.17)   (17.65)    (24.30)
 3 hours
            usd_chf      14.54       4.66     13.20        8.37      8.38       13.52      10.43       78.17       61.87      3.88      3.84     (6.88)
  VQ 8
            both          0.08     (6.16)    (0.50)        1.21     11.59       10.56      11.61       74.02       55.50    (7.41)    (7.16)    (15.93)


The 20 hours does not generate good return. This is probably due to the data which represent only 2.5 years of information. A nd for a
20 hours forecast, this is probably not enough.

The 3 hours does generate an extraordinary return. If you look at the detail in appendix IV, you will see that this return is coming from
the 2 first weeks of September 2001. (September 11 !)

A closer look confirmed it:




Author : James Krieger                 26.05.2011                               Page 17 sur 36
                                                                              “Vector Quantization and entropy technique to predict high frequency data”



algorithm : r sign                                                                                                                             algorithm : r sign
                                                          return repartition eur_usd                                                                                                                    Return repartition usd_chf

                            80%                                                                                                                                            60%




                                                                                                                                                 % of total return (gain
  % of total return (gain




                                                                                                                      10/18/2001 11/2/2001                                                                                                                             10/18/2001 11/2/2001




                                                                                                                                                 + loss) for the range
  + loss) for the range




                            60%                                                                                                                                            40%
                                                                                                                      10/2/2001 10/18/2001                                                                                                                             10/2/2001 10/18/2001
                            40%
                                                                                                                      9/17/2001 10/2/2001                                  20%                                                                                         9/17/2001 10/2/2001
                            20%
                                                                                                                      9/1/2001 9/17/2001                                                                                                                               9/1/2001 9/17/2001
                             0%                                                                                                                                             0%                                                                                         8/17/2001 9/1/2001
                                                                                                                      8/17/2001 9/1/2001




                                                                                                               more
                                    0 to 1

                                               1 to 2

                                                          2 to 3

                                                                    3 to 4

                                                                                4 to 5

                                                                                         5 to 6

                                                                                                     6 to 7




                                                                                                                                                                                                                                                              more
                                                                                                                                                                                   0 to 1

                                                                                                                                                                                              1 to 2

                                                                                                                                                                                                        2 to 3

                                                                                                                                                                                                                 3 to 4


                                                                                                                                                                                                                           4 to 5

                                                                                                                                                                                                                                      5 to 6

                                                                                                                                                                                                                                                  6 to 7
                            -20%                                                                                                                                                                                                                                       8/1/2001 8/17/2001
                                                                                                                      8/1/2001 8/17/2001                                   -20%
                            -40%                                                                                      Total:r sign                                                                                                                                     Total:r sign
                            -60%                                                                                                                                           -40%

                                                           standard deviation                                                                                                                            standard deviation


Algorithm :PPM p k-1_0:ds 2                                                                                                                    Algorithm :PPM p k-1_0:ds 2
                                                          return repartition eur_usd                                                                                                                    Return repartition usd_chf

                            100%                                                                                                                                           100%




                                                                                                                                                 % of total return (gain
  % of total return (gain




                                                                                                                      10/18/2001 11/2/2001                                                                                                                           10/18/2001 11/2/2001




                                                                                                                                                 + loss) for the range
  + loss) for the range




                             50%                                                                                      10/2/2001 10/18/2001                                  50%                                                                                      10/2/2001 10/18/2001
                                                                                                                      9/17/2001 10/2/2001                                                                                                                            9/17/2001 10/2/2001

                              0%                                                                                      9/1/2001 9/17/2001                                     0%                                                                                      9/1/2001 9/17/2001




                                                                                                                                                                                                                                                           more
                                                                                                                                                                                     0 to 1

                                                                                                                                                                                               1 to 2

                                                                                                                                                                                                        2 to 3

                                                                                                                                                                                                                 3 to 4

                                                                                                                                                                                                                          4 to 5

                                                                                                                                                                                                                                    5 to 6

                                                                                                                                                                                                                                               6 to 7
                                                                                                              more
                                      0 to 1

                                                 1 to 2

                                                           2 to 3

                                                                     3 to 4

                                                                                4 to 5

                                                                                         5 to 6

                                                                                                    6 to 7




                                                                                                                      8/17/2001 9/1/2001                                                                                                                             8/17/2001 9/1/2001
                            -50%                                                                                      8/1/2001 8/17/2001                                   -50%                                                                                      8/1/2001 8/17/2001

                                                                                                                      Total:PPM p k-1_0:ds 2                                                                                                                         Total:PPM p k-1_0:ds 2
                            -100%                                                                                                                                          -100%
                                                           standard deviation                                                                                                                            standard deviation




As you can see most of the return are coming from the 2 first week of September. The returns are highly skew positively.
Without the two first week of September, the returns are positive and not highly skew positively. Most of the return are in t he fist
standard deviation.




Author : James Krieger                                                                            26.05.2011                                    Page 18 sur 36
      “Vector Quantization and entropy technique to predict high frequency data”



7    Additional study
We could of course complete this study by trying different return period and VQ size.
Especially for the 5 minutes data where only 2 periods were tested which are probably to
long (1 hours period will probably be much better)
Also, it is necessary to check the impact of transaction cost on the return.

But more interesting study could be done on the following issue:
   1. Use more than 2 currencies. If we are capturing all the high volume currencies,
      we could capture all the important flow which I believe will have high
      redundancy.
   2. Instead of using only currency, we could add a combination of market indicator
      (S&P, CAC40, etc.)
   3. The “PPM g” algorithms are disappointing. But due to the good success of the “R
      sign” algorithm we could try to combine the PPM smoothing approach and apply
      it to “R sign” and try to compute multi path length directional return.
   4. The VQ LBG algorithm is known to be locally optimal (very good centroid) but
      not globally optimal. (Minimum error term). Other Algorithm could be tested to
      check if they will improve the overall result.
   5. We could mix multi period in the path. For example we can image build a path
      based on 2 weeks return following by 1 week return following by a 3 days return
      following by a 1 days return. This will allowed us to capture longer period pattern
      with a limited number of sample available.

8    Conclusion
The use of theory of information to predict future return seems promising. In this paper
we showed that with daily FX return, we have being able to generate return with a high
sharpe ratio (above 4). In some case, over a 5 years period, only 2 half year have
generated a negative return.

We have also showed that the length of the path that generated high redundancy is
closely related to the amount of data available. In our example, the longest path was 16
period for a VQ of size 2. This could simplify the complexity of the implementation as a
path is 16 periods is relatively small and easy to handle.

In our experience, we have unfortunately used long return period for the 5 minutes data.
It is realistic to believe that a shorter period would generate much higher return.

Overall the implementation was




Author : James Krieger               26.05.2011                         Page 19 sur 36
      “Vector Quantization and entropy technique to predict high frequency data”




9    References
Campbell R. Harvey, "Forecasting Foreign Exchange Market Returns via Entropy Based
Coding: The Framework," with Arman Glodjo.

http://faculty.fuqua.duke.edu/~charvey/Research/Working_Papers/W13_Forecasting_for
eign_exchange.pdf


David J.C. MacKay, “Information Theory, Inference, and Learning Algorithms ”

http://www.inference.phy.cam.ac.uk/mackay/itprnn/book.html


Suzanne Bunton, “On-Line Stochastic Processes in Data Compression”

ftp://ftp.cs.washington.edu/tr/1997/03/UW-CSE-97-03-02.PS.Z


Other online resources:

About compression including PPM algorithm description.
http://datacompression.info/index.shtml

Introduction to the theory of information and VQ. (include the LGG VQ)
http://datacompression.info/index.shtml




Author : James Krieger              26.05.2011                       Page 20 sur 36
       “Vector Quantization and entropy technique to predict high frequency data”



10 Appendix

Appendix I
10.1 LBG design
From: Nam Phamdo
Department of Electrical and Computer Engineering
State University of New York
Stony Brook, NY 11794-2350
phamdo@ieee.org


I. Introduction
  Vector quantization (VQ) is a lossy data compression method based on the principle of
block coding. It is a fixed-to-fixed length algorithm. In the earlier days, the design of a
vector quantizer (VQ) is considered to be a challenging problem due to the need for
multi-dimensional integration. In 1980, Linde, Buzo, and Gray (LBG) proposed a VQ
design algorithm based on a training sequence. The use of a training sequence bypasses
the need for multi-dimensional integration. A VQ that is designed using this algorithm
are referred to in the literature as an LBG-VQ.

II. Preliminaries
 A VQ is nothing more than an approximator. The idea is similar to that of ``rounding-
off'' (say to the nearest integer). An example of a 1-dimensional VQ is shown below:




Here, every number less than -2 are approximated by -3. Every number between -2 and 0
are approximated by -1. Every number between 0 and 2 are approximated by +1. Every
number greater than 2 are approximated by +3. Note that the approximate values are
uniquely represented by 2 bits. This is a 1-dimensional, 2-bit VQ. It has a rate of 2
bits/dimension.

An example of a 2-dimensional VQ is shown below:




Author : James Krieger                26.05.2011                          Page 21 sur 36
       “Vector Quantization and entropy technique to predict high frequency data”




Here, every pair of numbers falling in a particular region are approximated by a red star
associated with that region. Note that there are 16 regions and 16 red stars -- each of
which can be uniquely represented by 4 bits. Thus, this is a 2-dimensional, 4-bit VQ. Its
rate is also 2 bits/dimension.

In the above two examples, the red stars are called codevectors and the regions defined
by the blue borders are called encoding regions. The set of all codevectors is called the
codebook and the set of all encoding regions is called the partition of the space.

III. Design Problem
  The VQ design problem can be stated as follows. Given a vector source with its
statistical properties known, given a distortion measure, and given the number of
codevectors, find a codebook (the set of all red stars) and a partition (the set of blue lines)
which result in the smallest average distortion.

We assume that there is a training sequence consisting of        source vectors:



This training sequence can be obtained from some large database. For example, if the
source is a speech signal, then the training sequence can be obtained by recording several
long telephone conversations.       is assumed to be sufficiently large so that all the




Author : James Krieger                  26.05.2011                          Page 22 sur 36
          “Vector Quantization and entropy technique to predict high frequency data”


statistical properties of the source are captured by the training sequence. We assume that
the source vectors are     -dimensional, e.g.,


Let       be the number of codevectors and let


represents the codebook. Each codevector is             -dimensional, e.g.,


Let       be the encoding region associated with codevector           and let


denote the partition of the space. If the source vector           is in the encoding region   ,
then its approximation (denoted by               ) is     :

Assuming a squared-error distortion measure, the average distortion is given by:




where                                 . The design problem can be succinctly stated as
follows: Given       and    , find   and     such that           is minimized.

IV. Optimality Criteria
 If and are a solution to the above minimization problem, then it must satisfied the
following two criteria.

         Nearest Neighbor Condition:




          This condition says that the encoding region     should consists of all vectors that
          are closer to   than any of the other codevectors. For those vectors lying on the
          boundary (blue lines), any tie-breaking procedure will do.

         Centroid Condition:




Author : James Krieger                     26.05.2011                            Page 23 sur 36
       “Vector Quantization and entropy technique to predict high frequency data”


       This condition says that the codevector      should be average of all those training
       vectors that are in encoding region      . In implementation, one should ensure that
       at least one training vector belongs to each encoding region (so that the
       denominator in the above equation is never 0).




V. LBG Design Algorithm
 The LBG VQ design algorithm is an iterative algorithm which alternatively solves the

above two optimality criteria. The algorithm requires an initial codebook        . This
initial codebook is obtained by the splitting method. In this method, an initial codevector
is set as the average of the entire training sequence. This codevector is then split into two.
The iterative algorithm is run with these two vectors as the initial codebook. The final
two codevectors are splitted into four and the process is repeated until the desired number
of codevectors is obtained. The algorithm is summarized below.

 LBG Design Algorithm

   1. Given        . Fixed      to be a ``small'' number.
   2. Let            and




       Calculate




   3. Splitting: For                     , set




       Set             .


   4. Iteration: Let                  . Set the iteration index       .


Author : James Krieger                 26.05.2011                          Page 24 sur 36
      “Vector Quantization and entropy technique to predict high frequency data”


          i.   For                       , find the minimum value of




               over all                     . Let     be the index which achieves the
               minimum. Set




         ii.   For                   , update the codevector




        iii.   Set           .
        iv.    Calculate




         v.    If                                   , go back to Step (i).

        vi.    Set               . For                       , set




               as the final codevectors.

   5. Repeat Steps 3 and 4 until the desired number of codevectors is obtained.

VI. Performance
 The performance of VQ are typically given in terms of the signal-to-distortion ratio
(SDR):



                                                          (in dB),



Author : James Krieger                26.05.2011                             Page 25 sur 36
        “Vector Quantization and entropy technique to predict high frequency data”


where      is the variance of the source and    is the average squared-error distortion.
The higher the SDR the better the performance. The following tables show the
performance of the LBG-VQ for the memoryless Gaussian source and the first-order
Gauss-Markov source with correlation coefficient 0.9. Comparisons are made with the
optimal performance theoretically attainable, SDRopt, which is obtained by evaluating the
rate-distortion function.


        Rate                               SDR (in dB)                            SDRopt
(bits/dimension)
         1           4.4    4.4     4.5    4.7     4.8    4.8     4.9      5.0       6.0
         2           9.3     9.6    9.9   10.2     10.3   ----    ----     ----      12.0
         3          14.6    15.3   15.7   ----     ----   ----    ----     ----      18.1
         4          20.2    21.1   ----    ----    ----   ----    ----     ----      24.1
         5          26.0    27.0  ----    ----   ----   ----      ----     ----      30.1
                               Memoryless Gaussian Source

        Rate                               SDR (in dB)                            SDRopt
(bits/dimension)
         1           4.4    7.8     9.4   10.2     10.7   11.0    11.4     11.6      13.2
         2           9.3    13.6   15.0   15.8     16.2   ----    ----     ----      19.3
         3          14.6    19.0   20.6    ----    ----   ----    ----     ----      25.3
         4          20.2    24.8   ----    ----    ----   ----    ----     ----      31.3
         5          26.0    30.7   ----    ----    ----   ----    ----     ----      37.3
                   First-Order Gauss-Markov Source with Correlation 0.9


VII. References
   1.    A. Gersho and R. M. Gray, Vector Quantization and Signal Compression.
   2.    H. Abut, Vector Quantization.
   3.    R. M. Gray, ``Vector Quantization,'' IEEE ASSP Magazine, pp. 4--29, April 1984.
   4.    Y. Linde, A. Buzo, and R. M. Gray, ``An Algorithm for Vector Quantizer
         Design,'' IEEE Transactions on Communications, pp. 702--710, January 1980.




Author : James Krieger                26.05.2011                         Page 26 sur 36
       “Vector Quantization and entropy technique to predict high frequency data”


Appendix II
10.2 Update the LBG Algorithm into a geometric world
To model financial information as the change in price, you need to work with geometric
mean instead of arithmetic mean.

Below I have updated the LGB VQ Algorithm describes by Nam Phamdo to take into
consideration the use of geometric mean.

The training sequence:
                                              is now:            y1 , y2 ,...., yM 
The codevectors:
                                              is now:          B  b1 ,b2 ,....,bN 
The distortion measure:


                                                          1 M
                                                             1 ln( y m ) ln(Q( y m ))
                                                               Dave 
                                                                                        2
                                                     is now:
                                                         Mk m 
To have a distortion measure, which we could compare into the geometric world, we
would have e Dave but this last step is not strictly necessary as our goal is to minimize the
error. (To get Dave closest to 0 is similar than to get e Dave closest to 1)

The Nearest neighbor condition:

       is now:
                                                        2                     2
                 S n  y : ln( y)  ln(bn )  ln( y)  ln(bn' ) n '  1,2,.., N           
The Centroid Condition:



       is now:
                           y m S n ln( y m )
                             
                bn  e ym S n
                                          1


       This is the geometric means and it could be express as:
                      
                bn   y S y m ym S n
                               

                                 m    n
                                        1
                                                 




Author : James Krieger                                  26.05.2011                        Page 27 sur 36
             “Vector Quantization and entropy technique to predict high frequency data”



Appendix III
10.3 Redundancy result
                                                  Daily data
               return on 1 day             return on 2 day               return on 7 day
             path length   Redundancy    path length    Redundancy     path length   Redundancy
                        2   5.8%                    2    2.0%                     2   0.0%
                        3   5.9%                    3    2.0%                     3   0.1%
                        4   6.0%                    4    2.0%                     4   0.1%
                        5   6.1%                    5    2.1%                     5   0.2%
                        6   6.2%                    6    2.2%                     6   0.5%
 VQ size 2




                        7   6.4%                    7    2.3%                     7   0.8%
                        8   6.6% 8 d.               8    2.6%                     8   1.3%
                        9   6.5%                    9    3.0%                     9   1.9%
                       10   6.0%                   10    3.2% 20 d.              10   2.5%
                       11   4.8%                   11    2.8%                    11   2.5% 2.5 m onth
                       12   3.6%                   12    2.0%                    12   2.1%
                       13   2.4%                   13    1.3%                    13   1.6%
                       14    1.5%                  14    0.7%                    14    1.3%
             path length    Redundancy   path length    Redundancy     path length    Redundancy
                        2   17.1%                   2   12.4%                     2   10.3%
 VQ size 4




                        3   17.3% 3 d.              3   12.6% 6 days              3   10.5% 3 weeks
                        4   16.7%                   4   12.4%                     4   10.3%
                        5   13.3%                   5   10.2%                     5    8.5%
                        6    9.1%                   6    6.7%                     6    5.7%
                        7    5.5%                   7    3.7%                     7    3.2%
             path length    Redundancy   path length    Redundancy     path length    Redundancy
                        2   18.5% 2 d.              2   11.2% 4 days              2    6.7% 2 weeks
 VQ size 8




                        3   16.2%                   3   10.0%                     3    6.6%
                        4   10.5%                   4    5.7%                     4    4.0%
                        5    5.0%                   5    2.1%                     5    1.6%
                        6    2.0%                   6    0.6%                     6    0.9%
                        7    0.7%                   7    0.1%                     7    0.6%




Author : James Krieger                      26.05.2011                         Page 28 sur 36
                “Vector Quantization and entropy technique to predict high frequency data”




                                           5 minutes data
                       return on 3 hours                  return on 20 hours
                    path length     Redundancy          path length     Redundancy
                                 2    1.1%                           2    0.8%
                                  .       .                           .       .
                                  .       .                           .       .
                                10    1.4%                          10    2.5%
                                11    1.5%                          11    3.3%
    VQ size 2




                                12    1.7%                          12    4.2%
                                13    1.9%                          13    5.0%
                                14    2.2%                          14    5.4%
                                15    2.5%                          15    5.5% 12.5 days
                                16    2.5% 2 days                   16    5.5%
                                17    2.3%                          17    5.4%
                                18    2.0%                          18    5.3%
                                19    1.7%                          19    5.1%
                    path length     Redundancy          path length     Redundancy
                                 2   17.4%                           2    8.6%
    VQ size 4




                                 3   17.5%                           3    8.8%
                                 4   17.6% 12 hours                  4    9.1%
                                 5   17.4%                           5    9.4% 4.2 days
                                 6   16.0%                           6    9.0%
                                  7    13.3%                          7     7.9%
                    path length       Redundancy        path length       Redundancy
                                  2    17.7% 6 hours                  2    10.4% 40 hours
    VQ size 8




                                  3    17.5%                          3    10.1%
                                  4    15.4%                          4     8.8%
                                  5    11.4%                          5     6.8%
                                  6     6.9%                          6     5.1%
                                  7     3.5%                          7     4.1%




Author : James Krieger                         26.05.2011                        Page 29 sur 36
                 “Vector Quantization and entropy technique to predict high frequency data”




Based on the number of historical sample available, we can compute the average number
of sample available for each possible path based on a perfectly random repartition.

In grey, the number of sample for the combination of path length and VQ size which generated the maximum redundancy.
For the daily data, we have 4000 sample and 17000 for the 5 minute data.
                           Daily data                                                     5 minutes data

                                    size of VQ                                                         size of VQ
                    4000             2          4            8                       17000            2          4           8
                       2         1000        250            63                           2         4250       1063         266
                       3          500          63            8                           3         2125        266          33
                       4          250          16            1                           4         1063         66           4
                       5          125           4            0                           5          531         17           1
                       6           63           1            0                           6          266          4           0
                       7           31           0            0                           7          133          1           0
   Path length




                                                                       Path length
                       8           16           0            0                           8           66          0           0
                       9             8          0            0                           9           33          0           0
                      10             4          0            0                          10           17          0           0
                      11             2          0            0                         11               8              0     0
                      12             1          0            0                         12               4              0     0
                      13             0          0            0                         13               2              0     0
                      14             0          0            0                         14               1              0     0
                      15             0          0            0                         15               1              0     0
                      16             0          0            0                         16               0              0     0




Author : James Krieger                               26.05.2011                                       Page 30 sur 36
                         “Vector Quantization and entropy technique to predict high frequency data”




Appendix IV
10.4 Return result




Author : James Krieger           26.05.2011                       Page 31 sur 36
                                     “Vector Quantization and entropy technique to predict high frequency data”



           Daily data, 1 days return, VQ of size 4. Daily sharpe ratio (return/stdev on overall period)
                                                                                                                                   r          r
                                                     PPM p     PPM p k- PPM p k- PPM p k- PPM p k-                                 g*count/st g*count*si
from        to          PPM g 2 PPM g 4 PPM g 6 k0_0:ds 2 0_5:ds 2 0_5:ds 4 0_5:ds 6 1_0:ds 2 r sign        rg         r g*count dev          gn ifsame
                                 3        4        5         6         7        8        9       10      11         12          13         14          15
stdev                    9.88E-07 1.46E-06 1.76E-06 1.81E-06 1.92E-06 2.43E-06 2.79E-06 2.06E-06 0.000579 4.05E-06 0.000433 0.105969 7.16E-05
     182.625 gbp_usd         1826
    1/1/1997 7/2/1997       0.086    0.091    0.093     0.070     0.062    0.057    0.054    0.054   0.065      0.082       0.068      0.065       0.083
    7/2/1997 1/1/1998       0.149    0.141    0.132     0.075     0.053    0.021    0.003    0.033   0.084      0.052       0.119      0.127       0.167
    1/1/1998 7/2/1998       0.092    0.091    0.089     0.027     0.019    0.003   (0.009)   0.011   0.100      0.162       0.031      0.024       0.141
    7/2/1998 1/1/1999       0.031    0.041    0.049     0.046     0.046    0.048    0.050    0.046   0.016      0.008       0.003      0.000     (0.040)
    1/1/1999 7/3/1999      (0.052)  (0.043)  (0.036)    0.023     0.027    0.027    0.023    0.030  (0.016)     0.003      (0.060)    (0.065)    (0.056)
    7/3/1999 1/1/2000       0.030    0.030    0.029     0.033     0.029    0.029    0.025    0.026  (0.008)     0.026       0.065      0.069       0.045
    1/1/2000 7/2/2000      (0.090)  (0.063)  (0.048)   (0.092)   (0.095)  (0.096)  (0.098)  (0.096)  0.058      0.033      (0.135)    (0.146)    (0.030)
    7/2/2000 1/1/2001       0.046    0.065    0.077     0.136     0.140    0.163    0.174    0.143   0.207      0.138       0.089      0.071       0.154
    1/1/2001 7/2/2001       0.025    0.015    0.010     0.007     0.015    0.020    0.028    0.023  (0.037)    (0.083)      0.046      0.054     (0.015)
    7/2/2001 1/1/2002      (0.053)  (0.043)  (0.037)   (0.009)   (0.010)  (0.002)   0.002   (0.011) (0.014)    (0.017)     (0.032)    (0.033)      0.014

    1/1/1997 1/1/1999    0.092    0.094    0.093    0.059    0.049    0.037    0.029    0.041    0.067    0.078    0.060                 0.059    0.089
    1/1/1997 1/1/2002    0.027    0.034    0.037    0.033    0.030    0.029    0.027    0.028    0.046    0.041    0.021                 0.019    0.047
sk ew                    1.076    1.221    1.325    1.621    1.672    1.898    1.986    1.691    0.938    5.620    0.906                0.888    2.454
stdev                 1.34E-06 1.99E-06 2.43E-06 3.49E-06 3.72E-06 4.71E-06 5.42E-06 3.99E-06 0.000701 4.95E-06 0.000626               0.11438 7.35E-05
     182.625 usd_chf       3652
    1/1/1997 7/2/1997    0.042    0.045    0.045   (0.003)  (0.009)  (0.012)  (0.014)  (0.014)  (0.058)  (0.010)   0.060                 0.064    (0.023)
    7/2/1997 1/1/1998   (0.058)  (0.069)  (0.076)  (0.084)  (0.090)  (0.100)  (0.103)  (0.095)  (0.012)  (0.020)  (0.034)               (0.037)   (0.019)
    1/1/1998 7/2/1998    0.028    0.033    0.035    0.040    0.034    0.030    0.026    0.028    0.011    0.033   (0.061)               (0.064)   (0.011)
    7/2/1998 1/1/1999    0.132    0.142    0.149    0.111    0.121    0.124    0.124    0.129    0.155    0.036    0.046                 0.051     0.104
    1/1/1999 7/3/1999   (0.038)  (0.028)  (0.021)   0.018    0.015    0.030    0.037    0.012    0.053    0.032   (0.045)               (0.052)   (0.026)
    7/3/1999 1/1/2000    0.013    0.015    0.017    0.037    0.044    0.059    0.069    0.050    0.030    0.063    0.026                 0.024     0.031
    1/1/2000 7/2/2000    0.096    0.101    0.103    0.110    0.105    0.101    0.097    0.100    0.080    0.046    0.054                 0.068     0.036
    7/2/2000 1/1/2001    0.081    0.088    0.091    0.096    0.098    0.109    0.114    0.098    0.084    0.111    0.115                 0.108     0.110
    1/1/2001 7/2/2001   (0.056)  (0.064)  (0.068)  (0.062)  (0.050)  (0.034)  (0.022)  (0.039)   0.001   (0.056)  (0.000)                0.017    (0.003)
    7/2/2001 1/1/2002   (0.096)  (0.093)  (0.090)  (0.079)  (0.077)  (0.064)  (0.052)  (0.075)  (0.013)  (0.052)  (0.094)               (0.095)   (0.072)

    1/1/1997 1/1/1998    (0.009)     (0.013)   (0.016)   (0.044)   (0.050)   (0.057)   (0.059)   (0.055)   (0.035)   (0.016)   0.012     0.013    (0.021)
    1/1/1997 1/1/2002     0.015       0.017     0.019     0.019     0.020     0.025     0.028     0.020     0.034     0.019    0.007     0.009     0.013
sk ew                     0.702       0.719     0.792     0.448     0.798     1.162     1.410     1.082     1.204     1.341    1.033     1.068     0.871
Both currency (equally weight)
    1/1/1997 1/1/2002     0.024      0.029     0.032     0.030     0.028     0.030     0.031     0.027     0.049     0.036     0.017     0.016     0.037
sk ew                     0.880      0.985     1.105     1.062     1.338     1.736     1.964     1.554     0.854     2.413     0.715     0.723     1.103
monthly sharpe ratio      2.162      2.625     2.885     2.651     2.536     2.723     2.790     2.414     4.410     3.194     1.500     1.447     3.315




Author : James Krieger                         26.05.2011                                Page 32 sur 36
                                     “Vector Quantization and entropy technique to predict high frequency data”



           Daily data, 2 days return, VQ of size 4. Daily sharpe ratio (return/stdev on overall period)
                                                                                                                                   r          r
                                                     PPM p     PPM p k- PPM p k- PPM p k- PPM p k-                                 g*count/st g*count*si
from        to          PPM g 2 PPM g 4 PPM g 6 k0_0:ds 2 0_5:ds 2 0_5:ds 4 0_5:ds 6 1_0:ds 2 r sign        rg         r g*count dev          gn ifsame
                                 3        4        5         6         7        8        9       10      11         12          13         14          15
stdev                    1.93E-06 2.72E-06 3.21E-06 3.78E-06 4.02E-06 5.28E-06 6.23E-06 4.3E-06 0.000894 7.06E-06 0.000736 0.121116 0.000141
     182.625 gbp_usd         1824
    1/1/1997 7/2/1997       0.007    0.005    0.002    (0.058)   (0.059)  (0.084)  (0.097)  (0.060) (0.017)    (0.028)      0.006      0.007       0.003
    7/2/1997 1/1/1998       0.096    0.096    0.094     0.030     0.025   (0.001)  (0.010)   0.020  (0.022)     0.029       0.056      0.046       0.022
    1/1/1998 7/2/1998       0.106    0.107    0.109     0.092     0.089    0.078    0.070    0.085   0.067      0.102       0.016      0.011       0.002
    7/2/1998 1/1/1999       0.048    0.057    0.063     0.086     0.096    0.119    0.132    0.103   0.076      0.120       0.092      0.090       0.122
    1/1/1999 7/3/1999      (0.028)  (0.034)  (0.037)    0.017     0.022    0.030    0.033    0.026  (0.056)    (0.043)     (0.079)    (0.078)    (0.086)
    7/3/1999 1/1/2000       0.195    0.198    0.197     0.171     0.162    0.136    0.120    0.153   0.089      0.159       0.183      0.174       0.120
    1/1/2000 7/2/2000       0.021    0.029    0.035     0.056     0.071    0.091    0.103    0.084   0.021      0.038       0.007      0.009       0.031
    7/2/2000 1/1/2001      (0.062)  (0.058)  (0.054)   (0.043)   (0.031)  (0.016)  (0.004)  (0.020) (0.057)    (0.065)     (0.139)    (0.142)    (0.156)
    1/1/2001 7/2/2001      (0.077)  (0.068)  (0.060)   (0.021)   (0.016)  (0.008)  (0.005)  (0.012)  0.099     (0.009)     (0.048)    (0.056)      0.023
    7/2/2001 1/1/2002      (0.023)  (0.015)  (0.010)    0.019     0.021    0.034    0.040    0.023  (0.038)    (0.031)     (0.022)    (0.020)    (0.045)

    1/1/1997 1/1/1999    0.068    0.070         0.071     0.044    0.044    0.035    0.030    0.044    0.029    0.061    0.048              0.044    0.040
    1/1/1997 1/1/2002    0.031    0.035         0.037     0.040    0.043    0.043    0.044    0.046    0.019    0.031    0.010              0.007    0.005
sk ew                    0.578    0.565         0.574    0.521    0.620    0.725    0.823    0.716     0.580   0.575     0.441             0.503    0.432
stdev                 1.99E-06 2.93E-06        3.6E-06 5.53E-06 5.77E-06 7.43E-06 8.67E-06 6.05E-06 0.000968 8.74E-06 0.000964            0.11583 9.39E-05
     182.625 usd_chf       3648
    1/1/1997 7/2/1997   (0.067)  (0.061)       (0.061)   (0.104)   (0.105)   (0.127)   (0.138)   (0.105)   (0.019)   (0.015)    (0.003)   (0.012)   (0.047)
    7/2/1997 1/1/1998   (0.030)  (0.029)       (0.029)   (0.037)   (0.034)   (0.034)   (0.033)   (0.031)   (0.078)   (0.029)    (0.037)   (0.043)   (0.054)
    1/1/1998 7/2/1998    0.005   (0.003)       (0.009)    0.028     0.033     0.034     0.035     0.037    (0.097)   (0.109)    (0.077)   (0.082)   (0.050)
    7/2/1998 1/1/1999    0.061    0.047         0.038     0.097     0.094     0.096     0.094     0.089    (0.002)    0.171      0.146     0.146     0.067
    1/1/1999 7/3/1999   (0.189)  (0.165)       (0.148)   (0.104)   (0.089)   (0.054)   (0.037)   (0.075)    0.035    (0.104)    (0.180)   (0.175)   (0.050)
    7/3/1999 1/1/2000    0.055    0.061         0.063     0.059     0.057     0.054     0.050     0.055     0.021     0.060      0.126     0.122     0.078
    1/1/2000 7/2/2000   (0.037)  (0.028)       (0.022)    0.059     0.066     0.079     0.082     0.072    (0.068)   (0.055)    (0.063)   (0.062)   (0.015)
    7/2/2000 1/1/2001   (0.100)  (0.102)       (0.103)   (0.104)   (0.103)   (0.090)   (0.083)   (0.102)   (0.136)   (0.096)    (0.130)   (0.138)   (0.112)
    1/1/2001 7/2/2001   (0.037)  (0.032)       (0.025)   (0.076)   (0.066)   (0.042)   (0.027)   (0.056)    0.159    (0.003)    (0.038)   (0.033)    0.035
    7/2/2001 1/1/2002   (0.040)  (0.033)       (0.030)   (0.045)   (0.046)   (0.049)   (0.050)   (0.046)   (0.074)    0.032     (0.022)   (0.023)    0.015

    1/1/1997 1/1/1998    (0.048)     (0.045)   (0.044)   (0.070)   (0.068)   (0.080)   (0.085)   (0.067)   (0.048)    (0.022)   (0.020)   (0.027)   (0.050)
    1/1/1997 1/1/2002    (0.038)     (0.035)   (0.033)   (0.023)   (0.019)   (0.013)   (0.011)   (0.016)   (0.027)    (0.016)   (0.028)   (0.030)   (0.014)
sk ew                     0.301       0.194     0.162     0.430     0.376     0.358     0.301     0.314     0.395    (0.210)     0.877     1.063     1.313
Both currency (equally weight)
    1/1/1997 1/1/2002    (0.004)     0.000     0.003     0.010     0.014     0.018     0.020     0.018     (0.005)    0.009     (0.011)   (0.014)   (0.005)
sk ew                     0.609      0.488     0.429     0.487     0.500     0.598     0.652     0.506      0.033    (0.124)     1.028     1.239     0.619
monthly sharpe ratio     (0.367)     0.018     0.248     0.909     1.270     1.601     1.765     1.584     (0.442)    0.824     (0.964)   (1.247)   (0.474)




Author : James Krieger                          26.05.2011                               Page 33 sur 36
                                     “Vector Quantization and entropy technique to predict high frequency data”



           Daily data, 7 days return, VQ of size 4. Daily sharpe ratio (return/stdev on overall period)
                                                                                                                                   r          r
                                                     PPM p     PPM p k- PPM p k- PPM p k- PPM p k-                                 g*count/st g*count*si
from        to          PPM g 2 PPM g 4 PPM g 6 k0_0:ds 2 0_5:ds 2 0_5:ds 4 0_5:ds 6 1_0:ds 2 r sign        rg         r g*count dev          gn ifsame
                                 3        4        5         6         7        8        9       10      11         12          13         14          15
stdev                    6.03E-06 9.14E-06 1.13E-05 1.33E-05 1.38E-05 1.83E-05 2.15E-05 1.44E-05 0.002292 3.17E-05 0.002644 0.217239 0.000627
     182.625 gbp_usd         1820
    1/1/1997 7/2/1997      (0.063)  (0.069)  (0.072)   (0.128)   (0.130)  (0.152)  (0.161)  (0.132) (0.104)    (0.092)     (0.090)    (0.081)    (0.192)
    7/2/1997 1/1/1998       0.095    0.108    0.107     0.222     0.216    0.174    0.148    0.210  (0.139)    (0.075)      0.027      0.033       0.008
    1/1/1998 7/2/1998       0.108    0.138    0.149     0.183     0.179    0.163    0.147    0.175   0.018      0.195       0.173      0.138       0.095
    7/2/1998 1/1/1999       0.062    0.082    0.096     0.137     0.140    0.159    0.168    0.142   0.079      0.168       0.184      0.161       0.175
    1/1/1999 7/3/1999      (0.152)  (0.128)  (0.110)   (0.062)   (0.049)  (0.019)  (0.006)  (0.036) (0.122)    (0.049)     (0.142)    (0.149)    (0.171)
    7/3/1999 1/1/2000      (0.069)  (0.066)  (0.064)    0.016     0.013    0.035    0.047    0.011   0.150      0.111       0.036      0.036       0.047
    1/1/2000 7/2/2000       0.052    0.085    0.104     0.128     0.144    0.135    0.129    0.157  (0.106)     0.036       0.046      0.042       0.000
    7/2/2000 1/1/2001      (0.115)  (0.105)  (0.103)   (0.107)   (0.098)  (0.108)  (0.114)  (0.090) (0.179)    (0.114)     (0.226)    (0.242)    (0.152)
    1/1/2001 7/2/2001      (0.112)  (0.107)  (0.094)    0.002     0.015    0.038    0.048    0.027  (0.088)    (0.010)     (0.043)    (0.069)    (0.092)
    7/2/2001 1/1/2002      (0.024)  (0.012)  (0.008)   (0.009)   (0.023)  (0.022)  (0.023)  (0.037)  0.083      0.074       0.078      0.068       0.094

    1/1/1997 1/1/1999       0.050    0.064    0.069    0.097    0.094    0.079    0.069    0.091    (0.035)   0.047    0.071    0.061                          0.019
    1/1/1997 1/1/2002      (0.021)  (0.007)   0.001    0.036    0.039    0.038    0.036    0.040    (0.040)   0.024    0.004   (0.006)                        (0.019)
sk ew                       0.491    0.581    0.603    0.695    0.767    0.496    0.347    0.824   (0.796)  (0.424)  (0.610)  (0.663)                        (0.318)
stdev                    6.54E-06 9.85E-06 1.24E-05 1.89E-05 1.95E-05 2.66E-05 3.18E-05 2.03E-05 0.002123 3.32E-05 0.003388 0.217726                        0.00062
     182.625 usd_chf          3640
    1/1/1997 7/2/1997      (0.380)  (0.305)  (0.262)  (0.258)  (0.251)  (0.223)  (0.209)  (0.244)   (0.145)  (0.227)  (0.110)  (0.114)                       (0.059)
    7/2/1997 1/1/1998      (0.044)  (0.023)  (0.011)   0.067    0.076    0.092    0.103    0.084    (0.070)   0.000   (0.007)  (0.002)                       (0.011)
    1/1/1998 7/2/1998      (0.053)  (0.036)  (0.030)  (0.026)  (0.016)  (0.006)  (0.006)  (0.007)    0.023   (0.006)  (0.058)  (0.053)                       (0.005)
    7/2/1998 1/1/1999       0.040    0.030    0.033    0.155    0.143    0.160    0.175    0.131     0.118   (0.089)  (0.022)  (0.020)                        0.005
    1/1/1999 7/3/1999      (0.152)  (0.084)  (0.049)  (0.126)  (0.116)  (0.073)  (0.054)  (0.107)   (0.049)  (0.078)  (0.082)  (0.091)                       (0.081)
    7/3/1999 1/1/2000      (0.030)  (0.010)   0.001    0.097    0.101    0.116    0.121    0.103     0.128    0.043    0.043    0.050                         0.059
    1/1/2000 7/2/2000       0.085    0.133    0.162    0.236    0.247    0.302    0.318    0.256     0.282    0.368    0.348    0.362                         0.297
    7/2/2000 1/1/2001      (0.068)  (0.047)  (0.038)  (0.038)  (0.036)  (0.029)  (0.027)  (0.034)   (0.029)   0.054    0.038    0.035                        (0.010)
    1/1/2001 7/2/2001       0.062    0.032    0.018   (0.056)  (0.050)  (0.043)  (0.041)  (0.044)    0.018   (0.046)   0.038    0.034                         0.023
    7/2/2001 1/1/2002      (0.210)  (0.191)  (0.183)  (0.147)  (0.161)  (0.168)  (0.170)  (0.173)   (0.059)  (0.075)  (0.146)  (0.153)                       (0.078)

    1/1/1997 1/1/1998     (0.213)    (0.165)    (0.138)    (0.096)    (0.088)    (0.066)   (0.054)    (0.081)   (0.108)     (0.114)    (0.060)    (0.059)    (0.036)
    1/1/1997 1/1/2002     (0.074)    (0.050)    (0.036)    (0.010)    (0.007)     0.013     0.021     (0.004)    0.022      (0.006)     0.004      0.005      0.015
sk ew                    (0.184)    (0.140)    (0.130)    (0.070)    (0.109)    (0.040)     0.039    (0.139)     0.281     (0.019)    (0.030)    (0.059)    (0.399)
Both currency (equally weight)
    1/1/1997 1/1/2002     (0.060)    (0.036)   (0.022)    0.016      0.019       0.030      0.033    0.021       (0.011)    0.012      0.005      (0.001)    (0.003)
sk ew                      0.093      0.056     0.039     0.061      0.050      (0.010)    (0.010)   0.042      (0.246)    (0.121)    (0.561)    (0.617)    (0.517)
monthly sharpe ratio      (5.344)    (3.204)   (1.976)    1.388      1.666       2.647      2.967    1.913       (1.023)    1.057      0.473      (0.088)    (0.273)




Author : James Krieger                         26.05.2011                                     Page 34 sur 36
                                       “Vector Quantization and entropy technique to predict high frequency data”



         5 minutes data, 20 hours return, VQ of size 8. 20 hours sharpe ratio (return/stdev on overall period)
                                                                                                                                                r          r
                                                                 PPM p     PPM p k- PPM p k- PPM p k- PPM p k-                                  g*count/st g*count*si
from           to          PPM g 12 PPM g 2 PPM g 4 PPM g 8 k0_0:ds 2 0_5:ds 12 0_5:ds 4 0_5:ds 8 1_0:ds 2 r sign        rg         r g*count dev          gn ifsame
                                    3        4        5        6         7         8        9       10       11       12         13          14         15          16
stdev                       2.45E-06 2.05E-06 2.13E-06 2.29E-06 2.66E-06 4.63E-06 3.19E-06      4E-06 2.66E-06 0.000733 4.34E-06 0.026474 4.084671 0.004931
       30.5    eur_usd            129
   8/1/2001     9/1/2001      (0.398)  (0.353)  (0.381)  (0.398)   (0.469)   (0.296)  (0.419)  (0.342)  (0.465)  (0.231)    (0.260)     (0.262)    (0.258)    (0.322)
   9/1/2001    10/1/2001      (0.218)  (0.184)  (0.202)  (0.217)   (0.190)   (0.041)  (0.151)  (0.084)  (0.184)  (0.166)    (0.130)     (0.107)    (0.105)    (0.123)
  10/1/2001    11/1/2001       0.122    0.069    0.093    0.115     0.021    (0.050)  (0.008)  (0.038)   0.019    0.188      0.040       0.219      0.235       0.327
  11/1/2001    12/1/2001       0.024    0.153    0.113    0.058    (0.112)   (0.004)  (0.083)  (0.033)  (0.110)   0.235     (0.145)     (0.021)    (0.012)      0.019
  12/1/2001     1/1/2002      (0.048)  (0.052)  (0.052)  (0.050)    0.009    (0.126)  (0.031)  (0.089)   0.007    0.046      0.074       0.023      0.014     (0.018)
   1/1/2002    1/31/2002       0.247    0.209    0.233    0.249     0.083    (0.102)  (0.005)  (0.076)   0.077    0.279      0.227       0.382      0.362       0.313

    8/1/2001   1/31/2002      (0.046)   (0.016)  (0.026)  (0.038)  (0.117)  (0.095)  (0.119)  (0.107)  (0.117)  0.054    (0.055)   0.029   0.031                              0.030
sk ew                         (0.741)  (0.170)  (0.374)  (0.603)  (1.179)  (0.557)  (1.280)  (0.886)  (1.179)   0.191   (0.788)  (1.149)  (1.152)                            (1.841)
stdev                       2.15E-06 1.65E-06 1.77E-06 1.97E-06 2.51E-06 4.78E-06 3.13E-06 4.07E-06 2.51E-06 0.000969 4.87E-06 0.021877 3.414314                            0.00475
        30.5   usd_chf            258
    8/1/2001    9/1/2001      (0.385)   (0.307)  (0.345)  (0.375)  (0.455)  (0.320)  (0.420)  (0.358)  (0.453) (0.180)   (0.187)  (0.273) (0.278)                            (0.263)
    9/1/2001   10/1/2001      (0.184)   (0.177)  (0.184)  (0.188)  (0.235)  (0.068)  (0.191)  (0.115)  (0.231) (0.242)    0.033   (0.151) (0.150)                            (0.170)
   10/1/2001   11/1/2001       0.134     0.053    0.089    0.122    0.121    0.037    0.092    0.056    0.121   0.116    (0.022)   0.279   0.306                              0.299
   11/1/2001   12/1/2001       0.154     0.192    0.177    0.161   (0.127)   0.073   (0.057)   0.029   (0.123)  0.271     0.085    0.048   0.035                              0.082
   12/1/2001    1/1/2002      (0.116)   (0.007)  (0.048)  (0.093)  (0.032)  (0.250)  (0.115)  (0.203)  (0.038) (0.111)   (0.057)   0.003  (0.002)                            (0.077)
    1/1/2002   1/31/2002       0.051     0.097    0.087    0.068   (0.027)  (0.161)  (0.095)  (0.145)  (0.032)  0.104     0.003    0.157   0.137                              0.140

    8/1/2001 1/31/2002   (0.052)        (0.016)    (0.030)    (0.045)    (0.133)     (0.108)    (0.133)     (0.119)     (0.133)    (0.006)   (0.033)     0.006     0.004     0.003
sk ew                   (0.309)        (0.103)    (0.242)    (0.311)    (0.880)     (0.271)    (0.956)     (0.591)     (0.884)      0.848     0.858     (1.111)   (1.016)   (1.043)
Both currency (equally weight)
    8/1/2001 1/31/2002   (0.051)        (0.016)    (0.029)    (0.043)     (0.127)    (0.103)     (0.129)     (0.115)     (0.127)   0.024      (0.047)    0.018     0.018     0.017
sk ew                   (0.789)        (0.214)    (0.437)    (0.665)     (1.112)    (0.418)     (1.186)     (0.764)     (1.113)    0.557     (0.446)    (1.459)   (1.385)   (1.637)
monthly sharpe ratio     (4.556)        (1.469)    (2.559)    (3.834)   (11.398)     (9.256)   (11.535)    (10.278)    (11.394)    2.179      (4.193)    1.590     1.647     1.478




Author : James Krieger                            26.05.2011                                     Page 35 sur 36
                                       “Vector Quantization and entropy technique to predict high frequency data”



           5 minutes data, 3 hours return, VQ of size 8. 3 hours sharpe ratio (return/stdev on overall period)
                                                                                                                                              r          r
                                                                PPM p     PPM p k- PPM p k- PPM p k- PPM p k-                                 g*count/st g*count*si
from        to            PPM g 12 PPM g 2 PPM g 4 PPM g 8 k0_0:ds 2 0_5:ds 12 0_5:ds 4 0_5:ds 8 1_0:ds 2 r sign       rg         r g*count dev          gn ifsame
                                   3        4        5        6         7         8        9       10       11      12         13          14         15          16
stdev                      2.14E-07 1.46E-07 1.66E-07 1.95E-07 2.81E-07 5.36E-07 3.52E-07 4.55E-07 2.79E-07 0.000305 1.48E-06 0.002171 0.876599 0.000194
       15.5 eur_usd              535
   8/1/2001 8/17/2001        (0.193)  (0.202)  (0.195)  (0.191)   (0.137)   (0.046)  (0.103)  (0.069)  (0.112) (0.078)    (0.036)     (0.277)    (0.280)    (0.313)
  8/17/2001   9/1/2001        0.131    0.098    0.119    0.129     0.068    (0.024)   0.029   (0.008)   0.063   0.116      0.032       0.186      0.183       0.158
   9/1/2001 9/17/2001        (0.013)  (0.038)  (0.025)  (0.016)    0.050     0.108    0.083    0.101    0.084   0.330      0.348       0.073      0.074       0.078
  9/17/2001 10/2/2001         0.031    0.036    0.038    0.035    (0.017)   (0.021)  (0.009)  (0.014)  (0.019)  0.098      0.021       0.045      0.052       0.050
  10/2/2001 10/18/2001        0.077    0.075    0.076    0.077     0.088     0.099    0.105    0.105    0.098   0.046      0.018       0.046      0.044       0.036
 10/18/2001 11/2/2001        (0.061)  (0.051)  (0.056)  (0.060)   (0.059)   (0.012)  (0.039)  (0.021)  (0.044) (0.030)    (0.022)     (0.143)    (0.141)    (0.106)

    8/1/2001 11/2/2001       (0.014)    (0.022)  (0.017)  (0.014)  (0.006)   0.015    0.007    0.013    0.007    0.070              0.055    (0.018)  (0.018)  (0.024)
sk ew                         1.959      1.302    1.719    1.942    1.853   0.765    1.594    1.087    1.768   11.909             19.887    (0.291)  (0.361)  (0.709)
stdev                      2.03E-07    1.2E-07 1.46E-07 1.81E-07 2.67E-07 5.67E-07 3.46E-07 4.68E-07 2.67E-07 0.000291            1.4E-06 0.002037 0.831939 0.000184
        15.5 usd_chf            1070
    8/1/2001 8/17/2001       (0.108)    (0.141)   (0.120)   (0.109)   (0.090)   (0.028)   (0.064)   (0.042)   (0.070)   (0.040)   (0.030)   (0.223)    (0.231)     (0.231)
   8/17/2001   9/1/2001       0.119      0.099     0.115     0.119     0.062    (0.029)    0.023    (0.013)    0.054     0.107     0.028     0.144      0.144       0.124
    9/1/2001 9/17/2001        0.005     (0.065)   (0.028)   (0.004)    0.014     0.075     0.044     0.063     0.055     0.302     0.378     0.100      0.101       0.091
   9/17/2001 10/2/2001        0.113      0.090     0.114     0.118     0.048    (0.022)    0.023    (0.007)    0.047     0.152     0.048     0.124      0.124       0.042
   10/2/2001 10/18/2001       0.126      0.094     0.107     0.119     0.137     0.109     0.146     0.128     0.145     0.062     0.022     0.080      0.080       0.091
  10/18/2001 11/2/2001       (0.090)    (0.079)   (0.082)   (0.086)   (0.099)   (0.053)   (0.079)   (0.061)   (0.087)   (0.044)   (0.034)   (0.152)    (0.147)     (0.095)

    8/1/2001 11/2/2001     0.015        (0.012)    0.005    0.013     0.008      0.008    0.014      0.010    0.021      0.078     0.062     0.004      0.004      (0.007)
sk ew                     1.666          0.940     1.383    1.600     0.782     (1.509)   0.079     (1.051)   0.757      9.323    18.807    (0.148)    (0.103)    (1.379)
Both currency (equally weight)
    8/1/2001 11/2/2001     0.000        (0.017)   (0.006)   (0.000)   0.001      0.012     0.011     0.012     0.015     0.074     0.056     (0.007)    (0.007)     (0.016)
sk ew                     2.092          1.283     1.769     2.032    1.528     (0.317)    0.991     0.111     1.441    11.151    19.482    (0.273)    (0.330)     (0.927)
monthly sharpe ratio       0.082       (17.199)   (6.161)   (0.500)   1.206     11.588    10.556    11.611    14.575    74.016    55.502     (7.414)    (7.162)   (15.934)




Author : James Krieger                            26.05.2011                              Page 36 sur 36

								
To top