Statistical Arbitrage problem set 2

W
Document Sample
scope of work template
							Statistical Arbitrage problem set 2
Aloke Mukherjee

           1. Linear models for share volume.

           a) Dates with anomalous volume

                                                                                                                                        CSCO share volume 2002-2005

           3.00E+08




           2.50E+08




           2.00E+08
  volume




           1.50E+08




           1.00E+08




           5.00E+07




           0.00E+00
                      20020102
                                 20020213
                                            20020327
                                                       20020508
                                                                  20020619
                                                                             20020731
                                                                                        20020911
                                                                                                   20021022
                                                                                                              20021203
                                                                                                                         20030115
                                                                                                                                    20030227
                                                                                                                                               20030409
                                                                                                                                                          20030521
                                                                                                                                                                     20030702
                                                                                                                                                                                20030813
                                                                                                                                                                                           20030924
                                                                                                                                                                                                      20031104
                                                                                                                                                                                                                 20031216
                                                                                                                                                                                                                            20040129
                                                                                                                                                                                                                                       20040311
                                                                                                                                                                                                                                                  20040422
                                                                                                                                                                                                                                                             20040603
                                                                                                                                                                                                                                                                        20040716
                                                                                                                                                                                                                                                                                   20040826
                                                                                                                                                                                                                                                                                              20041007
                                                                                                                                                                                                                                                                                                         20041117
                                                                                                                                                                                                                                                                                                                    20041230
                                                                                                                                                                                                                                                                                                                               20050210
                                                                                                                                                                                                                                                                                                                                          20050324
                                                                                                                                                                                                                                                                                                                                                     20050505
                                                                                                                                                                                                                                                                                                                                                                20050616
                                                                                                                                                                                                                                                                                                                                                                           20050728
                                                                                                                                                                                                                                                                                                                                                                                      20050908
                                                                                                                                                                                                                                                                                                                                                                                                 20051019
                                                                                                                                                                                                                                                                                                                                                                                                            20051130
                                                                                                                                                                                                                 date


There are many peaks and troughs in this four year sample of volume data. The peaks are
much larger than the troughs. The largest volume day was October 8th, 2002 when Cisco
reached its lowest price ($8.60) over the sample period. More than 240 million shares
traded, an amount more than seven standard deviations away from the mean volume of
62 million in the period. The lowest volume day was December 23rd, 2003 when only 7
million shares traded, slightly more than two standard deviations from the mean.

What factors do high volume days have in common? Looking at high volume days in
2005 (see below) the best predictor for Cisco seems to be the release of an earnings
report. Comparing volume data with option expirations shows that expiries sometimes
coincide with high volume days but not in any consistent fashion. When the price makes
a big move it is often accompanied by large volume although there are many exceptions
and the relationship is not linear.

How about low volume days? Looking at the ten lowest volume days in the sample
reveals that most of them coincide with Christmas or American Thanksgiving. However,
looking at low volume days in 2005 (below) shows that there are many which cannot be
explained by holidays.
High volume days
(2005 days with volume greater than two standard deviations above the 2002-2005 mean)

Date       Possible Explanation
20050208   2q 2005 earnings
20050209   2q 2005 earnings
20050511   3q 2005 earnings
20050810   4q/FY2005 earnings
20050811   4q/FY2005 earnings
20051110   1q 2006 earnings
20051118   acquisition of Scientific-Atlanta announced, option expiry

Low volume days
(2005 days with volume greather than one standard deviation below the 2002-2005 mean)

Date                Possible Explanation
20050520
20050527            Friday before Memorial Day weekend
20050602
20050610
20050614
20050620
20050701            Friday before July 4th weekend
20050718
20050727
20050803-08         Days preceding earnings report?
20050902            Friday before Labour Day weekend
20051007            Friday before Columbus Day weekend
20051014
20051017
20051114
20051125            Day after Thanksgiving (also a Friday)
20051222-23,27,29   Christmas
      b) Fit the series using AR(1), AR(2) and AR(3) models, and higher if needed.

To choose an appropriate AR model we look at the magnitude of the coefficient, the
autocorrelation of the residuals and the significance associated with the coefficient. To
accept a coefficient it should be relatively large compared to higher lag coefficients and
have a t-statistic greater than two and the associated residuals should have the properties
of white noise including zero autocorrelation.

                                                                             AR(10)          AR(1,5,10)
 Coefficient   AR(1)       AR(2)       AR(3)       AR(4)        AR(5)        (t-stat)        (t-stat)
                                                                               1.30E+07        1.58E+07
 Intercept      2.60E+07    2.36E+07    2.10E+07    1.87E+07     1.68E+07           (5.25)         (7.01)
                                                                                  0.4798          0.5142
 Φ1               0.5831      0.5273      0.5165       0.5038        0.491        (15.20)         (19.18)
                                                                                  0.0231
 Φ2                           0.0947      0.0352       0.0308       0.0251          (0.63)
                                                                                    0.038
 Φ3                                       0.1122       0.0541       0.0502          (1.17)
                                                                                  0.0466
 Φ4                                                    0.1116       0.0567          (1.27)
                                                                                  0.0809          0.133
 Φ5                                                                 0.1082          (2.27)        (4.72)
                                                                                  0.0255
 Φ6                                                                                 (0.63)
                                                                                 -0.0137
 Φ7                                                                                (-0.36)
                                                                                  0.0074
 Φ8                                                                                 (0.21)
                                                                                  0.0335
 Φ9                                                                                 (0.91)
                                                                                  0.0701         0.0993
 Φ10                                                                                (2.21)        (3.66)

φ1 is much larger than the following terms. The reduction in the intercept and
coefficients indicate that each added term does have some predictive power. Looking at
the AR(1) residuals (below) shows that they display significant autocorrelation at five
and ten day lags.
                              correlogram for AR(1) residuals
      0.15




       0.1




      0.05




         0




      -0.05




       -0.1
              1   2      3      4       5        6       7      8    9      10



Fitting an AR(10) model also shows that the five and ten day lag coefficients are the
largest ones although they are still an order of magnitude less than the first lag. The
Excel computed t-stats for the AR(10) model also show that only the 1, 5 and 10 day lags
are significantly different from zero. Based on these observations we fit an AR(10)
model but force all but the 1, 5 and 10 day coefficients to zero. This makes intuitive
sense since this indicates that there are volume effects associated with the day of the
week. The residuals for this model fall within the confidence interval for zero
autocorrelation (see below).
                    correlogram of AR(10) residuals - 1,5,10
0.08


0.06


0.04


0.02


   0


-0.02


-0.04


-0.06


-0.08
        1   2   3         4        5       6        7          8   9   10
   c) What is your predictor? What is the distribution of the residual?

Based on the above the predictor chosen was:

V_k = 1.58e7 + (0.5142 * V_k-1) + (0.1330 * V_k-5) + (0.0993 * V_k-10)

The distribution of the residual is shown below.

                                 distribution of residuals
                         mean=-4.486772e-009 std=1.995927e+007
                     skewness=1.642015e+000 kurtosis=9.132899e+000
                                        Nobs=998
        70


        60


        50


        40


        30


        20


        10


         0
          -1          -0.5            0            0.5          1            1.5
                                                                             8
                                                                          x 10
   d) Try instead with log of volume.

Fitting AR models to the log series yields results similar to those described above. Here
the residuals are the difference between the log-volume and the predicted log-volume.
The residuals seem to be slightly better behaved but still display considerable kurtosis.

Predictor:
log(V_k) =
4.3404 + (0.5329 * log(V_k-1)) + (0.1327 * log(V_k-5)) + (0.0917 * log(V_k-10))

                                 distribution of residuals
                         mean=1.584126e-015 std=2.825476e-001
                     skewness=-6.802007e-002 kurtosis=5.359890e+000
                                        Nobs=998
        60


        50


        40


        30


        20


        10


         0
          -2       -1.5      -1       -0.5        0       0.5         1       1.5
   2. Two-variable MA(1) process

γxy(0)/σ2 =
COV(Xn,Xn) = (1 + β112 + β122)               COV(Xn,Yn) = (β11β21 + β12β22)
COV(Yn,Xn) = COV(Xn,Yn)                      COV(Yn,Yn) = (1 + β212 + β222)

γxy(1)/σ2 =
COV(Xn-1,Xn) = β11(1 + β12)                  COV(Xn-1,Yn) = β21
COV(Yn-1,Xn) = β12                           COV(Yn-1,Yn) = β21(1 + β22)

γxy(-1) = γxy(1)T since
COV(Xn+1,Xn) = COV(Xn,Xn-1)
COV(Xn+1,Yn) = COV(Xn,Yn-1)
COV(Yn+1,Xn) = COV(Yn,Xn-1)

   3. Technical Trading Rules

This strategy applied to Cisco (CSCO) from 2003/05/23 to 2005/12/30 (658 trading days,
~2.6 years) generates a profit of $4.31. The price on 2003/05/23 was $15.69 and on
2005/12/30 $17.12 yielding a profit for a buy-and-hold strategy of only $1.43 for the
same period.

                                VMA strategy for CSCO - 2003/05/23 - 2005/12/30
                                               s=1, l=200, b=0.01
          30


               short-term avg
          25



          20

                                long-term avg

          15
      $




                                     p&l
          10



          5


                                                          holdings
          0
                      100             200         300           400        500    600
                                                day of strategy


   a) Handling dividends and splits
Dividends and splits cause a change in share price without an underlying change in the
value of a position. This means that post-event prices cannot be compared directly with
pre-event prices. The solution is to adjust the pre-event prices so that calculations with
the adjusted price series reflect the true return of the position. The strategy can then be
run on the adjusted price series.

The reference for the following is Yahoo’s discussion of its “adjusted close” data
(http://help.yahoo.com/help/us/fin/quote/quote-12.html):

For splits the prices before the split should be multiplied by the reciprocal of the split
ratio – e.g. a 3:2 split would mean multiplying pre-split prices by 2/3.

For dividends the prices before the split should be reduced by the proportion of the
dividend to the last close before the dividend ex-date. Simply subtracting the absolute
value of the dividend from previous prices can result in negative historical prices.

(another useful page is this introduction to CRSP price data:
http://www.library.hbs.edu/helpsheets/wrdscrspstock.html)

     b) What value of bid-ask spread will cause your strategy to switch from profit to
        loss?

For this period any spread higher than roughly 180 basis points will result in a loss. This
can be seen from the chart above: twelve shares were either bought or sold assuming that
we buy/sell anything we hold at the end of the strategy. The average price through the
period was approximately $20. The profit over the period is 2% of the absolute value of
all the trades (roughly $240).

Code

vma.m
function [pnl, y, h, sig, avgs, avgl, sprd] = vma(p, s, l, b);

%   function [pnl, y, h, sig, avgs, avgl, sprd] = vma(p, s, l, b);
%
%   Compute profit-and-loss for the Variable Length Moving
%   Average rule and strategy as described in stat arb HW 2:
%   http://www.math.nyu.edu/~almgren/statarb/hw2.pdf
%
%   We assume we start at zero and the strategy is not started
%   until the (l + 1)st day from the beginning of the price series.
%   The strategy basically amounts to being long one share when the
%   short-term average is above the long-term average and being
%   short one share when it is below.
%
%   inputs:
%    p - price series (should be adjusted for dividends and splits)
%    s - short period length
%    l - long period length
%    b - band parameter (debouncing the buy and sell triggers)
%
%   output:
%    pnl - profit and loss
%    y - running cash
%    h - running holdings
% sig - signal value throughout the period
% avgs - short-term moving average
% avgl - long-term moving average
% sprd - spread in basis points which makes strategy breakeven
%
% 2006 aloke mukherjee

% rt will be positive when the short-term avg. goes above the
% long-term and negative when it goes below
avgs = mavg(p, s); % short-term avg
avgl = mavg(p, l); % long-term avg
rt = log(avgs(l+1:end)./avgl(l+1:end));

% create a row vector with signals
%   1 - buy
%   0 - hold
% -1 - sell
sig = zeros(size(rt));
sig(rt > b) = 1;
sig(rt < -b) = -1;

% encodes matrix from assignment describing change in holdings
% columns are current holdings (-1 0 1)
% rows are the current signal values (sell hold buy)
delta = [0 -1 -2;
         0 0 0;
     2 1 0];

lasth = 0; % current holding
lasty = 0; % current profit and loss
cost = 0; % absolute value of all trades made

% iterate through signals updating holdings and pnl
for i = 1:length(sig)
  d = delta(sig(i) + 2, lasth + 2);
  y(i) = lasty - p(i+l) * d; lasty = y(i);
  h(i) = lasth + d; lasth = h(i);
  cost = cost + abs(d) * p(i+l);
end;

% add in cost of selling final holdings
cost = cost + abs(h(end)) * p(end);

% remove the initial part of the moving averages so that all
% the output arrays are the same length
avgs = avgs(l+1:end);
avgl = avgl(l+1:end);

pnl = y + h.*p(l+1:end)';
sprd = 10000 * pnl(end)/cost;

						
Related docs