Docstoc

Backward Elimination procedure

Document Sample
Backward Elimination procedure Powered By Docstoc
					Backward Elimination procedure

The scenario:

An investor wishes to track the NASDAQ 100 (QQQQ) index by purchasing up to 11
stocks which he has already pre-selected. He is only willing to tolerate volatility of 2.5%.
What is the minimum number of the 11 stocks that he must purchase in order to meet his
volatility requirement?

Solution:

Using backwards elimination, the error and R2 (adjusted) of different sets of stocks can
be compared. Collineararity matters, not in the sense that independent variables are
being explored, but in the sense that more variables will be kept in the model.

Results:

The procedure removed three variables (surprisingly GM was one of them). (Output
attached at end of this analysis.) A correlation matrix was run on the remaining variables
to check for high collinearity.
      MMM         ASD        EBAY             GS         IBM         MER         MEL        YHOO

MMM         1.00000     0.24255     0.10982        0.32939     0.30720     0.38582     0.34506     0.19755
MMM                      <.0001      0.0205         <.0001      <.0001      <.0001      <.0001      <.0001

ASD         0.24255     1.00000     0.15376        0.32312     0.20317     0.32785     0.34731     0.23693
ASD          <.0001                  0.0011         <.0001      <.0001      <.0001      <.0001      <.0001

EBAY        0.10982     0.15376     1.00000        0.27219     0.17819     0.24400     0.19815     0.42391
EBAY         0.0205      0.0011                     <.0001      0.0002      <.0001      <.0001      <.0001

GS          0.32939     0.32312     0.27219        1.00000     0.32283     0.70268     0.45924     0.37330
GS           <.0001      <.0001      <.0001                     <.0001      <.0001      <.0001      <.0001

IBM         0.30720     0.20317     0.17819        0.32283     1.00000     0.37625     0.36774     0.19470
IBM          <.0001      <.0001      0.0002         <.0001                  <.0001      <.0001      <.0001

MER         0.38582     0.32785     0.24400        0.70268     0.37625     1.00000     0.56383     0.31523
MER          <.0001      <.0001      <.0001         <.0001      <.0001                  <.0001      <.0001

MEL         0.34506     0.34731     0.19815        0.45924     0.36774     0.56383     1.00000     0.27715
MEL          <.0001      <.0001      <.0001         <.0001      <.0001      <.0001                  <.0001

YHOO        0.19755     0.23693     0.42391        0.37330     0.19470     0.31523     0.27715     1.00000
YHOO         <.0001      <.0001      <.0001         <.0001      <.0001      <.0001      <.0001
MER is highly correlated with both MEL and GS (highlighted above). All other
correlations are far lower. To address this, a second regression was run with MER
removed. In both models the C-statistic equals k+1 however R2 fell from .6809 to .6645.
Following the parameters of the question strictly, i.e. choose the minimum number
meeting 2.5% volatility, it seems that keeping MER is prudent.
                                     The SAS System                  15:25 Friday, May 26, 2006   1

                                  The REG Procedure
                                    Model: MODEL1
                            Dependent Variable: QQQQ QQQQ

                       Number of Observations Read             445
                       Number of Observations Used             445

                             Backward Elimination: Step 0


            All Variables Entered: R-Square = 0.6809 and C(p) = 12.0000


                                   Analysis of Variance

                                         Sum of             Mean
Source                       DF         Squares           Square      F Value       Pr > F

Model                        11         0.02327          0.00212         84.01      <.0001
Error                       433         0.01090       0.00002518
Corrected Total             444         0.03418


                       Parameter      Standard
         Variable       Estimate         Error    Type II SS   F Value     Pr > F

         Intercept   -0.00002310    0.00024104    2.31243E-7        0.01   0.9237
         MMM             0.08373       0.02508    0.00028074       11.15   0.0009
         ASD             0.04881       0.01608    0.00023202        9.21   0.0025
         EBAY            0.07495       0.01093       0.00118       47.02   <.0001
         GM              0.00535       0.00961    0.00000781        0.31   0.5780
         GS              0.08354       0.02857    0.00021541        8.55   0.0036
         IBM             0.20059       0.02677       0.00141       56.15   <.0001
         HAL             0.02016       0.01201    0.00007095        2.82   0.0940
         MER             0.11100       0.03332    0.00027947       11.10   0.0009
         MEL             0.09640       0.02677    0.00032671       12.97   0.0004
         WMI             0.05392       0.02518    0.00011545        4.58   0.0328
         YHOO            0.09883       0.01450       0.00117       46.48   <.0001
                              Bounds on condition number: 2.5188, 178.19
------------------------------------------------------------------------------------------------------

                                        Backward Elimination: Step 1


                        Variable GM Removed: R-Square = 0.6807 and C(p) = 10.3100


                                              Analysis of Variance

                                                    Sum of             Mean
           Source                       DF         Squares           Square      F Value       Pr > F

           Model                        10         0.02326         0.00233           92.52     <.0001
           Error                       434         0.01091      0.00002514
           Corrected Total             444         0.03418


                                  Parameter      Standard
                    Variable       Estimate         Error    Type II SS    F Value    Pr > F

                    Intercept   -0.00002725    0.00024073    3.220941E-7       0.01   0.9099
                    MMM             0.08380       0.02506     0.00028122      11.18   0.0009
                    ASD             0.04951       0.01602     0.00024010       9.55   0.0021
                    EBAY            0.07507       0.01092        0.00119      47.26   <.0001
                    GS              0.08246       0.02848     0.00021085       8.39   0.0040
                    IBM             0.20161       0.02668        0.00144      57.09   <.0001
                    HAL             0.02006       0.01200     0.00007023       2.79   0.0954
                    MER             0.11438       0.03274     0.00030687      12.20   0.0005
                    MEL             0.09657       0.02674     0.00032789      13.04   0.0003
                    WMI             0.05475       0.02512     0.00011943       4.75   0.0298
                    YHOO            0.09892       0.01448        0.00117      46.65   <.0001
                                               The SAS System                  15:25 Friday, May 26, 2006   2

                                            The REG Procedure
                                              Model: MODEL1
                                      Dependent Variable: QQQQ QQQQ

                                       Backward Elimination: Step 1

                              Bounds on condition number: 2.4354, 149.65
------------------------------------------------------------------------------------------------------

                                       Backward Elimination: Step 2


                        Variable HAL Removed: R-Square = 0.6786 and C(p) = 11.0986


                                             Analysis of Variance

                                                   Sum of             Mean
           Source                      DF         Squares           Square      F Value       Pr > F

           Model                        9         0.02319          0.00258       102.07       <.0001
           Error                      435         0.01098       0.00002525
           Corrected Total            444         0.03418


                                 Parameter      Standard
                    Variable      Estimate         Error    Type II SS    F Value    Pr > F

                    Intercept   0.00000999    0.00024019    4.371028E-8       0.00   0.9668
                    MMM            0.08454       0.02511     0.00028632      11.34   0.0008
                    ASD            0.05250       0.01595     0.00027337      10.83   0.0011
                    EBAY           0.07567       0.01094        0.00121      47.87   <.0001
                    GS             0.08588       0.02846     0.00022989       9.11   0.0027
                    IBM            0.20004       0.02672        0.00141      56.04   <.0001
                    MER            0.11849       0.03272     0.00033119      13.12   0.0003
                    MEL            0.09767       0.02679     0.00033558      13.29   0.0003
                    WMI            0.05537       0.02517     0.00012219       4.84   0.0283
                    YHOO           0.09995       0.01450        0.00120      47.52   <.0001
                              Bounds on condition number: 2.4217, 124.27
------------------------------------------------------------------------------------------------------

                                       Backward Elimination: Step 3


                        Variable WMI Removed: R-Square = 0.6751 and C(p) = 13.9506


                                             Analysis of Variance

                                                   Sum of             Mean
           Source                      DF         Squares           Square      F Value       Pr > F

           Model                        8         0.02307          0.00288       113.23       <.0001
           Error                      436         0.01111       0.00002547
           Corrected Total            444         0.03418


                                 Parameter      Standard
                    Variable      Estimate         Error    Type II SS    F Value    Pr > F

                    Intercept   0.00003384    0.00024100    5.020536E-7       0.02   0.8884
                    MMM            0.08798       0.02517     0.00031124      12.22   0.0005
                    ASD            0.05546       0.01597     0.00030727      12.06   0.0006
                    EBAY           0.07768       0.01095        0.00128      50.36   <.0001
                    GS             0.08991       0.02853     0.00025300       9.93   0.0017
                    IBM            0.20236       0.02682        0.00145      56.94   <.0001
                    MER            0.12354       0.03278     0.00036178      14.20   0.0002
                    MEL            0.10614       0.02663     0.00040466      15.89   <.0001
                    YHOO           0.09958       0.01456        0.00119      46.76   <.0001
                                             The SAS System              15:25 Friday, May 26, 2006    3

                                          The REG Procedure
                                            Model: MODEL1
                                    Dependent Variable: QQQQ QQQQ

                                     Backward Elimination: Step 3

                              Bounds on condition number: 2.4098, 100.13
------------------------------------------------------------------------------------------------------


                All variables left in the model are significant at the 0.0250 level.



                                   Summary of Backward Elimination

         Variable                  Number      Partial         Model
 Step    Removed      Label        Vars In     R-Square       R-Square    C(p)     F Value    Pr > F

   1     GM           GM              10        0.0002        0.6807     10.3100       0.31   0.5780
   2     HAL          HAL              9        0.0021        0.6786     11.0986       2.79   0.0954
   3     WMI          WMI              8        0.0036        0.6751     13.9506       4.84   0.0283