Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Selection of predictor variables Statement of problem • A common problem is that there is a large set of candidate predictor variables • Goal is to choose a small subset from the lar

VIEWS: 4 PAGES: 28

									Selection of predictor variables
        Statement of problem
• A common problem is that there is a large
  set of candidate predictor variables.
• Goal is to choose a small subset from the
  larger set so that the resulting regression
  model is simple, yet have good predictive
  ability.
       Example: Cement data
• Response y: heat evolved in calories during
  hardening of cement on a per gram basis
• Predictor x1: % of tricalcium aluminate
• Predictor x2: % of tricalcium silicate
• Predictor x3: % of tetracalcium alumino
  ferrite
• Predictor x4: % of dicalcium silicate
Example: Cement data

105.05
               y
 83.35

   16
                          x1
    6

 59.75
                                         x2
 37.25


 18.25
                                                       x3
  8.75

  46.5
                                                                      x4
  19.5


           . 35 .05   6        16     . 25 9.75      5
                                                  8.7 18. 2
                                                           5
                                                               19
                                                                 .5
                                                                       46
                                                                         .5
         83 105                     37     5
   Two basic methods of selecting
             predictors
• Stepwise regression: Enter and remove
  variables, in a stepwise manner, until no
  justifiable reason to enter or remove more.
• Best subsets regression: Select the subset
  of variables that do the best at meeting
  some well-defined objective criterion.
   Stepwise regression: the idea
• Start with no predictors in the model.
• At each step, enter or remove a variable
  based on partial F-tests.
• Stop when no more variables can be
  justifiably entered or removed.
   Stepwise regression: the steps
• Specify an Alpha-to-Enter (0.15) and an
  Alpha-to-Remove (0.15).
• Start with no predictors in the model.
• Put the predictor with the smallest P-value
  based on the partial F statistic (a t-statistic)
  in the model. If P-value > 0.15, then stop.
  None of the predictors have good predictive
  ability. Otherwise …
  Stepwise regression: the steps
• Add the predictor with the smallest P-value
  (below 0.15) based on the partial F-statistic
  (a t-statistic) in the model. If none of the
  predictors yield P-values < 0.15, stop.
• If P-value of any of the partial F statistics >
  0.15, then remove the violating predictor.
• Continue the above two steps, until no more
  predictors can be entered or removed.
Stepwise Regression: y versus x1, x2, x3, x4
  Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15
 Response is    y     on 4 predictors, with N =   13

    Step         1        2        3        4
Constant    117.57   103.10    71.65    52.58

x4          -0.738   -0.614   -0.237
T-Value      -4.77   -12.62    -1.37
P-Value      0.001    0.000    0.205

x1                     1.44     1.45     1.47
T-Value               10.40    12.41    12.10
P-Value               0.000    0.000    0.000

x2                             0.416    0.662
T-Value                         2.24    14.44
P-Value                        0.052    0.000

S             8.96     2.73     2.31     2.41
R-Sq         67.45    97.25    98.23    97.87
R-Sq(adj)    64.50    96.70    97.64    97.44
C-p          138.7      5.5      3.0      2.7
  Drawbacks of stepwise regression

• The final model is not guaranteed to be
  optimal in any specified sense.
• The procedure yields a single final model,
  although in practice there are often several
  almost equally good models.
       Best subsets regression
• If there are P-1 possible predictors, then
  there are 2P-1 possible regression models
  containing the predictors.
• For example, 10 predictors yields 210 = 1024
  possible regression models.
• A best subsets algorithm determines the
  best subsets of each size, so that choice of
  the final model can be made by researcher.
    What is used to judge “best”?
•   R-square
•   Adjusted R-square
•   MSE (or S = square root of MSE)
•   Mallow’s Cp
                 R-square
              SSR       SSE
         R 
           2
                   1
             SSTO      SSTO

Use the R-square values to find the point where
adding more predictors is not worthwhile because
it leads to a very small increase in R-square.
    Adjusted R-square or MSE
          n  1  SSE          n 1 
          n  p  SSTO   1   SSTO  MSE
  R  1 
    2
    a            
                                   

Adjusted R-square increases only if MSE decreases,
so adjusted R-square and MSE provide equivalent
information.
Find a few subsets for which MSE is smallest (or
adjusted R-square is largest) or so close to the
smallest (largest) that adding more predictors is not
worthwhile.
           Mallow’s Cp criterion
                                               SSE p
Mallow’s Cp statistic:    Cp                                  n  2 p 
                                 MSE ( X1 ,...,X P 1 )


is an estimator of total standardized mean square
error of prediction:

                                         E Y                 
                                         n                         2

                                                        E Yi 
                                 1            ˆ
                         p 
                                   2             ip
                                        i 1

which equals:
                     1 n
                            
                             ˆ  E Y   Var Y                      
                                            n
                p  2  E Yip                 ˆ 
                                       2

                      i 1
                                     i           ip
                                          i 1      
         Plots of Cp against p
• Models with little bias will tend to fall near
  the line Cp = p.
• Models with substantial bias will tend to fall
  considerably above the line Cp = p.
• Cp values below the line Cp = p are
  interpreted as showing no bias (being below
  the line due to sampling error).
        Using the Cp criterion
• Subsets with small Cp values have a small
  total (standardized) mean square error of
  prediction.
• When the Cp value is also near p, the bias of
  the regression model is small.
• So, identify subsets of predictors for which:
  – the Cp value is small, and
  – the Cp value is near p (if possible)
Best Subsets Regression: y versus x1, x2, x3, x4

Response is y

                                                   x x x x
Vars   R-Sq     R-Sq(adj)       C-p         S      1 2 3 4

   1   67.5         64.5      138.7    8.9639           X
   1   66.6         63.6      142.5    9.0771        X
   2   97.9         97.4        2.7    2.4063      X X
   2   97.2         96.7        5.5    2.7343      X     X
   3   98.2         97.6        3.0    2.3087      X X   X
   3   98.2         97.6        3.0    2.3121      X X X
   4   98.2         97.4        5.0    2.4460      X X X X
Example: Modeling PIQ

   130.5
              PIQ
    91.5

 100.728
                             MRI
  86.283

   73.25
                                         Height
   65.75

   170.5
                                                      Weight
   127.5

             .5
           91 130
                 .5        83    28     .75 3.25      7.5 70.5
                      8 6.2 00.7      65    7      12    1
                             1
Stepwise Regression: PIQ versus MRI, Height, Weight
  Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15
 Response is   PIQ    on 3 predictors, with N =    38

    Step          1         2
Constant      4.652   111.276

MRI           1.18      2.06
T-Value       2.45      3.77
P-Value      0.019     0.001

Height                 -2.73
T-Value                -2.75
P-Value                0.009

S              21.2     19.5
R-Sq          14.27    29.49
R-Sq(adj)     11.89    25.46
C-p             7.3      2.0
Best Subsets Regression: PIQ versus MRI, Height, Weight
Response is PIQ

                                                  H   W
                                                  e   e
                                                  i   i
                                                M g   g
                                                R h   h
Vars   R-Sq    R-Sq(adj)        C-p         S   I t   t

   1   14.3         11.9        7.3    21.212   X
   1    0.9          0.0       13.8    22.810     X
   2   29.5         25.5        2.0    19.510   X X
   2   19.3         14.6        6.9    20.878   X   X
   3   29.5         23.3        4.0    19.794   X X X
The regression equation is
PIQ = 111 + 2.06 MRI - 2.73 Height

Predictor        Coef        SE Coef           T       P
Constant       111.28          55.87        1.99   0.054
MRI            2.0606         0.5466        3.77   0.001
Height        -2.7299         0.9932       -2.75   0.009

S = 19.51         R-Sq = 29.5%       R-Sq(adj) = 25.5%

Analysis of Variance

Source       DF       SS           MS        F        P
Regression    2     5572.7       2786.4     7.32    0.002
Error        35    13321.8        380.6
Total        37    18894.6

Source       DF        Seq SS
MRI           1        2697.1
Height        1        2875.6
Example: Modeling BP

    120
                 BP
    110


   53.25
                             Age
   47.75

  97.325
                                          Weight
  89.375


   2.125
                                                          BSA
   1.875

   8.275
                                                                      Duration
   4.425

    72.5
                                                                                    Pulse
    65.5

   76.25
                                                                                                   Stress
   30.75


             0      0     . 75 3.25        5     5       75   25       25 .275     .5      .5     .75 6. 25
           11     12    47     5        .37 7. 32    1. 8 2. 1     4. 4          65     72      30
                                      89     9                            8                           7
Stepwise Regression: BP versus Age, Weight, BSA, Duration,
Pulse, Stress   Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15
 Response is    BP    on 6 predictors, with N =    20

    Step         1         2         3
Constant     2.205   -16.579   -13.667

Weight       1.201    1.033     0.906
T-Value      12.92    33.15     18.49
P-Value      0.000    0.000     0.000

Age                   0.708     0.702
T-Value               13.23     15.96
P-Value               0.000     0.000

BSA                               4.6
T-Value                          3.04
P-Value                         0.008

S             1.74    0.533     0.437
R-Sq         90.26    99.14     99.45
R-Sq(adj)    89.72    99.04     99.35
C-p          312.8     15.1       6.4
Best Subsets Regression: BP versus Age, Weight, ...
Response is BP
                                                      D
                                                      u
                                                  W   r       S
                                                  e   a   P   t
                                                  i   t   u   r
                                                A g B i   l   e
                                                g h S o   s   s
Vars   R-Sq    R-Sq(adj)        C-p         S   e t A n   e   s

   1   90.3         89.7     312.8     1.7405    X
   1   75.0         73.6     829.1     2.7903         X
   2   99.1         99.0      15.1    0.53269   X X
   2   92.0         91.0     256.6     1.6246     X           X
   3   99.5         99.4       6.4    0.43705   X X   X
   3   99.2         99.1      14.1    0.52012   X X       X
   4   99.5         99.4       6.4    0.42591   X X   X X
   4   99.5         99.4       7.1    0.43500   X X   X     X
   5   99.6         99.4       7.0    0.42142   X X   X   X X
   5   99.5         99.4       7.7    0.43078   X X   X X X
   6   99.6         99.4       7.0    0.40723   X X   X X X X
The regression equation is
BP = - 13.7 + 0.702 Age + 0.906 Weight + 4.63 BSA

Predictor        Coef     SE Coef          T        P
Constant      -13.667       2.647      -5.16    0.000
Age           0.70162     0.04396      15.96    0.000
Weight        0.90582     0.04899      18.49    0.000
BSA             4.627       1.521       3.04    0.008
S = 0.4370      R-Sq = 99.5%     R-Sq(adj) = 99.4%

Analysis of Variance

Source       DF      SS        MS        F       P
Regression    3   556.94    185.65   971.93   0.000
Error        16     3.06      0.19
Total        19   560.00

Source       DF        Seq SS
Age           1        243.27
Weight        1        311.91
BSA           1          1.77
  Stepwise regression in Minitab
• Stat >> Regression >> Stepwise …
• Specify response and all possible predictors.
• If desired, specify predictors that must be
  included in every model.
• Select OK. Results appear in session
  window.
       Best subsets regression
• Stat >> Regression >> Best subsets …
• Specify response and all possible predictors.
• If desired, specify predictors that must be
  included in every model.
• Select OK. Results appear in session
  window.

								
To top