VIEWS: 4 PAGES: 28 POSTED ON: 4/29/2012
Selection of predictor variables Statement of problem • A common problem is that there is a large set of candidate predictor variables. • Goal is to choose a small subset from the larger set so that the resulting regression model is simple, yet have good predictive ability. Example: Cement data • Response y: heat evolved in calories during hardening of cement on a per gram basis • Predictor x1: % of tricalcium aluminate • Predictor x2: % of tricalcium silicate • Predictor x3: % of tetracalcium alumino ferrite • Predictor x4: % of dicalcium silicate Example: Cement data 105.05 y 83.35 16 x1 6 59.75 x2 37.25 18.25 x3 8.75 46.5 x4 19.5 . 35 .05 6 16 . 25 9.75 5 8.7 18. 2 5 19 .5 46 .5 83 105 37 5 Two basic methods of selecting predictors • Stepwise regression: Enter and remove variables, in a stepwise manner, until no justifiable reason to enter or remove more. • Best subsets regression: Select the subset of variables that do the best at meeting some well-defined objective criterion. Stepwise regression: the idea • Start with no predictors in the model. • At each step, enter or remove a variable based on partial F-tests. • Stop when no more variables can be justifiably entered or removed. Stepwise regression: the steps • Specify an Alpha-to-Enter (0.15) and an Alpha-to-Remove (0.15). • Start with no predictors in the model. • Put the predictor with the smallest P-value based on the partial F statistic (a t-statistic) in the model. If P-value > 0.15, then stop. None of the predictors have good predictive ability. Otherwise … Stepwise regression: the steps • Add the predictor with the smallest P-value (below 0.15) based on the partial F-statistic (a t-statistic) in the model. If none of the predictors yield P-values < 0.15, stop. • If P-value of any of the partial F statistics > 0.15, then remove the violating predictor. • Continue the above two steps, until no more predictors can be entered or removed. Stepwise Regression: y versus x1, x2, x3, x4 Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is y on 4 predictors, with N = 13 Step 1 2 3 4 Constant 117.57 103.10 71.65 52.58 x4 -0.738 -0.614 -0.237 T-Value -4.77 -12.62 -1.37 P-Value 0.001 0.000 0.205 x1 1.44 1.45 1.47 T-Value 10.40 12.41 12.10 P-Value 0.000 0.000 0.000 x2 0.416 0.662 T-Value 2.24 14.44 P-Value 0.052 0.000 S 8.96 2.73 2.31 2.41 R-Sq 67.45 97.25 98.23 97.87 R-Sq(adj) 64.50 96.70 97.64 97.44 C-p 138.7 5.5 3.0 2.7 Drawbacks of stepwise regression • The final model is not guaranteed to be optimal in any specified sense. • The procedure yields a single final model, although in practice there are often several almost equally good models. Best subsets regression • If there are P-1 possible predictors, then there are 2P-1 possible regression models containing the predictors. • For example, 10 predictors yields 210 = 1024 possible regression models. • A best subsets algorithm determines the best subsets of each size, so that choice of the final model can be made by researcher. What is used to judge “best”? • R-square • Adjusted R-square • MSE (or S = square root of MSE) • Mallow’s Cp R-square SSR SSE R 2 1 SSTO SSTO Use the R-square values to find the point where adding more predictors is not worthwhile because it leads to a very small increase in R-square. Adjusted R-square or MSE n 1 SSE n 1 n p SSTO 1 SSTO MSE R 1 2 a Adjusted R-square increases only if MSE decreases, so adjusted R-square and MSE provide equivalent information. Find a few subsets for which MSE is smallest (or adjusted R-square is largest) or so close to the smallest (largest) that adding more predictors is not worthwhile. Mallow’s Cp criterion SSE p Mallow’s Cp statistic: Cp n 2 p MSE ( X1 ,...,X P 1 ) is an estimator of total standardized mean square error of prediction: E Y n 2 E Yi 1 ˆ p 2 ip i 1 which equals: 1 n ˆ E Y Var Y n p 2 E Yip ˆ 2 i 1 i ip i 1 Plots of Cp against p • Models with little bias will tend to fall near the line Cp = p. • Models with substantial bias will tend to fall considerably above the line Cp = p. • Cp values below the line Cp = p are interpreted as showing no bias (being below the line due to sampling error). Using the Cp criterion • Subsets with small Cp values have a small total (standardized) mean square error of prediction. • When the Cp value is also near p, the bias of the regression model is small. • So, identify subsets of predictors for which: – the Cp value is small, and – the Cp value is near p (if possible) Best Subsets Regression: y versus x1, x2, x3, x4 Response is y x x x x Vars R-Sq R-Sq(adj) C-p S 1 2 3 4 1 67.5 64.5 138.7 8.9639 X 1 66.6 63.6 142.5 9.0771 X 2 97.9 97.4 2.7 2.4063 X X 2 97.2 96.7 5.5 2.7343 X X 3 98.2 97.6 3.0 2.3087 X X X 3 98.2 97.6 3.0 2.3121 X X X 4 98.2 97.4 5.0 2.4460 X X X X Example: Modeling PIQ 130.5 PIQ 91.5 100.728 MRI 86.283 73.25 Height 65.75 170.5 Weight 127.5 .5 91 130 .5 83 28 .75 3.25 7.5 70.5 8 6.2 00.7 65 7 12 1 1 Stepwise Regression: PIQ versus MRI, Height, Weight Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is PIQ on 3 predictors, with N = 38 Step 1 2 Constant 4.652 111.276 MRI 1.18 2.06 T-Value 2.45 3.77 P-Value 0.019 0.001 Height -2.73 T-Value -2.75 P-Value 0.009 S 21.2 19.5 R-Sq 14.27 29.49 R-Sq(adj) 11.89 25.46 C-p 7.3 2.0 Best Subsets Regression: PIQ versus MRI, Height, Weight Response is PIQ H W e e i i M g g R h h Vars R-Sq R-Sq(adj) C-p S I t t 1 14.3 11.9 7.3 21.212 X 1 0.9 0.0 13.8 22.810 X 2 29.5 25.5 2.0 19.510 X X 2 19.3 14.6 6.9 20.878 X X 3 29.5 23.3 4.0 19.794 X X X The regression equation is PIQ = 111 + 2.06 MRI - 2.73 Height Predictor Coef SE Coef T P Constant 111.28 55.87 1.99 0.054 MRI 2.0606 0.5466 3.77 0.001 Height -2.7299 0.9932 -2.75 0.009 S = 19.51 R-Sq = 29.5% R-Sq(adj) = 25.5% Analysis of Variance Source DF SS MS F P Regression 2 5572.7 2786.4 7.32 0.002 Error 35 13321.8 380.6 Total 37 18894.6 Source DF Seq SS MRI 1 2697.1 Height 1 2875.6 Example: Modeling BP 120 BP 110 53.25 Age 47.75 97.325 Weight 89.375 2.125 BSA 1.875 8.275 Duration 4.425 72.5 Pulse 65.5 76.25 Stress 30.75 0 0 . 75 3.25 5 5 75 25 25 .275 .5 .5 .75 6. 25 11 12 47 5 .37 7. 32 1. 8 2. 1 4. 4 65 72 30 89 9 8 7 Stepwise Regression: BP versus Age, Weight, BSA, Duration, Pulse, Stress Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is BP on 6 predictors, with N = 20 Step 1 2 3 Constant 2.205 -16.579 -13.667 Weight 1.201 1.033 0.906 T-Value 12.92 33.15 18.49 P-Value 0.000 0.000 0.000 Age 0.708 0.702 T-Value 13.23 15.96 P-Value 0.000 0.000 BSA 4.6 T-Value 3.04 P-Value 0.008 S 1.74 0.533 0.437 R-Sq 90.26 99.14 99.45 R-Sq(adj) 89.72 99.04 99.35 C-p 312.8 15.1 6.4 Best Subsets Regression: BP versus Age, Weight, ... Response is BP D u W r S e a P t i t u r A g B i l e g h S o s s Vars R-Sq R-Sq(adj) C-p S e t A n e s 1 90.3 89.7 312.8 1.7405 X 1 75.0 73.6 829.1 2.7903 X 2 99.1 99.0 15.1 0.53269 X X 2 92.0 91.0 256.6 1.6246 X X 3 99.5 99.4 6.4 0.43705 X X X 3 99.2 99.1 14.1 0.52012 X X X 4 99.5 99.4 6.4 0.42591 X X X X 4 99.5 99.4 7.1 0.43500 X X X X 5 99.6 99.4 7.0 0.42142 X X X X X 5 99.5 99.4 7.7 0.43078 X X X X X 6 99.6 99.4 7.0 0.40723 X X X X X X The regression equation is BP = - 13.7 + 0.702 Age + 0.906 Weight + 4.63 BSA Predictor Coef SE Coef T P Constant -13.667 2.647 -5.16 0.000 Age 0.70162 0.04396 15.96 0.000 Weight 0.90582 0.04899 18.49 0.000 BSA 4.627 1.521 3.04 0.008 S = 0.4370 R-Sq = 99.5% R-Sq(adj) = 99.4% Analysis of Variance Source DF SS MS F P Regression 3 556.94 185.65 971.93 0.000 Error 16 3.06 0.19 Total 19 560.00 Source DF Seq SS Age 1 243.27 Weight 1 311.91 BSA 1 1.77 Stepwise regression in Minitab • Stat >> Regression >> Stepwise … • Specify response and all possible predictors. • If desired, specify predictors that must be included in every model. • Select OK. Results appear in session window. Best subsets regression • Stat >> Regression >> Best subsets … • Specify response and all possible predictors. • If desired, specify predictors that must be included in every model. • Select OK. Results appear in session window.