Dummy Variables in Regression
/****************************************
SAS Example -- Regression II
Dummy Variables
Polynomial Regression
Box-Cox Transformations
****************************************/
title;
options pageno=1;
libname labdata "c:\temp\labdata";
title "Descriptive Statistics for Each Age Group";
proc means data=labdata.werner;
class agegrp;
var age ht wt pill chol alb calc uric;
run;
Descriptive Statistics for Each Age Group
The MEANS Procedure
N
AGEGRP Obs Variable N Mean Std Dev Minimum Maximum
-----------------------------------------------------------------------------------------------
1 44 AGE 44 21.8181818 1.2440131 19.0000000 24.0000000
HT 44 64.0000000 2.8119347 57.0000000 71.0000000
WT 44 124.0454545 17.2504864 94.0000000 160.0000000
PILL 44 1.5000000 0.5057805 1.0000000 2.0000000
CHOL 43 218.4418605 33.8035658 155.0000000 290.0000000
ALB 44 4.1363636 0.4160115 3.2000000 5.0000000
CALC 43 10.0418605 0.4316262 9.2000000 10.8000000
URIC 44 4.6931818 1.0404377 2.8000000 8.3000000
2 46 AGE 46 28.0434783 2.0758561 25.0000000 31.0000000
HT 46 64.5000000 2.1265517 60.0000000 69.0000000
WT 45 128.9555556 18.2233062 99.0000000 180.0000000
PILL 46 1.5000000 0.5055250 1.0000000 2.0000000
CHOL 46 227.3043478 48.0716454 50.0000000 330.0000000
ALB 46 4.1456522 0.3998369 3.3000000 4.8000000
CALC 46 9.9086957 0.4140993 9.1000000 11.1000000
URIC 46 4.5847826 1.1356775 2.4000000 8.4000000
3 50 AGE 50 36.2400000 3.2486104 32.0000000 41.0000000
HT 49 64.9183673 2.4565960 60.0000000 71.0000000
WT 50 135.2600000 23.4745281 100.0000000 215.0000000
PILL 50 1.5000000 0.5050763 1.0000000 2.0000000
CHOL 50 235.6200000 42.9940148 160.0000000 324.0000000
ALB 49 4.0816327 0.3276644 3.2000000 4.7000000
CALC 50 9.9160000 0.5497161 9.0000000 11.1000000
URIC 50 4.6460000 1.2162522 2.2000000 9.9000000
4 48 AGE 48 47.8333333 4.0070859 42.0000000 55.0000000
HT 47 64.5744681 2.5086345 59.0000000 69.0000000
WT 47 137.5957447 20.5232208 94.0000000 190.0000000
PILL 48 1.5000000 0.5052912 1.0000000 2.0000000
CHOL 48 257.1666667 43.4728251 160.0000000 390.0000000
ALB 47 4.0851064 0.2858844 3.5000000 4.7000000
CALC 46 9.9913043 0.5036869 8.6000000 10.8000000
URIC 47 5.1574468 1.1642771 2.5000000 8.5000000
----------------------------------------------------------------------------------------------
proc sort data=labdata.werner;
by age;
run;
title "Boxplot of Cholesterol By Age";
proc boxplot data=labdata.werner;
plot chol*age / boxstyle=schematic;
1
run;
400 400
300 300
C C
H H
200 200
O O
L L
100 100
0 0
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 52 53 54 55
G
A E G
A E
proc sort data=labdata.werner;
by agegrp;
run;
title "Boxplot of Cholesterol By Age Group";
proc boxplot data=labdata.werner;
plot chol*agegrp / boxstyle=schematic;
run;
400
300
C
H
200
O
L
100
0
1 2 3 4
GGP
A E R
title "Regression With Dummy Variables for Age";
proc reg data=labdata.werner;
model chol = agedum2 agedum3 agedum4;
plot rstudent.*predicted.;
output out=regdat1 p=predict r=resid rstudent=rstudent;
run; quit;
2
Regression With Dummy Variables for Age
The REG Procedure
Model: MODEL1
Dependent Variable: CHOL
Number of Observations Read 188
Number of Observations Used 187
Number of Observations with Missing Values 1
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 3 38114 12705 7.02 0.0002
Error 183 331383 1810.83492
Corrected Total 186 369497
Root MSE 42.55391 R-Square 0.1032
Dependent Mean 235.15508 Adj R-Sq 0.0884
Coeff Var 18.09610
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 218.44186 6.48941 33.66 F
Model 3 38114 12705 7.02 0.0002
Error 183 331383 1810.83492
Corrected Total 186 369497
Root MSE 42.55391 R-Square 0.1032
Dependent Mean 235.15508 Adj R-Sq 0.0884
Coeff Var 18.09610
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 257.16667 6.14213 41.87 F
Model 3 38113.7122 12704.5707 7.02 0.0002
Error 183 331382.7904 1810.8349
Corrected Total 186 369496.5027
R-Square Coeff Var Root MSE CHOL Mean
0.103150 18.09610 42.55391 235.1551
Source DF Type I SS Mean Square F Value Pr > F
AGEGRP 3 38113.71223 12704.57074 7.02 0.0002
Source DF Type III SS Mean Square F Value Pr > F
AGEGRP 3 38113.71223 12704.57074 7.02 0.0002
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 257.1666667 B 6.14212728 41.87 F
Model 2 50733 25366 14.64 |t|
Intercept 1 233.52222 4.48182 52.10 <.0001
centage 1 1.56664 0.33162 4.72 <.0001
centage_sq 1 0.01481 0.03251 0.46 0.6492
title "Check Possible Box-Cox Transformations";
proc transreg data=werner2;
model boxcox(chol) = identity(age);
run;
7
Check Possible Box-Cox Transformations
The TRANSREG Procedure
Transformation Information
for BoxCox(CHOL)
Lambda R-Square Log Like
-3.00 0.01 -1178.40
-2.75 0.01 -1122.62
-2.50 0.01 -1068.26
-2.25 0.02 -1015.61
-2.00 0.02 -965.13
-1.75 0.03 -917.37
-1.50 0.04 -873.06
-1.25 0.05 -833.05
-1.00 0.06 -798.16
-0.75 0.08 -768.98
-0.50 0.09 -745.71
-0.25 0.11 -728.06
0.00 0.12 -715.32
0.25 0.12 -706.62
0.50 0.13 -701.11
0.75 0.13 -698.04 *
1.00 + 0.14 -696.85 <
1.25 0.14 -697.14 *
1.50 0.14 -698.62 *
1.75 0.14 -701.08
2.00 0.14 -704.39
2.25 0.14 -708.47
2.50 0.14 -713.24
2.75 0.14 -718.67
3.00 0.13 -724.73
< - Best Lambda
* - Confidence Interval
+ - Convenient Lambda
The TRANSREG Procedure
TRANSREG Univariate Algorithm Iteration History for BoxCox(CHOL)
Iteration Average Maximum Criterion
Number Change Change R-Square Change Note
-------------------------------------------------------------------------
1 0.04132 3.09421 0.13533
2 0.00000 0.00000 0.14267 0.00734 Converged
Algorithm converged.
8