ANCOVA Examples Using SAS
/************************************** ANCOVA Using Proc Reg and Proc GLM FILENAME: ancova_example.sas ****************************************/ options pageno=1; title; /*Import data from SPSS portable file*/ filename file1 "C:\Documents and Settings\Kwelch\Desktop\b510\htwt.por"; proc convert spss=file1 out=htwt; run; title "Contents of HTWT Data Set"; proc contents data=htwt; run;
Contents of HTWT Data Set The CONTENTS Procedure Data Set Name Member Type Engine Created Last Modified Protection Data Set Type Label Data Representation Encoding WORK.HTWT DATA V9 Monday, January 30, 2006 04:03:49 PM Monday, January 30, 2006 04:03:49 PM Observations Variables Indexes Observation Length Deleted Observations Compressed Sorted 237 4 0 32 0 NO NO
WINDOWS_32 wlatin1 Western (Windows)
Engine/Host Dependent Information Data Set Page Size 4096 Number of Data Set Pages 3 First Data Page 1 Max Obs per Page 126 Obs in First Data Page 83 Number of Data Set Repairs 0 File Name c:\temp\_TD3292\htwt.sas7bdat Release Created 9.0101M2 Host Created XP_PRO Alphabetic List of Variables and Attributes # Variable Type Len Format 2 AGE Num 8 5.2 3 HEIGHT Num 8 5.2 1 SEX Char 8 8. 4 WEIGHT Num 8 6.2
title "Descriptive Statistics for HTWT Data Set"; proc means data=htwt; run;
1
Descriptive Statistics for HTWT Data Set The MEANS Procedure Variable N Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ AGE 237 16.4430380 1.8425767 13.9000000 25.0000000 HEIGHT 237 61.3645570 3.9454019 50.5000000 72.0000000 WEIGHT 237 101.3080169 19.4406980 50.5000000 171.5000000 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
title "Oneway Frequency Tabulation for Sex for HTWT Data Set"; proc freq data=htwt; tables sex; run;
Oneway Frequency Tabulation for Sex for HTWT Data Set The FREQ Procedure Cumulative Cumulative SEX Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ f 111 46.84 111 46.84 m 126 53.16 237 100.00
/*Create a new data set with new variables*/ data htwt2; set htwt; /*Create dummy variables for female*/ if sex="f" then female=1; if sex="m" then female=0; /*Center age at 16 years*/ centage = age - 16.5; /*Create interactions*/ fem_age = female * age; fem_centage = female * centage; run; goptions reset = all; goptions device=win target=winprtm; symbol1 color=black interpol=rl value=dot; symbol2 color=black interpol=rl value=star; title "Scatterplot of Height by Age"; title2 "For Males and Females"; proc gplot data=htwt2; where age <= 19; plot height * age = sex; run; quit;
2
/*ANCOVA using Proc Reg*/ title "ANCOVA for Males and Females"; title2 "Relationship of Height to Age"; proc reg data=htwt2; where age <=19; model height = age female fem_age; plot rstudent. * predicted.; output out=outreg1 p=predict1 r=resid1 rstudent=rstud1; run;
The REG Procedure Model: MODEL1 Dependent Variable: HEIGHT Number of Observations Read Number of Observations Used
219 219
Source Model Error Corrected Total
DF 3 215 218
Analysis of Variance Sum of Mean Squares Square 1432.63813 477.54604 1684.95730 7.83701 3117.59543 2.79947 61.00457 4.58895 R-Square Adj R-Sq
F Value 60.93
Pr > F <.0001
Root MSE Dependent Mean Coeff Var
0.4595 0.4520
Variable Intercept AGE female fem_age
DF 1 1 1 1
Parameter Estimates Parameter Standard Estimate Error 28.88281 2.03130 13.61231 -0.92943 2.87343 0.17764 4.01916 0.24782
t Value 10.05 11.44 3.39 -3.75
Pr > |t| <.0001 <.0001 0.0008 0.0002
3
proc univariate data=outreg1 normal; var rstud1; histogram / normal; probplot / normal (mu=est sigma=est); run;
goptions reset = all; goptions device=win target=winprtm; symbol1 color=black interpol=rl value=dot; symbol2 color=black interpol=rl value=star; axis1 order = 0 to 20 by 2; axis2 order = 0 to 80 by 10; title "Scatterplot Showing Origin"; proc gplot data=htwt2; where age <= 19; plot height * age = sex / haxis = axis1 vaxis = axis2; run; quit;
4
title "ANCOVA for Males and Females"; title2 "Relationship of Height to Centered Age"; proc reg data=htwt2; where age <=19; model height = centage female fem_centage; plot rstudent. * predicted.; output out=outreg2 p=predict2 r=resid2 rstudent=rstud2; run;
Model: MODEL1 Dependent Variable: HEIGHT Number of Observations Read Number of Observations Used
219 219
Source Model Error Corrected Total
DF 3 215 218
Analysis of Variance Sum of Mean Squares Square 1432.63813 477.54604 1684.95730 7.83701 3117.59543 2.79947 61.00457 4.58895 R-Square Adj R-Sq
F Value 60.93
Pr > F <.0001
Root MSE Dependent Mean Coeff Var
0.4595 0.4520
Variable Intercept centage female fem_centage
DF 1 1 1 1
Parameter Estimates Parameter Standard Estimate Error 62.39929 0.26902 2.03130 0.17764 -1.72336 0.38917 -0.92943 0.24782
t Value 231.95 11.44 -4.43 -3.75
Pr > |t| <.0001 <.0001 <.0001 0.0002
/*GLM model with interactions*/ title "ANCOVA model using GLM"; title "Relationship of Height to Centered AGE"; proc glm data=htwt2;
5
where age <=19; class sex; model height = sex centage sex*centage / solution; estimate "intercept males" intercept 1 sex 0 1; estimate "slope males" centage 1 sex*centage 0 1; estimate "intercept females" intercept 1 sex 1 0; estimate "slope females" centage 1 sex*centage 1 0; run; quit;
ANCOVA for Males and Females Relationship of Height to Centered Age The GLM Procedure Dependent Variable: HEIGHT Source Model Error Corrected Total R-Square 0.459533 Source SEX centage centage*SEX DF 3 215 218 Sum of Squares 1432.638133 1684.957300 3117.595434 Mean Square 477.546044 7.837011 F Value 60.93 Pr > F <.0001
Coeff Var 4.588945 DF 1 1 1
Root MSE 2.799466
HEIGHT Mean 61.00457 F Value 11.39 157.35 14.07 Pr > F 0.0009 <.0001 0.0002
Type I SS 89.225774 1233.182326 110.230033
Mean Square 89.225774 1233.182326 110.230033
Source SEX centage centage*SEX
DF 1 1 1
Type III SS 153.684358 1252.650148 110.230033
Mean Square 153.684358 1252.650148 110.230033
F Value 19.61 159.84 14.07
Pr > F <.0001 <.0001 0.0002
/*Separate Regression Models for Females and Males*/ proc sort data = htwt2; by sex; run; title "Separate Regressions for Females and Males"; proc reg data = htwt2; by sex; model height = centage; run; quit;
6
Separate Regressions for Females and Males -------------------------------------------- SEX=f --------------------------------------------The REG Procedure Model: MODEL1 Dependent Variable: HEIGHT Number of Observations Read Number of Observations Used
111 111
Source Model Error Corrected Total
DF 1 109 110
Analysis of Variance Sum of Mean Squares Square 366.93784 366.93784 873.65640 8.01520 1240.59423 2.83111 60.52613 4.67750 R-Square Adj R-Sq
F Value 45.78
Pr > F <.0001
Root MSE Dependent Mean Coeff Var
0.2958 0.2893
Variable Intercept centage
DF 1 1
Parameter Estimates Parameter Standard Estimate Error 60.58603 0.26886 1.00754 0.14891
t Value 225.34 6.77
Pr > |t| <.0001 <.0001
-------------------------------------------- SEX=m --------------------------------------------The REG Procedure Model: MODEL1 Dependent Variable: HEIGHT Number of Observations Read Number of Observations Used
126 126
Source Model Error Corrected Total
DF 1 124 125
Analysis of Variance Sum of Mean Squares Square 1274.23912 1274.23912 1012.01961 8.16145 2286.25873 2.85682 62.10317 4.60013 R-Square Adj R-Sq
F Value 156.13
Pr > F <.0001
Root MSE Dependent Mean Coeff Var
0.5573 0.5538
Variable Intercept centage
DF 1 1
Parameter Estimates Parameter Standard Estimate Error 62.19639 1.70220 0.25462 0.13623
t Value 244.28 12.50
Pr > |t| <.0001 <.0001
Another ANCOVA Example
/************************************** Another ANCOVA Model for the Cars Data Set ****************************************/ /*Import data from SPSS portable file*/ filename file2 "C:\Documents and Settings\Kwelch\Desktop\b510\cars.por";
7
proc convert spss=file2 out=cars; run;
Contents of Cars Data Set The CONTENTS Procedure Data Set Name Member Type Engine Created Last Modified Protection Data Set Type Label Data Representation Encoding WORK.CARS DATA V9 Monday, January 30, 2006 04:17:58 PM Monday, January 30, 2006 04:17:58 PM Observations Variables Indexes Observation Length Deleted Observations Compressed Sorted 406 8 0 64 0 NO NO
WINDOWS_32 wlatin1 Western (Windows)
Engine/Host Dependent Information Data Set Page Size 8192 Number of Data Set Pages 4 First Data Page 1 Max Obs per Page 127 Obs in First Data Page 96 Number of Data Set Repairs 0 File Name c:\temp\_TD3292\cars.sas7bdat Release Created 9.0101M2 Host Created XP_PRO Alphabetic List of Variables and Attributes Type Len Format Label Num 8 4. Time to Accelerate from 0 to 60 mph (sec) Num 8 1. Number of Cylinders Num 8 5. Engine Displacement (cu. inches) Num 8 5. Horsepower Num 8 4. Miles per Gallon Num 8 1. Country of Origin Num 8 4. Vehicle Weight (lbs.) Num 8 2. Model Year (modulo 100)
# 5 8 2 3 1 7 4 6
Variable ACCEL CYLINDER ENGINE HORSE MPG ORIGIN WEIGHT YEAR
title "Descriptive Statistics for Cars Data Set"; proc means data=cars; run;
8
Descriptive Statistics for Cars Data Set The MEANS Procedure Variable Label N Mean Std Dev ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ MPG Miles per Gallon 398 23.5145729 7.8159843 ENGINE Engine Displacement (cu. inches) 406 194.0406404 105.2073623 HORSE Horsepower 400 104.8325000 38.5220627 WEIGHT Vehicle Weight (lbs.) 406 2969.56 849.8271661 ACCEL Time to Accelerate from 0 to 60 mph (sec) 406 15.4950739 2.8209840 YEAR Model Year (modulo 100) 406 75.7487685 5.3074312 ORIGIN Country of Origin 405 1.5703704 0.7979622 CYLINDER Number of Cylinders 405 5.4691358 1.7096582 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Variable Label Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ MPG Miles per Gallon 9.0000000 46.6000000 ENGINE Engine Displacement (cu. inches) 4.0000000 455.0000000 HORSE Horsepower 46.0000000 230.0000000 WEIGHT Vehicle Weight (lbs.) 732.0000000 5140.00 ACCEL Time to Accelerate from 0 to 60 mph (sec) 8.0000000 24.8000000 YEAR Model Year (modulo 100) 0 82.0000000 ORIGIN Country of Origin 1.0000000 3.0000000 CYLINDER Number of Cylinders 3.0000000 8.0000000 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
title "Oneway Frequency Tabulation of Cylinder for Cars Data Set"; proc freq data=cars; tables cylinder;run; goptions reset = all; goptions device=win target=winprtm; symbol1 color=black interpol=rl value=dot; symbol2 color=black interpol=rl value=star; symbol3 color=black interpol=rl value=square; title "Scatterplot of Horsepower by Weight"; title2 "For 4, 6, and 8 Cylinder Vehicles"; proc gplot data=cars; where cylinder in(4,6,8); plot horse * weight = cylinder; run; quit;
/*Create New data Set with Dummy Variables*/ data cars2; set cars;
9
if cylinder not=. fourcyl = 0; if cylinder = sixcyl = 0; if cylinder = 6 eightcyl = 0; if cylinder = 8 end;
then do; 4 then fourcyl=1; then sixcyl=1; then eightcyl=1;
fourcyl_wt = fourcyl*weight; sixcyl_wt = sixcyl *weight; eightcyl_wt = eightcyl*weight; weight1000 = weight/1000; run; title "ANCOVA Model for Cars Data"; title2 "Using Proc Reg"; proc reg data=cars2; where cylinder in (4,6,8); model horse = sixcyl eightcyl weight sixcyl_wt eightcyl_wt; run;quit;
*UPDATED* ANCOVA Model for Cars Data The REG Procedure Model: MODEL1 Dependent Variable: HORSE Horsepower Number of Observations Read Number of Observations Used Number of Observations with Missing Values Analysis of Variance Sum of Mean Squares Square 483077 96615 106311 275.41774 589389 16.59571 105.09184 15.79163 R-Square Adj R-Sq 13:18 Thursday, March 9, 2006
398 392 6
Source Model Error Corrected Total
DF 5 386 391
F Value 350.80
Pr > F <.0001
Root MSE Dependent Mean Coeff Var
0.8196 0.8173
Variable Intercept sixcyl eightcyl WEIGHT sixcyl_wt eightcyl_wt
Label Intercept Vehicle Weight (lbs.)
Parameter Estimates Parameter DF Estimate 1 10.82114 1 78.90431 1 30.35967 1 0.02929 1 -0.02561 1 -0.00075853
Standard Error 7.87288 19.41720 16.94254 0.00337 0.00646 0.00496
t Value 1.37 4.06 1.79 8.69 -3.96 -0.15
Pr > |t| 0.1701 <.0001 0.0739 <.0001 <.0001 0.8785
title "ANCOVA Model for Cars Data"; title2 "Using Proc GLM"; proc glm data=cars2; where cylinder in (4,6,8); class cylinder;
10
model horse = cylinder weight cylinder*weight; estimate "Slope for 4 Cylinders" weight 1 cylinder*weight 1 0 0; estimate "Slope for 6 Cylinders" weight 1 cylinder*weight 0 1 0; estimate "Slope for 8 Cylinders" weight 1 cylinder*weight 0 0 1; run; quit;
ANCOVA Model for Cars Data Using Proc GLM The GLM Procedure Class Level Information Class Levels Values CYLINDER 3 4 6 8 Number of Observations Read 398 Number of Observations Used 392 Dependent Variable: HORSE Source Model Error Corrected Total Horsepower DF 5 386 391 Sum of Squares 483077.4467 106311.2472 589388.6939 Mean Square 96615.4893 275.4177 F Value 350.80 Pr > F <.0001
R-Square 0.819625 Source CYLINDER WEIGHT WEIGHT*CYLINDER Source CYLINDER WEIGHT WEIGHT*CYLINDER DF 2 1 2
Coeff Var 15.79163
Root MSE 16.59571
HORSE Mean 105.0918 F Value 808.28 119.92 8.75 F Value 8.80 68.75 8.75 Pr > F <.0001 <.0001 0.0002 Pr > F 0.0002 <.0001 0.0002
Type I SS 445231.4569 33027.7221 4818.2677 Type III SS 4848.76600 18935.97011 4818.26771
Mean Square 222615.7284 33027.7221 2409.1339 Mean Square 2424.38300 18935.97011 2409.13385
DF 2 1 2
11
Parameter Slope for 4 Cylinders Slope for 6 Cylinders Slope for 8 Cylinders
Estimate 0.02928755 0.00367899 0.02852902
Standard Error 0.00337073 0.00551378 0.00363869
t Value 8.69 0.67 7.84
Pr > |t| <.0001 0.5050 <.0001
proc sort data=cars2; by cylinder; run; title "Separate Regression for Each Number of Cylinders"; proc reg data=cars2; where cylinder in (4,6,8); by cylinder; model horse = weight; run; quit;
Separate Regression for Each Number of Cylinders ------------------------------------ Number of Cylinders=4 ------------------------------------The REG Procedure Model: MODEL1 Dependent Variable: HORSE Horsepower Number of Observations Read Number of Observations Used Number of Observations with Missing Values Analysis of Variance Sum of Mean Squares Square 20793 20793 24234 121.16836 45026 11.00765 78.47030 14.02779 R-Square Adj R-Sq
207 202 5
Source Model Error Corrected Total
DF 1 200 201
F Value 171.60
Pr > F <.0001
Root MSE Dependent Mean Coeff Var
0.4618 0.4591
Parameter Estimates Variable Intercept WEIGHT Label Intercept Vehicle Weight (lbs.) DF 1 1 Parameter Estimate 10.82114 0.02929 Standard Error 5.22194 0.00224 t Value 2.07 13.10 Pr > |t| 0.0395 <.0001
12
------------------------------------ Number of Cylinders=6 ------------------------------------The REG Procedure Model: MODEL1 Dependent Variable: HORSE Horsepower Number of Observations Read Number of Observations Used Number of Observations with Missing Values Analysis of Variance Sum of Mean Squares Square 122.61693 122.61693 16670 205.80407 16793 14.34587 101.50602 14.13303 R-Square Adj R-Sq
84 83 1
Source Model Error Corrected Total
DF 1 81 82
F Value 0.60
Pr > F 0.4424
Root MSE Dependent Mean Coeff Var
0.0073 -0.0050
Variable Intercept WEIGHT
Label Intercept Vehicle Weight (lbs.)
Parameter Estimates Parameter DF Estimate 1 89.72545 1 0.00368
Standard Error 15.34326 0.00477
t Value 5.85 0.77
Pr > |t| <.0001 0.4424
------------------------------------ Number of Cylinders=8 ------------------------------------The REG Procedure Model: MODEL1 Dependent Variable: HORSE Horsepower Number of Observations Read 107 Number of Observations Used 107 Analysis of Variance Sum of Mean DF Squares Square 1 16931 16931 105 65407 622.92805 106 82338 24.95853 158.13084 15.78347 R-Square Adj R-Sq
Source Model Error Corrected Total
F Value 27.18
Pr > F <.0001
Root MSE Dependent Mean Coeff Var
0.2056 0.1981
Variable Intercept WEIGHT
Label Intercept Vehicle Weight (lbs.)
Parameter Estimates Parameter DF Estimate 1 41.18081 1 0.02853
Standard Error 22.56209 0.00547
t Value 1.83 5.21
Pr > |t| 0.0708 <.0001
title "Separate Regression for Each Number of Cylinders"; title2 "Using Weight/1000"; proc reg data=cars2; where cylinder in (4,6,8); by cylinder; model horse = weight1000; run; quit;
Separate Regression for Each Number of Cylinders Using Weight/1000 ------------------------------------ Number of Cylinders=4 ------------------------------------The REG Procedure Model: MODEL1 Dependent Variable: HORSE Horsepower Number of Observations Read Number of Observations Used
207 202
13
Number of Observations with Missing Values Analysis of Variance Sum of Mean Squares Square 20793 20793 24234 121.16836 45026 11.00765 78.47030 14.02779 R-Square Adj R-Sq
5
Source Model Error Corrected Total
DF 1 200 201
F Value 171.60
Pr > F <.0001
Root MSE Dependent Mean Coeff Var
0.4618 0.4591
Variable Intercept weight1000
Label Intercept
DF 1 1
Parameter Estimates Parameter Standard Estimate Error 10.82114 5.22194 29.28755 2.23575
t Value 2.07 13.10
Pr > |t| 0.0395 <.0001
------------------------------------ Number of Cylinders=6 ------------------------------------The REG Procedure Model: MODEL1 Dependent Variable: HORSE Horsepower Number of Observations Read 84 Number of Observations Used 83 Number of Observations with Missing Values 1 Analysis of Variance Source Model Error Corrected Total DF 1 81 82 Sum of Squares 122.61693 16670 16793 14.34587 101.50602 14.13303 Mean Square 122.61693 205.80407 R-Square Adj R-Sq F Value 0.60 Pr > F 0.4424
Root MSE Dependent Mean Coeff Var
0.0073 -0.0050
Variable Intercept weight1000
Label Intercept
DF 1 1
Parameter Estimates Parameter Standard Estimate Error 89.72545 15.34326 3.67899 4.76629
t Value 5.85 0.77
Pr > |t| <.0001 0.4424
14