# HW3sol F06

Document Sample

```					ST3900/4950                                    HOMEWORK/LAB 3 Solutions
Fall, 2006

Question 1

(a) We are testing the hypotheses: H0: the mean amounts of life insurance of 3 states are equal
vs. H1: H0 is false. Here is the Anova table:

Tests of Between-Subjects Effects

Dependent Variable: AMOUNT
Type III Sum
Source                           of Squares        df        Mean Square           F                Sig.
Corrected Model                  2859.524(a)            2       1429.762               .339            .717
Intercept                        655433.333             1     655433.333      155.424                 .000
STATE                               2859.524            2         1429.762             .339           .717
Error                             75907.143             18        4217.063
Total                            734200.000             21
Corrected Total                   78766.667             20
a R Squared = .036 (Adjusted R Squared = -.071)

Since the p-value (0.717) is larger than 0.05, we cannot reject H0. That is, we don’t have
enough evidence to the means of the 3 states are significantly different.

Check normality:

Figure 1: normal probability plot of residuals

Normal Q-Q Plot of Residual for AMOUNT

2

1
Expected Normal

0

-1

-2

-150    -100      -50           0         50        100              150

Observed Value

Table 1:

1
Tes ts of Norm ality
a
Kolmogorov-Smirnov                           Shapiro-Wilk
Statistic      df         Sig.         Statistic        df         Sig.
Residual f or AMOUNT                             .148         21        .200*           .938            21       .199
*. This is a low er bound of the true signif ic anc e.
a. Lillief ors Signif icance Correc tion

The points in Figure 1 are closed to the line and the p-values of tests of normality in Table
1 are both bigger than 0.05, so that the assumption of normality is reasonable.

Check equal variances:

Figure 2: residual plot; residual vs. predicted value

150.00

100.00
Residual for AMOUNT

50.00

0.00

-50.00

-100.00

-150.00

160.00           170.00                180.00               190.00

Predicted Value for AMOUNT

The residual plot (Figure 2) suggests the variances are not equal. The assumption of equal
variances seems failed. You should conduct Leven’s test to support it.

(b) (c)
Based on the SPSS output in Table 2, the 3 states are all grouped into one set using both
SNK and Duncan tests. There is no significant difference between the mean amounts of any
pair of these three states.

Table 2: The SPSS output of SNK and Duncan tests

2
AMOUNT

Subs et
STATE         N            1
Student-New man-Keulsa,b     3                 7    162.1429
2                 7    177.1429
1                 7    190.7143
Sig.                        .694
Dunc ana,b                   3                 7    162.1429
2                 7    177.1429
1                 7    190.7143
Sig.                        .447
Means f or groups in homogeneous subs ets are displayed.
Based on Type III Sum of Squares
The error term is Mean Square(Error) = 4217.063.
a. Uses Harmonic Mean Sample Siz e = 7.000.
b. Alpha = .05.

(d). Contrast: CA vs KA and CO

CA             KA             CO
Coefficient            2             -1             -1

F-test: the regular way to test contrast using GLM/ANOVA method.

The GLM Procedure

Dependent Variable: INSUR

Contrast                  DF     Contrast SS    Mean Square     F Value     Pr > F

CA VS THE OTHERS          1    2072.023810     2072.023810           0.49   0.4923

The contrast is not significant to 0 (p=0.48>.05), meaning that the insurance amount
in CA is not significantly different than the insurance amounts in KA and CO.

Question 2

(a) We are testing the hypotheses: H0: the mean scores obtained for 4 programs are equal vs.
H1: H0 is false. Here is the Anova table:

3
Tes ts of Be tw ee n-Subje cts Effe cts

Dependent Variable: SCORE
Ty pe III Sum
Sourc e           of Squares                     df          Mean Square     F       Sig.
Correc ted Model     3938.679a                          3       1312.893     2.449     .088
Intercept        7872621.750                            1    7872621.750 14687.075     .000
PROGRAM              3938.679                           3       1312.893     2.449     .088
Error              12864.571                           24        536.024
Total            7889425.000                           28
Correc ted Total   16803.250                           27
a. R Squared = .234 (Adjusted R Squared = .139)

The p-value in the table is 0.088, which is a marginal value. You can decide yourselves if
you would like to take a risk of 8.8% to reject H0. The 4 programs
do not make significant differences in college entrance exam at the significant level of 5%.

Check normality:

Figure 3: normal probability plot of residuals

Normal Q-Q Plot of Residual for SCORE

2

1
Expected Normal

0

-1

-2

-40       -20         0         20           40       60      80

Observed Value

Table 3:

4
Tes ts of Nor mality
a
Kolmogorov-Smirnov                       Shapiro-Wilk
Statistic      df         Sig.        Statistic     df          Sig.
Residual f or SCORE                         .183         28        .017           .889         28        .007
a. Lilliefors Signif ic ance Correc tion

Many points in Figure 3 are not closed to the line and there is a possible outlier. In addition,
the p-values of tests of normality in Table 1 are both smaller than 0.05. The assumption of
normality is therefore not reasonable.

Check equal variances:

Figure 4: residual plot; residual vs. predicted value

100.00

75.00
Residual for SCORE

50.00

25.00

0.00

-25.00

-50.00

520.00              530.00           540.00            550.00

Predicted Value for SCORE

The points in Figure 4 are more or less in a horizontal band and so it is reasonable to
assume equal variances of the scores over the 4 programs. Again, you should conduct
Leven’s test to support it.

(b) The output for SNK tests is posted in Table 4. The SNK procedure groups the 4 programs
together. There is no significant difference in exam scores between any pair of the 4
program.

Table 4:
Student-Newman-Keuls
Subset
PROGRAM              N            1

5
D                       7        518.2857
C                       7        519.0000
B                       7        538.4286
A                       7        545.2857
Sig.                                   .157
Means for groups in homogeneous subsets are displayed.
Based on Type III Sum of Squares
The error term is Mean Square(Error) = 536.024.
a Uses Harmonic Mean Sample Size = 7.000.
b Alpha = .05.

(c) Create two contrasts: One to compare programs A and B to C and D, the other to compare
program D to the other three.

A                  B                 C             D
A,B to C,D                  1                 1             -1                -1
A,B,C to D                  1                 1                 1             -3

Conduct tests for the two contrasts.

The GLM Procedure

Dependent Variable: SCORES

Contrast                         DF         Contrast SS        Mean Square      F Value    Pr > F

COMPARE A TO B AND C TO D        1       3772.321429          3772.321429        7.04     0.0139
COMPARE D TO A AND B AND C       1       1336.011905          1336.011905        2.49     0.1275

The first contrast indicates that the programs A and B are significantly different to the
programs C and D (p=0.0139<.05). The second contrast shows that the program D is not
significantly different to the other programs: A, B, and C (p=0.1275>.05).

End of Solutions

Note that since the assumptions are failed in the both questions, in fact we should not use
the results from the anova tables and nor these SNK and Duncan procedures. An
alternative non-parametric method will be discussed in HW 4.

SAS Codes

6
Question 1

*** One way Anova: Insurance DATASET;

DATA INSURANCE;
INPUT STATE \$ AMOUNT@@;

DATALINES;
CA 90 CA 200 CA 225 CA 100 CA 170 CA 300 CA 250
KA 80 KA 140 KA 150 KA 140 KA 150 KA 300 KA 280
CO 165 CO 160 CO 140 CO 160 CO 175 CO 155 CO 180
;
**ANALYSIS AND MULTIPLE COMPARISONS OF INSURANCE DATA;
PROC GLM DATA=INSURANCE;
TITLE "ANOVA OF INSURANCE DATA";
CLASS STATE;
MODEL AMOUNT = STATE;
MEANS STATE / SNK;
MEANS STATE / DUNCAN;
CONTRAST 'CA VS THE OTHERS' STATE 2 -1 -1;
OUTPUT OUT= GRAPH
P=PREDICTED
R=RESIDUAL;
RUN;

**GRAPHICAL SUMMARIES OF RESIDUALS TO CHECK ASSUMPTIONS;
PROC UNIVARIATE DATA=GRAPH NORMAL PLOT;
VAR RESIDUAL;
RUN;

SYMBOL VALUE=DOT COLOR=RED I=R;
PROC GPLOT DATA=GRAPH;
TITLE "RESIDUALS VERSUS PREDICTED";
PLOT RESIDUAL * PREDICTED;
RUN;

Question 2

***PROGRAM TO TEST DIFFERENCES AMONG FOUR METHODS OF PREPARING FOR EXAMS;

DATA EXAM;
INPUT PROGRAM \$
SCORE @@;
DATALINES;
1 560 1 520 1 530 1 525 1 575 1 580 1 527

7
2 565 2 522 2 520 2 530 2 510 2 600 2 522
3 512 3 518 3 555 3 502 3 510 3 516 3 520
4 505 4 508 4 512 4 520 4 543 4 517 4 523
;

PROC FORMAT;
VALUE \$GROUP 1='A' 2='B' 3='C' 4='D';
RUN;

**ANALYSIS AND MULTIPLE COMPARISONS OF EXAM DATA;
PROC GLM DATA=EXAM;
TITLE "ANALYSIS OF EXAM DATA";
CLASS PROGRAM;
MODEL SCORE = PROGRAM;
MEANS PROGRAM / SNK;
MEANS PROGRAM / DUNCAN;
CONTRAST 'A,B VS C,D' PROGRAM 1 1 -1 -1;
CONTRAST 'D VS THE OTHERS' PROGRAM 1 1 1 -3;
OUTPUT OUT=GRAPH
R=RESIDUAL
P=PREDICTED;
RUN;

**GRAPHICAL SUMAMRIES OF RESIDUALS TO CHECK ASSUMPTIONS;
PROC UNIVARIATE DATA=GRAPH PLOT NORMAL;
VAR RESIDUAL;
RUN;

SYMBOL VALUE=DOT COLOR=RED I=R;
PROC GPLOT DATA=GRAPH;
TITLE "RESIDUALS VERSUS PREDICTED";
PLOT RESIDUAL * PREDICTED;
RUN;

8

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 2 posted: 9/29/2012 language: Unknown pages: 8