# regression project

Document Sample

```					Chris Pencille
Regression Project
STAT 462
December 2, 2011

Research Question: Does temperature, number of manufacturing companies, population size,
average annual wind speed, average annual precipitation, and the average number of days, or any
combination of those variables predict how much air pollution there is in the air, measured by the
sulfur dioxide concentration in the air?

Response Variable: Air Pollution (measured in annual mean concentration of sulfur dioxide)

(Potential) Predictor Variables: Temperature (measured in degrees Fahrenheit), number of
manufacturing enterprises that employ 20 or more workers, population size (measured from the
1970 census in thousands), average annual wind speed (in mph), average annual precipitation (in
inches), average number of days with precipitation per year

First, an exploratory data analysis will be done to see if there are any oddities in the data and to
get a sense of early relationships to follow up on with a regression analysis. We also need to
code the data and get a working set of interaction statistics to completely analyze the data and
make sure no variables are affecting others and causing a final model to be worse than any other
models.

Descriptive Statistics: Sulfur, Temp, company, popsize, wind, daysprecip, ...

Variable        N      Mean   SE Mean    StDev   Minimum       Q1   Median        Q3
Sulfur        181     28.01      1.60    21.56      1.87    10.00    25.03     37.84
Temp          181    54.925     0.479    6.438    43.000   50.600   54.000    57.850
company       181     487.8      36.3    488.9       4.0    197.0    361.0     620.9
popsize       181     601.6      35.7    480.1      54.6    295.8    520.0     757.0
wind          181     9.388     0.106    1.427     6.000    8.450    9.300    10.304
daysprecip    181    113.67      1.92    25.89     36.00    95.36   115.00    132.00
precip        181    36.691     0.854   11.489     7.050   30.213   36.220    44.032

Variable      Maximum
Sulfur         110.00
Temp           75.500
company        3344.0
popsize        3369.0
wind           12.700
daysprecip     166.00
precip         63.373

Looking at the descriptive statistics of the non-interaction terms, nothing major stands out. There
are a few medians that are slightly higher than the means however we can say they are close
enough to assume normal data and continue on with the interaction terms.

To figure out the interaction terms the variables were turned into categorical variables by using
the median as a separating point putting some into high categories and the others into a low
category. Once the variables are turned into categorical variables they can then be used to make
multiple graphs to check for interactions between two variables on the sulfur output. As seen
below.
Interaction Plot for Sulfur
Data Means
0      1       0        1     0       1      0       1        0       1

40   Coded_Temp
0
30
Coded_Temp                                                                                                  1
20

40   Coded_company
0
30
Coded_company                                                                                             1
20

40   Coded_popsize
0
30
Coded_popsize                                                                         1
20

40   Coded_wind
0
30
Coded_wind                                                    1
20

40   Coded_precip
0
30
Coded_precip                                        1
20

Looking at the interaction plot we see that there are times when a change in one variable produce
a big difference when the other variable changes. This is considered an interaction and can be
added to the list of interaction terms to analyze in the analysis. From the plot we would say that
temperature and company, temperature and population size, temperature and days of
precipitation, population size and company, population size and wind, wind and precipitation,
wind and days of precipitation, and precipitation and days of precipitation are all interaction
terms to be used in the analysis. We can now look at their descriptive statistics to see if anything
unusual stands out.

Descriptive Statistics: Temp*company, Temp*popsize, Temp*dayspre, ...

Variable                                   N           Mean             SE Mean                    StDev               Minimum       Q1   Median
Temp*company                             135          27524                2308                    26820                   230    11726    20229
Temp*popsize                             135          35371                2416                    28073                  3229    17100    28938
Temp*daysprecip                          135           6264                 143                     1664                  2531     5047     6236
popsize*company                          135         496695              121653                  1413481                  1848    69345   201249
popsize*wind                             135           6189                 456                     5303                   446     2663     4991
wind*precip                              135         350.29                9.73                   113.01                 42.30   285.03   347.61
wind*daysprecip                          135         1075.3                25.8                    300.2                 216.0    913.0   1040.3
precip*daysprecip                        135           4334                 149                     1729                   254     3178     4363

Variable                                     Q3                Maximum
Temp*company                              34375                 169206
Temp*popsize                              43930                 170471
Temp*daysprecip                            7409                  12218
popsize*company                          454656               11265936
popsize*wind                               7886                  35038
wind*precip                              398.74                 606.99
wind*daysprecip                          1244.4                 2058.4
precip*daysprecip                          5438                   8273

Given the massive size of the data values it is hard to tell if anything is out of the ordinary
however there are a few medians that are higher than the means, however it is safe to go on with
the regression as many of the interactions may not be involved in the models. We will now look
to see if there are any correlations in the x variables by graphing them against each other as well
as looking at the scatterplots of each variable against the response variable of sulfur. We do this
to get an initial idea of the direction (positive, negative, or none) and magnitude of the individual
relationships as the individual relationships could prove to be the best models in the end.

Matrix Plot of Sulfur vs Temp, company, popsize, wind, precip, ...                                                                                     Matrix Plot of Sulfur vs Temp*company, Temp*popsize, Temp*dayspre, ...
0       1500     3000                         6          9          12                            50     100     150                                                                                                                                       0         00
0     00                                                    00        00
120                                                                                                                                                                                                                  00    00                                                   00        00                                    0              0                                                        00      00
0         80    16                                      0           50        10                           0         25             50                                      0               40      80
120

100
100

80
80

Sulfur
60
Sulfur

60
40

20
40

0

20                                                                                                                                                                       0              0         00                            0 0 00 0 0 00                                             0             0       0                                  0         00            00
00        00                             40                                                                     00       00                                          10             20
80          16
8   12                                                     20           40
y                          e                           ip                            y                         d                           p                            ip                                   p
n                         iz                          ec                            an                   in                          ec
i
ec                                ci
0                                                                                                                                                                                  pa                         ps                        r                             p                      *w                        r                                 r                                re
m                         po                       ys
p                             m                   e                          *p                                 sp                           ys
p
co                           p*
o                     iz                          d                              ay
p*                                                      da                               *c                     ps                       in                              d                                da
50     60 70                             0     1500 3000                                 0        25    50                                                                              em                       p*                            ze                                                                                                                  p*
em                                                                                 si                          po                         w                              d*
T                                                                                                                                           in                            ci
Temp             company                 popsize                wind                         precip           daysprecip                        T                                                      em                          p                                                                                                                 e
T                            po                                                                                     w
pr

Matrix Plot of Temp, company, wind, popsize, precip, daysprecip
0       1500      3000                        0       1500       3000                            50        100     150

70
60
Temp
50

3000

1500                                        company

0
12

wind                                                                                      9

6
3000

1500                                                                                      popsize

0

50

precip                                    25

0
150

100
daysprecip
50

50    60    70                            6       9       12                             0        25       50

Looking at the scatterplots first we see positive relationships in most of them. Magnitude is a
little harder to grasp in matrix plots, but it can be expanded upon if need be if the plot is
determined to be the optimal model. The plots between the x variables don’t show any
significant linear relationships so correlations should not affect any of the models that the
The next thing that needs to be done is determine which variables should be used in a regression
model to find the best ones. When this is done in a regression model if the non-interaction term
is not used, but it is present in an interaction term to be used, we need to include it in the model
to be thorough. We will first use a stepwise regression to determine a very good model to be
considered and then a best subset regression will be used to determine other models that could be
good possible candidates.

Stepwise Regression: Sulfur versus company, Temp, ...
Alpha-to-Enter: 0.05 Alpha-to-Remove: 0.15
Response is Sulfur on 14 predictors, with N = 135

Step                                                                              1                                       2                                       3                                                  4
Constant                                                                      17.72                                   62.61                                   66.95                                              69.77

company                                                                      0.0222                               0.0205                                0.0110                                            0.0104
T-Value                                                                        6.86                                 6.44                                  1.95                                              1.87
P-Value                                                                       0.000                                0.000                                 0.053                                             0.064

Temp                                                                                                                  -0.80                                   -0.83                                              -0.77
T-Value                                                                                                               -3.29                                   -3.44                                              -3.24
P-Value                         0.001       0.001       0.002

popsize*company                         0.00000      0.00001
T-Value                                    2.00         3.14
P-Value                                   0.047        0.002

popsize*wind                                         -0.00124
T-Value                                                 -2.47
P-Value                                                 0.015

S                     19.3      18.6       18.4         18.1
R-Sq                 26.15     31.75      33.78        36.74
Mallows Cp            35.1      24.6       22.0         17.3
PRESS              51242.2   48057.6    46577.9      45300.7
R-Sq(pred)           23.80     28.53      30.73        32.63

Looking at the stepwise regression we see that a four variable model is a very good model to use
and a good parsimonious model. For the regression analysis of this model however population
size and wind will also need to be included as they are pieces of the interaction terms.
We will now run a regression analysis for this model and then run a best subset regression to find
comparison models to run it against. Before the regressions will be run a subset of the data will
be extracted (every 4th observation) to be used a validation data set to validate the final
regression model chosen.
Regression Analysis: Sulfur versus company, Temp, ...
The regression equation is
Sulfur = 32.9 + 0.00828 company - 0.684 Temp + 0.000010 popsize*company
- 0.00592 popsize*wind + 0.0421 popsize + 3.62 wind

Predictor                Coef        SE Coef        T       P
Constant                32.90          27.11     1.21   0.227
company              0.008279       0.005695     1.45   0.148
Temp                  -0.6836         0.2455    -2.78   0.006
popsize*company    0.00001023     0.00000291     3.51   0.001
popsize*wind        -0.005916       0.003650    -1.62   0.108
popsize               0.04205        0.03404     1.24   0.219
wind                    3.618          2.311     1.57   0.120

S = 18.0566   R-Sq = 37.9%   R-Sq(adj) = 35.0%
PRESS = 46122.4   R-Sq(pred) = 31.41%

Analysis of Variance
Source           DF      SS            MS        F       P
Regression        6 25510.4        4251.7    13.04   0.000
Residual Error 128 41733.2          326.0
Total           134 67243.6

The sum of squares for pure error is (nearly) zero.
Minitab cannot do the lack of fit test based on pure error.

Source             DF    Seq SS
company             1   17585.8
Temp                1    3762.5
popsize*company     1    1365.6
popsize*wind        1    1989.8
popsize             1       7.6
wind                1     799.1
Unusual Observations
Obs company Sulfur       Fit SE Fit Residual St Resid
8     3344 110.00 113.23     12.43     -3.23     -0.25 X
23      343   94.00   36.85    3.50     57.15      3.23R
49       18   66.00   24.34    5.50     41.66      2.42R
75     3344 110.00 113.23     12.43     -3.23     -0.25 X
90      343   94.00   36.85    3.50     57.15      3.23R
100      482   79.09   32.65    3.08     46.44      2.61R
R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.

Durbin-Watson statistic = 1.77421

The first thing needed to be looked at is the t-tests of the individual variables to see if their slopes
are important to the model. This can be determined by a high t-value and a low corresponding p-
value (below 0.05). The interaction terms need to be analyzed first because of the statement
earlier about the non-interaction terms needed if they are involved in an interaction term.
Looking at the interaction terms one, the regression shows a bad test and should be removed
from the regression and ran again. The interaction term to be removed is population size by
wind. Being wind was only added because of that interaction term, it will also be removed as it
also had a bad test. The residuals for the analysis are below however at this point they are not
needed as the model is going to be changed already.
Residual Plots for Sulfur
Probability Plot of RESI1
Normal Probability Plot                                          Versus Fits                                                              Normal
99.9                                                     60
99                                                                                                                   99.9
40                                                                                                               Mean      2.934278E-14
90                                                                                                                                                                       StDev            17.65
Residual
Percent

20                                                            99
N                  135
50
0                                                            95
10                                                                                                                                                                       P-Value          0.011
-20                                                           90
1
0.1
80
-50     -25       0       25     50                     0       25       50        75    100                    70
Percent

Residual                                                Fitted Value                             60
50
40
Histogram                                               Versus Order                              30
60                                                            20
20
40                                                            10
15
Frequency

5
Residual

20
10
0                                                             1
5                                                      -20
0.1
0
-30   -15     0     15    30    45                      1 10 20 30 40 50 60 70 80 90 00 10 20 30                      -50     -25        0               25   50
1 1 1 1
Residual                                                                                                                 RESI1
Observation Order

Regression Analysis: Sulfur versus popsize, company, Temp, popsize*comp
The regression equation is
Sulfur = 68.6 - 0.0122 popsize + 0.0104 company - 0.743 Temp
+ 0.000008 popsize*company

Predictor                                                  Coef                              SE Coef                        T                  P
Constant                                                  68.61                                13.62                     5.04              0.000
popsize                                               -0.012176                             0.005050                    -2.41              0.017
company                                                0.010415                             0.005553                     1.88              0.063
Temp                                                    -0.7432                               0.2385                    -3.12              0.002
popsize*company                                      0.00000787                           0.00000254                     3.09              0.002

S = 18.1074                                    R-Sq = 36.6%                               R-Sq(adj) = 34.7%

PRESS = 45361.2                                       R-Sq(pred) = 32.54%

Analysis of Variance
Source           DF      SS                                                                    MS                 F                     P
Regression        4 24619.6                                                                6154.9             18.77                 0.000
Residual Error 130 42624.0                                                                  327.9
Total                                                134            67243.6

The sum of squares for pure error is (nearly) zero.
Minitab cannot do the lack of fit test based on pure error.

Source                                                    DF         Seq SS
popsize                                                    1         4752.9
company                                                    1        13261.6
Temp                                                       1         3468.7
popsize*company                                            1         3136.4

Unusual Observations
Obs popsize Sulfur                                                      Fit                       SE Fit             Residual                    St Resid
8     3369 110.00                                                  113.46                        12.45                -3.46                       -0.26 X
23      179   94.00                                                  33.33                         2.58                60.67                        3.39R
44     1975   26.00                                                  15.04                         6.32                10.96                        0.65 X
75     3369 110.00                                                  113.46                        12.45                -3.46                       -0.26 X
90      179   94.00                                                  33.33                         2.58                60.67                        3.39R
100      181   79.09                                                  30.05                         2.47                49.04                        2.73R
111     1975   25.93                                                  15.06                         6.32                10.87                        0.64 X
127     1174    2.69                                                  39.08                         3.38               -36.39                       -2.05R

R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.

Durbin-Watson statistic = 1.76421

Again, the interaction terms need to be looked at in the t-tests to see if their slopes are important
to the model. The interaction term is fine, the only bad t-test is one of the non-interaction terms
that is involved in an interaction term and needs to stay in the model. Then, the F-test needs to be
analyzed to see if the model is good. Just like the t-tests, a high F-value and low corresponding
p-value (below 0.05) is needed to say the model can be further analyzed. This model has a good
F-test and can be further analyzed. Next, the residuals need to be analyzed to see if they meet the
LINE assumptions which are linearity, independence, normal distribution, and equal variances.

Residual Plots for Sulfur
Probability Plot of RESI2
Normal Probability Plot                                                Versus Fits                                                           Normal
99.9
99                                                                                                                         99.9
50                                                                                                             Mean      3.802719E-14
90                                                                                                                                                                             StDev            17.84
25
Residual

99
Percent

N                  135
50
95                                                 P-Value          0.016
10
-25                                                         90
1
-50                                                         80
0.1
-50     -25      0        25        50                         0       25       50       75     100                  70
Percent

Residual                                                       Fitted Value                           60
50
40
Histogram                                                     Versus Order                             30
30                                                                                                                          20
50                                                          10
Frequency

20                                                              25                                                           5
Residual

0
1
10
-25
0.1
0                                                              -50
-40    -20      0     20        40        60                    1 10 20 30 40 50 60 70 80 90 00 10 20 30                    -50   -25      0          25     50     75
1 1 1 1
Residual                                                                                                                    RESI2
Observation Order

The residual plots along with an expanded version of the normal probability plot show that the
normality of the residuals is not met by this model so before analyzing the rest we will transform
the y variable to fix the normality. To know which transformation to use, a box cox
transformation will be done to determine the best transformation.
Box-Cox Plot of Sulfur
Lower CL   Upper CL
Lambda
100
(using 95.0% confidence)
90                                                     Estimate          0.27

Lower CL          0.10
80                                                     Upper CL          0.43

70                                                     Rounded Value     0.27
StDev

60

50

40

30

20                                            Limit
10
-1         0        1        2      3
Lambda

Looking at the box cox output, the lambda estimate is what determines the best transformation is.
In this case the lambda is between two transformations, square root and natural log. The square
root function is done with a lambda of 0.5 and the natural log is done with a lambda of 0. The
square root transformation will be tried first to see if that solves the problem of normality and
another regression will be run.
Regression Analysis: sqrt(sulfur) versus popsize, company, ...
The regression equation is
sqrt(sulfur) = 9.28 - 0.000841 popsize + 0.00101 company - 0.0814 Temp
+ 0.000000 popsize*company

Predictor                                Coef             SE Coef                         T       P
Constant                                9.277               1.319                      7.03   0.000
popsize                            -0.0008412           0.0004891                     -1.72   0.088
company                             0.0010072           0.0005377                      1.87   0.063
Temp                                 -0.08139             0.02310                     -3.52   0.001
popsize*company                    0.00000046          0.00000025                      1.86   0.065

S = 1.75357   R-Sq = 28.3%   R-Sq(adj) = 26.1%
PRESS = 425.569   R-Sq(pred) = 23.71%

Analysis of Variance
Source           DF      SS                                 MS                    F           P
Regression        4 158.067                             39.517                12.85       0.000
Residual Error 130 399.749                               3.075
Total           134 557.816

The sum of squares for pure error is (nearly) zero.
Minitab cannot do the lack of fit test based on pure error.

Source                             DF     Seq SS
popsize                             1     22.473
company                             1     84.938
Temp                                1     40.014
popsize*company                     1     10.642

Unusual Observations
Obs popsize sqrt(sulfur)                              Fit                  SE Fit         Residual    St Resid
8     3369        10.488                         10.856                   1.206           -0.368       -0.29 X
23      179         9.695                          5.430                   0.250            4.265        2.46R
39      375         1.732                          5.567                   0.290           -3.835       -2.22R
44     1975         5.099                          3.949                   0.612            1.150        0.70 X
75     3369        10.488                         10.856                   1.206           -0.368       -0.29 X
90      179         9.695                          5.430                   0.250            4.265        2.46R
100      181         8.893                          5.043                   0.239            3.850        2.22R
107                                      375                                      1.721                        5.567                            0.290                             -3.846               -2.22R
111                                     1975                                      5.092                        3.950                            0.612                              1.142                0.69 X
127                                     1174                                      1.641                        5.868                            0.327                             -4.227               -2.45R

R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.

Durbin-Watson statistic = 1.78859

The first thing to look at again is the t-tests for each individual x variable and focus will be given
to the interaction terms first. The t-tests are slightly above the mark we would like to have
however some of the data set is not in the model and could influence the outcome of these tests
so we will analyze the rest of the model and assume that close tests are inconclusive and check
the validation data set in a regression and then if that is good then the full model will be ran and
confirm the assumptions made here. The F-test is great however looking at the R-sq value is
slightly lower than we would like as it tells us how much of the variability of y is accounted for
by the model. However, most of the models analyzed so far are around the area of 20 to 30% so
unless the other models can beat this R-sq values and still reached all the assumptions, this will
be the best model as long as the LINE assumptions check out and the full model and validation
model are good.

Residual Plots for sqrt(sulfur)
Probability Plot of RESI3
Normal Probability Plot                                                            Versus Fits                                                                               Normal
99.9                                                                            5.0
99                                                                                                                                                            99.9
Mean      3.907985E-15
90                                                                        2.5
StDev            1.727
Residual
Percent

99
N                  135
50                                                                        0.0
95                                                    P-Value          0.300
10
-2.5                                                                                 90
1
0.1                                                                       -5.0                                                                                 80
-5.0        -2.5      0.0          2.5         5.0                       2         4             6         8                   10                      70
Percent

Residual                                                                  Fitted Value                                             60
50
40
Histogram                                                                 Versus Order                                              30
5.0                                                                                  20
20                                                                                                                                                             10
2.5
Frequency

15                                                                                                                                                              5
Residual

0.0
10                                                                                                                                                              1
5                                                                        -2.5
0.1
0                                                                        -5.0
-3.0     -1.5     0.0     1.5         3.0    4.5                      1 10 20
-5.0   -2.5        0.0             2.5    5.0
30   40   50 60   70    80    9 0 10 0 11 0 1 20 1 3 0
Residual                                                                                                                                                   RESI3
Observation Order

Test for Equal Variances for RESI1
1.4142
1.4512
1.6409
1.7207
1.7321
1.8548                                                                                                           Bartlett's Test
1.9995
2.0000
2.1358
2.2351                                                                                                     Test Statistic          33.86
2.2361
2.6458
2.6767
2.7619                                                                                                     P-Value                 0.051
2.8284
2.8811
2.9321
2.9331
3.0000
3.1613                                                                                                           Lev ene's Test
3.1623
3.3166
3.4641
3.6056                                                                                                     Test Statistic           2.46
3.6837
3.7417
3.8730
3.9152
4.0000
4.0887                                                                                                     P-Value                 0.002
4.1231
4.2426
sqrt(sulfur)

4.5763
4.5826
4.6009
4.7958
4.8713
4.8990
4.9873
5.0000
5.0920
5.0990
5.2915
5.3793
5.3852
5.4772
5.5678
5.7232
5.7446
5.7817
5.7978
5.8112
5.8310
5.9051
5.9161
6.0000
6.0656
6.0828
6.1387
6.1644
6.2450
6.3246
6.3440
6.4778
6.7823
6.8051
6.8557
6.9282
6.9858
7.2011
7.2111
7.4833
7.5498
7.6811
7.7030
7.8102
8.0623
8.0839
8.1240
8.3066
8.8930
9.6954
10.4881
0   200   400    600   800 1000 1200 1400 1600
95% Bonferroni Confidence Intervals for StDevs

The LINE assumptions are now to be tested and in the residual plots it appears the only problem that
may be faced is the equal variances, but all the assumptions will be confirmed with their individual tests.
From the versus fit plot we see that linearity is great. Independence is tested using the Durbin-Watson
statistic in the regression analysis and needs to be around 2 to say that the residuals are independent. In
this case the Durbin-Watson statistic is close enough to 2 to assume that the residuals are independent.
The next assumption is normality which is tested using the Anderson-Darling test. In the test, the
outcome should be a rejection of the null hypothesis of the test which would verify that the residuals
are normal. In this case the A-D test has a good value and the corresponding p-value is above 0.05 by a
lot which means we fail to reject the null hypothesis and say the residuals are normal. The final test for
the LINE assumptions is to test equal variances using Bartlett’s test. The Bartlett’s test is the same as the
A-D test so we want to reject the null. In this case we see a close p-value to 0.05 in the test and a good
test statistic so at this point it will be concluded that variances are equal and will be verified and looked
at closely in the validation and full models. All LINE assumptions are met and now the validation model
will be analyzed for a confirmation of this model.

Regression Analysis: sqrt(sulfur s) versus popsize s, company s, ...
The regression equation is
sqrt(sulfur s) = 12.9 - 0.00232 popsize s + 0.00011 company s - 0.139 temp s
+ 0.000001 popsize*company s

Predictor                     Coef        SE Coef         T        P
Constant                    12.909          2.661      4.85    0.000
popsize s                -0.002324       0.001402     -1.66    0.105
company s                 0.000110       0.001559      0.07    0.944
temp s                    -0.13862        0.04893     -2.83    0.007
popsize*company s       0.00000122     0.00000183      0.66    0.511

S = 1.73709   R-Sq = 25.9%   R-Sq(adj) = 18.5%
PRESS = 148.418   R-Sq(pred) = 8.85%

Analysis of Variance
Source          DF       SS             MS       F        P
Regression       4   42.132         10.533    3.49    0.015
Residual Error 40 120.700            3.017
Total           44 162.832

There are no replicates.
Minitab cannot do the lack of fit test based on pure error.

Source                  DF   Seq SS
popsize s                1    3.632
company s                1    9.152
temp s                   1   28.019
popsize*company s        1    1.328

Unusual Observations
sqrt(sulfur
Obs popsize s            s)           Fit    SE Fit    Residual     St Resid
11        181        8.888         4.802     0.442       4.086         2.43R
21       1014        7.000         6.300     1.083       0.700         0.52 X
27       1513        5.916         4.553     1.020       1.363         0.97 X
38        277        1.891         5.352     0.385      -3.461        -2.04R

R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.

Durbin-Watson statistic = 2.21927

The validation model regression shows in the t-tests a slight problem which is that the slope of
the interaction is not relevant however that could just be that the size of the data set is affecting
the model and for now will be assumed is okay as the rest of the regression (the F-test and the R-
sq value) are good. We will look at the residuals for the validation model and make sure they
also meet the LINE assumptions for the residuals.

Residual Plots for sqrt(sulfur s)
Probability Plot of RESI1
Normal Probability Plot                                                         Versus Fits                                                                    Normal
99
4                                                                99
Mean      -3.34547E-15
90
2                                                                                                                         StDev            1.656

Residual
Percent

95                                                       N                   45
90
P-Value          0.494
10                                                                                -2
80
1                                                                         -4                                                               70
-5.0          -2.5           0.0          2.5           5.0                           3      4         5          6        7

Percent
60
Residual                                                          Fitted Value
50
40
Histogram                                                         Versus Order
30
12                                                                                4
20
9                                                                         2
Frequency

10
Residual

6                                                                         0                                                                5

3                                                                         -2
1
0                                                                         -4                                                                    -4   -3   -2   -1      0     1    2    3    4   5
-3    -2      -1     0    1    2         3   4                         1   5   10   15  20   25 30    35       40   45                                         RESI1
Residual                                                        Observation Order

Test for Equal Variances for RESI1
1.36571
1.73205                                                                                               Bartlett's Test
1.89102
2.00000                                                                                           Test Statistic     5.59
2.20848
2.23607                                                                                           P-Value           0.471
2.59843
2.61494                                                                                               Lev ene's Test
2.82843
3.00000
3.16228                                                                                           Test Statistic     0.81
3.60555                                                                                           P-Value           0.585
3.74166
sqrt(sulfur s)

4.12311
4.79583
4.89898
5.00289
5.09902
5.29150
5.49848
5.56776
5.83095
5.91608
6.00000
6.08276
6.10780
6.18471
6.20957
6.48074
6.78233
6.89480
7.00000
7.56116
7.81025
8.11778
8.88819
0      50       100      150      200       250
95% Bonferroni Confidence Intervals for StDevs

Looking at the residual plots it appears as though no assumptions is not met, but will be
confirmed using the same tests as before. The Durbin-Watson statistic is higher than 2 meaning
the residuals are independent and the regression is linear as both seen in the regression analysis.
The Anderson-Darling test again shows a great value for the test statistic and the p-value also
suggests that the null would not be rejected meaning the residuals are normally distributed. The
last assumption needed to be met is the equal variances again tested using the Bartlett test. The
Bartlett’s test output is exactly what is needed for verification of the equal variances. Being all
LINE assumptions are met a full data set regression will be ran to determine if the model is the
one we want to compare to other models for a chance to be the best model for this data set.

Regression Analysis: sqrt(sulfur_1) versus popsize_1, company_1, ...
The regression equation is
sqrt(sulfur_1) = 9.88 - 0.00101 popsize_1 + 0.000770 company_1 - 0.0911 Temp_1
+ 0.000001 popsize*company_1

Predictor                                                                               Coef                                   SE Coef                                 T              P
Constant                                                                               9.885                                     1.172                              8.43          0.000
popsize_1                                                                         -0.0010139                                 0.0004410                             -2.30          0.023
company_1                                                                          0.0007703                                 0.0004624                              1.67          0.098
Temp_1                                                                              -0.09110                                   0.02073                             -4.39          0.000
popsize*company_1                                                                 0.00000057                                0.00000023                              2.53          0.012

S = 1.74719   R-Sq = 26.1%   R-Sq(adj) = 24.4%
PRESS = 561.890   R-Sq(pred) = 22.70%

Analysis of Variance
Source                                                        DF                    SS                             MS                 F                     P
Regression                                                     4               189.595                         47.399             15.53                 0.000
Residual Error                                               176               537.268                          3.053
Total                                                        180               726.863

The sum of squares for pure error is (nearly) zero.
Minitab cannot do the lack of fit test based on pure error.

Source                                                                   DF                 Seq SS
popsize_1                                                                 1                 16.340
company_1                                                                 1                 90.910
Temp_1                                                                    1                 62.869
popsize*company_1                                                         1                 19.475

Unusual Observations
Obs popsize_1 sqrt(sulfur_1)                                                                                            Fit            SE Fit                  Residual          St Resid
11       3369        10.488                                                                                         10.857             1.197                    -0.369             -0.29 X
31        179         9.695                                                                                          5.447             0.211                     4.248              2.45R
44        181         8.888                                                                                          4.966             0.206                     3.922              2.26R
53        375         1.732                                                                                          5.615             0.249                    -3.883             -2.25R
59       1975         5.099                                                                                          3.664             0.561                     1.435              0.87 X
80       1174         1.732                                                                                          5.624             0.285                    -3.892             -2.26R
101       3369        10.488                                                                                         10.857             1.197                    -0.369             -0.29 X
121        179         9.695                                                                                          5.447             0.211                     4.248              2.45R
134        181         8.893                                                                                          4.966             0.206                     3.927              2.26R
143        375         1.721                                                                                          5.615             0.249                    -3.894             -2.25R
149       1975         5.092                                                                                          3.666             0.561                     1.426              0.86 X
170       1174         1.641                                                                                          5.625             0.285                    -3.984             -2.31R

R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.

Durbin-Watson statistic = 1.77597

Looking at the regression of the full data set model, the t-tests that were in question no longer are
as all slopes are important to this model. The F-test also indicates the model is good. Again there
is a slightly lower R-sq value than would be preferred, but again it is around the same percentage
as the rest of the models analyzed and the residuals will be examined to make sure the model
meets the LINE assumptions for the residuals.

Residual Plots for sqrt(sulfur_1)
Probability Plot of RESI1
Normal Probability Plot                                                      Versus Fits                                                             Normal
99.9                                                                   5.0
99                                                                                                                                99.9
Mean      3.874126E-15
90                                                                    2.5                                                                                                                  StDev            1.728
Residual

99
Percent

N                  181
95                                                      P-Value          0.384
10
-2.5                                                         90
1
80
0.1                                                                   -5.0
-5.0          -2.5      0.0       2.5         5.0                      2        4         6         8   10                   70
Percent

Residual                                                         Fitted Value                          60
50
40
Histogram                                                        Versus Order                           30
5.0                                                          20
24                                                                                                                                 10
2.5                                                           5
Frequency

18
Residual

0.0
12                                                                                                                                  1

6                                                                    -2.5
0.1
0                                                                    -5.0                                                               -5.0     -2.5        0.0             2.5    5.0
-3.0    -1.5     0.0      1.5    3.0                            1   20   40    60 80 100 120 140 160 180                                        RESI1
Residual                                                       Observation Order
Test for Equal Variances for RESI1
1.3657
1.4142
1.4512
1.6409
1.7207
1.7321
1.8548                                                              Bartlett's Test
1.8910
1.9995
2.0000
2.1358
2.2085                                                          Test Statistic    48.12
2.2351
2.2361
2.5984
2.6149
2.6458                                                          P-Value           0.005
2.6767
2.7619
2.8284
2.8811
2.9321
2.9331                                                              Lev ene's Test
3.0000
3.1613
3.1623
3.3166
3.4641
3.6056
3.6837                                                          Test Statistic     0.92
3.7417
3.8730
3.9152
4.0000                                                          P-Value           0.608
4.0887
4.1231
sqrt(sulfur_1)

4.2426
4.5763
4.5826
4.6009
4.7958
4.8713
4.8990
4.9873
5.0000
5.0029
5.0920
5.0990
5.2915
5.3793
5.3852
5.4772
5.4985
5.5678
5.7232
5.7446
5.7817
5.7978
5.8112
5.8310
5.9051
5.9161
6.0000
6.0656
6.0828
6.1078
6.1387
6.1644
6.1847
6.2096
6.2450
6.3246
6.3440
6.4778
6.4807
6.7823
6.8051
6.8557
6.8948
6.9282
6.9858
7.0000
7.2011
7.2111
7.4833
7.5498
7.5612
7.6811
7.7030
7.8102
8.0623
8.0839
8.1178
8.1240
8.3066
8.8882
8.8930
9.6954
10.4881
0     200     400      600     800     1000     1200
95% Bonferroni Confidence Intervals for StDevs

The residual plots appear that only one thing could be a possible issue as far as the LINE
assumptions are concerned and that is equal variances. The linearity is met and the independence
is met as seen in the regression analysis by the regression equation and the Durbin-Watson
statistic respectively. The next assumption is normality again tested by the Anderson-Darling test
and the test concludes that the residuals are normal for the same reasons as before. To test the
final assumption of equal variance we will use the Levene’s test instead of the Bartlett’s test as
the Levene’s test is very similar and results in the residuals having equal variances. This model
meets all LINE assumptions so a best subset analysis will determine other models to compare
this one to for validation that this model is the best.
Best Subsets Regression: Sulfur versus Temp, company, ...
Response is Sulfur
p
r
T   p           w   e
e   o           i   c
m   p           n   i
T   T   p   s   p       d   p
e   e   *   i   o   w   *   *
d   m   m   d   z   p   i   d   d
a   p   p   a   e   s   n   a   a
y   *   *   y   *   i   d   y   y
c   p           s   c   p   s   c   z   *   s   s
o   o       p   p   o   o   p   o   e   p   p   p
m   p       r   r   m   p   r   m   *   r   r   r
T   p   s   w   e   e   p   s   e   p   w   e   e   e
e   a   i   i   c   c   a   i   c   a   i   c   c   c
Mallows                                       m   n   z   n   i   i   n   z   i   n   n   i   i   i
Vars                        R-Sq       R-Sq(adj)                    Cp                  S                    p   y   e   d   p   p   y   e   p   y   d   p   p   p
1                        26.2            25.6                  35.1             19.323                        X
1                        24.7            24.1                  38.4             19.512                                                        X
2                        31.8            30.8                  24.3             18.633                    X                                   X
2                        31.7            30.7                  24.6             18.646                    X X
3                        35.0            33.6                  19.1             18.260                    X                         X             X
3                        34.9            33.4                  19.5             18.281                    X   X                     X
4                        37.2            35.2                  16.4             18.029                    X   X                 X   X
4                        36.7            34.8                  17.3             18.090                    X X                       X             X
5                        39.2            36.9                  13.8             17.801                    X X X                 X   X
5                        39.0            36.7                  14.1             17.826                    X   X               X X   X
6                        41.7            39.0                  10.2             17.502                    X               X     X   X             X   X
6                        40.3            37.5                  13.3             17.707                    X   X           X         X             X   X
7                        43.4            40.3                   8.3             17.311                                      X   X X X             X   X   X
7                        43.4            40.2                   8.4             17.319                    X                 X   X   X             X   X   X
8                        44.7            41.2                   7.4             17.176                    X               X X   X   X             X   X X
8   44.3        40.8       8.3   17.237    X         X     X       X       X   X   X       X
9   46.1        42.2       6.4   17.035    X   X     X     X       X       X   X   X   X
9   46.0        42.1       6.5   17.043    X         X     X   X   X       X   X   X   X
10   46.6        42.2       7.2   17.024    X   X     X     X       X       X   X   X   X   X
10   46.5        42.2       7.4   17.033    X         X     X   X   X       X   X   X   X   X
11   46.6        41.8       9.1   17.083    X   X     X     X       X   X   X   X   X   X   X
11   46.6        41.8       9.1   17.086    X   X   X X     X       X       X   X   X   X   X
12   46.6        41.4      11.0   17.149    X   X     X     X   X   X   X   X   X   X   X   X
12   46.6        41.4      11.0   17.149    X   X X   X     X       X   X   X   X   X   X   X
13   46.7        40.9      13.0   17.218    X   X X   X     X   X   X   X   X   X   X   X   X
13   46.6        40.9      13.0   17.219    X   X   X X     X   X   X   X   X   X   X   X   X
14   46.7        40.4      15.0   17.289    X   X X X X     X   X   X   X   X   X   X   X   X

The best subsets regression gives us 2 models of each size up to using all the predictor variables.
To figure out which ones are the best, the R-Sq value and the Mallows Cp will be examined. The
higher the R-Sq and the lower the Mallows the better the models are for the data. The top five
models will be examined further to compare to the model above for best model for the data set.
The five models are both 9 variable models given, both 10 variable models and the lower
Mallows and higher R-Sq 8variable model.

The first model to be analyzed includes the variables below and the model will also have to
include any non-interaction terms that are included in the interactions.
1) Temp, Precip, daysprecip, temp*popsize, popsize*company, popsize*wind, wind*precip,
wind*daysprecip

Regression Analysis: Sulfur versus Temp, company, ...
The regression equation is
Sulfur = 82.8 - 1.27 Temp + 0.00934 company - 3.42 precip + 0.768 daysprecip
+ 0.00105 Temp*popsize + 0.0055 popsize + 0.000012 popsize*company
- 0.00839 popsize*wind + 0.30 wind + 0.362 wind*precip
- 0.0682 wind*daysprecip

Predictor                Coef       SE Coef           T       P
Constant                82.77         51.47        1.61   0.110
Temp                  -1.2653        0.5035       -2.51   0.013
company              0.009338      0.005501        1.70   0.092
precip                 -3.416         1.104       -3.09   0.002
daysprecip             0.7682        0.4001        1.92   0.057
Temp*popsize        0.0010451     0.0007652        1.37   0.175
popsize               0.00548       0.05590        0.10   0.922
popsize*company    0.00001215    0.00000297        4.09   0.000
popsize*wind        -0.008386      0.003570       -2.35   0.020
wind                    0.297         4.730        0.06   0.950
wind*precip            0.3625        0.1164        3.11   0.002
wind*daysprecip      -0.06822       0.04159       -1.64   0.104

S = 17.1725   R-Sq = 46.1%   R-Sq(adj) = 41.2%
PRESS = 43799.3   R-Sq(pred) = 34.86%

Analysis of Variance
Source           DF      SS           MS      F           P
Regression       11 30971.5       2815.6   9.55       0.000
Residual Error 123 36272.1         294.9
Total           134 67243.6

The sum of squares for pure error is (nearly) zero.
Minitab cannot do the lack of fit test based on pure error.
Source             DF    Seq SS
Temp                1    6911.2
company             1   14437.1
precip              1       7.9
daysprecip          1     612.1
Temp*popsize        1     111.5
popsize             1     363.1
popsize*company     1    4663.1
popsize*wind        1     222.2
wind                1     781.0
wind*precip         1    2068.9
wind*daysprecip     1     793.3

Unusual Observations
Obs Temp Sulfur        Fit     SE Fit   Residual   St Resid
1 70.3    10.00   19.93       12.10      -9.93      -0.82 X
8 50.6 110.00 112.15          11.82      -2.15      -0.17 X
23 50.0    94.00    44.88       3.98      49.12       2.94R
49 49.5    66.00    30.63       6.88      35.37       2.25R
75 50.6 110.00 112.15          11.82      -2.15      -0.17 X
90 50.0    94.00    44.88       3.98      49.12       2.94R
100 56.6    79.09    37.09       4.30      42.00       2.53R
109 60.3    16.72    17.48       9.67      -0.76      -0.05 X

R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.

Durbin-Watson statistic = 1.74534

Looking at the regression analysis again the t-tests will be analyzed first to determine if any
interaction term slopes are not important to the model. From the t-tests, it can be concluded that
only three interactions terms should be in the regression analysis. The three interactions are
popsize*company, popsize*wind, and wind*precipitation. The regression will be rerun with the
same non-interaction terms and the three interaction terms from above.

Regression Analysis: Sulfur versus Temp, company, ...
The regression equation is
Sulfur = 104 - 0.630 Temp + 0.00850 company - 2.80 precip + 0.119 daysprecip
+ 0.0553 popsize - 5.67 wind + 0.000011 popsize*company
- 0.00741 popsize*wind + 0.300 wind*precip

Predictor                Coef        SE Coef       T       P
Constant               103.75          38.11    2.72   0.007
Temp                  -0.6296         0.2462   -2.56   0.012
company              0.008501       0.005560    1.53   0.129
precip                -2.7962         0.8864   -3.15   0.002
daysprecip            0.11869        0.06137    1.93   0.055
popsize               0.05531        0.03323    1.66   0.098
wind                   -5.670          3.730   -1.52   0.131
popsize*company    0.00001059     0.00000282    3.75   0.000
popsize*wind        -0.007409       0.003563   -2.08   0.040
wind*precip           0.29964        0.09402    3.19   0.002

S = 17.4170   R-Sq = 43.6%   R-Sq(adj) = 39.5%
PRESS = 44558.4   R-Sq(pred) = 33.74%

Analysis of Variance
Source           DF      SS           MS       F       P
Regression        9 29324.5       3258.3   10.74   0.000
Residual Error 125 37919.1         303.4
Total                                                  134      67243.6

The sum of squares for pure error is (nearly) zero.
Minitab cannot do the lack of fit test based on pure error.

Source                                                  DF       Seq SS
Temp                                                     1       6911.2
company                                                  1      14437.1
precip                                                   1          7.9
daysprecip                                               1        612.1
popsize                                                  1        155.4
wind                                                     1          3.6
popsize*company                                          1       3155.5
popsize*wind                                             1        960.6
wind*precip                                              1       3081.3

Unusual Observations
Obs Temp Sulfur        Fit                                                              SE Fit                 Residual                    St Resid
1 70.3    10.00   32.14                                                                10.91                   -22.14                       -1.63 X
8 50.6 110.00 112.90                                                                   11.99                    -2.90                       -0.23 X
18 47.1    11.00    44.99                                                                5.19                   -33.99                       -2.04R
23 50.0    94.00    42.65                                                                3.88                    51.35                        3.02R
49 49.5    66.00    31.29                                                                6.70                    34.71                        2.16R
58 57.2     7.00    40.90                                                                6.00                   -33.90                       -2.07R
75 50.6 110.00 112.90                                                                   11.99                    -2.90                       -0.23 X
86 47.1    11.00    44.99                                                                5.19                   -33.99                       -2.04R
90 50.0    94.00   42.65                                                                 3.88                    51.35                        3.02R
100 56.6    79.09    39.24                                                                4.26                    39.84                        2.36R
109 60.3    16.72    21.73                                                                9.23                    -5.02                       -0.34 X

R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.

Durbin-Watson statistic = 1.80550

From the regression analysis of the three variables, it can be concluded that all the three
interaction terms are relevant to the model and the rest can be analyzed to see if the model’s
residuals should be analyzed. The F-test is good and the R-sq value is around the same as the
previous models analyzed. The residuals now need to be analyzed to make sure the LINE
assumptions are met.

Residual Plots for Sulfur
Probability Plot of RESI4
Normal Probability Plot                                          Versus Fits                                                            Normal
99.9
99
99.9
40                                                                                                              Mean      1.997415E-14
90                                                                                                                                                                        StDev            16.82
99
Residual

20
Percent

N                  135
50
95                                                  P-Value          0.144
10
-20                                                         90
1
-40                                                         80
0.1
-50      -25       0        25      50                    0       25       50        75    100                  70
Percent

Residual                                                 Fitted Value                           60
50
40
Histogram                                               Versus Order                             30
20
20
40                                                          10
Frequency

15                                                                                                                     5
Residual

20
10                                                         0                                                           1
5                                                        -20
0.1
0                                                        -40
-30    -15     0     15     30    45                     1 10 20 30 40 50 60 70 80 90 00 10 20 30                     -50   -25         0              25     50
1 1 1 1
Residual                                                                                                               RESI4
Observation Order
Test for Equal Variances for RESI4
2
2
3
3
3
3                                                                 Bartlett's Test
4
4
5
5                                                             Test Statistic    35.31
5
7
7
8                                                             P-Value           0.036
8
8
9
9
9
10                                                                 Lev ene's Test
10
11
12
13                                                             Test Statistic     2.19
14
14
15
15
16
17                                                             P-Value           0.007
17
18
21
21
21
23
24
24
Sulfur

25
25
26
26
28
29
29
30
31
33
33
33
34
34
34
35
35
36
37
37
38
38
39
40
40
42
46
46
47
48
49
52
52
56
57
59
59
61
65
65
66
69
79
94
110
0     2000   4000   6000   8000 10000 12000 14000 16000
95% Bonferroni Confidence Intervals for StDevs

From the regression analysis we see that the linearity and independence are met from the
regression equation and the Durbin-Watson statistic being about 2 respectively. The next
assumptions to be tested are normality and equal variances. The normality of the residuals is met
given by the Anderson-Darling test again. Unfortunately the variances of the residuals are not
equal given by the Bartlett’s test so a transformation will be done on the sulfur variable. The
same transformation used above of the square root will be tested first.

Regression Analysis: sqrt(sulfur) versus Temp, company, ...
The regression equation is
sqrt(sulfur) = 13.2 - 0.0703 Temp + 0.000862 company + 0.00515 popsize
- 0.589 wind - 0.272 precip + 0.0110 daysprecip
+ 0.000001 popsize*company - 0.000657 popsize*wind
+ 0.0289 wind*precip

Predictor                                         Coef                  SE Coef                         T       P
Constant                                        13.166                    3.706                      3.55   0.001
Temp                                          -0.07033                  0.02394                     -2.94   0.004
company                                      0.0008617                0.0005407                      1.59   0.114
popsize                                       0.005147                 0.003231                      1.59   0.114
wind                                           -0.5890                   0.3628                     -1.62   0.107
precip                                        -0.27248                  0.08621                     -3.16   0.002
daysprecip                                    0.011009                 0.005969                      1.84   0.067
popsize*company                             0.00000069               0.00000027                      2.52   0.013
popsize*wind                                -0.0006567                0.0003465                     -1.90   0.060
wind*precip                                   0.028947                 0.009143                      3.17   0.002

S = 1.69384   R-Sq = 35.7%   R-Sq(adj) = 31.1%
PRESS = 423.103   R-Sq(pred) = 24.15%

Analysis of Variance
Source           DF      SS                                               MS                       F       P
Regression        9 199.179                                           22.131                    7.71   0.000
Residual Error 125 358.637                                             2.869
Total           134 557.816

The sum of squares for pure error is (nearly) zero.
Minitab cannot do the lack of fit test based on pure error.

Source                                      DF      Seq SS
Temp                                         1      67.683
company                                      1      78.110
popsize                                      1       1.632
wind                                         1       0.000
precip                                       1       0.024
daysprecip                                   1       5.041
popsize*company                                                                1        10.620
popsize*wind                                                                   1         7.313
wind*precip                                                                    1        28.757

Unusual Observations
Obs Temp sqrt(sulfur)                                                                                        Fit                  SE Fit               Residual                St Resid
1 70.3          3.162                                                                                    5.359                   1.061                 -2.196                   -1.66 X
8 50.6         10.488                                                                                   10.800                   1.166                 -0.312                   -0.25 X
18 47.1          3.317                                                                                    6.574                   0.505                 -3.257                   -2.01R
23 50.0          9.695                                                                                    6.264                   0.377                  3.431                    2.08R
39 45.6          1.732                                                                                    5.202                   0.525                 -3.470                   -2.15R
58 57.2          2.646                                                                                    6.136                   0.584                 -3.490                   -2.19R
75 50.6         10.488                                                                                   10.800                   1.166                 -0.312                   -0.25 X
86 47.1          3.317                                                                                    6.574                   0.505                 -3.257                   -2.01R
90 50.0          9.695                                                                                    6.264                   0.377                  3.431                    2.08R
107 45.6          1.721                                                                                    5.207                   0.525                 -3.486                   -2.16R
109 60.3          4.089                                                                                    4.171                   0.898                 -0.082                   -0.06 X
127 53.3          1.641                                                                                    5.401                   0.378                 -3.761                   -2.28R

R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.

Durbin-Watson statistic = 1.82914

Looking at the regression analysis with the transformation of the sulfur variable shows a few
variables that should technically be taken out of the model, but the closeness of the test could be
inconclusive as some of the data set is missing and could adjust this in the full model. The rest of
the analysis (F-test and the R-Sq value) are good and the residuals will be analyzed to make sure
LINE assumptions are met.

Residual Plots for sqrt(sulfur)
Probability Plot of RESI2
Normal Probability Plot                                                      Versus Fits                                                                 Normal
99.9                                                                              4
99                                                                                                                                           99.9
Mean      4.291218E-15
90                                                                          2                                                                                                                       StDev            1.636
Residual

99
Percent

N                  135
95                                                    P-Value          0.278
10                                                                          -2                                                                90
1
-4                                                                80
0.1
-5.0           -2.5      0.0          2.5           5.0                   2        4          6         8           10                 70
Percent

Residual                                                            Fitted Value                                60
50
40
Histogram                                                           Versus Order                                 30
4                                                                 20
20                                                                                                                                            10
2
Frequency

5
Residual

15
0
10                                                                                                                                             1
-2
5
0.1
0                                                                          -4
-4       -3     -2    -1    0     1         2   3                         1 10 20 30 40                                                      -5.0   -2.5        0.0             2.5    5.0
50 60 70 80 90 100 110 120 130
Residual                                                                                                                                RESI2
Observation Order

Test for Equal Variances for RESI2
1.4142
1.4512
1.6409
1.7207
1.7321
1.8548                                                                                                Bartlett's Test
1.9995
2.0000
2.1358
2.2351                                                                                            Test Statistic    36.74
2.2361
2.6458
2.6767
2.7619                                                                                            P-Value           0.025
2.8284
2.8811
2.9321
2.9331
3.0000
3.1613                                                                                                Lev ene's Test
3.1623
3.3166
3.4641
3.6056                                                                                            Test Statistic     2.13
3.6837
3.7417
3.8730
3.9152
4.0000
4.0887                                                                                            P-Value           0.009
4.1231
4.2426
sqrt(sulfur)

4.5763
4.5826
4.6009
4.7958
4.8713
4.8990
4.9873
5.0000
5.0920
5.0990
5.2915
5.3793
5.3852
5.4772
5.5678
5.7232
5.7446
5.7817
5.7978
5.8112
5.8310
5.9051
5.9161
6.0000
6.0656
6.0828
6.1387
6.1644
6.2450
6.3246
6.3440
6.4778
6.7823
6.8051
6.8557
6.9282
6.9858
7.2011
7.2111
7.4833
7.5498
7.6811
7.7030
7.8102
8.0623
8.0839
8.1240
8.3066
8.8930
9.6954
10.4881
0   200   400    600   800 1000 1200 1400 1600
95% Bonferroni Confidence Intervals for StDevs
The LINE assumptions appear to be met in the residual plots, but again will be confirmed using
their respective tests. The only possible issue will again be the variances of the residuals. The
linearity and independence of the residuals are okay as seen in the regression analysis by the
regression equation and Durbin-Watson statistic respectively. The normality of the residuals is
again tested using the Anderson-Darling test and again passes the tests concluding the residuals
are normal. The final test suggest the variances are unequal however the test is close to being a
result of equal so the validation model and full model will be examined to make sure this is the
case. The validation regression will be ran first.

Regression Analysis: sqrt(sulfur s) versus temp s, company s, ...
The regression equation is
sqrt(sulfur s) = 6.72 - 0.0596 temp s - 0.00176 company s + 0.00718 popsize s
- 0.080 wind s - 0.163 precip s + 0.0393 daysprecip s
+ 0.000004 popsize*company s - 0.00111 popsize*wind s
+ 0.0139 wind*precip s

Predictor                  Coef       SE Coef         T        P
Constant                  6.725         5.625      1.20    0.240
temp s                 -0.05959       0.05903     -1.01    0.320
company s             -0.001757      0.001868     -0.94    0.353
popsize s              0.007183      0.006964      1.03    0.309
wind s                  -0.0796        0.5269     -0.15    0.881
precip s                -0.1631        0.1432     -1.14    0.262
daysprecip s            0.03932       0.01211      3.25    0.003
popsize*company s    0.00000359    0.00000214      1.68    0.102
popsize*wind s       -0.0011055     0.0007830     -1.41    0.167
wind*precip s           0.01387       0.01446      0.96    0.344

S = 1.57438   R-Sq = 46.7%   R-Sq(adj) = 33.0%
PRESS = 158.996   R-Sq(pred) = 2.36%

Analysis of Variance
Source          DF       SS         MS      F       P
Regression       9   76.078      8.453   3.41   0.004
Residual Error 35    86.754      2.479
Total           44 162.832

There are no replicates.
Minitab cannot do the lack of fit test based on pure error.

Source               DF   Seq SS
temp s                1   31.624
company s             1    0.430
popsize s             1    8.750
wind s                1    5.183
precip s              1    0.879
daysprecip s          1   19.940
popsize*company s     1    3.729
popsize*wind s        1    3.265
wind*precip s         1    2.279

Unusual Observations
sqrt(sulfur
Obs temp s            s)       Fit   SE Fit     Residual   St Resid
39    49.5        8.118     5.145    0.916        2.973       2.32R
45    58.7        2.598     6.583    0.951       -3.984      -3.18R

R denotes an observation with a large standardized residual.

Durbin-Watson statistic = 1.93603
Looking at the regression analysis first we see that the t-tests are slightly off however that could
just be because the data set that was picked out from the data set could be influencing the
outcome of the entire data set as a whole as the rest of the regression appears to be good and the
R-Sq is fine compared to the other analyses ran so far. The residuals need to be looked at to see
if the full model will be an appropriate one to analyze.
Residual Plots for sqrt(sulfur s)
Probability Plot of RESI1
Normal Probability Plot                                                            Versus Fits                                                                       Normal
99                                                                               4
99
Mean      -2.67440E-15
90                                                                               2
Residual                                                                                                                                  StDev            1.404
Percent

95                                                    N                   45
50                                                                               0
90
-2                                                                                                                             P-Value          0.050
10
80
1                                                                        -4                                                                       70
-4             -2             0          2           4                            2               4                   6

Percent
60
Residual                                                           Fitted Value
50
40
Histogram                                                          Versus Order
30
16                                                                               4
20

12                                                                               2
10
Frequency

Residual

0                                                                        5
8

4                                                                        -2
1
0                                                                        -4                                                                            -4   -3   -2   -1      0       1     2   3   4
-4   -3        -2    -1    0    1       2   3                        1   5       10   15  20   25 30    35           40   45                                      RESI1
Residual                                                         Observation Order

Test for Equal Variances for RESI1
1.36571
1.73205                                                                                                  Bartlett's Test
1.89102
2.00000                                                                                              Test Statistic        5.03
2.20848
2.23607                                                                                              P-Value              0.541
2.59843
2.61494                                                                                                  Lev ene's Test
2.82843
3.00000
3.16228                                                                                              Test Statistic        3.42
3.60555                                                                                              P-Value              0.048
3.74166
sqrt(sulfur s)

4.12311
4.79583
4.89898
5.00289
5.09902
5.29150
5.49848
5.56776
5.83095
5.91608
6.00000
6.08276
6.10780
6.18471
6.20957
6.48074
6.78233
6.89480
7.00000
7.56116
7.81025
8.11778
8.88819
0      50     100      150     200      250     300
95% Bonferroni Confidence Intervals for StDevs

All of the LINE assumptions appear to be met in the residual plots, with the only possible one
that might not be met is normality. The linearity and independence are met by looking at the
regression equation and the Durbin-Watson statistic again. The normality is on the borderline of
pass fail and could just be because of the size of the data set. The variances are all equal and the
full model should be ran because all assumptions are met.

Regression Analysis: sqrt(sulfur_ versus wind*precip_, popsize*wind, ...
The regression equation is
sqrt(sulfur_1) = 13.5 + 0.0271 wind*precip_1 - 0.000618 popsize*wind_1
+ 0.000001 popsize*company_1 + 0.0156 daysprecip_1
- 0.257 precip_1 - 0.616 wind_1 + 0.00467 popsize_1
+ 0.000654 company_1 - 0.0779 Temp_1

Predictor                                                                                 Coef                                   SE Coef                                      T             P
Constant                                                                                13.453                                     2.952                                   4.56         0.000
wind*precip_1                                                                         0.027092                                  0.007558                                   3.58         0.000
popsize*wind_1                                                                      -0.0006184                                 0.0003051                                  -2.03         0.044
popsize*company_1                                                                   0.00000078                                0.00000025                                   3.09         0.002
daysprecip_1                                                                          0.015627                                  0.005207                                   3.00         0.003
precip_1                                                                              -0.25748                                   0.07178                                  -3.59         0.000
wind_1                                                                                 -0.6159                                    0.2888                                  -2.13         0.034
popsize_1                                                                             0.004669                                  0.002841                                   1.64         0.102
company_1                                                                            0.0006535                                 0.0004663                                   1.40         0.163
Temp_1                                                                                -0.07793                                   0.02121                                  -3.67         0.000
S = 1.66820   R-Sq = 34.5%   R-Sq(adj) = 31.1%
PRESS = 534.539   R-Sq(pred) = 26.46%

Analysis of Variance
Source           DF      SS            MS        F        P
Regression        9 250.990        27.888    10.02    0.000
Residual Error 171 475.873          2.783
Total           180 726.863

The sum of squares for pure error is (nearly) zero.
Minitab cannot do the lack of fit test based on pure error.

Source                DF    Seq SS
wind*precip_1          1     0.181
popsize*wind_1         1    17.201
popsize*company_1      1   100.163
daysprecip_1           1    23.446
precip_1               1    23.071
wind_1                 1    22.712
popsize_1              1    17.649
company_1              1     9.007
Temp_1                 1    37.561

Unusual Observations
Obs wind*precip_1 sqrt(sulfur_1)             Fit     SE Fit   Residual   St Resid
1             42         3.162           4.965      0.943     -1.802      -1.31 X
11            358        10.488          10.829      1.144     -0.341      -0.28 X
25            448         3.317           6.579      0.440     -3.262      -2.03R
31            453         9.695           6.184      0.329      3.512       2.15R
53            564         1.732           5.290      0.458     -3.558      -2.22R
56            183         4.123           3.870      0.704      0.253       0.17 X
67            415         5.916           2.504      0.496      3.412       2.14R
78            607         2.646           5.866      0.500     -3.220      -2.02R
80            350         1.732           5.051      0.332     -3.319      -2.03R
101            358        10.488          10.829      1.144     -0.341      -0.28 X
115            448         3.317           6.579      0.440     -3.262      -2.03R
121            453         9.695           6.184      0.329      3.512       2.15R
143            565         1.721           5.292      0.458     -3.571      -2.23R
146            182         4.089           3.888      0.699      0.201       0.13 X
157            413         5.905           2.520      0.489      3.386       2.12R
168            605         2.615           5.852      0.493     -3.238      -2.03R
170            350         1.641           5.055      0.332     -3.415      -2.09R

R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.

Durbin-Watson statistic = 1.89774

The regression analysis of the full model shows that all of the slopes of the x variables are
important so the assumption from above is confirmed and the rest of the analysis shows good
results so the residuals will be analyzed to see if it should be included in the final debate of best
model.
Residual Plots for sqrt(sulfur_1)
Probability Plot of RESI1
Normal Probability Plot                                                   Versus Fits                                                                     Normal
99.9                                                                           4
99
99.9
Mean      2.869405E-15
90                                                                      2                                                                                                                          StDev            1.626
99

Residual
Percent

N                  181
95                                                       P-Value          0.022
10
-2                                                                90
1
-4                                                                80
0.1
-5.0         -2.5          0.0        2.5       5.0                   2        4         6         8           10                  70

Percent
Residual                                                     Fitted Value                                 60
50
40
Histogram                                                        Versus Order                                  30
30                                                                      4                                                                 20
10
2                                                                  5
Frequency

20

Residual
0
1
10
-2
0.1
0                                                                      -4                                                                      -5.0      -2.5        0.0             2.5    5.0
-3    -2     -1      0     1     2   3                         1   20   40   60 80 100 120 140 160 180                                                 RESI1
Residual                                                   Observation Order

Test for Equal Variances for RESI1
1.3657
1.4142
1.4512
1.6409
1.7207
1.7321
1.8548                                                                                            Bartlett's Test
1.8910
1.9995
2.0000
2.1358
2.2085                                                                                        Test Statistic    48.26
2.2351
2.2361
2.5984
2.6149
2.6458                                                                                        P-Value           0.005
2.6767
2.7619
2.8284
2.8811
2.9321
2.9331                                                                                            Lev ene's Test
3.0000
3.1613
3.1623
3.3166
3.4641
3.6056
3.6837                                                                                        Test Statistic     1.05
3.7417
3.8730
3.9152
4.0000                                                                                        P-Value           0.415
4.0887
4.1231
sqrt(sulfur_1)

4.2426
4.5763
4.5826
4.6009
4.7958
4.8713
4.8990
4.9873
5.0000
5.0029
5.0920
5.0990
5.2915
5.3793
5.3852
5.4772
5.4985
5.5678
5.7232
5.7446
5.7817
5.7978
5.8112
5.8310
5.9051
5.9161
6.0000
6.0656
6.0828
6.1078
6.1387
6.1644
6.1847
6.2096
6.2450
6.3246
6.3440
6.4778
6.4807
6.7823
6.8051
6.8557
6.8948
6.9282
6.9858
7.0000
7.2011
7.2111
7.4833
7.5498
7.5612
7.6811
7.7030
7.8102
8.0623
8.0839
8.1178
8.1240
8.3066
8.8882
8.8930
9.6954
10.4881
0    100    200    300    400    500     600    700
95% Bonferroni Confidence Intervals for StDevs

The LINE assumptions appear to be good with the possible exception of normality in the residual
plots. The linearity and independence are good as given by the regression equation and the
Durbin-Watson statistic given in the regression analysis. The variances are equal as given by the
Levene’s test on the residuals. The normality is slightly off however, being this is the only thing
wrong with the model it will still be considered in the final debate for best model as the
normality is only slightly off.

The second choice from the best subset regression will now be analyzed with the following
variables and the mono-x variables as needed:
2) Temp, company, precip, days precip, temp*popsize, popsize*company, popsize*wind,
wind*precip, wind*daysprecip

Regression Analysis: Sulfur versus Temp, precip, ...
The regression equation is
Sulfur = 56.0 - 0.729 Temp - 3.89 precip + 0.857 daysprecip + 0.0675 popsize
- 0.02 wind + 0.000157 Temp*company + 0.000011 popsize*company
- 0.00867 popsize*wind + 0.413 wind*precip - 0.0776 wind*daysprecip

Predictor                                                                      Coef                                   SE Coef                            T                       P
Constant                                                                      56.02                                     46.53                         1.20                   0.231
Temp                                                                        -0.7293                                    0.2429                        -3.00                   0.003
precip                                                                       -3.893                                     1.049                        -3.71                   0.000
daysprecip                                                                   0.8571                                    0.3945                         2.17                   0.032
popsize                                                                     0.06745                                   0.03334                         2.02                   0.045
wind                                                                         -0.016                                     4.748                        -0.00                   0.997
Temp*company                                                             0.00015651                                0.00009800                         1.60                   0.113
popsize*company                                                          0.00001096                                0.00000268                         4.09                   0.000
popsize*wind                                                              -0.008673                                  0.003570                        -2.43                   0.017
wind*precip                                                                  0.4135                                    0.1105                         3.74                   0.000
wind*daysprecip       -0.07756       0.04099    -1.89   0.061

S = 17.2306     R-Sq = 45.3%      R-Sq(adj) = 40.8%

PRESS = 43724.2       R-Sq(pred) = 34.98%

Analysis of Variance
Source           DF      SS            MS       F        P
Regression       10 30428.6        3042.9   10.25    0.000
Residual Error 124 36815.0          296.9
Total           134 67243.6

The sum of squares for pure error is (nearly) zero.
Minitab cannot do the lack of fit test based on pure error.

Source             DF   Seq SS
Temp                1   6911.2
precip              1    240.8
daysprecip          1    607.0
popsize             1   4628.4
wind                1      0.0
Temp*company        1   9248.5
popsize*company     1   3637.7
popsize*wind        1    988.6
wind*precip         1   3103.7
wind*daysprecip     1   1062.7

Unusual Observations

Obs   Temp   Sulfur      Fit     SE Fit   Residual   St Resid
1   70.3    10.00    21.48      12.06     -11.48      -0.93 X
8   50.6   110.00   112.51      11.86      -2.51      -0.20 X
23   50.0    94.00    43.70       3.84      50.30       2.99R
49   49.5    66.00    29.04       6.74      36.96       2.33R
50   66.2    35.00     1.61       5.67      33.39       2.05R
58   57.2     7.00    40.10       5.97     -33.10      -2.05R
75   50.6   110.00   112.51      11.86      -2.51      -0.20 X
90   50.0    94.00    43.70       3.84      50.30       2.99R
100   56.6    79.09    37.66       4.31      41.42       2.48R
109   60.3    16.72    16.41       9.71       0.31       0.02 X
117   66.2    34.87     1.83       5.59      33.04       2.03R

R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.

Durbin-Watson statistic = 1.73531

Looking at the regression some of the interaction term slopes are not relevant to the model and
should be excluded from the model. The remaining variables are the same as the adjusted
regression from the first choice of regression so the adjusted regression for this model has
already been done and does not need to be ran again.

The third choice from the best subset will now be analyzed with the following variables through
a regression analysis with the mono-x variables included as needed:
3) Temp, precip, daysprecip, temp*company, temp*popsize, popsize*company,
popsize*wind, wind*precip, wind*daysprecip

Regression Analysis: Sulfur versus Temp, precip, ...
The regression equation is
Sulfur = 81.3 - 1.24 Temp - 3.41 precip + 0.772 daysprecip + 0.0043 popsize
+ 0.0133 company + 0.34 wind - 0.000071 Temp*company
+ 0.00106 Temp*popsize + 0.000012 popsize*company
- 0.00835 popsize*wind + 0.362 wind*precip - 0.0686 wind*daysprecip

Predictor               Coef       SE Coef       T       P
Constant               81.32         53.64    1.52   0.132
Temp                 -1.2445        0.5462   -2.28   0.024
precip                -3.413         1.109   -3.08   0.003
daysprecip            0.7718        0.4032    1.91   0.058
popsize              0.00431       0.05733    0.08   0.940
company              0.01329       0.03955    0.34   0.737
wind                   0.335         4.765    0.07   0.944
Temp*company      -0.0000709     0.0007031   -0.10   0.920
Temp*popsize       0.0010605     0.0007834    1.35   0.178
popsize*company   0.00001207    0.00000309    3.91   0.000
popsize*wind       -0.008352      0.003599   -2.32   0.022
wind*precip           0.3620        0.1170    3.09   0.002
wind*daysprecip     -0.06863       0.04196   -1.64   0.104

S = 17.2420   R-Sq = 46.1%   R-Sq(adj) = 40.8%
PRESS = 44500.2   R-Sq(pred) = 33.82%

Analysis of Variance
Source           DF      SS         MS      F        P
Regression       12 30974.6     2581.2   8.68    0.000
Residual Error 122 36269.0       297.3
Total           134 67243.6

The sum of squares for pure error is (nearly) zero.
Minitab cannot do the lack of fit test based on pure error.

Source            DF   Seq SS
Temp               1   6911.2
precip             1    240.8
daysprecip         1    607.0
popsize            1   4628.4
company            1   9736.1
wind               1      3.6
Temp*company       1    285.9
Temp*popsize       1    595.2
popsize*company    1   4149.4
popsize*wind       1    967.4
wind*precip        1   2054.1
wind*daysprecip    1    795.4

Unusual Observations
Obs Temp Sulfur        Fit   SE Fit   Residual    St Resid
1 70.3    10.00   20.05     12.21     -10.05       -0.83 X
8 50.6 110.00 112.14        11.87      -2.14       -0.17 X
23 50.0    94.00    44.83     4.02      49.17        2.93R
49 49.5    66.00    30.52     6.99      35.48        2.25R
75 50.6 110.00 112.14        11.87      -2.14       -0.17 X
90 50.0    94.00    44.83     4.02      49.17        2.93R
100 56.6    79.09    37.02     4.37      42.06        2.52R
109 60.3    16.72    17.25     9.97      -0.53       -0.04 X

R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.

Durbin-Watson statistic = 1.74616
As seen from the second choice, the same interaction terms would be left behind in a reduced
regression model, as the first reduced regression model ran from the best subsets regression, as
seen from analysis from the t-tests of each x variables slopes. The regression has already been
ran and does not need to be run again.

The fourth choice from the best subsets regression will now be analyzed with the following
variables and the necessary mono-x variables:
4) Temp, company, precip, daysprecip, temp*popsize, popsize*company, popsize*wind,
wind*precip, wind*daysprecip, precip*daysprecip

Regression Analysis: Sulfur versus Temp, company, ...
The regression equation is
Sulfur = 73.3 - 1.17 Temp + 0.00829 company + 0.0102 popsize - 1.37 wind
- 2.86 precip + 0.845 daysprecip + 0.00103 Temp*popsize
+ 0.000013 popsize*company - 0.00883 popsize*wind + 0.366 wind*precip
- 0.0541 wind*daysprecip - 0.00586 precip*daysprecip

Predictor                  Coef        SE Coef        T       P
Constant                  73.26          52.10     1.41   0.162
Temp                    -1.1696         0.5100    -2.29   0.024
company                0.008286       0.005573     1.49   0.140
popsize                 0.01021        0.05600     0.18   0.856
wind                     -1.372          4.951    -0.28   0.782
precip                   -2.862          1.208    -2.37   0.019
daysprecip               0.8448         0.4053     2.08   0.039
Temp*popsize          0.0010337      0.0007644     1.35   0.179
popsize*company      0.00001252     0.00000299     4.19   0.000
popsize*wind          -0.008827       0.003587    -2.46   0.015
wind*precip              0.3662         0.1164     3.15   0.002
wind*daysprecip        -0.05409        0.04340    -1.25   0.215
precip*daysprecip     -0.005859       0.005194    -1.13   0.261

S = 17.1535   R-Sq = 46.6%   R-Sq(adj) = 41.4%
PRESS = 44001.5   R-Sq(pred) = 34.56%

Analysis of Variance
Source           DF      SS           MS      F       P
Regression       12 31346.1       2612.2   8.88   0.000
Residual Error 122 35897.5         294.2
Total           134 67243.6

The sum of squares for pure error is (nearly) zero.
Minitab cannot do the lack of fit test based on pure error.

Source               DF    Seq SS
Temp                  1    6911.2
company               1   14437.1
popsize               1     134.9
wind                  1       3.3
precip                1       5.4
daysprecip            1     635.4
Temp*popsize          1     339.3
popsize*company       1    4640.8
popsize*wind          1    1001.9
wind*precip           1    2068.9
wind*daysprecip       1     793.3
precip*daysprecip     1     374.5

Unusual Observations
Obs   Temp   Sulfur      Fit   SE Fit   Residual    St Resid
1   70.3    10.00    16.09    12.56      -6.09       -0.52 X
8   50.6   110.00   112.31    11.81      -2.31       -0.19 X
23   50.0    94.00    44.38     4.00      49.62        2.97R
49   49.5    66.00    30.26     6.88      35.74        2.27R
67   58.7     7.00    40.66     7.30     -33.66       -2.17R
75   50.6   110.00   112.31    11.81      -2.31       -0.19 X
90   50.0    94.00    44.38     4.00      49.62        2.97R
100   56.6    79.09    38.62     4.51      40.46        2.44R
109   60.3    16.72    19.51     9.82      -2.79       -0.20 X

R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.

Durbin-Watson statistic = 1.76816

Once again as seen from the second choice the same interaction terms would be left behind in a
reduced regression model as seen from analysis from the t-tests of each x variables slopes. The
regression has already been ran and does not need to be run again.

The final choice from the best subsets regression will now be analyzed with the following
variables and will include in the regression the mono-x variables that are needed:
5) Temp, precip, daysprecip, temp*company, temp*popsize, popsize*company,
popsize*wind, wind*precip, wind*daysprecip, precip*daysprecip

Regression Analysis: Sulfur versus Temp, company, ...
The regression equation is
Sulfur = 71.3 - 1.14 Temp + 0.0134 company + 0.0087 popsize - 1.33 wind
- 2.86 precip + 0.850 daysprecip - 0.000093 Temp*company
+ 0.00105 Temp*popsize + 0.000012 popsize*company
- 0.00879 popsize*wind + 0.366 wind*precip - 0.0546 wind*daysprecip
- 0.00588 precip*daysprecip

Predictor                   Coef      SE Coef          T       P
Constant                   71.34        54.30       1.31   0.191
Temp                     -1.1421       0.5531      -2.06   0.041
company                  0.01344      0.03951       0.34   0.734
popsize                  0.00869      0.05739       0.15   0.880
wind                      -1.327        4.983      -0.27   0.790
precip                    -2.856        1.213      -2.35   0.020
daysprecip                0.8496       0.4086       2.08   0.040
Temp*company          -0.0000927    0.0007026      -0.13   0.895
Temp*popsize           0.0010539    0.0007825       1.35   0.181
popsize*company       0.00001242   0.00000310       4.01   0.000
popsize*wind           -0.008785     0.003616      -2.43   0.017
wind*precip               0.3656       0.1169       3.13   0.002
wind*daysprecip         -0.05458      0.04373      -1.25   0.214
precip*daysprecip      -0.005878     0.005217      -1.13   0.262

S = 17.2230   R-Sq = 46.6%   R-Sq(adj) = 40.9%
PRESS = 44673.8   R-Sq(pred) = 33.56%

Analysis of Variance
Source           DF      SS            MS      F       P
Regression       13 31351.2        2411.6   8.13   0.000
Residual Error 121 35892.4          296.6
Total           134 67243.6

The sum of squares for pure error is (nearly) zero.
Minitab cannot do the lack of fit test based on pure error.
Source               DF    Seq SS
Temp                  1    6911.2
company               1   14437.1
popsize               1     134.9
wind                  1       3.3
precip                1       5.4
daysprecip            1     635.4
Temp*company          1     285.9
Temp*popsize          1     595.2
popsize*company       1    4149.4
popsize*wind          1     967.4
wind*precip           1    2054.1
wind*daysprecip       1     795.4
precip*daysprecip     1     376.7

Unusual Observations
Obs Temp Sulfur        Fit    SE Fit    Residual   St Resid
1 70.3    10.00   16.23      12.66       -6.23      -0.53 X
8 50.6 110.00 112.31         11.86       -2.31      -0.19 X
23 50.0    94.00    44.31      4.04       49.69       2.97R
49 49.5    66.00    30.11      6.99       35.89       2.28R
67 58.7     7.00    40.69      7.34      -33.69      -2.16R
75 50.6 110.00 112.31         11.86       -2.31      -0.19 X
90 50.0    94.00    44.31      4.04       49.69       2.97R
100 56.6    79.09    38.54      4.57       40.54       2.44R
109 60.3    16.72    19.22     10.11       -2.50      -0.18 X

R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.

Durbin-Watson statistic = 1.76859

Once again as seen from the second choice the same interaction terms would be left behind in a
reduced regression model as seen from analysis from the t-tests of each x variables slopes. The
regression has already been ran and does not need to be run again.

FINAL ANALYSIS AND BEST MODEL:

After analyzing many of the best models given by the stepwise and best subsets analysis, we see
that two final models are to be compared to see which model is the best model for the data set in
predicting the y variable of sulfur. Looking back on the analysis from the stepwise regression
reduced down to the and simplified and then ran with the full data set all of the LINE
assumptions were met whereas the LINE assumptions were not all completely met by the best
subsets models. Therefore based on that fact and the fact that the full data model ran as a result
of the stepwise regression met all requirements of a good model and all assumptions, as stated
above would be the best model for this data set.

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 16 posted: 8/18/2012 language: English pages: 26
How are you planning on using Docstoc?