Worksheet 5 Key 2012 by r4NxxJ3V

VIEWS: 6 PAGES: 9

									Worksheet 5: Multiple Regression, Non-Linear Regression, Fixed and Random Factors
Answer Key

1) Multiple Regression
a. ii. These scatterplots below present our first chance to investigate whether or not we might
have any problems with collinearity between the independent variables. What we are looking
for, and hopefully will not have, are relationships between the independent variables such as
positive or negative correlations. In this case, none of the scatterplots indicate any relationships
between the variables (note the “shot-gun” patterns) except for the relationship between our
dependent variable and one of the independent variables (limpet_A vs. food). Of course it is
okay if there is a relationship between our dependent and one of the independent, and we may
expect such a relationship, since that is why we are analyzing the data with a multiple regression
in the first place! With this dataset we want to know whether or not the abundance of limpet_A
varies with any of the independent variables (food, other limpets, predators).
                                 FOOD   LIMPET_A OTH_LIMPETS TIDE_HT   PREDS
  FOOD




                                                                               FOOD
                                                                               LIMPET_A OTH_LIMPETS TIDE_HT
  TIDE_HT OTH_LIMPETS LIMPET_A




                                                                               PREDS
  PREDS




                                 FOOD   LIMPET_A OTH_LIMPETS TIDE_HT   PREDS


a. iii. Here are the results of the multiple regression analysis with the model
          LIMPET_A=constant+FOOD+TIDE_HT+OTH_LIMPETS+PREDS
Condition Indices
1       2       3       4       5
1.00000 3.40232 4.40920 6.67319 25.54445


Dependent Variable          LIMPET_A
N                           19
Multiple R                  0.99989
Squared Multiple R          0.99979
Adjusted Squared Multiple R 0.99973
Standard Error of Estimate 1.62951

                                                                 -1
Regression Coefficients B = (X'X) X'Y
Effect       Coefficient Standard Error Std.        Tolerance t        p-value
                                        Coefficient
CONSTANT     -23.39511 3.31140          0.00000     .         -7.06503 0.00001
FOOD         1.00605     0.00483        0.94218     0.73872 208.35328 0.00000
TIDE_HT      0.97785     0.06544        0.06813     0.72654 14.94221 0.00000
                                                  -1
Regression Coefficients B = (X'X) X'Y
Effect       Coefficient Standard Error Std.        Tolerance t    p-value
                                        Coefficient
OTH_LIMPETS -1.05361 0.02831            -0.15449 0.87693 -37.22302 0.00000
PREDS        -0.07318 0.13684           -0.00221 0.88323 -0.53480 0.60118


Confidence Interval for Regression Coefficients
Effect        Coefficient 95.0% Confidence Interval VIF
                          Lower       Upper
CONSTANT      -23.39511 -30.49735     -16.29288     .
FOOD          1.00605     0.99570     1.01641       1.35370
TIDE_HT       0.97785     0.83749     1.11821       1.37639
OTH_LIMPETS -1.05361 -1.11432         -0.99291      1.14034
PREDS         -0.07318 -0.36669       0.22032       1.13220


Analysis of Variance
Source     Type III SS   df Mean Squares F-ratio     p-value
Regression 175,741.77291 4 43,935.44323 16,546.20680 0.00000
Residual 37.17445        14 2.65532


Durbin-Watson D Statistic 2.55602
First Order Autocorrelation -0.28273


Information Criteria
AIC             78.67214
AIC (Corrected) 85.67214
Schwarz's BIC 84.33877


Plot of Residuals against Predicted Values


            3

            2

            1
 RESIDUAL




            0

            -1

            -2

            -3
                 0   100      200     300   400
                           ESTIMATE

a. iv. Assumptions:
         1. Normality - should do p-plots for each of the variables in the model before you even run the
            analysis
         2. Homogeneity of variance - check residual scatterplot for “shot-gun” pattern in residuals of
            dependent variable, not a “wedge” pattern
         3. Independence of observations - are each of the observations of the dependent variable
            independent? e.g. from randomly chosen plots
         4. Linearity - if there are relationships between the dependent and any of the independent
            variables, are these relationships linear?
         5. Collinearity - Three ways to check for collinearity:
                a. scatterplot matrix – no correlations between independents,
                b. condition indices – values <15 are fine, but if between 15-30 need to worry (check
                   tolerance values too), and if >30 definitely need to worry (may want to exclude one
                   of the collinear (redundant) factors from the model, or do a principal components
                   analysis first to reduce number of independent variables in the model – you will learn
                   more about this in Week 9
                c. tolerance – values >0.20 indicate that collinearity is not a problem, but <0.20
                   indicates that the model is not “tolerant” to the collinearity that factor introduces




a. viii.
a. ix. Yes, they are different from the first scatterplots. They show what part of the variance in the
dependent variable (y) each independent variable (xi) explains while factoring out the effect of any of the
other independent variables.


a. x. The abundance of limpets = -23.395+1.006*(125)+0.978*(55)-1.054*(43)-0.073*(5.3)=       110.44



2) a) This scattlerplot below shows that the relationship between species and area is non-linear.
              80


              70
    SPECIES




              60


              50


              40


              30
                0    5000   10000     15000    20000
                            AREA

2) a.. i)
Running the first model: Y=(a*X^b)/(c+X)– this model has 3 parameters that we are fitting – a,b,c
Dependent variable is SPECIES

Dependent Variable                  :SPECIES

Sum of Squares and Mean Squares
Source         SS            df Mean Squares
Regression     237,992.98220 3 79,330.99407
Residual       1,381.01780 54 25.57440
Total          239,374.00000 57
Mean corrected 7,945.50877 56

R-squares

Raw R-square (1-Residual/Total)                : 0.99423
Mean Corrected R-square (1-Residual/Corrected) : 0.82619
R-square(Observed vs Predicted)                : 0.82639


Parameter Estimates
Parameter Estimate ASE                 Parameter/ASE Wald 95% Confidence Interval
                                                     Lower       Upper
A                   49.75867 7.14859 6.96063         35.42661    64.09072
B                   1.04288 0.01587 65.70585         1.01106     1.07470
C                   106.09789 27.33661 3.88117       51.29129    160.90449
                                Scatter Plot


          80



          70
SPECIES




          60



          50



          40



          30
               0        5,000       10,000     15,000         20,000
                                   AREA
Running the second model: Y=(a*X^b) – this model has 2 parameters that we are fitting - a and b.
Dependent Variable               :SPECIES




Sum of Squares and Mean Squares
Source         SS            df Mean Squares
Regression     237,335.15908 2 118,667.57954
Residual       2,038.84092 55 37.06983
Total          239,374.00000 57
Mean corrected 7,945.50877 56

R-squares

Raw R-square (1-Residual/Total)                : 0.99148
Mean Corrected R-square (1-Residual/Corrected) : 0.74340
R-square(Observed vs Predicted)                : 0.74387


Parameter Estimates
Parameter Estimate ASE              Parameter/ASE Wald 95% Confidence Interval
                                                  Lower        Upper
A                  27.05393 2.04930 13.20153      22.94703     31.16082
B                  0.10882 0.00911 11.94865       0.09057      0.12708
ii. – 1 (two term model)
          80



          70
SPECIES




          60



          50



          40



          30
            40   50          60       70   80
                      ESTIMATE
Dependent Variable          SPECIES
N                           57
Multiple R                  0.86248
Squared Multiple R          0.74387
Adjusted Squared Multiple R 0.73921
Standard Error of Estimate 6.08288
                                      -1
Regression Coefficients B = (X'X) X'Y
Effect    Coefficient Standard Error Std.         Tolerance t        p-value
                                      Coefficient
CONSTANT -1.66917 5.23606             0.00000     .         -0.31878 0.75110
ESTIMATE 1.02556      0.08114         0.86248     1.00000 12.63863 0.00000


Confidence Interval for Regression Coefficients
Effect    Coefficient 95.0% Confidence Interval VIF
                       Lower        Upper
CONSTANT -1.66917 -12.16246         8.82413     .
ESTIMATE 1.02556       0.86294      1.18818     1.00000


Analysis of Variance
Source     SS          df Mean Squares F-ratio p-value
Regression 5,910.42803 1 5,910.42803 159.73496 0.00000
Residual 2,035.08074 55 37.00147


Durbin-Watson D Statistic 1.15039
First Order Autocorrelation 0.39325


Information Criteria
AIC             371.54764
AIC (Corrected) 372.00047
Schwarz's BIC 377.67680


3 term model



          80



          70
SPECIES




          60



          50



          40



          30
            30   40         50             60   70    80
                        ESTIMATE
Dependent Variable          SPECIES
N                           57
Multiple R                  0.90906
Squared Multiple R          0.82639
Adjusted Squared Multiple R 0.82323
Standard Error of Estimate 5.00803

                                      -1
Regression Coefficients B = (X'X) X'Y
Effect    Coefficient Standard Error Std.         Tolerance t       p-value
                                      Coefficient
CONSTANT 0.99219      3.93310         0.00000     .         0.25227 0.80178
ESTIMATE 0.98486      0.06087         0.90906     1.00000 16.18028 0.00000


Confidence Interval for Regression Coefficients
Effect     Coefficient 95.0% Confidence Interval VIF
                       Lower        Upper
CONSTANT 0.99219       -6.88993     8.87430      .
ESTIMATE 0.98486       0.86288      1.10685      1.00000




Analysis of Variance
Source     SS          df Mean Squares F-ratio p-value
Regression 6,566.08703 1 6,566.08703 261.80158 0.00000
Residual 1,379.42174 55 25.08040


Durbin-Watson D Statistic 1.58886
First Order Autocorrelation 0.18436


Information Criteria
AIC             349.38199
AIC (Corrected) 349.83482
Schwarz's BIC 355.51114



iii The three term model looks best

iv Use the added fit models (compare the added fit relative to the expected added fit based on
change in number of parameters)

v. Comparing slopes is much easier with linear models. For example lets assume that you had
another treatment, which was after application of an antibiotic and you want to see if the species
area relationship varied as a function of whether an antibiotic had been applied or not. Here you
could plot the two linear functions (estimate vs species, with and without antibiotics) and
compare the slopes and intercepts.

3) a. Fixed – we want to know about these two fertilizer brands specifically, and do not want to
extrapolate our results beyond those two brands
   b. Random – we are randomly choosing batches to compare, not interested in specific batches,
but rather want to assess variation among all batches
   c. i. Any spatial variable will be fixed if you didn’t choose them randomly, i.e. you are asking
a question about these specific locations and not trying to infer something about variation at a
larger scale. For example, if you are looking at three sites in the north and three in the south but
they are not chosen randomly due to logistics (i.e. the only sites that are accessible) or other
hypotheses you are testing, this would be a fixed effect. If, however, there were a number of
sites in the north and in the south and you then randomly chose 3 to sample both in the north and
in the south, you could extrapolate beyond those specific sites to make more general conclusions
about the north vs. the south – then the site variable would be considered random.
   d. i. If you randomize your sampling effort in time, it can be considered a random effect. For
example, if you are doing pollinator observations but had limited number of observations you
could make at the same time (i.e. only one population per day or week), you could randomize
when you observed each population over time during a specific time period (such as within a
season, when you do not expect your observations to vary because of any temporal effects such
as variation in conditions across seasons, storms, etc.) and then use time as a random effect.

								
To top