VIEWS: 6 PAGES: 9 POSTED ON: 9/29/2012
Worksheet 5: Multiple Regression, Non-Linear Regression, Fixed and Random Factors Answer Key 1) Multiple Regression a. ii. These scatterplots below present our first chance to investigate whether or not we might have any problems with collinearity between the independent variables. What we are looking for, and hopefully will not have, are relationships between the independent variables such as positive or negative correlations. In this case, none of the scatterplots indicate any relationships between the variables (note the “shot-gun” patterns) except for the relationship between our dependent variable and one of the independent variables (limpet_A vs. food). Of course it is okay if there is a relationship between our dependent and one of the independent, and we may expect such a relationship, since that is why we are analyzing the data with a multiple regression in the first place! With this dataset we want to know whether or not the abundance of limpet_A varies with any of the independent variables (food, other limpets, predators). FOOD LIMPET_A OTH_LIMPETS TIDE_HT PREDS FOOD FOOD LIMPET_A OTH_LIMPETS TIDE_HT TIDE_HT OTH_LIMPETS LIMPET_A PREDS PREDS FOOD LIMPET_A OTH_LIMPETS TIDE_HT PREDS a. iii. Here are the results of the multiple regression analysis with the model LIMPET_A=constant+FOOD+TIDE_HT+OTH_LIMPETS+PREDS Condition Indices 1 2 3 4 5 1.00000 3.40232 4.40920 6.67319 25.54445 Dependent Variable LIMPET_A N 19 Multiple R 0.99989 Squared Multiple R 0.99979 Adjusted Squared Multiple R 0.99973 Standard Error of Estimate 1.62951 -1 Regression Coefficients B = (X'X) X'Y Effect Coefficient Standard Error Std. Tolerance t p-value Coefficient CONSTANT -23.39511 3.31140 0.00000 . -7.06503 0.00001 FOOD 1.00605 0.00483 0.94218 0.73872 208.35328 0.00000 TIDE_HT 0.97785 0.06544 0.06813 0.72654 14.94221 0.00000 -1 Regression Coefficients B = (X'X) X'Y Effect Coefficient Standard Error Std. Tolerance t p-value Coefficient OTH_LIMPETS -1.05361 0.02831 -0.15449 0.87693 -37.22302 0.00000 PREDS -0.07318 0.13684 -0.00221 0.88323 -0.53480 0.60118 Confidence Interval for Regression Coefficients Effect Coefficient 95.0% Confidence Interval VIF Lower Upper CONSTANT -23.39511 -30.49735 -16.29288 . FOOD 1.00605 0.99570 1.01641 1.35370 TIDE_HT 0.97785 0.83749 1.11821 1.37639 OTH_LIMPETS -1.05361 -1.11432 -0.99291 1.14034 PREDS -0.07318 -0.36669 0.22032 1.13220 Analysis of Variance Source Type III SS df Mean Squares F-ratio p-value Regression 175,741.77291 4 43,935.44323 16,546.20680 0.00000 Residual 37.17445 14 2.65532 Durbin-Watson D Statistic 2.55602 First Order Autocorrelation -0.28273 Information Criteria AIC 78.67214 AIC (Corrected) 85.67214 Schwarz's BIC 84.33877 Plot of Residuals against Predicted Values 3 2 1 RESIDUAL 0 -1 -2 -3 0 100 200 300 400 ESTIMATE a. iv. Assumptions: 1. Normality - should do p-plots for each of the variables in the model before you even run the analysis 2. Homogeneity of variance - check residual scatterplot for “shot-gun” pattern in residuals of dependent variable, not a “wedge” pattern 3. Independence of observations - are each of the observations of the dependent variable independent? e.g. from randomly chosen plots 4. Linearity - if there are relationships between the dependent and any of the independent variables, are these relationships linear? 5. Collinearity - Three ways to check for collinearity: a. scatterplot matrix – no correlations between independents, b. condition indices – values <15 are fine, but if between 15-30 need to worry (check tolerance values too), and if >30 definitely need to worry (may want to exclude one of the collinear (redundant) factors from the model, or do a principal components analysis first to reduce number of independent variables in the model – you will learn more about this in Week 9 c. tolerance – values >0.20 indicate that collinearity is not a problem, but <0.20 indicates that the model is not “tolerant” to the collinearity that factor introduces a. viii. a. ix. Yes, they are different from the first scatterplots. They show what part of the variance in the dependent variable (y) each independent variable (xi) explains while factoring out the effect of any of the other independent variables. a. x. The abundance of limpets = -23.395+1.006*(125)+0.978*(55)-1.054*(43)-0.073*(5.3)= 110.44 2) a) This scattlerplot below shows that the relationship between species and area is non-linear. 80 70 SPECIES 60 50 40 30 0 5000 10000 15000 20000 AREA 2) a.. i) Running the first model: Y=(a*X^b)/(c+X)– this model has 3 parameters that we are fitting – a,b,c Dependent variable is SPECIES Dependent Variable :SPECIES Sum of Squares and Mean Squares Source SS df Mean Squares Regression 237,992.98220 3 79,330.99407 Residual 1,381.01780 54 25.57440 Total 239,374.00000 57 Mean corrected 7,945.50877 56 R-squares Raw R-square (1-Residual/Total) : 0.99423 Mean Corrected R-square (1-Residual/Corrected) : 0.82619 R-square(Observed vs Predicted) : 0.82639 Parameter Estimates Parameter Estimate ASE Parameter/ASE Wald 95% Confidence Interval Lower Upper A 49.75867 7.14859 6.96063 35.42661 64.09072 B 1.04288 0.01587 65.70585 1.01106 1.07470 C 106.09789 27.33661 3.88117 51.29129 160.90449 Scatter Plot 80 70 SPECIES 60 50 40 30 0 5,000 10,000 15,000 20,000 AREA Running the second model: Y=(a*X^b) – this model has 2 parameters that we are fitting - a and b. Dependent Variable :SPECIES Sum of Squares and Mean Squares Source SS df Mean Squares Regression 237,335.15908 2 118,667.57954 Residual 2,038.84092 55 37.06983 Total 239,374.00000 57 Mean corrected 7,945.50877 56 R-squares Raw R-square (1-Residual/Total) : 0.99148 Mean Corrected R-square (1-Residual/Corrected) : 0.74340 R-square(Observed vs Predicted) : 0.74387 Parameter Estimates Parameter Estimate ASE Parameter/ASE Wald 95% Confidence Interval Lower Upper A 27.05393 2.04930 13.20153 22.94703 31.16082 B 0.10882 0.00911 11.94865 0.09057 0.12708 ii. – 1 (two term model) 80 70 SPECIES 60 50 40 30 40 50 60 70 80 ESTIMATE Dependent Variable SPECIES N 57 Multiple R 0.86248 Squared Multiple R 0.74387 Adjusted Squared Multiple R 0.73921 Standard Error of Estimate 6.08288 -1 Regression Coefficients B = (X'X) X'Y Effect Coefficient Standard Error Std. Tolerance t p-value Coefficient CONSTANT -1.66917 5.23606 0.00000 . -0.31878 0.75110 ESTIMATE 1.02556 0.08114 0.86248 1.00000 12.63863 0.00000 Confidence Interval for Regression Coefficients Effect Coefficient 95.0% Confidence Interval VIF Lower Upper CONSTANT -1.66917 -12.16246 8.82413 . ESTIMATE 1.02556 0.86294 1.18818 1.00000 Analysis of Variance Source SS df Mean Squares F-ratio p-value Regression 5,910.42803 1 5,910.42803 159.73496 0.00000 Residual 2,035.08074 55 37.00147 Durbin-Watson D Statistic 1.15039 First Order Autocorrelation 0.39325 Information Criteria AIC 371.54764 AIC (Corrected) 372.00047 Schwarz's BIC 377.67680 3 term model 80 70 SPECIES 60 50 40 30 30 40 50 60 70 80 ESTIMATE Dependent Variable SPECIES N 57 Multiple R 0.90906 Squared Multiple R 0.82639 Adjusted Squared Multiple R 0.82323 Standard Error of Estimate 5.00803 -1 Regression Coefficients B = (X'X) X'Y Effect Coefficient Standard Error Std. Tolerance t p-value Coefficient CONSTANT 0.99219 3.93310 0.00000 . 0.25227 0.80178 ESTIMATE 0.98486 0.06087 0.90906 1.00000 16.18028 0.00000 Confidence Interval for Regression Coefficients Effect Coefficient 95.0% Confidence Interval VIF Lower Upper CONSTANT 0.99219 -6.88993 8.87430 . ESTIMATE 0.98486 0.86288 1.10685 1.00000 Analysis of Variance Source SS df Mean Squares F-ratio p-value Regression 6,566.08703 1 6,566.08703 261.80158 0.00000 Residual 1,379.42174 55 25.08040 Durbin-Watson D Statistic 1.58886 First Order Autocorrelation 0.18436 Information Criteria AIC 349.38199 AIC (Corrected) 349.83482 Schwarz's BIC 355.51114 iii The three term model looks best iv Use the added fit models (compare the added fit relative to the expected added fit based on change in number of parameters) v. Comparing slopes is much easier with linear models. For example lets assume that you had another treatment, which was after application of an antibiotic and you want to see if the species area relationship varied as a function of whether an antibiotic had been applied or not. Here you could plot the two linear functions (estimate vs species, with and without antibiotics) and compare the slopes and intercepts. 3) a. Fixed – we want to know about these two fertilizer brands specifically, and do not want to extrapolate our results beyond those two brands b. Random – we are randomly choosing batches to compare, not interested in specific batches, but rather want to assess variation among all batches c. i. Any spatial variable will be fixed if you didn’t choose them randomly, i.e. you are asking a question about these specific locations and not trying to infer something about variation at a larger scale. For example, if you are looking at three sites in the north and three in the south but they are not chosen randomly due to logistics (i.e. the only sites that are accessible) or other hypotheses you are testing, this would be a fixed effect. If, however, there were a number of sites in the north and in the south and you then randomly chose 3 to sample both in the north and in the south, you could extrapolate beyond those specific sites to make more general conclusions about the north vs. the south – then the site variable would be considered random. d. i. If you randomize your sampling effort in time, it can be considered a random effect. For example, if you are doing pollinator observations but had limited number of observations you could make at the same time (i.e. only one population per day or week), you could randomize when you observed each population over time during a specific time period (such as within a season, when you do not expect your observations to vary because of any temporal effects such as variation in conditions across seasons, storms, etc.) and then use time as a random effect.