Docstoc

Homework 8 Solution

Document Sample
Homework 8 Solution Powered By Docstoc
					Stat 112, Spring 2004 Homework 8 Solutions
1. (a) Analysis using All Data Points
Bivariate Fit of StockRate By Handicap
100 90 80 70

StockRate

60 50 40 30 20 10 0 0 5 10 15 20 25 30 35 Handicap

Linear Fit
StockRate = 55.137337 - 0.1734305 Handicap

Summary of Fit
RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.001741 -0.01863 25.38184 52.47059 51

Analysis of Variance
Source Model Error C. Total DF 1 49 50 Sum of Squares 55.053 31567.653 31622.706 Mean Square 55.053 644.238 F Ratio 0.0855 Prob > F 0.7713

Parameter Estimates
Term Intercept Handicap Estimate 55.137337 -0.17343 Std Error 9.790429 0.593278 t Ratio 5.63 -0.29 Prob>|t| <.0001 0.7713

Analysis with 7 Points Removed
Bivariate Fit of StockRate By Handicap
90 80 70

StockRate

60 50 40 30 20 10 5 10 15 Handicap 20 25

Linear Fit
StockRate = 73.233833 - 1.7114614 Handicap

Summary of Fit
RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.171794 0.152075 19.09969 48.09091 44

Analysis of Variance
Source Model Error C. Total DF 1 42 43 Sum of Squares 3178.122 15321.515 18499.636 Mean Square 3178.12 364.80 F Ratio 8.7120 Prob > F 0.0052

Parameter Estimates
Term Intercept Handicap Estimate 73.233833 -1.711461 Std Error 8.991861 0.57984 t Ratio 8.14 -2.95 Prob>|t| <.0001 0.0052

Conclusions When all data points are used the conclusion is that there is no relationship between handicap and stock rating (F-test/t-test have p-value=0.7713). However, when the 7 data points are removed the result is that handicap is a useful predictor of stock rating (F-test/t-test have p-value=0.0052).

(b) Analysis without Observation 45
Bivariate Fit of StockRate By Handicap
100 90 80 70

StockRate

60 50 40 30 20 10 0 0 5 10 15 20 25 30 35 Handicap

Linear Fit
StockRate = 49.388121 + 0.1403252 Handicap

Summary of Fit
RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.001117 -0.01969 24.81898 51.58 50

Analysis of Variance
Source Model Error C. Total DF 1 48 49 Sum of Squares 33.063 29567.117 29600.180 Mean Square 33.063 615.982 F Ratio 0.0537 Prob > F 0.8178

Parameter Estimates
Term Intercept Handicap Estimate 49.388121 0.1403252 Std Error 10.09088 0.605683 t Ratio 4.89 0.23 Prob>|t| <.0001 0.8178

Observation 45 is not highly influential. In terms of the distribution of handicaps 3.2 is not an outlier. More importantly, removing that observation does not change the least squares regression line. The estimate for the slope changes slightly from negative (-0.17) to positive (0.14) but the model is still insignificant (F-test/t-test have pvalue=0.82).

2. (a) None of the observations are highly influential based on the rule of thumb because they all have a Cook’s Distance less than 1. Observation 49 is the highest with a distance of 0.41. (b) I disagree with the removal of the seven observations. The scatterplots, histograms, residual plots, and Cook’s distances do not show that those seven points are unusual. By removing those points the model becomes significant, but there is no good reason to remove them. (c) Because a majority of CEO’s in this study are included because they decided to complete the survey suggests the sample is no longer random and there might be response bias in the analysis. Perhaps the CEOs with the lower handicaps were eager to show off their skills and decided to fill out the survey. For the additional 110 CEOs included from the research there can be bias here as well. Only the better players will have official handicaps.

3. Use the Fit Model menu in JMP
Response HEART - Whole Model Summary of Fit
RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.223642 0.150858 4.804986 19.80556 36 Sum of Squares 212.82642 738.81246 951.63889 Estimate 3.1786957 0.405217 0.4516011 -0.17961 Nparm 1 1 1 DF 1 1 1 Std Error 6.336946 0.197102 0.200874 0.222215 Mean Square 70.9421 23.0879 F Ratio 3.0727 Prob > F 0.0416 Prob>|t| 0.6194 0.0480 0.0316 0.4249 F Ratio 4.2266 5.0543 0.6533 Prob > F 0.0480 0.0316 0.4249

Analysis of Variance
Source Model Error C. Total Term Intercept BANK WALK TALK DF 3 32 35

Parameter Estimates
t Ratio 0.50 2.06 2.25 -0.81

Effect Tests
Source BANK WALK TALK Sum of Squares 97.58367 116.69407 15.08326

(a) See the highlighted estimates above. (b) When bank and talk are held fixed, increasing walk by 10 has an estimated change in heart of 4.52 (=10x0.452). When talk is held fixed, increasing bank by one and walk by one has an estimated change in heart of 0.857 (=0.405+0.452). (c) The predicted heart for Philadelphia = 3.179+.405(31)+.452(12)-.180(19)=17.74 Residual=Observed-Predicted=18-17.74=0.26

4. No, the researchers should not conclude that coffee drinking causes heart disease. This is an observational study and there may be confounding variables. In particular, cigarette smoking is probably a confounding variable because it is known to be associated with heart disease and is probably also associated with coffee drinking (people who smoke cigarettes tend to drink more coffee). We could use multiple regression to better address the question of whether coffee drinking causes heart disease by fitting a multiple regression of heart disease on coffee drinking and cigarette smoking. The coefficient on coffee drinking in the multiple regression would measure the mean change in heart disease that is associated with a one cup increase in coffee drinking when cigarette smoking is held fixed. The multiple regression could not be used to prove or disprove that coffee drinking causes heart disease because there could be other confounding variables besides cigarette smoking but it would provide more relevant evidence than the simple linear regression that does not control for cigarette smoking.