Problem on Simple linear regression
Shared by: cty88181
Categories
Tags
simple linear regression, regression line, linear regression model, linear regression, least squares, dependent variable, regression equation, confidence interval, confidence intervals, regression coefficients, multiple linear regression, data set, independent variable, model parameters, simple regression
-
Stats
- views:
- 19
- posted:
- 7/9/2010
- language:
- English
- pages:
- 3
Document Sample


Problem on Simple linear regression
Stat1222 4/16/09
To determine which students should receive scholarships, a university admissions
officer decided to study the relationship between a student’s score on the SAT verbal
test (taken in the final year of high school) and the student’s college GPA at the
end of the sophomore year. Ten students were examined and the student’s college
GPA at the end of the sophomore year. Ten student records were examined with the
following results. The reported exam scores are the actual scores divided by 100.
Student SAT, x GPA, y
1 4.8 2.4
2 6.6 3.5
3 5.9 3.0
4 7.4 3.8
5 3.8 2.7
6 5.2 2.4
7 6.6 3.0
8 5.0 2.8
9 7.2 3.4
10 6.0 3.2
Let x be the SAT score of a student and y be the student’s GPA. Then x2 =
354.05, xy = 180.66, y 2 = 93.14 x = 58.5, and y = 30.2.
1. Find the best fit regression line relating y to x
ˆ
Solution: Best fit regression line is y = mx + b.
The slope
n xy − ( x)( y) 10 ∗ 180.66 − (58.5)(30.2)
m= 2−( 2
= = 0.3374
n x x) 10 ∗ 354.05 − (58.5)2
The y− intercept
y x 30.2 58.5
b= −m = − 0.3374 ∗ = 1.0462.
n n 10 10
ˆ
ANSWER: The best fit line is y = 0.3374x + 1.0462.
2. Calculate the correlation coefficient, r.
SOLUTION:
n xy − ( x)( y) 10 ∗ 180.66 − (58.5)(30.2)
r= =
n x2 − ( x)2 n y2 − ( y)2 10 ∗ 354.05 − (58.5)2 10 ∗ 93.14 − (30.2)2
Calculations yield r = 0.8339.
3. Test at α = 0.05 whether y and x have positive linear association.
Solution: Here we want to test for positive correlation, i.e.,
H0 : ρ ≤ 0 versus Ha : ρ > 0 at α = 0.05
Test statistic:
r .8339
t= = = 4.2735
1−r 2 1 − (.8339)2
n−2 10 − 2
Critical point for right tailed test at α = 0.05 and d.f. = 10−2 = 8 is t0 = 1.860
The rejection rule is: Reject H0 at α = 0.05 if t > 1.860
Here t = 4.2735 > t0 = 1.860. So, the decision is Reject H0
4. Predict a student’s GPA when his SAT score was 5.2.
ˆ
Solution: The predicted GPA y is
y = 0.3374 ∗ 5.2 + 1.0462 = 2.8007
ˆ
ˆ
ANSWER: y = 2.8007
5. Find se , the standard error in estimation.
SOLUTION:
(yi − yi )2
ˆ
se =
n−2
First create the following table to find (y − y )2 .
ˆ
SAT, x GPA, y y
ˆ (y − y )2
ˆ
4.8 2.4 2.66572 (2.4 − 2.66572)2
6.6 3.5 3.27304 (3.5 − 3.27304)2
5.9 3.0 3.03686 (3.0 − 3.03686)2
7.4 3.8 3.54296 (3.8 − 3.54296)2
3.8 2.7 2.32832 (2.7 − 2.32832)2
5.2 2.4 2.80068 (2.4 − 2.80068)2
6.6 3.0 3.27304 (3.0 − 3.27304)2
5.0 2.8 2.73320 (2.8 − 2.73320)2
7.2 3.4 3.47548 (3.4 − 3.47548)2
6.0 3.2 3.07060 (3.2 − 3.07060)2
0.58969
The standard error se is
0.58969
se = = 0.2715
10 − 2
6. Find a 99% prediction interval for GPA of a student whose SAT score was 5.2.
Solution: The formula to compute the Prediction Interval is (ˆ − E, y + E)
y ˆ
where
1 n(x0 − x)2
¯
E = tc se 1 + + 2−(
n n x x)2
ˆ
Here for x0 = 5.2, y = 2.8007 (computed in part 4).
se = .2715 (from part 5), n = 10, x = nx = 58.5 = 5.85, x2 = 354.05.
¯ 10
For 99% confidence and d.f = n − 2 = 10 − 2 = 8, tc = 3.355
Plug in values gives
1 10(5.2 − 5.85)2
E = 3.355 ∗ .2715 1 + + = .9707
10 10 ∗ 354.05 − (58.5)2
The prediction interval is (2.8007 − .9707, 2.8007 + .9707) = (1.8300, 3.7714).
7. What proportion of variation in y values is explained by the regression line
relating y, student’s GPA to x, the student’s SAT score.
Solution: The coefficient of determination r2 is defined as the proportion of
variation in y values that is explained by the regression line.
We calculated r in part 2 to be 0.8339.
Here
r2 = (0.8339)2 = 0.6954
Alternate way to calculate r2 :
SAT, x GPA, y ˆ
y (ˆ − y )2
y ¯ (y − y )2
¯
4.8 2.4 2.66572 (2.66572 − 3.02)2 (2.4 − 3.02)2
6.6 3.5 3.27304 (3.27304 − 3.02)2 (3.5 − 3.02)2
5.9 3.0 3.03686 (3.03686 − 3.02)2 (3.0 − 3.02)2
7.4 3.8 3.54296 (3.54296 − 3.02)2 (3.8 − 3.02)2
3.8 2.7 2.32832 (2.32832 − 3.02)2 (2.7 − 3.02)2
5.2 2.4 2.80068 (2.80068 − 3.02)2 (2.4 − 3.02)2
6.6 3.0 3.27304 (3.27304 − 3.02)2 (3.0 − 3.02)2
5.0 2.8 2.73320 (2.73320 − 3.02)2 (2.8 − 3.02)2
7.2 3.4 3.47548 (3.47548 − 3.02)2 (3.4 − 3.02)2
6.0 3.2 3.07060 (3.07060 − 3.02)2 (3.2 − 3.02)2
1.3466 1.936
Total variation = 1.936 and Explained variation = 1.3466
So, coefficient of determination = r2 = 1.3466/1.936 = .6955.
Related docs
Get documents about "