Problem on Simple linear regression

Shared by:
Categories
-
Stats
views:
19
posted:
7/9/2010
language:
English
pages:
3
Document Sample

```							                             Problem on Simple linear regression
Stat1222                                                                                     4/16/09

oﬃcer decided to study the relationship between a student’s score on the SAT verbal
test (taken in the ﬁnal year of high school) and the student’s college GPA at the
end of the sophomore year. Ten students were examined and the student’s college
GPA at the end of the sophomore year. Ten student records were examined with the
following results. The reported exam scores are the actual scores divided by 100.

Student SAT, x          GPA, y
1      4.8            2.4
2      6.6            3.5
3      5.9            3.0
4      7.4            3.8
5      3.8            2.7
6      5.2            2.4
7      6.6            3.0
8      5.0            2.8
9      7.2            3.4
10      6.0            3.2

Let x be the SAT score of a student and y be the student’s GPA. Then                             x2 =
354.05,   xy = 180.66,   y 2 = 93.14  x = 58.5, and    y = 30.2.
1. Find the best ﬁt regression line relating y to x
ˆ
Solution: Best ﬁt regression line is y = mx + b.
The slope
n        xy − ( x)( y)   10 ∗ 180.66 − (58.5)(30.2)
m=               2−(     2
=                            = 0.3374
n x       x)       10 ∗ 354.05 − (58.5)2
The y− intercept
y                x         30.2               58.5
b=               −m               =          − 0.3374 ∗           = 1.0462.
n                n             10                  10
ˆ
ANSWER: The best ﬁt line is y = 0.3374x + 1.0462.
2. Calculate the correlation coeﬃcient, r.
SOLUTION:
n           xy − (   x)(      y)                       10 ∗ 180.66 − (58.5)(30.2)
r=                                                          =
n    x2 − (             x)2   n       y2 − (   y)2       10 ∗ 354.05 − (58.5)2    10 ∗ 93.14 − (30.2)2
Calculations yield r = 0.8339.
3. Test at α = 0.05 whether y and x have positive linear association.
Solution: Here we want to test for positive correlation, i.e.,
H0 : ρ ≤ 0 versus Ha : ρ > 0 at α = 0.05
Test statistic:
r             .8339
t=             =                  = 4.2735
1−r   2       1 − (.8339)2
n−2             10 − 2
Critical point for right tailed test at α = 0.05 and d.f. = 10−2 = 8 is t0 = 1.860
The rejection rule is: Reject H0 at α = 0.05 if t > 1.860
Here t = 4.2735 > t0 = 1.860. So, the decision is Reject H0

4. Predict a student’s GPA when his SAT score was 5.2.
ˆ
Solution: The predicted GPA y is

y = 0.3374 ∗ 5.2 + 1.0462 = 2.8007
ˆ

ˆ

5. Find se , the standard error in estimation.
SOLUTION:
(yi − yi )2
ˆ
se =
n−2
First create the following table to ﬁnd    (y − y )2 .
ˆ

SAT, x    GPA, y       y
ˆ           (y − y )2
ˆ
4.8      2.4      2.66572    (2.4 − 2.66572)2
6.6      3.5      3.27304    (3.5 − 3.27304)2
5.9      3.0      3.03686    (3.0 − 3.03686)2
7.4      3.8      3.54296    (3.8 − 3.54296)2
3.8      2.7      2.32832    (2.7 − 2.32832)2
5.2      2.4      2.80068    (2.4 − 2.80068)2
6.6      3.0      3.27304    (3.0 − 3.27304)2
5.0      2.8      2.73320    (2.8 − 2.73320)2
7.2      3.4      3.47548    (3.4 − 3.47548)2
6.0      3.2      3.07060    (3.2 − 3.07060)2
0.58969

The standard error se is

0.58969
se =            = 0.2715
10 − 2
6. Find a 99% prediction interval for GPA of a student whose SAT score was 5.2.
Solution: The formula to compute the Prediction Interval is (ˆ − E, y + E)
y       ˆ
where
1     n(x0 − x)2
¯
E = tc se 1 + +           2−(
n n x            x)2
ˆ
Here for x0 = 5.2, y = 2.8007 (computed in part 4).
se = .2715 (from part 5), n = 10, x = nx = 58.5 = 5.85, x2 = 354.05.
¯         10
For 99% conﬁdence and d.f = n − 2 = 10 − 2 = 8, tc = 3.355
Plug in values gives

1     10(5.2 − 5.85)2
E = 3.355 ∗ .2715 1 +      +                      = .9707
10 10 ∗ 354.05 − (58.5)2

The prediction interval is (2.8007 − .9707, 2.8007 + .9707) = (1.8300, 3.7714).

7. What proportion of variation in y values is explained by the regression line
relating y, student’s GPA to x, the student’s SAT score.
Solution: The coeﬃcient of determination r2 is deﬁned as the proportion of
variation in y values that is explained by the regression line.
We calculated r in part 2 to be 0.8339.
Here
r2 = (0.8339)2 = 0.6954
Alternate way to calculate r2 :

SAT, x   GPA, y       ˆ
y           (ˆ − y )2
y ¯             (y − y )2
¯
4.8     2.4      2.66572   (2.66572 − 3.02)2   (2.4 − 3.02)2
6.6     3.5      3.27304   (3.27304 − 3.02)2   (3.5 − 3.02)2
5.9     3.0      3.03686   (3.03686 − 3.02)2   (3.0 − 3.02)2
7.4     3.8      3.54296   (3.54296 − 3.02)2   (3.8 − 3.02)2
3.8     2.7      2.32832   (2.32832 − 3.02)2   (2.7 − 3.02)2
5.2     2.4      2.80068   (2.80068 − 3.02)2   (2.4 − 3.02)2
6.6     3.0      3.27304   (3.27304 − 3.02)2   (3.0 − 3.02)2
5.0     2.8      2.73320   (2.73320 − 3.02)2   (2.8 − 3.02)2
7.2     3.4      3.47548   (3.47548 − 3.02)2   (3.4 − 3.02)2
6.0     3.2      3.07060   (3.07060 − 3.02)2   (3.2 − 3.02)2
1.3466            1.936

Total variation = 1.936 and Explained variation = 1.3466
So, coeﬃcient of determination = r2 = 1.3466/1.936 = .6955.

```
Related docs