Problem on Simple linear regression

Document Sample
scope of work template
							                             Problem on Simple linear regression
Stat1222                                                                                     4/16/09

To determine which students should receive scholarships, a university admissions
officer decided to study the relationship between a student’s score on the SAT verbal
test (taken in the final year of high school) and the student’s college GPA at the
end of the sophomore year. Ten students were examined and the student’s college
GPA at the end of the sophomore year. Ten student records were examined with the
following results. The reported exam scores are the actual scores divided by 100.

 Student SAT, x          GPA, y
    1      4.8            2.4
    2      6.6            3.5
    3      5.9            3.0
    4      7.4            3.8
    5      3.8            2.7
    6      5.2            2.4
    7      6.6            3.0
    8      5.0            2.8
    9      7.2            3.4
   10      6.0            3.2

Let x be the SAT score of a student and y be the student’s GPA. Then                             x2 =
354.05,   xy = 180.66,   y 2 = 93.14  x = 58.5, and    y = 30.2.
  1. Find the best fit regression line relating y to x
                                          ˆ
     Solution: Best fit regression line is y = mx + b.
     The slope
                 n        xy − ( x)( y)   10 ∗ 180.66 − (58.5)(30.2)
            m=               2−(     2
                                        =                            = 0.3374
                         n x       x)       10 ∗ 354.05 − (58.5)2
     The y− intercept
                             y                x         30.2               58.5
                b=               −m               =          − 0.3374 ∗           = 1.0462.
                         n                n             10                  10
                                 ˆ
     ANSWER: The best fit line is y = 0.3374x + 1.0462.
  2. Calculate the correlation coefficient, r.
     SOLUTION:
                     n           xy − (   x)(      y)                       10 ∗ 180.66 − (58.5)(30.2)
     r=                                                          =
            n    x2 − (             x)2   n       y2 − (   y)2       10 ∗ 354.05 − (58.5)2    10 ∗ 93.14 − (30.2)2
     Calculations yield r = 0.8339.
3. Test at α = 0.05 whether y and x have positive linear association.
   Solution: Here we want to test for positive correlation, i.e.,
   H0 : ρ ≤ 0 versus Ha : ρ > 0 at α = 0.05
   Test statistic:
                                 r             .8339
                        t=             =                  = 4.2735
                               1−r   2       1 − (.8339)2
                                n−2             10 − 2
   Critical point for right tailed test at α = 0.05 and d.f. = 10−2 = 8 is t0 = 1.860
   The rejection rule is: Reject H0 at α = 0.05 if t > 1.860
   Here t = 4.2735 > t0 = 1.860. So, the decision is Reject H0

4. Predict a student’s GPA when his SAT score was 5.2.
                                ˆ
   Solution: The predicted GPA y is

                          y = 0.3374 ∗ 5.2 + 1.0462 = 2.8007
                          ˆ

           ˆ
   ANSWER: y = 2.8007

5. Find se , the standard error in estimation.
   SOLUTION:
                                           (yi − yi )2
                                                 ˆ
                                 se =
                                            n−2
   First create the following table to find    (y − y )2 .
                                                   ˆ


                    SAT, x    GPA, y       y
                                           ˆ           (y − y )2
                                                             ˆ
                      4.8      2.4      2.66572    (2.4 − 2.66572)2
                      6.6      3.5      3.27304    (3.5 − 3.27304)2
                      5.9      3.0      3.03686    (3.0 − 3.03686)2
                      7.4      3.8      3.54296    (3.8 − 3.54296)2
                      3.8      2.7      2.32832    (2.7 − 2.32832)2
                      5.2      2.4      2.80068    (2.4 − 2.80068)2
                      6.6      3.0      3.27304    (3.0 − 3.27304)2
                      5.0      2.8      2.73320    (2.8 − 2.73320)2
                      7.2      3.4      3.47548    (3.4 − 3.47548)2
                      6.0      3.2      3.07060    (3.2 − 3.07060)2
                                                        0.58969

   The standard error se is

                                       0.58969
                               se =            = 0.2715
                                        10 − 2
6. Find a 99% prediction interval for GPA of a student whose SAT score was 5.2.
   Solution: The formula to compute the Prediction Interval is (ˆ − E, y + E)
                                                                y       ˆ
   where
                                       1     n(x0 − x)2
                                                     ¯
                       E = tc se 1 + +           2−(
                                       n n x            x)2
                     ˆ
  Here for x0 = 5.2, y = 2.8007 (computed in part 4).
  se = .2715 (from part 5), n = 10, x = nx = 58.5 = 5.85, x2 = 354.05.
                                    ¯         10
  For 99% confidence and d.f = n − 2 = 10 − 2 = 8, tc = 3.355
  Plug in values gives

                                     1     10(5.2 − 5.85)2
            E = 3.355 ∗ .2715 1 +      +                      = .9707
                                     10 10 ∗ 354.05 − (58.5)2

  The prediction interval is (2.8007 − .9707, 2.8007 + .9707) = (1.8300, 3.7714).

7. What proportion of variation in y values is explained by the regression line
   relating y, student’s GPA to x, the student’s SAT score.
   Solution: The coefficient of determination r2 is defined as the proportion of
   variation in y values that is explained by the regression line.
   We calculated r in part 2 to be 0.8339.
   Here
                                r2 = (0.8339)2 = 0.6954
  Alternate way to calculate r2 :


           SAT, x   GPA, y       ˆ
                                 y           (ˆ − y )2
                                              y ¯             (y − y )2
                                                                    ¯
             4.8     2.4      2.66572   (2.66572 − 3.02)2   (2.4 − 3.02)2
             6.6     3.5      3.27304   (3.27304 − 3.02)2   (3.5 − 3.02)2
             5.9     3.0      3.03686   (3.03686 − 3.02)2   (3.0 − 3.02)2
             7.4     3.8      3.54296   (3.54296 − 3.02)2   (3.8 − 3.02)2
             3.8     2.7      2.32832   (2.32832 − 3.02)2   (2.7 − 3.02)2
             5.2     2.4      2.80068   (2.80068 − 3.02)2   (2.4 − 3.02)2
             6.6     3.0      3.27304   (3.27304 − 3.02)2   (3.0 − 3.02)2
             5.0     2.8      2.73320   (2.73320 − 3.02)2   (2.8 − 3.02)2
             7.2     3.4      3.47548   (3.47548 − 3.02)2   (3.4 − 3.02)2
             6.0     3.2      3.07060   (3.07060 − 3.02)2   (3.2 − 3.02)2
                                              1.3466            1.936

  Total variation = 1.936 and Explained variation = 1.3466
  So, coefficient of determination = r2 = 1.3466/1.936 = .6955.

						
Related docs