IT233 Applied Statistics TIHE 2005 Lecture 07 Simple Linear Regression Analysis In this section we do an in depth analysis of the linea by svh16277

VIEWS: 36 PAGES: 5

									IT233: Applied Statistics                       TIHE 2005                   Lecture 07



                Simple Linear Regression Analysis
In this section we do an in-depth analysis of the linear association between two
variables X (called an independent variable or regressor) and Y (called a
dependent variable or response).

Simple Liner Regression makes 3 basic assumptions:

      1.    Given a value ( x ) of X , the corresponding value of Y is a
                             0
            random variable whose mean                 (the mean of Y given the value
                                                 Y x0
             x0 ) is a linear function of X .

                   i.e.               x0           or, E Y X  x0      x0
                                                                       
                            Y x0


      2.    The variation of Y around this given value x is Normal.
                                                        0

      3.    The variance of Y is same for all given values of X

                   i.e.   2        2,          for any x
                                                             0
                            Y x0


Example: In simple linear regression, suppose the variance of Y when X = 4 is
16, what is the variance of Y when X = 5?


Answer: Same
Simple Linear Regression Model:

Using the above assumptions, we can write the model as:

                          Y    x 

Where  is a random variable (or error term) and follows normal distribution
with E ( )  0 and Var ( )   2

             i.e.         
                      N 0,  2   

Illustration of Linear Regression:

   Let X = the height of father
       Y = the height of son

For fathers whose height is x , the heights of the sons will vary randomly.
                                0
Linear regression states that the height on son is a linear function of the height
of father i.e.

                          Y    x
Scatter Diagrams:

A scatter diagram will suggest whether a simple linear regression model would
be a good fit.




   Figure (A) suggests that a simple linear regression model seems OK, though
   the fit is not very good (Wide scatter around the fitted lines).

   Figure (B) suggest that a simple linear regression model seems to fit well
   (points are closed to the fitted line).

   In (C) a straight line could be fitted, but relationship is not linear.

   In (D) there is no relationship between Y and X .
Fitting a Linear Regression Equation:

Keep in mind that there are two lines

                     Y
                                                  Y     x (True line)

                                                     ˆ ˆ ˆ
                                                    Y     x (Estimated line)




                                                         X

Notation:

    a (or  ) = Estimate of 
           ˆ
          ˆ
    b (or  ) = Estimate of 
    y         = The observed value of Y corresponding to x
     i                                                           i
    ˆ
    y         = The fitted value of Y corresponding to x
     i                                                  i
    ei       = yi  yi = The residual .
                    ˆ

Residual: The error in Fit:

The residual, denoted by e = yi  yi , is the difference between the observed
                                  ˆ
                             i
and fitted values of Y . It estimates  .
                                       i



                    Y                        yi     ˆ ˆ ˆ
                                                   Y    x
                                    
                                 ei 
                                    
                                             ˆ
                                             yi




                                        xi                   X
The Method of Least Squares:
                                   ˆ
We shall find a (or  ) and b (or  ) , the estimates of  and  , so that the sum
                      ˆ
of the squares of the residuals is minimum. The residual sum of squares is often
called the sum of squares of the errors about the regression line and denoted by
SSE . This minimization procedure for estimating the parameters is called the
method of least squares. Hence, we shall find a and b so as to minimize


                                                                       
                          n        n                2     n                   2
                SSE   e2   y  y
                                   ˆ                      y  a  bx
                         i      i   i                        i        i
                       i 1     i 1                     i 1

Differentiating SSE with respect to a and b and setting the partial derivatives
equal to zero, we obtain the equations (called the normal equations).

                               n          n
                      na  b  x   y
                                i     i
                              i 1       i 1

                      n        n          n
                    a  x  b  x2   x y
                         i       i      i i
                     i 1     i 1       i 1

Which may be solved simultaneously for a and b .

								
To top