IT233: Applied Statistics TIHE 2005 Lecture 07 Simple Linear Regression Analysis In this section we do an in-depth analysis of the linear association between two variables X (called an independent variable or regressor) and Y (called a dependent variable or response). Simple Liner Regression makes 3 basic assumptions: 1. Given a value ( x ) of X , the corresponding value of Y is a 0 random variable whose mean (the mean of Y given the value Y x0 x0 ) is a linear function of X . i.e. x0 or, E Y X x0 x0 Y x0 2. The variation of Y around this given value x is Normal. 0 3. The variance of Y is same for all given values of X i.e. 2 2, for any x 0 Y x0 Example: In simple linear regression, suppose the variance of Y when X = 4 is 16, what is the variance of Y when X = 5? Answer: Same Simple Linear Regression Model: Using the above assumptions, we can write the model as: Y x Where is a random variable (or error term) and follows normal distribution with E ( ) 0 and Var ( ) 2 i.e. N 0, 2 Illustration of Linear Regression: Let X = the height of father Y = the height of son For fathers whose height is x , the heights of the sons will vary randomly. 0 Linear regression states that the height on son is a linear function of the height of father i.e. Y x Scatter Diagrams: A scatter diagram will suggest whether a simple linear regression model would be a good fit. Figure (A) suggests that a simple linear regression model seems OK, though the fit is not very good (Wide scatter around the fitted lines). Figure (B) suggest that a simple linear regression model seems to fit well (points are closed to the fitted line). In (C) a straight line could be fitted, but relationship is not linear. In (D) there is no relationship between Y and X . Fitting a Linear Regression Equation: Keep in mind that there are two lines Y Y x (True line) ˆ ˆ ˆ Y x (Estimated line) X Notation: a (or ) = Estimate of ˆ ˆ b (or ) = Estimate of y = The observed value of Y corresponding to x i i ˆ y = The fitted value of Y corresponding to x i i ei = yi yi = The residual . ˆ Residual: The error in Fit: The residual, denoted by e = yi yi , is the difference between the observed ˆ i and fitted values of Y . It estimates . i Y yi ˆ ˆ ˆ Y x ei ˆ yi xi X The Method of Least Squares: ˆ We shall find a (or ) and b (or ) , the estimates of and , so that the sum ˆ of the squares of the residuals is minimum. The residual sum of squares is often called the sum of squares of the errors about the regression line and denoted by SSE . This minimization procedure for estimating the parameters is called the method of least squares. Hence, we shall find a and b so as to minimize n n 2 n 2 SSE e2 y y ˆ y a bx i i i i i i 1 i 1 i 1 Differentiating SSE with respect to a and b and setting the partial derivatives equal to zero, we obtain the equations (called the normal equations). n n na b x y i i i 1 i 1 n n n a x b x2 x y i i i i i 1 i 1 i 1 Which may be solved simultaneously for a and b .
Pages to are hidden for
"IT233 Applied Statistics TIHE 2005 Lecture 07 Simple Linear Regression Analysis In this section we do an in depth analysis of the linea"Please download to view full document