VIEWS: 36 PAGES: 5 CATEGORY: Education POSTED ON: 3/4/2010 Public Domain
IT233: Applied Statistics TIHE 2005 Lecture 07 Simple Linear Regression Analysis In this section we do an in-depth analysis of the linear association between two variables X (called an independent variable or regressor) and Y (called a dependent variable or response). Simple Liner Regression makes 3 basic assumptions: 1. Given a value ( x ) of X , the corresponding value of Y is a 0 random variable whose mean (the mean of Y given the value Y x0 x0 ) is a linear function of X . i.e. x0 or, E Y X x0 x0 Y x0 2. The variation of Y around this given value x is Normal. 0 3. The variance of Y is same for all given values of X i.e. 2 2, for any x 0 Y x0 Example: In simple linear regression, suppose the variance of Y when X = 4 is 16, what is the variance of Y when X = 5? Answer: Same Simple Linear Regression Model: Using the above assumptions, we can write the model as: Y x Where is a random variable (or error term) and follows normal distribution with E ( ) 0 and Var ( ) 2 i.e. N 0, 2 Illustration of Linear Regression: Let X = the height of father Y = the height of son For fathers whose height is x , the heights of the sons will vary randomly. 0 Linear regression states that the height on son is a linear function of the height of father i.e. Y x Scatter Diagrams: A scatter diagram will suggest whether a simple linear regression model would be a good fit. Figure (A) suggests that a simple linear regression model seems OK, though the fit is not very good (Wide scatter around the fitted lines). Figure (B) suggest that a simple linear regression model seems to fit well (points are closed to the fitted line). In (C) a straight line could be fitted, but relationship is not linear. In (D) there is no relationship between Y and X . Fitting a Linear Regression Equation: Keep in mind that there are two lines Y Y x (True line) ˆ ˆ ˆ Y x (Estimated line) X Notation: a (or ) = Estimate of ˆ ˆ b (or ) = Estimate of y = The observed value of Y corresponding to x i i ˆ y = The fitted value of Y corresponding to x i i ei = yi yi = The residual . ˆ Residual: The error in Fit: The residual, denoted by e = yi yi , is the difference between the observed ˆ i and fitted values of Y . It estimates . i Y yi ˆ ˆ ˆ Y x ei ˆ yi xi X The Method of Least Squares: ˆ We shall find a (or ) and b (or ) , the estimates of and , so that the sum ˆ of the squares of the residuals is minimum. The residual sum of squares is often called the sum of squares of the errors about the regression line and denoted by SSE . This minimization procedure for estimating the parameters is called the method of least squares. Hence, we shall find a and b so as to minimize n n 2 n 2 SSE e2 y y ˆ y a bx i i i i i i 1 i 1 i 1 Differentiating SSE with respect to a and b and setting the partial derivatives equal to zero, we obtain the equations (called the normal equations). n n na b x y i i i 1 i 1 n n n a x b x2 x y i i i i i 1 i 1 i 1 Which may be solved simultaneously for a and b .