The Ordinary Least Squares (OLS) method by dib16550

VIEWS: 0 PAGES: 5

									                                                                                                                              1/23/2010




                                                                      Overview

   The Ordinary Least Squares                                          Single independent variable models
                                                                       Multivariate (multiple) regression models
        (OLS) method                                                   Interpretation of coefficients
                                                                       Total, explained, and residual sum of squares
              Chapter 2                                                The coefficient of determination (R2) and its
                                                                       (mis)use




Single independent variable model                                     The OLS method
                 Yi = β 0 + β1 X i + ε i , i = 1,..., n
                                                                       The OLS method determines β0 and β1 such that the
 Given n of observations about variables Y                             sum of squared residuals in the sample is minimized
 and X, we seek to find (estimate) the
 population coefficients (β0, β1)
                                                                                       ∑ (e ) = ∑ (Y − β                  )
                                                                                                                          2
 The simplest and most popular estimation                                   Minimize
                                                                                               2       ˆ         ˆ
                                                                                                               − β1 X i
                                                                                           i        i      0
 method is the Ordinary Least Squares (OLS)




The OLS estimators                                                    Estimator vs. estimate

 The result of solving the minimization problem gives                  An estimator is a formula or a method of
 the following two estimators:                                         approximating a parameter of a population,
          n                                                            such as β0 and β1
         ∑ (X          i   − X )(Yi − Y )
                                                ˆ         ˆ            An estimate is a number found by applying
  ˆ
  β1 =   i =1
                                                β 0 = Y − β1 X
                 n                                                     the estimator (the formula) to a particular
                ∑ (X             −X)
                                   2
                             i                                         sample
                i =1



 Where Y and                X are the means of Y and X respectively




                                                                                                                                     1
                                                                                                                                                   1/23/2010




Exercise                                                               Exercise
     Given a sample of 3 observations (n=3), find the
     OLS estimates of the coefficients of the Weight-                                               _        _        _            _      _
     Height regression equation                                         i       Xi       Yi      Yi–Y     Xi–X    (Xi–X)2      (Xi–X) (Yi–Y)

                                                                       1        9       165
               Obs. #      Height over 5’’          Weight
               i=1…3             Xi                  Yi
                 1               9                   165
                                                                       2       12       180

                                                                       3       15       190
                 2               12                  180
                                                                                _        _
                                                                                X=       Y=                       Σ=           Σ=
                 3               15                  190
                                                                       Estimates based on this sample:
                                                                       β 1^ =       β0^ =




Exercise (continued)                                                   OLS and the multivariate regression model
                           _       _       _               _      _
                                                                             Yi = β0 + β1 X 1i + β 2 X 2 i + ... + β k X ki + ε i , i = 1,..., n
 i      Xi       Yi     Yi–Y    Xi–X   (Xi–X)2         (Xi–X) (Yi–Y)
                                                                            The OLS estimators of the Betas are determined by
1       9       165     -13.3    -3           9                40           minimizing the sum of squared residuals of the sample
                                                                            Coefficient βj shows the change in Y when variable Xj changes
2       12      180     1.6      0            0                 0           by one unit
                                                                                                         ∆Y
                                                                                                  βj =        ,
3       15      190     11.6     3            9                35                                        ∆X j
        X_=     Y_=
       12.00   178.33                        Σ=18            Σ = 75
                                                                       when all other variables in the model stay the same
Estimates based on this sample:                                        (Recommended reading: Wooldridge 1.4 and 3.2)
β1^ = 4.17 and β0^ = 128.3




Example                                                                More on the interpretation of Betas
                                                                       (based on W3.2)
     Given the estimated model                                              Consider the model:
                                                                                          Y = β0 + β1 X 1 + β 2 X 2 + ε
     Y=12.8 − .317 X1 + 1.2 X2,
                                                                            β1 can be determined in two steps, as follows
                                                                              Regress X1 on X2 and retain the residuals r12
     Explain the meaning of the numbers in this                               Regress Y on r12 and thus determine β1
     expression
                                                                            r12 is the part of X1 uncorrelated with X2
                                                                            If X1 is uncorrelated with X2, β1 is the same whether
                                                                            or not X2 is included in the model




                                                                                                                                                          2
                                                                                                                 1/23/2010




Comparison of Simple and Multiple                       Comparison continued
Regression Estimates (W3.2)
 We compare the simple regression                         It can be proved that the relationship between the
                  ~ ~ ~                                   two Betas is
                  Y = β0 + β1 X 1                                        ~ ) ) ˆ
                                                                         β1 = β1 + β2 δ 21
 With the multiple regression
                 ) ) )             )                      Where ˆ 21 is the slope in the regression of X2 on X1
                                                                  δ
                 Y = β0 + β1 X 1 + β 2 X 2                Thus, the two Betas are the same if either
                                                                                          )
                                                           X2 has no direct effect on Y ( β 2 = 0), or
                        ~        )                                                                     ˆ
                                                           X2 and X1 are uncorrelated in the sample ( δ 21 = 0 ).
 And we want to see if β1 and β1 are the same.




Comparison – Example (W3.2)                             Example I - Discussion

 Y= the participation rate in a pension plan             How to interpret the coefficient of X2 in the
 X1 = employer’s contribution (%)                        multiple regression? Does it seem to be
 X2 = for how long the plan has been in place (years)    important?
 Based on a sample of about 1000 plans,

         Y = 80.12 + 5.52 X 1 + .243 X 2                 The coefficient of X1 did not change much
 If X2 is omitted, the new equation is                   when X2 was removed. How can you explain
            Y = 83.08 + 5.86 X 1                         that?

 Discussion:




Comparison – Example II (W3.2)                          How good is an estimated regression?
Achievement Test Score and college GPA
                                                         How much of the observed variation in Y is
                                                         ‘explained’ by the model?
 Y= college GPA                                          The total variation in Y that is captured in a sample
 X1= achievement test score (0 to 100)                   is measured by the total sum of squares

 X2= high school GPA
                                                                                 n
                                                                        TSS = ∑ (Yi − Y )
                                                                                             2

                                                                                i =1
 Model 1: Y = 2.40 + .0271 X1
 Model 2: Y = 1.29 + .0094 X1 +.453 X2




                                                                                                                        3
                                                                                                                       1/23/2010




Exercise                                                      The explained sum of squares

 Given Y=(2, 7, 4, 10), calculate TSS.                         The explained sum of squares is the variation of the
                                                               fitted values of y around their mean


                                                                                    ˆ   (
                                                                            ESS = ∑ Yi − Y     )2




The residual sum of squares                                   OLS and the sum of squares

 The residual sum of squares measures what portion             For an OLS regression,
 of the total variation in Y around its mean (in the
 sample) is not explained by the regression model
                                                                               TSS=ESS+RSS
                      n               n
                RSS = ∑ e = ∑
                            2
                            i               (    ˆ
                                            Yi − Yi   )
                                                      2

                     i =1            i =1




Graphics of TSS, ESS, and RSS                                 Goodness of fit
            Y

                                                               R2, or the coefficient of determination shows what
                                                               is the explained portion of the total variation in Y:
       Y8
                                     RSS                                           ESS      RSS
       ˆ
       Y8
                      TSS
                                                                            R2 =       = 1−
                                     ESS                                           TSS      TSS
       Y                                                       R2 is always a number between 0 and 1
                                                               It is a summary measure of how well the regression
                                                               function approximates the observations in the
                                                               sample (a measure of ‘goodness of fit’)
                                                          X    It cannot decrease when introducing an additional
                                X8
                                                               variable in the model




                                                                                                                              4
                                                                                                           1/23/2010




Exercise                                                    Adjusted R2
 For the data in the table below, calculate   R2   if the
 estimated regression equation is                                  R 2 = 1−
                                                                              ∑ e / (N − K − 1)
                                                                                   i
                                                                                    2



                                                                              ∑ (Y − Y ) / (N − 1)
                                                                                        2
                    Yi^=94.2+8.12Xi                                               i


                  Yi            Xi                           N−K−1 = the degrees of freedom of the
                 140             5                           model
                 157             9                           Adjusted R2 is a better measure of goodness
                 205            13                           of fit because it takes into account the trade-
                                                             off between the costs and benefits of
 Draw a graph to show the actual points, the                 introducing an additional variable
 regression line, the residuals, and the Y-bar line.




A regression model is good if …                             Misuse of the R2

 It is based on theory                                       R2 is only one criterion of judging a model
 Estimation fits the data relatively well (!!)               The other criteria are at least as important
 The sample is sufficiently large and reliable               High R2 does not necessarily indicate a good
 The signs in the estimated coefficients are                 model
 those expected                                              Low R2 could be acceptable, if theoretical
 There are no relevant variables left out                    explanation is provided
 The functional form is appropriate




Example (S2.5)                                              Example (continued)
                                                             Compare the following two equations. Which
 Estimate the demand for water in the LA
                                                             of them would you prefer?
 county in a given year
 W = amount of water (millions of gallons)
                                                               W=24000 + 48000 PR + .4 P – 370RF
 PR = price
                                                                    Adj. R2 = .847, DF=25
 P = Population
 RF = Rainfall
                                                                     W=30000 + .62 P – 400 RF
 DF= degrees of freedom of the regression
 equation                                                             Adj. R2 = .847, DF=26




                                                                                                                  5

								
To top