regression

Shared by: yPsYn3x
Categories
Tags
-
Stats
views:
66
posted:
11/29/2011
language:
English
pages:
78
Document Sample
scope of work template
							               Regression Analysis
• Linear Regression Model
      –   Method of OLS
      –   Properties of OLS Estimators
      –   Goodness-of-Fit
      –   Inference
• Multiple Regression Model
      – Estimation
      – Goodness-of-Fit
      – Inference

Sisir Sarma                18.318: Introduction to Econometrics
     The Simple Regression Model
• Economic Model: y = b0 + b1x
• Examples: Consumption Function, Savings
  Function, Demand Function, Supply Function, etc.
• The parameters we are interested in this model
  are: b0 and b1, which we wish to estimate.
• A simple regression model can be written as
                    y = b0 + b1 x + 


Sisir Sarma          18.318: Introduction to Econometrics
              Some Terminology
In the simple linear regression model, where y =
b0 + b1x + , we typically refer to y as the

 •   Dependent Variable, or
 •   Left-Hand Side Variable, or
 •   Explained Variable, or
 •   Regressand

Sisir Sarma           18.318: Introduction to Econometrics
              Some Terminology (cont.)
 In the simple linear regression of y on x, we
 typically refer to x as the
  • Independent Variable, or
  • Right-Hand Side Variable, or
  • Explanatory Variable, or
  • Regressor, or
  • Covariate, or
  • Control Variable
Sisir Sarma             18.318: Introduction to Econometrics
                A Simple Assumption
The average value of , the error term, in the
 population is 0. That is,

                      E() = 0

This is not a restrictive assumption, since we can
 always use b0 to normalize E() to 0.


  Sisir Sarma          18.318: Introduction to Econometrics
               Zero Conditional Mean
• We need to make a crucial assumption about
  how  and x are related
• We want it to be the case that knowing
  something about x does not give us any
  information about , so that they are
  completely unrelated. That is,
• E(|x) = E() = 0, which implies
• E(y|x) = b0 + b1x

 Sisir Sarma           18.318: Introduction to Econometrics
                Ordinary Least Squares
• Basic idea of regression is to estimate the
  population parameters from a sample.
• Let {(xi,yi): i = 1, …, n} denote a random
  sample of size n from the population.
• For each observation in this sample, it will be
  the case that
                  yi = b0 + b1xi + i
This is the econometric model.
  Sisir Sarma           18.318: Introduction to Econometrics
Population regression line, sample data points
        and the associated error terms
  y                                    E(y|x) = b0 + b1x
 y4                                        .
                                       4 {


 y3                            .} 3
 y2                  2   {
                           .

                1
 y1        .}

           x1             x2   x3         x4          x
                Deriving OLS Estimates
• To derive the OLS estimates we need to realize
  that our main assumption of E(|x) = E( ) = 0
  also implies that

• Cov(x, ) = E(x ) = 0

• Why? Remember from basic probability that
  Cov(X,Y) = E(XY) – E(X)E(Y).
  Sisir Sarma            18.318: Introduction to Econometrics
                Deriving OLS (cont.)
• We can write our 2 restrictions just in terms of
  x, y, b0 and b1 , since  = y – b0 – b1x

• E(y – b0 – b1x) = 0
• E[x(y – b0 – b1x)] = 0

• These are called moment restrictions

  Sisir Sarma          18.318: Introduction to Econometrics
         Deriving OLS using M.O.M.
• The method of moments approach to estimation
  implies imposing the population moment
  restrictions on the sample moments.

• What does this mean? Recall that for E(X), the
  mean of a population distribution, a sample
  estimator of E(X) is simply the arithmetic mean
  of the sample.

  Sisir Sarma         18.318: Introduction to Econometrics
                More Derivation of OLS
• We want to choose values of the parameters
  that will ensure that the sample versions of our
  moment restrictions are true.
• The sample versions are as follows:

                                                      
        n
1             ˆ     ˆ
  
n i 1
        y i  b 0  b1 xi  0

                                                               
    n
1                 ˆ    ˆ
  
n i 1
       xi y i  b 0  b 1 xi  0
  Sisir Sarma            18.318: Introduction to Econometrics
                More Derivation of OLS
• Given the definition of a sample mean, and
  properties of summation, we can rewrite the
  first condition as follows

                        ˆ     ˆ
                    y  b 0  b1 x ,
                    or
                           ˆ
                     ˆ  yb x
                    b0      1
  Sisir Sarma             18.318: Introduction to Econometrics
         The OLS estimated slope is
                n

                x  x  y
                           i                i    y
        ˆ
        b1    i 1
                       n

                       x  x 
                                                2
                                i
                      i 1
                                     n
        provided that   xi  x   0
                                                               2

                                    i 1

Sisir Sarma                    18.318: Introduction to Econometrics
   Summary of OLS slope estimate
• The slope estimate is the sample covariance between x
  and y divided by the sample variance of x. [Note: if
  you divide both the numerator and the denominator by
  (n-1), we get the sample covariance and the sample
  variance formulas, respectively].
• If x and y are positively correlated, the slope will be
  positive.
• If x and y are negatively correlated, the slope will be
  negative.
• Only need x to vary in our sample.

  Sisir Sarma            18.318: Introduction to Econometrics
                   More OLS
• Intuitively, OLS is fitting a line through the
  sample points such that the sum of squared
  residuals is as small as possible, hence the term
  least squares.
                   
• The residual,  is an estimate of the error
  term,  , and is the difference between the fitted
  line (sample regression function) and the
  sample point.

  Sisir Sarma          18.318: Introduction to Econometrics
Sample regression line, sample data points
and the associated estimated error terms
  y
 y4                                       .
                                        4{
                                               ˆ ˆ      ˆ
                                               y  b0  b1x
                                   
 y3                           .}  3
 y2                      .
                    2{
                

 y1        .}  1
           x1           x2    x3           x4           x
   Alternate approach to Derivation
           (The Textbook)
• Given the intuitive idea of fitting a line, we can
  set up a formal minimization problem.
• That is, we want to choose our parameters such
  that we minimize the SSR:


    ˆi   yi  b 0  b1 x 
     n               n
  
                2   ˆ     ˆ                                       2
                                                                      
   i 1             i 1

  Sisir Sarma              18.318: Introduction to Econometrics
                Alternate approach (cont.)
• If one uses calculus to solve the minimization
  problem for the two parameters you obtain the
  following first order conditions, which are the
  same as we obtained before, multiplied by n

  y                                                     0
   n

                  i    b 0  b 1 xi
                        ˆ     ˆ
 i 1


                                                                  0
   n
  xi yi  b 0  b1 xi
           ˆ     ˆ
 i 1

  Sisir Sarma              18.318: Introduction to Econometrics
         Algebraic Properties of OLS
• The sum of the OLS residuals is zero.
• Thus, the sample average of the OLS residuals
  is zero as well.
• The sample covariance between the regressors
  and the OLS residuals is zero.
• The OLS regression line always goes through
  the mean of the sample.


  Sisir Sarma        18.318: Introduction to Econometrics
     Algebraic Properties (precise)
                                                  n

               n                                 ˆi
               ˆi  0 and thus,               i 1
                                                      n
                                                               0
              i 1
               n
               xiˆi  0
              i 1

                   ˆ    ˆ
              y  b 0  b1 x
Sisir Sarma                 18.318: Introduction to Econometrics
                    More terminology
We can think of each observatio n as being made
up of an explained part, and an unexplaine d part,
y i  y i   i We then define the following :
      ˆ     ˆ
 n
  y i  y 2 is the total sum of squares (TSS)
i 1
 n
  yi  y 
                2
    ˆ               is the explained sum of squares (ESS)
i 1
 n
  i 2 is the sum of squared residuals
 ˆ                                                                  (SSR)
i 1

Then TSS  ESS  SSR OR ESS  TSS - SSR
  Sisir Sarma                 18.318: Introduction to Econometrics
       Proof that TSS = ESS + SSR

  yi  y     yi  yi    yi  y 
              2                                            2
                            ˆ      ˆ
   i   y i  y 
                       2
      ˆ      ˆ
   i  2  i  y i  y     y i  y 
                                            2
     ˆ 2
                  ˆ ˆ              ˆ
 SSR  2  i  y i  y   ESS
                 ˆ ˆ
and we know that   i  y i  y   0
                         ˆ ˆ
Sisir Sarma         18.318: Introduction to Econometrics
                Goodness-of-Fit
• How do we think about how well our sample
  regression line fits our sample data?
• Can compute the fraction of the total sum of
  squares (TSS) that is explained by the model,
  call this the R-squared of regression
• R2 = ESS/TSS = 1 – SSR/TSS
• Since SSR lies between 0 and TSS, R2 will
  always lie between 0 and 1.

  Sisir Sarma         18.318: Introduction to Econometrics
                Unbiasedness of OLS
• Assume the population model is linear in
  parameters as y = b0 + b1x + 
• Assume we can use a random sample of size n,
  {(xi, yi): i = 1, 2, …, n}, from the population
  model. Thus we can write the sample model yi
  = b0 + b1xi + i
• Assume E(|x) = 0 and thus E(i|xi) = 0
• Assume there is variation in the xi

  Sisir Sarma          18.318: Introduction to Econometrics
         Unbiasedness of OLS (cont.)
• In order to think about unbiasedness, we need
  to rewrite our estimator in terms of the
  population parameter.
• Start with a simple rewrite of the formula as


                ˆ
                b1   
                        x  x  yi                     i

                        x  x 
                                                     2
                                       i
  Sisir Sarma           18.318: Introduction to Econometrics
       Unbiasedness of OLS (cont.)


        ˆ b   xi  x  i
        b1
                xi  x 
            1              2



               
              ˆ b
        So, E b1   1


Sisir Sarma       18.318: Introduction to Econometrics
                Unbiasedness Summary
• The OLS estimates of b1 and b0 are unbiased
• Proof of unbiasedness depends on our 4
  assumptions – if any assumption fails, then
  OLS is not necessarily unbiased
• Remember unbiasedness is a description of the
  estimator – in a given sample we may be “near”
  or “far” from the true parameter


  Sisir Sarma           18.318: Introduction to Econometrics
    Variance of the OLS Estimators
• Now we know that the sampling distribution of
  our estimate is centered around the true
  parameter.
• Want to think about how spread out this
  distribution is.
• Much easier to think about this variance under
  an additional assumption, so
• Assume Var(|x) = s2 (Homoskedasticity)

  Sisir Sarma         18.318: Introduction to Econometrics
                Variance of OLS (cont.)
• Var(|x) = E(2|x)-[E(|x)]2
• E(|x) = 0, so s 2 = E(2|x) = E(2) = Var()
• Thus s2 is also the unconditional variance,
  called the error variance.
• s, the square root of the error variance is called
  the standard deviation of the error.
• Can say: E(y|x)=b0 + b1x and Var(y|x) = s2.


  Sisir Sarma            18.318: Introduction to Econometrics
Homoskedastic Case
                    y
 f(y|x)


                        . E(y|x) = b + b x
                                   0    1
               .

          x1   x2
 Heteroskedastic Case
f(y|x)



                            .
                        .       E(y|x) = b0 + b1x

              .
         x1   x2   x3             x
                 Variance of OLS


                                      s      2
                  ˆ
              Var b1 
                               x  x 
                                                              2
                                           i

                               s x
                           N x  x 
                                         2                    2
                  ˆ
              Var b                                       i

                               
                      0                                            2
                                                   i

Sisir Sarma                 18.318: Introduction to Econometrics
           Variance of OLS Summary
• The larger the error variance, s2, the larger the
  variance of the slope estimate, for a given  x  x  2
                                               i
•
• The larger the variability in the xi, the smaller
  the variance of the slope estimate.
• As a result, a larger sample size should
  decrease the variance of the slope estimate.
• Problem that the error variance is unknown.
  Sisir Sarma           18.318: Introduction to Econometrics
       Estimating the Error Variance
• We don’t know what the error variance, s2 is,
  because we don’t observe the errors, i.

• What we observe are the residuals, i
                                     ˆ
• We can use the residuals to form an estimate of
  the error variance.



  Sisir Sarma         18.318: Introduction to Econometrics
   Error Variance Estimate (cont.)



   An unbiased estimator of s is                        2



                 ˆi  SSR /n  2
           1
   s 
    ˆ 2              2

        n  2

Sisir Sarma      18.318: Introduction to Econometrics
   Error Variance Estimate (cont.)

sˆ  s 2  Standard error of the regression
        ˆ
                  
recall that sd b  s
               ˆ
                            
                      xi  x 
                                  1
                                2 2
                                                       
if wesubstitutes for s then wehave
                 ˆ
                      ˆ
the standard error of b1 ,

       ˆ    
se b1  s /  xi  x 
   ˆ                         2
                                 
                                 1
                                     2


Sisir Sarma            18.318: Introduction to Econometrics
       Multiple Regression Analysis


              y = b0 + b1x1 + b2x2 + . . . bkxk +

                          Estimation




Sisir Sarma                  18.318: Introduction to Econometrics
 Parallels with Simple Regression
•    b0 is still the intercept
•    b1 to bk all called slope parameters
•     is still the error term
• Still need to make a zero conditional mean
  assumption, so now assume that
• E(|x1,x2, …,xk) = 0
• Still minimizing the sum of squared
  residuals, so have k+1 first order conditions
Sisir Sarma            18.318: Introduction to Econometrics
 Interpreting Multiple Regression
 ˆ ˆ       ˆ        ˆ            ˆ
 y  b 0  b1 x1  b 2 x2  ... b k xk , so
   ˆ     ˆ         ˆ
 y  b x  b x  ... b x ,   ˆ
              1 1    2 2                             k      k

 so holding x2 ,...,xk fixed implies that
         ˆ
 y  b x , that is each b has
   ˆ          1 1

   a ceteris paribus interpretation
Sisir Sarma          18.318: Introduction to Econometrics
 Simple vs Multiple Reg Estimate
                                      ~ ~
                                  ~b b x
Compare the simple regression y        0     1 1

                               ˆ ˆ       ˆ       ˆ
with the multiple regression y  b 0  b1 x1  b 2 x2
            ~    ˆ
Generally, b1  b1 unless :
ˆ
b  0 (i.e. no partial effect of x ) OR
   2                                           2

x1 and x2 are uncorrelated in the sample

Sisir Sarma           18.318: Introduction to Econometrics
                   Goodness-of-Fit
We can think of each observatio n as being made
up of an explained part, and an unexplaine d part,
yi  yi   i We then define the following :
     ˆ ˆ
  yi  y  is the total sum of squares (TSS)
               2



  yi  y  is the explained sum of squares (ESS)
            2
    ˆ
 ˆi is the sum of squared residuals (SSR)
     2


Then TSS  ESS  SSR
 Sisir Sarma            18.318: Introduction to Econometrics
              Goodness-of-Fit (cont.)
  • How do we think about how well our
      sample regression line fits our sample data?

  • Can compute the fraction of the total sum
      of squares (SST) that is explained by the
      model, call this the R-squared of regression

  • R2 = ESS/TSS = 1 – SSR/TSS
Sisir Sarma            18.318: Introduction to Econometrics
              More about R-squared
• R2 can never decrease when another
  independent variable is added to a
  regression, and usually will increase

• Because R2 will usually increase with the
  number of independent variables, it is not a
  good way to compare models

Sisir Sarma           18.318: Introduction to Econometrics
              Adjusted R-Squared
• Recall that the R2 will always increase as more
  variables are added to the model
• The adjusted R2 takes into account the number of
  variables in a model, and may decrease

                R   2
                         1
                             SSR n  k  1
                               SST n  1
                               sˆ   2
                1
                         SST n  1
Sisir Sarma                 18.318: Introduction to Econometrics
         Adjusted R-Squared (cont.)
• Most packages will give you both R2 and
  adj-R2
• You can compare the fit of 2 models (with
  the same y) by comparing the adj-R2
• You cannot use the adj-R2 to compare
  models with different y’s (e.g. y vs. ln(y))


Sisir Sarma         18.318: Introduction to Econometrics
              Goodness of Fit
• Important not to fixate too much on adj-R2 and
  lose sight of theory and common sense
• If economic theory clearly predicts a variable
  belongs, generally leave it in
• Don’t want to include a variable that prohibits a
  sensible interpretation of the variable of interest
• Remember ceteris paribus interpretation of
  multiple regression

Sisir Sarma             18.318: Introduction to Econometrics
Classical Linear Model: Inference
• The 4 assumptions for unbiasedness, plus
  homoskedasticity assumption are known as the
  Gauss-Markov assumptions.
• If the Gauss-Markov assumptions hold, OLS is
  BLUE.
• In order to do classical hypothesis testing, we
  need to add another assumption (beyond the
  Gauss-Markov assumptions).
• Assume that  is independent of x1, x2,…, xk and 
  is normally distributed with zero mean and
  variance s2:  ~ iid N(0,s2)

Sisir Sarma           18.318: Introduction to Econometrics
              CLM Assumptions (cont.)
• Under CLM, OLS is not only BLUE, but is
  the minimum variance unbiased estimator.
• We can summarize the population
  assumptions of CLM as follows
• y|x ~ Normal(b0 + b1x1 +…+ bkxk, s2)
• While for now we just assume normality,
  clear that sometimes not the case.
• Large samples will let us drop normality.
Sisir Sarma            18.318: Introduction to Econometrics
The homoskedastic normal distribution with
a single explanatory variable
                            y
   f(y|x)


                                                     . E(y|x) = b + b x
                                                                0    1
                     .
                   Normal
                   distributions

              x1     x2
Sisir Sarma              18.318: Introduction to Econometrics
    Normal Sampling Distributions
 Under the CLM assumptions, conditional on
 the sample values of the independent variables
  ˆ
     j
                       ˆ  
 b ~ Normal b ,Var b , so thatj        j

 bˆ         bj             ~ Normal0,1
                          
         j
                          ˆ
                       sd b j
 ˆ
 b j is distributed normally becauseit
 is a linear combination of the errors
Sisir Sarma                       18.318: Introduction to Econometrics
                   The t Test
  Under the CLM assumptions
  ˆ
  bj  b j    
              ˆ
           se b    
                ~ t n  k 1
                   j

  Note this is a t distribution (vs normal)
  because we have to estimate s by sˆ2                  2


  Note the degrees of freedom : n  k  1
Sisir Sarma             18.318: Introduction to Econometrics
              The t Test (cont.)
• Knowing the sampling distribution for the
  standardized estimator allows us to carry out
  hypothesis tests
• Start with a null hypothesis
• For example, H0: bj = 0
• If accept null, then accept that xj has no effect
  on y, controlling for other x’s.


Sisir Sarma           18.318: Introduction to Econometrics
              The t Test (cont.)
                            e
To perform our test w first need to form
                                    ˆ
                                    bj
                       ˆ
" the" t statistic for b j : t bˆ 
                                 j        ˆ
                                       se b                 
                                                            j

We will then use our t statistic along with
                                      o
a rejection rule to determine whether t
accept thenull hypothesis, H 0
Sisir Sarma         18.318: Introduction to Econometrics
     t Test: One-Sided Alternatives
• Besides our null, H0, we need an alternative
  hypothesis, HA, and a significance level
• HA may be one-sided, or two-sided
• HA: bj > 0 and HA: bj < 0 are one-sided
• HA: bj  0 is a two-sided alternative.
• If we want to have only a 5% probability of
  rejecting H0 if it is really true, then we say
  our significance level is 5%.
Sisir Sarma         18.318: Introduction to Econometrics
    One-Sided Alternatives (cont.)
• Having picked a significance level, a, we
  look up the (1 – a)th percentile in a t
  distribution with n – k – 1 df and call this c,
  the critical value.
• We can reject the null hypothesis if the t
  statistic is greater than the critical value.
• If the t statistic is less than the critical value
  then we fail to reject the null.

Sisir Sarma           18.318: Introduction to Econometrics
One-Sided Alternatives (cont.)
 yi = b0 + b1xi1 + … + bkxik + i

 H0: bj = 0                              HA: bj > 0

Fail to reject
                                                          reject
                 1  a                                     a

Sisir Sarma
                      0                           c
                   18.318: Introduction to Econometrics
              One-sided vs Two-sided
• Because the t distribution is symmetric, testing
  H1: bj < 0 is straightforward. The critical value is
  just the negative of before
• We can reject the null if the t statistic < –c, and if
  the t statistic > than –c then we fail to reject the
  null
• For a two-sided test, we set the critical value
  based on a/2 and reject H1: bj  0 if the absolute
  value of the t statistic > c
Sisir Sarma             18.318: Introduction to Econometrics
         Two-Sided Alternatives
 yi = b0 + b1Xi1 + … + bkXik + i

 H0: bj = 0                                    HA: bj  0
                    fail to reject

reject                                                           reject
        a/2            1  a                                     a/2

 Sisir Sarma
               -c            0                            c
                          18.318: Introduction to Econometrics
              Summary for H0: bj = 0
• Unless otherwise stated, the alternative is
  assumed to be two-sided
• If we reject the null, we typically say “xj is
  statistically significant at the a % level”
• If we fail to reject the null, we typically say
  “xj is statistically insignificant at the a %
  level”

Sisir Sarma           18.318: Introduction to Econometrics
              Testing other hypotheses
• A more general form of the t statistic
  recognizes that we may want to test
  something like H0: bj = aj
• In this case, the appropriate t statistic is

                   bˆ        aj   
                t
                                           
                         j
                                           ˆ   , where
                                        se b j
                a j  0 for the standard test
Sisir Sarma                     18.318: Introduction to Econometrics
              Confidence Intervals
•  Another way to use classical statistical testing is
  to construct a confidence interval using the same
  critical value as was used for a two-sided test
• A (1 - a) % confidence interval is defined as



                
     ˆ  c  se b , wherec is the 1 - a  percentile
    bj              ˆ
                      j                 
                                   2
    in a tn k 1 distribution
Sisir Sarma             18.318: Introduction to Econometrics
     Computing p-values for t tests
• An alternative to the classical approach is
  to ask, “what is the smallest significance
  level at which the null would be rejected?”
• So, compute the t statistic, and then look up
  what percentile it is in the appropriate t
  distribution – this is the p-value
• p-value is the probability we would observe
  the t statistic we did, if the null were true

Sisir Sarma         18.318: Introduction to Econometrics
     Testing a Linear Combination
• Suppose instead of testing whether b1 is equal to a
  constant, you want to test if it is equal to another
  parameter, that is H0 : b1 = b2
• Use same basic procedure for forming a t statistic

                   ˆ ˆ
                   b1  b 2
              t
                    ˆ b
                 se b1   ˆ
                            2                      
Sisir Sarma            18.318: Introduction to Econometrics
       Testing Linear Comb. (cont.)
Since
   
   ˆ ˆ              
                  ˆ ˆ        
se b1  b 2  Var b1  b 2 , then
Varb  b   Varb   Varb   2Covb , b 
      ˆ ˆ
              1
                      ˆ
                           2
                                  ˆ
                                    1
                                            ˆ ˆ
                                                           2                           1   2


se b  b   se b   se b   2 s 
               ˆ
                                                                               1
    ˆ ˆ                         ˆ   2                       2                      2
        1              2       1                       2                  12

where s is an estimate of Covb , b 
                  12
                                    ˆ ˆ
                                                                1         2

Sisir Sarma                        18.318: Introduction to Econometrics
    Testing a Linear Comb. (cont.)
• So, to use formula, need s12, which standard
  output does not have
• Many packages will have an option to get it, or
  will just perform the test for you
• In Eviews, after ls y c x1 x2 … xk, in the window
  with the regression results, select View then
  Coefficient Tests, then Wald tests to do a Wald
  test of the hypothesis that b1 = b2 (type c(2) = c(3)

Sisir Sarma             18.318: Introduction to Econometrics
                  Example:
  • Suppose you are interested in the effect of
    campaign expenditures on outcomes
  • Model is voteA = b0 + b1log(expendA) +
    b2log(expendB) + b3prtystrA + 
  • H0: b1 = - b2, or H0: q1 = b1 + b2 = 0
  • b1 = q1 – b2, so substitute in and rearrange
     voteA = b0 + q1log(expendA) +
    b2log(expendB - expendA) + b3prtystrA + 
Sisir Sarma          18.318: Introduction to Econometrics
              Example (cont.)
• This is the same model as originally, but
  now you get a standard error for b1 – b2 = q1
  directly from the basic regression
• Any linear combination of parameters
  could be tested in a similar manner
• Other examples of hypotheses about a
  single linear combination of parameters: b1
  = 1 + b2 ; b1 = 5b2 ; b1 = -1/2b2 ; etc

Sisir Sarma         18.318: Introduction to Econometrics
        Multiple Linear Restrictions
• Everything we’ve done so far has involved
  testing a single linear restriction, (e.g. b1 = 0
  or b1 = b2 )
• However, we may want to jointly test
  multiple hypotheses about our parameters
• A typical example is testing “exclusion
  restrictions” – we want to know if a group
  of parameters are all equal to zero

Sisir Sarma           18.318: Introduction to Econometrics
    Testing Exclusion Restrictions
• Now the null hypothesis might be
  something like H0: bk-r+1 = 0, ... , bk = 0
• The alternative is just HA: H0 is not true
• Can’t just check each t statistic separately,
  because we want to know if the r
  parameters are jointly significant at a given
  level – it is possible for none to be
  individually significant at that level

Sisir Sarma         18.318: Introduction to Econometrics
      Exclusion Restrictions (cont.)
• To do the test we need to estimate the “restricted
  model” without xk-r+1,, …, xk included, as well as
  the “unrestricted model” with all x’s included
• Intuitively, we want to know if the change in SSR
  is big enough to warrant inclusion of xk-r+1,, …, xk
          SSRUR  r
              SSRR
F                    , where
    SSRUR n  k  1
the subscript R refers to restricted and
the subscript UR refers to unrestrict ed
Sisir Sarma            18.318: Introduction to Econometrics
              The F statistic
• The F statistic is always positive, since the
  SSR from the restricted model can’t be less
  than the SSR from the unrestricted
• Essentially the F statistic is measuring the
  relative increase in SSR when moving from
  the unrestricted to restricted model
• r = number of restrictions, or dfR – dfUR
• n – k – 1 = dfUR
Sisir Sarma         18.318: Introduction to Econometrics
              The F statistic (cont.)
• To decide if the increase in SSR when we
  move to a restricted model is “big enough”
  to reject the exclusions, we need to know
  about the sampling distribution of our F stat
• Not surprisingly, F ~ Fr,n-k-1, where r is
  referred to as the numerator degrees of
  freedom and n – k – 1 as the denominator
  degrees of freedom
Sisir Sarma            18.318: Introduction to Econometrics
               The F statistic (cont.)
f(F)
                                       Reject H0 at a
              fail to reject           significance level
                                       if F > c

                                         reject
         1  a                 a
   0                       c                               F
Sisir Sarma                    18.318: Introduction to Econometrics
       The    R2   form of the F statistic
• Because the SSR’s may be large and unwieldy, an
  alternative form of the formula is useful
• We use the fact that SSR = TSS(1 – R2) for any
  regression, so can substitute in for SSRR and
  SSRUR


F
           R  r
              R
               2        2


   1  R  n  k  1
               UR       R
               2
                        , where again
               UR

  R is restricted and UR is unrestrict ed
Sisir Sarma             18.318: Introduction to Econometrics
                Overall Significance
• A special case of exclusion restrictions is to test
  H0: b1 = b2 =…= bk = 0
• Since the R2 from a model with only an intercept
  will be zero, the F statistic is simply

                                  2
                        R k
              F
                            
                 1  R n  k  1
                      2

Sisir Sarma            18.318: Introduction to Econometrics
         General Linear Restrictions
• The basic form of the F statistic will work
  for any set of linear restrictions
• First estimate the unrestricted model and
  then estimate the restricted model
• In each case, make note of the SSR
• Imposing the restrictions can be tricky –
  will likely have to redefine variables again

Sisir Sarma         18.318: Introduction to Econometrics
              F Statistic Summary
• Just as with t statistics, p-values can be
  calculated by looking up the percentile in
  the appropriate F distribution
• If only one exclusion is being tested, then F
  = t2, and the p-values will be the same
• You can use Wald test to test multivariate
  hypotheses as well

Sisir Sarma          18.318: Introduction to Econometrics

						
Related docs
Other docs by yPsYn3x
mat4
Views: 64  |  Downloads: 0
Bid 8 06 Docs
Views: 9  |  Downloads: 0
Sheet1
Views: 1  |  Downloads: 0
me phrecomen
Views: 1  |  Downloads: 0
Legenda - Excel 1
Views: 55  |  Downloads: 0
mws civ reg phy problem
Views: 4  |  Downloads: 0