# ECONOMETRICS

Document Sample

```					Lecture :Apply Gauss Markov Modeling
Regression with One Explanator

(Chapter 3.1–3.5, 3.7
Chapter 4.1–4.4)

5-1
Agenda

• Finding a good estimator for a
straight line through the origin:
Chapter 3.1–3.5, 3.7

• Finding a good estimator for a straight
line with an intercept: Chapter 4.1–4.4

5-2
Where Are We? (範例)

• We wish to uncover quantitative features
of an underlying process, such as the
relationship between family income and
financial aid.
• 更精準些 How much less aid will I
receive on average for each dollar of
• DATA, a sample of the process, for example
observations on 10,000 students’ aid awards
and family incomes.

5-3

• Other factors (e), such as number of siblings,
influence any individual student’s aid, so we
cannot directly observe the relationship
between income and aid.

• We need a rule for making a good guess
about the relationship between income and
financial aid, based on the data.

5-4
Guess

• A good guess is a guess which is right
on average.

• We also desire a guess which will have
a low variance around the true value.

5-5

• Our rule is called an “estimator.”
• We started by brainstorming a number
of estimators and then comparing their
performances in a series of computer
simulations.
• We found that the Ordinary Least Squares
estimator dominated the other estimators.
• Why is Ordinary Least Squares so good?

5-6

• To make more general statements, we
need to move beyond the computer and
into the world of mathematics.

• Last time, we reviewed a number of
mathematical tools: summations,
descriptive statistics, expectations,
variances, and covariances.

5-7
DGP

• As a starting place, we need to write down all
our assumptions about the way the
underlying process works, and about how that
process led to our data.
• These assumptions are called the “Data
Generating Process.”
• Then we can derive estimators that have
good properties for the Data Generating
Process we have assumed.

5-8
Model

• The DGP is a model to approximate
reality. We trade off realism to gain
parsimony and tractability.

• Models are to be used, not believed.

5-9
DGP assumptions

• Much of this course focuses on different
types of DGP assumptions that you can
make, giving you many options as you

5-10
Two Ways to Screw Up in Econometrics

– Your Data Generating Process assumptions
missed a fundamental aspect of reality (your DGP
is not a useful approximation); or

• Today we focus on picking a good estimator

5-11
GMT

• Today, we will focus on deriving the
properties of an estimator for a simple
DGP: the Gauss–Markov Assumptions.
• First we will find the expectations and
variances of any linear estimator under
the DGP.
• Then we will derive the Best Linear
Unbiased Estimator (BLUE).

5-12
Our Baseline DGP: Gauss–Markov
(Chapter 3)

• Y = bX +e
• E(ei ) = 0
• Var(ei ) = s 2
• Cov(ei ,ej ) = 0, for i ≠ j
• X ’s fixed across samples (so we can
treat them like constants).
• We want to estimate b
5-13
A Strategy for Inference

•   The DGP tells us the assumed relationships
between the data we observe and the
underlying process of interest.

•   Using the assumptions of the DGP and the
algebra of expectations, variances, and
covariances, we can derive key properties
of our estimators, and search for estimators
with desirable properties.

5-14
An Example: bg1

Yi  b X i  e i
E(e i )  0
Var(e i )  s 2
Cov(e i ,e j )  0, for i  j
X's fixed across samples (so we can treat it as a constant).
1 n Yi
b g1  
n i1 X i
In our simulations, b g1 appeared to give estimates close to b.
Was this an accident, or does b g1 on average give us b ?

5-15
An Example: bg1 (OK on average)
1 n Yi    1 n   Yi  1 n   b X i  ei
E( b g1 )  E(  )   E( )   E(                 )
n i1 X i n i1 X i n i1    Xi
1 n        1 n 1
  E( b )   E(e i )
n i1      n i1 X i
1
 nb  0  b
n
On average, b g1  b.
E( b g1 )  b
Using the DGP and the algebra of expectations,
we conclude that b g1 is unbiased.
5-16
Checking Understanding

1 n Yi    1 n   Yi  1 n   b X i  ei
E( b g1 )  E(  )   E( )   E(                 )
n i1 X i n i1 X i n i1    Xi
1 n        1 n 1
  E( b )   E(e i )
n i1      n i1 X i
1
 nb  0  b
n
E( b g1 )  b
Question: which DGP assumptions did we need to use?

5-17
Which assumption used?
1 n Yi    1 n   Yi  1 n   b X i  ei
E( b g1 )  E(  )   E( )   E(                 )
n i1 X i n i1 X i n i1    Xi
Here we used Yi  b X i  e i
1 n        1 n 1
  E( b )   E(e i )
n i1      n i1 X i
Here we used the assumption that X's
are fixed across samples.
1
 nb  0  b
n
Here we used E(e i )  0
5-18
Checking Point 2:

We did NOT use the assumptions about
the variance and covariances of e i .

We will use these assumptions when we
calculate the variance of the estimator.

5-19
Linear Estimators

•   bg1 is unbiased. Can we generalize?
• We will focus on linear estimators.
• Linear estimator: a weighted sum of the Y ’s.

ˆ
b         wY     i i

5-20
Linear Estimators (weighted sum)

• Linear estimator:
b   wYi
ˆ
i

• Example: bg1 is a linear estimator.
1  Yi
b g1  
n  Xi
1
wi 
nX i
b g1   wiYi
5-21
A class of Linear Estimators
1) Mean of Ratios:       3) Mean of Ratio of Changes:
Yi                                  Yi  Yi1
b g1                                    
1                               1
b g3 
n Xi                         n 1 X i  X i1
1                     1        1            1     
wi                    wi                               
nX i                 n 1  X i  X i1 X i1  X i 
2) Ratio of Means:      4) Ordinary Least Squares:

b g2   
Y   i
b g4   
Y X  i       i

X       i                    X        j
2

1                      Xi
wi                    wi 
Xj                    X     j
2

• All of our “best guesses” are linear estimators!
5-22
Expectation of Linear Estimators

Yi  b X i  e i           E (e i )  0
Var (e i )  s 2           Cov(e i , e j )  0, for i  j
X 's fixed across samples (so we can treat it as a constant).
n
b   wiYi
ˆ
i 1
n            n              n
E ( b )  E ( wiYi )   wi E (Yi )  wi E ( b X i  e i )
ˆ
i 1         i 1           i 1
n                                   n
  wi [ E ( b X i )  E (e i )] b  wi X i
i 1                                i 1

5-23
Condition for Unbias

n
b   wiYi
ˆ
i 1

n
E ( b )  b  wi X i
ˆ
i 1

n
A linear estimator is unbiased if   w X
i 1
i   i    1.

5-24
Check others

• A linear estimator is unbiased if SwiXi = 1
• Are bg2 and bg4 unbiased?
2) Ratio of Means:           4) Ordinary Least Squares:

b g2   
Y  i
b g4   
Y X  i       i

X      i                              X        j
2

1                                 Xi
wi                         wi 
Xj                           X         j
2

Xi
 wi X i                   w X  
1
Xi                                                        Xi
Xj                    i     i
X          j
2

 Xi  1                              X i2  1
1                                 1
                                
Xj                                X j2
5-25
Better unbiased estimator

• Similar calculations hold for bg3

• All 4 of our “best guesses” are unbiased.

• But bg4 did much better than bg3. Not all
unbiased estimators are created equal.

• We want an unbiased estimator with a low
mean squared error.

5-26
First: A Puzzle…..

• Suppose n = 1
– Would you like a big X or a small X for
that observation?
– Why?

5-27
What Observations
1) Mean of Ratios:      3) Mean of Ratio of Changes:
Yi                                 Yi  Yi1
b g1                                   
1                              1
b g3 
n Xi                        n 1 X i  X i1
1                     1        1            1     
wi                    wi                               
nX i                 n 1  X i  X i1 X i1  X i 
2) Ratio of Means:      4) Ordinary Least Squares:

b g2   
Y   i
b g4   
Y X
i       i

X       i                    X    j
2

1                       Xi
wi                    wi 
Xj                     X j2

5-28
(Stat. significant)?

•   bg1 puts more weight on observations with low
values of X.
•   bg3 puts more weight on observations with low
values of X, relative to neighboring observations.
•   These estimators did very poorly in the simulations.

Yi                              Yi  Yi1
                             
1                          1
b g1                      b g3 
n Xi                     n 1 X i  X i1
1                       1        1             1    
wi                      wi                               
nX i                   n 1  X i  X i1 X i1  X i 

5-29
What Observations

• bg2 weights all observations equally.
• bg4 puts more weight on observations with high
values of X.
• These observations did very well in the simulations.

b g2   
Yi
b g4   
Y X i        i

X    i                               X       j
2

1                                     Xi
wi                                   wi 
Xj                                    X j2

5-30
Why Weight More Heavily Observations
With High X ’s?

• Under our Gauss–Markov DGP the
disturbances are drawn the same for all
values of X….
• To compare a high X choice and a low X
choice, ask what effect a given disturbance
will have for each.

5-31
Figure 3.1 Effects of a Disturbance for
Small and Large X

5-32
Linear Estimators and Efficiency

• For our DGP, good estimators will place more
weight on observations with high values of X
• Inferences from these observations are less
sensitive to the effects of the same e
• Only one of our “best guesses” had this
property.
• bg4 (a.k.a OLS) dominated the other
estimators.
• Can we do even better?
5-33
Min. MSE

• Mean Squared Error = Variance + Bias2

• To have a low Mean Squared Error,
we want two things: a low bias and a
low variance.

5-34
Need Variance

• An unbiased estimator with a low variance
will tend to give answers close to the true
value of b

• Using the algebra of variances and our
DGP, we can calculate the variance of
our estimators.

5-35
Algebra of Variances

(1) Var (k )  0
(2) Var (kY )  k 2 ·Var (Y )
(3) Var (k  Y )  Var (Y )
(4) Var ( X  Y )  Var ( X )  Var (Y )  2Cov( X , Y )
n         n              n    n
(5) Var ( Yi )   Var (Yi )   Cov(Yi , Y j )
i 1      i 1           i 1 j 1
j i

• One virtue of independent observations is that
Cov( Yi ,Yj ) = 0, killing all the cross-terms in the
variance of the sum.
5-36
Back again to Our Baseline DGP
: Gauss–Markov

• Our benchmark DGP: Gauss–Markov
• Y = bX +e
• E(ei ) = 0
• Var(ei ) = s 2
• Cov(ei ,ej ) = 0, for i ≠ j
• X ’s fixed across samples
We will refer to this DGP (very) frequently.

5-37
Variance of OLS
   X iYi 
OLS )  Var  
ˆ
Var ( b                   2 
 X i 
 X iYi        n      n
X iYi X jY j
  Var      2 
2          Cov( X 2 , X 2
 X i        i 1   j 1,
j i
k     k

2
 Xi 
     2 
Var Yi   0
X 
   k 
2
 Xi 
        Var  b X i  e i 
X 2 

   k 

5-38
Variance of OLS (cont.)
2
 Xi 
OLS )          Var  b X i  e i 
ˆ
Var ( b
X 2 

   k 
2
 Xi 
        (0  Var  e i   0)
X 2 

   k 
2                    2
 Xi  2                 1                      s2
       s s                          X i2 
X 2 
2
X 2                                           Xk2
   k                    k 

• Note: the higher the Xk2 , the lower
the variance.
5-39
Variance of a Linear Estimator

• More generally:

Var ( wiYi )   Var ( wiYi )  2 Covariance Terms
  Var ( wiYi )  0   wi Var (Yi )
2

  wi Var ( b X i  e i )
2

  wi   2
0  Var (e i )  0
s   2
w  i
2

5-40
Variance of a Linear Estimator (cont.)

• The algebras of expectations
and variances allow us to get exact
results where the Monte Carlos gave
only approximations.

• The exact results apply to ANY
model meeting our Gauss–Markov
assumptions.

5-41
Variance of a Linear Estimator (cont.)

• We now know mathematically that bg1–bg4
are all unbiased estimators of b under our
Gauss–Markov assumptions.
• We also think from our Monte Carlo models
that bg4 is the best of these four estimators,
in that it is more efficient than the others.
• They are all unbiased (we know from the
algebra), but bg4 appears to have a smaller
variance than the other 3.
5-42
Variance of a Linear Estimator (cont.)

• Is there an unbiased linear estimator
better (i.e., more efficient) than bg4?
– What is the Best, Linear, Unbiased
Estimator?
– How do we find the BLUE estimator?

5-43
BLUE Estimators

• Mean Squared Error = Variance + Bias2

• An unbiased estimator is right
“on average”

• In practice, we don’t get to average. We
see only one draw from the DGP.

5-44

• Some analysts would prefer an
estimator with a small bias, if it gave
them a large reduction in variance

• What good is being right on average if
you’re likely to be very wrong in your
one draw?

5-45
BLUE Estimators (cont.)

• Mean Squared Error = Variance + Bias2
• In a particular application, there may be
a favorable trade-off between accepting a
little bias in return for a lot less variance.
• We will NOT look for these trade-offs.
• Only after we have made sure our
estimator is unbiased will we try to make
the variance small.

5-46
BLUE Estimators (cont.)

A Strategy for Finding the Best Linear
Unbiased Estimator:
2. Impose the unbiasedness condition wiXi=1
3. Calculate the variance of a linear estimator:
Var(wiYi) =s2wi2
– Use calculus to find the wi that give the smallest
variance subject to the unbiasedness condition

Result: the BLUE Estimator for Our DGP

5-47
BLUE Estimators (cont.)

Xi
Using calculus, we would find wi 
 X j2
This formula is OLS!
OLS is the Best Linear Unbiased Estimator for
the Gauss–Markov DGP.
This result is called the Gauss–Markov Theorem.

5-48
BLUE Estimators (cont.)

• OLS is a very good strategy for the
Gauss–Markov DGP.
• OLS is unbiased: our guesses are right
on average.
• OLS is efficient: it has a small variance
(or at least the smallest possible variance
for unbiased linear estimators).
• Our guesses will tend to be close to right (or
at least as close to right as we can get; the
minimum variance could still be pretty large!)
5-49
BLUE Estimator (cont.)

• According to the Gauss–Markov Theorem,
OLS is the BLUE Estimator for the
Gauss–Markov DGP.

• We will study other DGP’s. For any DGP,
we can follow this same procedure:
– Look at Linear Estimators
– Impose the unbiasedness conditions
– Minimize the variance of the estimator

5-50
Example: Cobb–Douglas Production
Functions (Chapter 3.7)

• A classic production function in economics is
the Cobb–Douglas function.

• Y = aLbK1-b

• If firms pay workers and capital their marginal
product, then worker compensation equals a
fraction b of total output (or national income).

5-51
Example: Cobb–Douglas

• To illustrate, we randomly pick 8 years
between 1900 and 1995. For each year,
we observe total worker compensation
and national income.
• We use bg1, bg2, bg3, and bg4 to
estimate
   Compensation = b·National Income +e

5-52
TABLE 3.6 Estimates of the Cobb–Douglas
Parameter b, with Standard Errors

5-53
TABLE 3.7
Outputs from
a Regression* of
Compensation on
National Income

5-54
Example: Cobb–Douglas

• All 4 of our estimators give very
similar estimates.
• However, bg2 and bg4 have much smaller
standard errors. (We will see the value of
small standard errors when we cover
hypothesis tests.)
• Using our estimate from bg4, 0.738, a
1 billion dollar increase in National Income
is predicted to increase total worker
compensation by 0.738 billion dollars.
5-55
A New DGP

• Most lines do not go through the origin.
• Let’s add an intercept term and find the
BLUE Estimator (from Chapter 4).

5-56
Gauss–Markov with an Intercept

Yi  b0  b1 X i  e i (i  1...n)
E(e i )  0
Var(e i )  s   2

Cov(e i ,e j )  0, i  j
X's fixed across samples.
All we have done is add a b0 .
5-57
Gauss–Markov with an Intercept (cont.)

• Example: let’s estimate the effect of income
on college financial aid.

• Students whose families have 0 income do

• E[financial aid | family income]
= b0 + b1(family income)

5-58
Gauss–Markov with an Intercept (cont.)

5-59
Gauss–Markov with an Intercept (cont.)

• How do we construct a BLUE Estimator?
• Step 1: focus on linear estimators.
• Step 2: calculate the expectation of a linear
estimator for this DGP, and find the condition
for the estimator to be unbiased.
• Step 3: calculate the variance of a linear
estimator. Find the weights that minimize this
variance subject to the unbiasedness
constraint.

5-60
Expectation of a Linear Estimator

E ( b )  E   wYi    E ( wYi )
ˆ
i             i

  wi E (Yi )   wi E ( b0  b1 X i  e i )
   wi E ( b 0 )  wi E ( b1 X i )  wi E (e i ) 
   b 0 wi  b1wi X i  0
 b0  wi  b1  wi X i

5-61
Checking Understanding

E(b )  b0  wi  b1  wi X i
ˆ

• Question: What are the conditions for an
estimator of b1 to be unbiased? What
are the conditions for an estimator of b0
to be unbiased?

5-62
Checking Understanding (cont.)

E(b )  b0  wi  b1  wi X i
ˆ

• When is the expectation equal to b1?
– When wi = 0 and wiXi = 1

• What if we were estimating b0? When is the
expectation equal to b0?
– When wi = 1 and wiXi = 0

• To estimate 1 parameter, we needed 1 unbiasedness
condition. To estimate 2 parameters, we need 2
unbiasedness conditions.

5-63
Variance of a Linear Estimator

Var ( b )  Var   wiYi    Var  wiYi   0
ˆ

  wi Var  b 0  b1 X i  e i 
2

  wi   2
0  0  Var (e i )  0
  wi s 2     2

• Adding a constant to the DGP does NOT
change the variance of the estimator.

5-64
BLUE Estimator

ˆ
To compute the BLUE estimator for b1, we want to
minimize s 2  wi 2
subject to the constraints
w  0
i

w X 1
i    i

Solution:
ˆ  ( X i  X )(Yi  Y )
b1  n
   ( X j  X )2
j 1

5-65
BLUE Estimator of b1

ˆ  ( X i  X )(Yi  Y )
b1  n
(X j  X )
j 1
2

• This estimator is OLS for the DGP with
an intercept.
• It is the Best (minimum variance) Linear
Unbiased Estimator for the Gauss–Markov
DGP with an intercept.

5-66
BLUE Estimator of b1 (cont.)

ˆ  ( X i  X )(Yi  Y )
b1  n
(X j  X )
j 1
2

• This formula is very similar to the
formula for OLS without an intercept.
• However, now we subtract the mean
values from both X and Y.

5-67
BLUE Estimator of b1 (cont.)

ˆ  ( X i  X )(Yi  Y )
b1  n
(X j  X )
j 1
2

• OLS places more weight on high values of:
Xi  X
• Observations are more valuable if X is far
away from its mean.

5-68
BLUE Estimator of b1 (cont.)

Xi  X
wi 
 X       X
2
j
2
                
Xi  X
Var ( b1 )  s 2  wi 2  s 2  
ˆ                                          
 X  X  
 j
2


1
s 2                    ( X i  X )2
  X j  X 
2 2

s2

 X       X
2
j

5-69
BLUE Estimator of b0

• The easiest way to estimate the intercept:
ˆ         ˆ
b0  Y  b1 X
• Notice that the fitted regression line always
goes through the point

( X ,Y )
• Our fitted regression line passes through “the
middle of the data.”

5-70
Example: The Phillips Curve

• Phillips argued that nations
and unemployment.

• He used annual British data on wage
inflation and unemployment from
1861–1913 and 1914–1957 to regress
inflation on unemployment.

5-71
Example: The Phillips Curve (cont.)

• The fitted regression line for 1861–1913
did a good job predicting the data from
1914 to 1957.

• “Out of sample predictions” are a strong
test of an econometric model.

5-72
Example: The Phillips Curve (cont.)

• The US data from 1958–1969 also
and unemployment.

Unemploymentt  0.06 - 0.55·Inflationt

ˆ
b 0  0.06
ˆ
b1  0.55
5-73
Example: The Phillips Curve (cont.)

Unemploymentt  0.06 - 0.55·Inflationt

• How do we interpret these numbers?
• If Inflation were 0, our best guess of
Unemployment would be 0.06
percentage points.
• A one percentage point increase of Inflation
decreases our predicted Unemployment level
by 0.55 percentage points.

5-74
Figure 4.2 U.S. Unemployment and
Inflation, 1958–1969

5-75
TABLE 4.1 The Phillips Curve

5-76
Example: The Phillips Curve

• We no longer need to assume our
regression line goes through the origin.

• We have learned how to estimate
an intercept.

• A straight line doesn’t seem to do a
great job here. Can we do better?

5-77
Review

• As a starting place, we need to write down all
our assumptions about the way the
underlying process works, and about how that
process led to our data.
• These assumptions are called the “Data
Generating Process.”
• Then we can derive estimators that have
good properties for the Data Generating
Process we have assumed.

5-78
Review: The Gauss–Markov DGP

• Y = bX +e
• E(ei ) = 0
• Var(ei ) = s 2
• Cov(ei ,ej ) = 0, for i ≠ j
• X ’s fixed across samples (so we can
treat them like constants).
• We want to estimate b
5-79
Review

• We will focus on linear estimators.
• Linear estimator: a weighted sum of the Y ’s.

ˆ
b           i
wYi
5-80
Review (cont.)

Yi  b X i  e i
E (e i )  0
Var (e i )  s 2
Cov(e i , e j )  0, for i  j
X 's fixed across samples (so we can treat it as a constant).
n
b   wiYi
ˆ
i 1
n
ˆ
E(b )  b      w X
i 1
i   i

n
A linear estimator is unbiased if   w X
i 1
i   i    1.

5-81
Review (cont.)

Yi  b X i  e i
E(e i )  0
Var(e i )  s 2
Cov(e i ,e j )  0, for i  j
X's fixed across samples (so we can treat it as a constant).
n
A linear estimator is unbiased if  wi X i  1.
i1

Many linear estimators will be unbiased. How do I pick the "best"
linear unbiased estimator (BLUE)?

5-82
Review: BLUE Estimators

A Strategy for Finding the Best Linear
Unbiased Estimator:
2. Impose the unbiasedness condition wiXi = 1
3. Use calculus to find the wi that give the smallest
variance subject to the unbiasedness condition.

Result: The BLUE Estimator for our DGP

5-83
Review: BLUE Estimators (cont.)

• Ordinary Least Squares (OLS) is BLUE
for our Gauss–Markov DGP.
• This result is called the “Gauss–Markov
Theorem.”

5-84
Review: BLUE Estimators (cont.)

• OLS is a very good strategy for the Gauss–
Markov DGP.
• OLS is unbiased: our guesses are right
on average.
• OLS is efficient: the smallest possible
variance for unbiased linear estimators.
• Our guesses will tend to be close to right
(or at least as close to right as we can get).
• Warning: the minimum variance could still be
pretty large!
5-85
Gauss–Markov with an Intercept

Yi  b0  b1 X i  e i (i  1...n)
E(e i )  0
Var(e i )  s   2

Cov(e i ,e j )  0, i  j
X's fixed across samples.
All we have done is add a b0 .
5-86
Review: BLUE Estimator of b1

ˆ  ( X i  X )(Yi  Y )
b1  n
   ( X j  X )2
j 1

• This estimator is OLS for the DGP with
an intercept.
• It is the Best (minimum variance) Linear
Unbiased Estimator for the Gauss–Markov
DGP with an intercept.

5-87
BLUE Estimator of b0

• The easiest way to estimate the intercept:
ˆ         ˆ
b0  Y  b1 X
• Notice that the fitted regression line always
goes through the point

( X ,Y )
• Our fitted regression line passes through “the
middle of the data.”

5-88

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 3 posted: 11/6/2012 language: Unknown pages: 88