multiple regression analysis estimation

Document Sample

```					Multiple Regression Analysis:
Estimation
Multiple Regression Model
y = ß0 + ß1x1 + ß2x2 + …+ ßkxk + u

-ß is still the intercept
0

-ß to ß all called slope parameters
1    k

-u is still the error term (or disturbance term)
-Zero mean assumption
E(u) = 0
-Still minimize the sum of squared residuals
Multiple Regression Model: Example

Demand Estimation:
-Dependent variable: Q, tile cases (in 1000 of cases)
-Right-hand side variables: tile price per case (p), income per
capita I (in 1000 of \$), and advertising expenditure A (in
1000 \$)

Regression: Q = ß0 + ß1P + ß2I + ß3A + u

Interpretation:
-ß1 measures the effects of the tile price on the tile consumption,
holding all other factors fixed
-ß2 represents the effects of income, holding all other factors fixed
-ß3 represents the effects of advertising, holding all other factors fixed
Q = 17.513 – 0.296P + 0.066I + 0.036A

1. What is the impact of a price change on tile scales?
2. What is the impact of a change in income on tile scales?
3. What is the impact of a change in advertising
expenditures on tile scales?

Calculation of own-price elasticity?
Calculation of income elasticity?
Random Sampling
-Collecting sales data of 23 tile stores in 2002 in the
market
-For each observation, Q = ß + ß P + ß I + ß A + u
i        0           1   i           2   i         3       i   i

-Goal: Estimate ß , ß , ß , ß
0       1        2       3

Variable

Q1          P1      I1                        A1                       Using OLS to estimate the
coefficients to minimize the
Q2          P2      I2                        A2                       sum of squared errors.
Q3          P3      I3                        A3
…          …       …                         …
Q23        P23      I23                       A23

min  y                                                         
n                                     n                                   
 ui  
2
ˆ                                                           1Pi   2 I i  3 Ai
2
min
   
 0, 1,  2,  3 i 1
   
 0, 1,  2,  3 i 1
i           0
The Generic Multiple Regression Model
Yi  0  1X1i  2 X 2i  ...  k X ki  i                i  1,...,n
Yi
 j
X ji
 Y1        1 X11 X12 ... X1k 
Y          1 X    X 22 ... X 2 k 
Y  2
X    21                
                             
                                 
 Yn  nx1   1 X n1 X n 2 ... X nk  nx ( k 1)
0                   1 
                    
  1
   2
                   
                     
 k  ( k 1) x1      n  nx1
Estimation of regression parameters:
-Least Squares (no knowledge of the distribution of the error or disturbance terms is required).
-The use of the matrix notation allows a view of how the data are housed in software programs.
Components of the Model

-Endogenous Variables—dependent variables, values of
which are determined within the system.
-Exogenous Variables—determined outside the system
but influence the system by affecting the values of the
endogenous variables.
-Structural Parameters—estimated using statistical
techniques and relevant data.
-Lagged Endogenous Variables
-Lagged Exogenous Variables
-Predetermined Variables
The Disturbance (or Error) Term
Stochastic, a random variable.
Statistical distribution often normal.

Captures:
1. Omission of the influence of other variables.
2. Measurement error.

Recognition that any regression model is a parsimonious
stochastic representation of reality. Also recognition that any
regression model is stochastic and not deterministic.
OLS Estimates Associated with the
Multiple Regression Model
1
ˆ
  ( x x) x y
T         T

1 x11        x21  xk1                      y1 
1 x          x22  xk 2                    y 
x   12                    (nx(k  1))   y   2  (nx1)
                                       
                                            
1 x1n        x2 n  xkn                      yn 

ˆ
0 
ˆ
xT is the transposeof x ((k  1) xn)       ˆ   1  ((k  1) x1)


 
ˆ
 k 
 
The Gauss-Markov Theorem
Given the assumptions below, it can be shown that the OLS
estimator is “BLUE.”
- Best
- Linear
- Unbiased
- Estimator

Assumptions:
- Linear in parameters
- Corr (εi, εj) = 0
- Zero mean
- No perfect collinearity
- Homoscedasticity
Communication and Aims for
the Analyst
Communication
- A technician can run a program and get output.
- An analyst must interpret the findings from examination of this output.
- There are no bonus points to be given to terrific hackers but poor
analysts.

Aims
1. Improve your ability in developing models to conduct structural analysis and to
forecast with some accuracy.
2. Enhance your ability in interpreting and communicating the results, so as to

Bottom Line
1. The analyst transforms the economic model/idea to a mathematical/statistical
one.
2. The technician estimates the model and obtains a mathematical/statistical
3. The analyst transforms the mathematical/statistical answer to an economic
one.
Goodness-of-Fit

yi  yi  u i
ˆ ˆ
Definition s :
  yi  y 2 is the total sum of squares (SST)
ˆ yi  y 2 is the regressionsum of squares (SSR)
u
ˆi2 is the residual (or error)sum of squares (SSE)
Then SST  SSR  SSE
Goodness-of-Fit (continued . . .)
How well does our sample regression line fit our sample
data?

R-squared of regression is the fraction of the total sum
of squares (SST) that is explained by the model.

R² = SSR/SST = 1 – SSE/SST

R² can never decrease when another explanatory
or predetermined variable is added to a
regression; usually R² will increase.

Because R² will usually increase (or at least not
decrease) with increases in the number of right-
hand side or explanatory variables, it is not
necessarily a good way to compare alternative
models with the same dependent variable.
n
 yi  y 2
 ˆ
Explained sample variabili ty                           SSR      SSE
R2                                     i 1
        1
R²                                                n

 y        y
Total sample variabili ty                     2       SST      SST
i
i 1

2    SSE /(n  k  1)
SST /(n  1)

Questions:
(b)Is adjusted R² always better than R² ?
(c)What’s the relationship between R² and adjusted R² ?
Model Selection Criteria
T

 et2
MSE    t 1

T
T

 T          e   2
t
 T 
s2         t 1
T  p  T               T  p  MSE
       
                             

T
Is the " penalty factor"
Tp

 2P 
    
Akaike Information Criterion (AIC)               AIC  e    T 
MSE
(SIC) or (BIC)
 p
 
SIC( BIC)  T     T 
MSE
p is the number of parameters to be estimated
Model Selection Criteria Example

Model 1   Model 2     Model 3

AIC    19.35     15.83       17.15

SIC    19.37     15.86       17.17

Which model to choose?
Estimate of Error Variance

s    u
2
ˆ    2
ˆ           2
i    n  k 1  SSE df
-df = n – (k + 1), or df = n – k – 1
-df (i.e. degrees of freedom) is the (number of
observations) – (number of estimated parameters)
Variance of OLS Parameter Estimates

ˆ
Var   s 2 (x T x)-1

Variance  Covariance Matrix of OLS Parameter Estimates
this matrix is a function of the residual variance s 2 .
Example: SAS Output of the
Demand Function for Shrimp
Quantity sold
of shrimp
■   Price of shrimp
■   Price of finfish
■   Price of other shellfish
shellfish
Model Selection Criteria for the
QSHRIMP Problem

MSE   SSE/T  1580.90 / 97  16.29

 T           97 
s2          MSE   (16.29)  17.56
 T-P         90 

 2P          14 
             
AIC   e T 
MSE  e 97  (16.29)  18.82

P            7 
             
SIC   T  T  MSE  97 97  (16.29)  22.67

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 31 posted: 6/22/2012 language: English pages: 23