# Introduction to Multiple Regression Analysis

Document Sample

```					                                                       Handout 7

Statistics 3115/5315

Introduction to Multiple Regression Analysis.
A Multiple linear regression model is a generalization of a simple lin-
ear regression model which has one independent variable to a model with k
independent variables:
Y =         0   +   1 X1     +    2 X2   + :::: +         k Xk   + E:                     (1)
The parameters 0 ; 1 ; ::::: k are called the regression coe¢ cients and have
to be estimated from the data:
(X11 ; ::::; Xk1 ; Y1 ); (X12 ; ::::; Xk2 ; Y2 ); ::::; (X1n ; ::::; Xkn ; Yn ):
Special cases of the general multiple regression model given in Equation
(1) include:
Y = 0 + 1 X1 + 2 X2 + E;                        (2)
2
Y =      0   +    1X   +       2X       +E                                (3)
and
2                2
Y =     0   +       1 X1    +     2 X2   +    3 X1     +       4 X2   ++     5 X1 X2   + E:           (4)
Model (3) is a second order polynomial model of the independent variable X
and model (5) is called a full (complete) second order model.
Given the data, the regression coe¢ cients are estimated using the least
squares method by minimizing
X
n
2       X
n
2
Yi     b
Yi         =           Yi       c0
cX1i
1
cX2i
2        ::::   cXki         ;
k
i=1                           i=1

where
Yi = c + cX1i + cX2i + :::: + cXki
b     0   1      2             k

is the predicted or …tted value for the dependent variable for the ith case
0
(X1i ; X2i ; ::::; Xki ). One can show that the bi s are linear functions of the Yi0 s
and therefore have a normal distribution, moreover
bi v N ( i ;      2
b );
i

1
where the formulae for 2 is quite complex. Its estimator, S 2 ; can be
b                                 b
i                                 i
obtained from the SAS output. SAS will also perform the testing of the
hypotheses that i = 0:
The ith residual is de…ned as follows:
c
ei = Ei = Yi                 b
Yi :

Assumptions of the Multiple Regression Model.

Existence: For given values of the independent variables in the model, the
dependent variable Y, has a univariate distribution with a …nite mean
and variance.

Independence: The observations Y1 ; ::::; Yn are independent of each other.

Linearity: The mean value of the dependent variable Y , given speci…ed
values of X1 ; X2 ; ::::; Xk is a linear function of X1 ; X2 ; ::::; Xk :

Y jX1 ;X2 ;::::;Xk   =     0   +       1 X1   +      2 X2   + :::: +   k Xk :

This is equivalent to the linear model assumption and the fact that the
mean of the error is equal to 0.

Homoscedasticity: The variance of Y is the same given any …xed values
of the independent variables in the model:
2                                                                    2
Y jX1 ;X2 ;::::;Xk    = V ar(Y jX1 ; X2 ; ::::; Xk ) =                   :

Normality: For any given values of X1 ; X2 ; ::::; Xk the distribution of Y is
normal with mean = Y jX1 ;X2 ;::::;Xk and variance = 2 : This is equivalent
to the fact the random error, E, has a normal distribution with mean
=0 and variance = 2 .

ANOVA for a Multiple Regression Model

2
As for the simple linear regression model we have a decomposition of the
total variability for the dependent variable Y:

Total sum of squares = Regression sum of squares + Residual sum of squares

or
X
n
2       X
n
2       X
n
2
Yi       Y       =             b
Yi       Y       +         Yi     b
Yi       :
i=1                        i=1                            i=1

We de…ne the multiple R2 coe¢ cient to be

SSR   SSY SSE
R2 =            =         :
SSY      SSY
The adjusted R2 coe¢ cient is given by:

2               (1    R2 )(n 1)                           k(1 R2 )
adjusted R = 1                                = R2                               :
n k 1                                n k 1
The variance of the errors, 2 , can be estimated by:

b2 = M SE =                 SSE
:
n k 1
The ANOVA Table for multiple regression is given by

ANOVA Table for Multiple Regression

Source    df   SS      MS                                                      F
Regression  k   SSR     SSR/k                                                 MSR/MSE
Residual n k 1 SSE SSE/(n k                                           1)
Total   n 1  SSY

The above F statistic is used for testing the null hypothesis

H0 :         1   = :::: =      k   = 0:

This null hypothesis is rejected at the level if the computed value of the
F statistics exceeds Fk;n k 1;1 : This test by itself is not very informative,
unless we accept the above null hypothesis.

3
Remarks.

1. All the regression diagnostics we have discussed for the linear regression
model have to adjusted for the general regression model. See Chapter
14.

2. Con…dence intervals for the average response and for a future observation
of the dependent variable can be obtained via SAS with the CLM and
CLI options.

4

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 18 posted: 1/4/2010 language: English pages: 4
How are you planning on using Docstoc?