# Simple Linear Regression Template - Download as PowerPoint by wys12714

VIEWS: 13 PAGES: 54

Simple Linear Regression Template document sample

• pg 1
```									                          10-1

COMPLETE
BUSINESS
STATISTICS
by
AMIR D. ACZEL
&
JAYAVEL SOUNDERPANDIAN
6th edition (SIE)
10-2

Chapter 10

Simple Linear
Regression and
Correlation
10-3

10    Simple Linear Regression and Correlation

• Using Statistics
• The Simple Linear Regression Model
• Estimation: The Method of Least Squares
• Error Variance and the Standard Errors of Regression
Estimators
• Correlation
• Hypothesis Tests about the Regression Relationship
• How Good is the Regression?
• Analysis of Variance Table and an F Test of the
Regression Model
• Residual Analysis and Checking for Model Inadequacies
• Use of the Regression Model for Prediction
• The Solver Method for Regression
10-4

10 LEARNING OBJECTIVES
After studying this chapter, you should be able to:
• Determine whether a regression experiment would
be useful in a given instance
• Formulate a regression model
• Compute a regression equation
• Compute the covariance and the correlation
coefficient of two random variables
• Compute confidence intervals for regression
coefficients
• Compute a prediction interval for the dependent
variable
10-5

10     LEARNING OBJECTIVES (continued)

After studying this chapter, you should be able to:
• Test hypothesis about a regression coefficients
• Conduct an ANOVA experiment using regression
results
• Analyze residuals to check if the assumptions about the
regression model are valid
• Solve regression problems using spreadsheet templates
• Apply covariance concept to linear composites of
random variables
• Use LINEST function to carry out a regression
10-6

10-1 Using Statistics

• Regression refers to the statistical technique of modeling the
relationship between variables.
• In simple linear regression, we model the relationship
between two variables.
• One of the variables, denoted by Y, is called the dependent
variable and the other, denoted by X, is called the
independent variable.
• The model we will use to depict the relationship between X and
Y will be a straight-line relationship.
• A graphical sketch of the the pairs (X, Y) is called a scatter
plot.
10-7

10-1 Using Statistics
This scatterplot locates pairs of observations of   Scatterplot of Advertising Expenditures (X) and Sales (Y)
advertising expenditures on the x-axis and sales            140

on the y-axis. We notice that:                              120

100

Sales
80
 Larger (smaller) values of sales tend to be               60
associated with larger (smaller) values of                  40

advertising.                                                20

0
0   10      20        30     40       50
A d ve rtising

 The scatter of points tends to be distributed around a positively sloped straight line.

 The pairs of values of advertising expenditures and sales are not located exactly on a
straight line.
 The scatter plot reveals a more or less strong tendency rather than a precise linear
relationship.
 The line represents the nature of the relationship on average.
10-8

Examples of Other Scatterplots

0

0

Y
Y
Y

0

0

X   0        X              X
Y

Y

Y
X            X              X
10-9

Model Building

The inexact nature of the          Data       In ANOVA, the systematic
relationship between                          component is the variation
advertising and sales                         of means between samples
suggests that a statistical                   or treatments (SSTR) and
model might be useful in
Statistical   the random component is
analyzing the relationship.       model       the unexplained variation
(SSE).
A statistical model separates
the systematic component        Systematic    In regression, the
of a relationship from the                    systematic component is
component
random component.                             the overall linear
+        relationship, and the
Random       random component is the
errors      variation around the line.
10-10

10-2 The Simple Linear Regression
Model
The population simple linear regression model:
Y= 0 + 1 X       + 
Nonrandom or     Random
Systematic     Component
Component
where
 Y is the dependent variable, the variable we wish to explain or predict
 X is the independent variable, also called the predictor variable
  is the error term, the only random component in the model, and thus, the
only source of randomness in Y.

 0 is the intercept of the systematic component of the regression relationship.
 1 is the slope of the systematic component.

The conditional mean of Y: E [Y X ]         0  1 X
10-11

Picturing the Simple Linear
Regression Model
Y
Regression Plot                           The simple linear regression
model gives an exact linear
relationship between the
expected or average value of Y,
the dependent variable, and X,
E[Y]=0 + 1 X
the independent or predictor
Yi
variable:
{
Error: i               }   1 = Slope

E[Yi]=0 + 1 Xi
}

1
Actual observed values of Y
0 = Intercept
differ from the expected value by
an unexplained or random error:

X
Yi = E[Yi] + i
Xi                                               = 0 + 1 Xi + i
10-12

Assumptions of the Simple Linear
Regression Model
•   The relationship between X          Assumptions of the Simple
Y    Linear Regression Model
and Y is a straight-line
relationship.
•   The values of the independent
variable X are assumed fixed
(not random); the only                                          E[Y]=0 +  1 X

randomness in the values of Y
comes from the error term i.
•   The errors i are normally
distributed with mean 0 and                            Identical normal
variance 2. The errors are                            distributions of errors,
all centered on the
uncorrelated (not related) in                          regression line.

successive observations. That
is: ~ N(0,2)                                                               X
10-13

10-3 Estimation: The Method of Least
Squares
Estimation of a simple linear regression relationship involves finding
estimated or predicted values of the intercept and slope of the linear
regression line.

The estimated regression equation:
Y = b0 + b1X + e

where b0 estimates the intercept of the population regression line, 0 ;
b1 estimates the slope of the population regression line, 1;
and e stands for the observed errors - the residuals from fitting the estimated
regression line b0 + b1X to a set of n points.
The estimated regression line:


Y  b0 + b1 X

where Y (Y - hat) is the value of Y lying on the fitted regression line for a given
value of X.
10-14

Fitting a Regression Line
Y                              Y

Data
Three errors from the
least squares regression
X              line             X
Y

Three errors             Errors from the least
from a fitted line       squares regression
line are minimized
X                               X
10-15

Errors in Regression

Y
the observed data point

Y  b0  b1 X        the fitted regression line
Yi                        .

Yi
{

Error ei  Yi  Yi

Yi the predicted value of Y for X
i

X
Xi
10-16

Least Squares Regression

The sum of squared errors in regression is:
n                  n
SSE =      e
i=1
2
i       (y
i=1
i     yi ) 2


The least squares regression line is that which minimizes the SSE
with respect to the estimates b 0 and b 1 .

The normal equations:                                   SSE                    b0

n                           n

y
i=1
i
 nb0  b1  x i
i=1
At this point
SSE is
Least squares b0     minimized
n                   n                 n                           with respect

 x i y i b0  x i  b1  x 2
i=1                  i=1
i
i=1
to b0 and b1

Least squares b1               b1
10-17

Sums of Squares, Cross Products,
and Least Squares Estimators
Sums of Squares and Cross Products:
  x
2

SSx   (x  x )   x
2       2

n 2
SS y   ( y  y )   y 
2     2    y
n
SSxy   (x  x )( y  y )     xy 
  x  ( y )
n
Least  squares regression estimators:
SS XY
b1 
SS X

b0  y  b1 x
10-18

Example 10-1
Miles    Dollars     Miles 2   Miles*Dollars
2      x 2
1211
1345
1802
2405
1466521
1809025
2182222
3234725
SS x   x 
1422       2005    2022084          2851110                        n
1687       2511    2845969          4236057                                           2
1849       2332    3418801          4311868                                79 , 448
2026       2305    4104676          4669930          293 , 426 ,946                      40 ,947 ,557 .84
2133       3016    4549689          6433128                                    25
2253
2400
3385
3090
5076009
5760000
7626405
7416000                      x ( y )
2468       3694    6091024          9116792    SS xy   xy 
2699       3371    7284601          9098329                            n
2806       3998    7873636         11218388
(79 , 448 )(106 ,605 )
3082
3209
3555
4692
9498724
10297681
10956510
15056628          390 ,185 ,014                                 51, 402 ,852 .4
3466       4244   12013156         14709704                                           25
3643       5298   13271449         19300614
3852       4801   14837904         18493452           SS   51, 402 ,852 .4
4033       5147   16265089         20757852      b  XY                    1.255333776  1.26
4267       5738   18207288         24484046       1 SS    40 ,947 ,557 .84
4498       6420   20232004         28877160            X
4533       6059   20548088         27465448
4804       6426   23078416         30870504                        106 ,605                           79,448 
5090       6321   25908100         32173890      b  y b x                    (1.255333776 )              
5233       7026   27384288         36767056
0      1             25                             25 
5439       6964   29582720         37877196
79,448   106,605 293,426,946     390,185,014          274 .85
10-19

Template (partial output) that can be
used to carry out a Simple Regression
10-20

Template (continued) that can be used
to carry out a Simple Regression
10-21

Template (continued) that can be used
to carry out a Simple Regression

Residual Analysis. The plot shows the absence of a relationship
between the residuals and the X-values (miles).
10-22

Template (continued) that can be used
to carry out a Simple Regression

Note: The normal probability plot is approximately linear. This
would indicate that the normality assumption for the errors has not
been violated.
10-23

Y

X
10-24

10-4 Error Variance and the Standard
Errors of Regression Estimators
Y
Degrees of Freedom in Regression:

df = (n - 2) (n total observations less one degree of freedom
for each parameter estimated (b 0 and b1 ) )
2                                       Square and sum all
( SS XY )
 2
regression errors to find
SSE =  ( Y - Y )  SSY                                                       SSE.
SS X                                                                       X

= SSY  b1SS XY                                            Example 10 - 1:
SSE = SS Y  b1 SS XY
2              2                       66855898  (1.255333776)( 51402852 .4 )
An unbiased estimator of s , denoted by S :
 2328161.2
SSE        2328161.2
SSE                                                    MSE            
MSE =                                                                  n2              23
(n - 2)                                                  101224 .4
s    MSE       101224 .4  318.158
10-25

Standard Errors of Estimates in
Regression

The standard error of b0 (intercept):   Example 10 - 1:
2
s x
s(b0 ) 
s(b0 ) 
s     x2                         nSS X
nSS X                     
318.158 293426944
( 25)( 4097557.84 )
where s =     MSE                                170.338
s
s(b1 ) 
The standard error of b1 (slope):                   SS X
318.158
s                               
s(b1 )                                       40947557.84
 0.04972
SS X
10-26

Confidence Intervals for the
Regression Parameters
A (1 -  ) 100% confidence interval for b :
0
b  t        s (b )                                           Example 10 - 1
0  ,(n 2 ) 0
                                                  95% Confidence Intervals:
2      
b t                 s (b )
0  0.025,( 25 2 ) 0
A (1 -  ) 100% confidence interval for b :                                 = 274.85  ( 2.069) (170.338)
1
b  t        s (b )                                           274.85  352.43
1  ,(n 2 ) 1

2                                                        [ 77.58, 627.28]
Least-squares point estimate:
b1=1.25533
b1  t                        s (b1 )
 0.025,( 25 2 )
= 1.25533  ( 2.069) ( 0.04972 )
Height = Slope

 1.25533  010287
.
 [115246,1.35820]
.

0               (not a possible value of the
Length = 1
regression slope at 95%)
10-27

Template (partial output) that can be used
to obtain Confidence Intervals for 0 and 1
10-28

10-5 Correlation

The correlation between two random variables, X and Y, is a measure of the
degree of linear association between the two variables.

The population correlation, denoted by, can take on any value from -1 to 1.

  1      indicates a perfect negative linear relationship
-1 <  < 0   indicates a negative linear relationship
0          indicates no linear relationship
0<<1        indicates a positive linear relationship
  1       indicates a perfect positive linear relationship

The absolute value of  indicates the strength or exactness of the relationship.
10-29

Illustrations of Correlation

Y                    Y         Y
 = -1           =0
=1

X         X           X

Y       = -.8       Y   =0   Y
 = .8

X              X           X
10-30

Covariance and Correlation
The covariance of two random variables X and Y:
Cov ( X , Y )  E [( X   )(Y   )]
X       Y
where  and  Y are the population means of X and Y respectively.
X

The population correlation coefficient:            Example 10 - 1:
Cov ( X , Y )                           SS
=                                              XY
                              r=
SS SS
X Y                                   X Y
51402852.4
The sample correlation coefficient * :               
( 40947557.84)( 66855898)
SS                              51402852.4
r=         XY                                        .9824
SS SS                            52321943.29
X Y

*Note:    If  < 0, b1 < 0 If  = 0, b1 = 0 If  > 0, b1 >0
10-31

Hypothesis Tests for the Correlation
Coefficient
Example 10 -1:
r
H0:  = 0     (No linear relationship)     t( n  2 ) 
H1:   0     (Some linear relationship)                  1 r2
n2
0.9824
r                      =
Test Statistic: t( n 2 )                               1 - 0.9651
1 r2
25 - 2
n2                       0.9824
=          25.25
0.0389
t0. 005  2.807  25.25
H 0 rejected at 1% level
10-32

10-6 Hypothesis Tests about the
Regression Relationship
Constant Y                   Unsystematic Variation             Nonlinear Relationship
Y                                   Y                                  Y

X                                   X                                   X
A hypothesis test for the existence of a linear relationship between X and Y:
H0: 1  0
H1:  1  0
Test statistic for the existence of a linear relationship between X and Y:
b
    1
t
(n - 2)   s(b )
1
where b is the least - squares estimate of the regression slope and s ( b ) is the standard error of b .
1                                                               1                            1
When the null hypothesis is true, the statistic has a t distribution with n - 2 degrees of freedom.
10-33

Hypothesis Tests for the Regression
Slope
Example 10 - 1:                                   Example 10 - 4 :
H0: 1  0                                          H :  1
0 1
H1:  1  0                                     H :  1
1 1
b                                b 1
1                               1
t                                                t
(n - 2)                s(b )                   ( n - 2) s (b )
1
1
1.24 - 1
1.25533                       =           1.14
=                  25.25                0.21
0.04972
t            1.671  1.14
t                           2.807  25.25         (0.05,58)
( 0 . 005 , 23 )                           H is not rejected at the 10% level.
0
H 0 is rejected at the 1% level and we may
We may not conclude that the beta
conclude that there is a relationship between
coefficient is different from1.
charges and miles traveled.
10-34

10-7 How Good is the Regression?
The coefficient of determination, r2, is a descriptive measure of the strength of
the regression relationship, a measure of how well the regression line fits the data.
( y  y)  ( y  y
)   ( y  y)

Y                                                        Total = Unexplained    Explained
Deviation   Deviation   Deviation
Y                             .                                     (Error)    (Regression)


Y

Y
Unexplained Deviation

Explained Deviation
{
}
{
Total Deviation

SST
2
 ( y  y)   ( y  y
= SSE
2
)   ( y  y )
+ SSR


Percentage of
2

2     SSR          SSE
r           1         total variation
SST          SST   explained by
X
X                                                            the regression.
10-35

The Coefficient of Determination

Y                   Y                                           Y

X                        X                                                          X
SST                     SST                                                     SST
S
r2 = 0   SSE       r2 = 0.50   SSE SSR                          r2 = 0.90          S     SSR
E

7000
Example 10 -1:                                6000

5000

Dollars
SSR 64527736.8
r 2
           0.96518                  4000

SST   66855898                            3000

2000

1000 1500 2000 2500 3000 3500 4000 4500 5000 5500
Miles
10-36

10-8 Analysis-of-Variance Table and
an F Test of the Regression Model
Source of   Sum of       Degrees of
Variation   Squares      Freedom Mean Square F Ratio

Regression SSR           (1)          MSR          MSR
MSE
Error       SSE          (n-2)        MSE
Total       SST          (n-1)        MST

Example 10-1
Source of    Sum of       Degrees of
Variation    Squares      Freedom                  F Ratio p Value
Mean Square
Regression 64527736.8     1            64527736.8  637.47   0.000
Error        2328161.2    23           101224.4
Total        66855898.0   24
10-37

Template (partial output) that displays Analysis of
Variance and an F Test of the Regression Model
10-38

10-9 Residual Analysis and Checking
for Model Inadequacies
Residuals                                         Residuals

0                                                 0


x or y                                                     
x or y

Homoscedasticity: Residuals appear completely      Heteroscedasticity: Variance of residuals
random. No indication of model inadequacy.         increases when x changes.

Residuals                                          Residuals

0                                                  0

Time                                                   
x or y

Curved pattern in residuals resulting from
Residuals exhibit a linear trend with time.        underlying nonlinear relationship.
10-39

Normal Probability Plot of the
Residuals
Flatter than Normal
10-40

Normal Probability Plot of the
Residuals
More Peaked than Normal
10-41

Normal Probability Plot of the
Residuals
Positively Skewed
10-42

Normal Probability Plot of the
Residuals
Negatively Skewed
10-43

10-10 Use of the Regression Model
for Prediction

• Point Prediction
A single-valued estimate of Y for a given value of X
obtained by inserting the value of X in the estimated
regression equation.
• Prediction Interval
For a value of Y given a value of X
 Variation in regression line estimate
 Variation of points around regression line
For an average value of Y given a value of X
 Variation in regression line estimate
10-44

Errors in Predicting E[Y|X]

Y         Upper limit on slope                         Y     Upper limit on intercept
Regression line                                        Regression line

Lower limit on slope
Y                                                      Y                        Lower limit on intercept

X                 X                                      X               X

1) Uncertainty about the                               2) Uncertainty about the
slope of the regression line                           intercept of the regression line
10-45

Prediction Interval for E[Y|X]

Y   Prediction band for E[Y|X]                • The prediction band for E[Y|X]
Regression
line
is narrowest at the mean value
of X.
Y                                             •   The prediction band widens as
the distance from the mean of
X increases.
X                 X            •   Predictions become very
unreliable when we
Prediction Interval for E[Y|X]                    extrapolate beyond the range of
the sample itself.
10-46

Additional Error in Predicting Individual
Value of Y
Y
Regression line   Y   Prediction band for E[Y|X]
Regression
line

Y

Prediction band for Y

X                   X                 X
3) Variation around the regression
line                                 Prediction Interval for E[Y|X]
10-47

Prediction Interval for a Value of Y

A (1 -  ) 100% prediction interval for Y :

1 (x  x)      2

y  t  s 1 
ˆ   
2      n   SS     X

Example 10 - 1 (X = 4,000) :

1 (4,000  3,177.92)   2

{274.85  (1.2553)(4,000)}  2.069  318.16 1  
25  40,947,557.84

 5296 .05  676.62  [4619 .43, 5972 .67]
10-48

Prediction Interval for the Average
Value of Y
A (1 -  ) 100% prediction interval for the E[ Y X] :

1 (x  x)         2

yt s
ˆ   

2 n   SS        X

Example 10 - 1 (X = 4,000) :

1 (4,000  3,177.92)   2

{274.85  (1.2553)(4,000)}  2.069  318.16    
25   40,947,557.84

 5,296.05  156.48  [5139 .57, 5452 .53]
10-49

Template Output with Prediction
Intervals
10-50

10-11 The Solver Method for
Regression
The solver macro available in EXCEL can also be used to conduct a
simple linear regression. See the text for instructions.
10-51

10-12 Linear Composites of
Dependent Random Variables

• The Case of Independent Random
Variables:
For independent random variables, X1, X2, …, Xn,
the expected value for the sum, is given by:
•   E(X1 + X2 + … + Xn) = E(X1) + E(X2)+ … + E(Xn)
•   For independent random variables, X1, X2, …, Xn, the
variance for the sum, is given by:
•   V(X1 + X2 + … + Xn) = V(X1) + V(X2)+ … + V(Xn)
10-52

10-12 Linear Composites of
Dependent Random Variables
• The Case of Independent Random Variables
with Weights:
For independent random variables, X1, X2, …, Xn,
with respective weights 1, 2, …, n, the expected
value for the sum, is given by:
•   E(1 X1 + 2 X2    + … + n Xn) = 1 E(X1) + 2 E(X2)+
… + n E(Xn)
For independent random variables, X1, X2, …, Xn,
with respective weights 1, 2, …, n, the variance for
the sum, is given by:
•   V(1 X1 + 2 X2    + … + n Xn) = 12 V(X1) + 22
V(X2)+ … + n2 V(Xn)
10-53

Covariance of two random variables
X1 and X2

• The covariance between two random variables
X1 and X2 is given by:

•   Cov(X1, X2) = E{[X1 – E(X1)] [X2 – E(X2)]}

• A simpler measure of covariance is given by:

• Cov(X1, X2) =  SD(X1) SD(X2) where  is the
correlation between X1 and X2.
10-54

10-12 Linear Composites of
Dependent Random Variables

• The Case of Dependent Random
Variables with Weights:
For dependent random variables, X1, X2, …, Xn, with
respective weights 1, 2, …, n, the variance for the
sum, is given by:
•   V(1 X1 + 1 X2      + … + n Xn) = 12 V(X1) + 22
V(X2)+ … + n2 V(Xn) + 2 1 2Cov(X1, X2) + … +          2
n-1 nCov(Xn-1, Xn)

```
To top