# Linear Regression

Document Sample

```					Chapter 12
Simple Linear Regression
The Simple Linear Regression Model
The Least Squares Point Estimates
Testing Significance of Slope and y-Intercept (T Tests)
The Coefficient of Determination and Correlation
An F Test for the Simple Linear Regression Model
Multiple Regression ***

1
In this chapter we will explore the mathematical relationship between
two variables. Specifically we will seek to determine the linear
relationship between the two variables.
One of the variables will be considered the independent variable,
which we will refer to as x.
The other variable we will term the dependent variable (because its
value depends upon the value of x.

2
Example 12.1
Suppose we wanted to explore the relationship between a person’s height
and their shoe size. To investigate, we asked to individuals their height
and corresponding shoe size. We believe that a persons shoe size depends
upon their height. Thus we will refer to their height as our our
independent or x variable, and their shoe shoes is the dependent variable,

3
The following data was collected:
Height, x (inches)   Shoe size, y (Male sizes)
Person 1                  69                   9.5
Person 2                  67                   8.5
Person 3                  71                   11.5
Person 4                  65                   10.5
Person 5                  72                   11
Person 6                  68                   7.5
Person 7                  74                   12
Person 8                  65                   7
Person 9                  66                   7.5
Person 10                 72                   13

4
Example 12.1 Graph

Height vs. Shoe Size

14
12
Shoe Size, y

10
8
6
4
2
0
64   66     68        70         72   74       76
Height, x

5
These data points appear to be grouped around an imaginary line.
Thus these two variables (height and shoe size) are said to have a
linear relationship.
The equation that describes how y is related to x and an error term is
called the regression model.
The simple linear regression equation is:
 y | x = 0 + 1 x
Graph of the regression equation is a straight line.
0 is the y intercept of the regression line.
1 is the slope of the regression line.

 y |x   is the expected value of y for a given x value.

6
The estimated simple linear regression equation is:

ˆ
y  b0  b1 x

The graph is called the estimated regression line.
b0 is the y intercept of the line.
b1 is the slope of the line.
ˆ
y is the estimated value of y for a given x value.

7
In the graph for Example 12.1 (slide 5), the simple linear regression
equation would be the imaginary line that the data appears to be
grouped around (the dashed line).
From the previous slide we know that the equation for this line is
ˆ
y  b0  b1 x
ˆ
Where x represents height and y is the estimated value of a shoe size
(y) when the height equals x.
Regression analysis seeks to determine the values of b0 and b1, in this
regression equation which minimizes the overall differences between
ˆ
the actual value of y, and the value y given by the regression equation.

8
Least Squares Method
Regression employs the least squared method to
determine b0 and b1.
Least Squares Criterion
min  (y i  y i ) 2


where:
yi = observed value of the dependent variable
for the ith observation
yi = estimated value of the dependent variable

for the ith observation

9
Least Squares Method
The following parameters minimizes the least squared criterion.

SS xy   xi yi       
x y
i     i
SS xy                                           n
b1                 where
SS xx
(  xi ) 2
b0  y  b1 x               SS xx   xi 2 
n
xi = value of independent variable for ith observation
yi = value of dependent variable for ith observation

x = mean value for independent variable

y = mean value for dependent variable
n = total number of observations

10
Example 12.1 Revisited (Regression Equation)
Height, x       Shoe size, y        x2              xy
69           9.5                 4761            655.5
67            8.5                 4489            569.5
71            11.5                5041            816.5
65            10.5                4225            682.5
72            11                  5184            792
68            7.5                 4624            510
74            12                  5476            888
65            7                   4225            455
66
72
7.5
13
 yi          4356
5184
495
936
689           98      98          47565           6800
y      9.8
689
10
x
2           x y
x
i i

i
x      68.9                        i
10
11
Example 12.1
(689)(98)
(6800)               47.8
b1              10             0.5145
2
(689)       92.9
(47565)          
10

b0  (9.8)  0.515(68.9)  25.65
Thus the least squared regression equation expressing the linear relationship
between the height and shoe size for our data is:

y  25.65  0.5145x


12
Example 12.1
Thus if a person is 5 feet tall (i.e. x=60 inches), then I would estimate
their shoe size to be:

y  25.65  0.5145 *(60)  5.22

or 5.5

13
Example 12.2
Suppose we wanted to know the linear relationship between the
average hourly temperature and the weekly fuel consumption. To
explore this relationship data was collected over an 8 week period.
This data is shown below:
Temperature      Consumption
Week x (deg F)        y (MMcf)
1    28.0             12.4
2    28.0             11.7
3    32.5             12.4
4    39.0             10.8
5    45.9             9.4
6    57.8             9.5
7    58.1             8.0
8    62.5             7.5

14
Example 12.2

Estimate the fuel consumption for a week when the average daily
temperature is 40 °F.

15
Example 12.2

y       x         x2        xy
12.4    28.0     784.00    347.20
11.7    28.0     784.00    327.60
12.4    32.5    1056.25    403.00
10.8    39.0    1521.00    421.20
9.4    45.9    2106.81    431.46
9.5    57.8    3340.84    549.10
8.0    58.1    3375.61    464.80
7.5    62.5    3906.25    468.75
81.7   351.8   16874.76   3413.11

16
Example 12.2
 x  y   3413.11  (351.8)(81.7)  179.6475
SS  x y 
xy              i    i
i

n
i

8
SS xx   x 
 x 2          i
2

 16874.76 
(351.8) 2
 1404.355
i
n                                      8
SS xy            179.6475
b1                                0.1279
SS xx           1404.355
y
y     81.7
 10.2125
i

n      8
x
 xi  351.8  43.98
n       8
b0  y  b1 x
 10.2125  (0.1279)( 43.98)
 15.84
17
Thus the estimated regression equation is

y  1584  01279 x
     .     .
When x=40, an estimate for y is

y  1584  01279 *(40)  10.72
     .     .

18
Example 12.3
Reed Auto periodically has a special week-long sale. As part of the
advertising campaign Reed runs one or more television commercials
during the weekend preceding the sale. Data from a sample of 5 previous
sales are shown bellow:

Number of TV Ads     Number of Cars Sold
1                14
3                24
2                18
1                17
3                27

19
b1 = 220 - (10)(100)/5 = 5
24 - (10)2/5

b0 = 20 - 5(2) = 10

Thus the estimated regression equation is

y  10  5x


20
Scatter Diagram for Example 12.3

30
25
20
Cars Sold

y = 10 + 5x
15
10
5
0
0   1      2          3         4

21
MS Excel can be used to determine the least squared
regression equation.
The least squared regression method provides an equation
which gives the best linear relationship that exists between
the dependent and independent variables.
Sometimes, however, the “best” relationship is not
sufficient or reliable enough for estimation.
If you are estimating inventory, or capacity, or other major
business decisions, it is very costly to be inaccurate.
MS Excel gives various measures for determining whether
a regression line is “good” or reliable.

22
Measures for Evaluating a Regression Line
MS Excel Provides the Following Measures for Determining How
Reliable a Line is for Estimation
The Coefficient of Determination, r2.
The Correlation Coefficient, r.
Hypotheses Tests for a Significant Relationship
T test for a significant regression relationship
F Test for the Simple Linear Regression Model

23
The Coefficient of Determination, r2
The coefficient of determination gives a value between 0 and 1.
r2 provides the proportion of the total variation in y explained by the
simple linear regression model.
The closer this value is to 1 the more reliable the regression line is for
estimating y.

24
The Correlation Coefficient, r
The correlation coefficient gives a value between -1 and +1.
r = (sign b1) Coefficien of Deter min ation
t
where:
b1 = the slope of the estimated regression
equation
ˆ
y  b0  b1 x
The closer that r is to -1 the stronger the negative linear
relationship is between your independent and dependent variables.
The closer that r is to +1 the stronger the positive linear
relationship is between your independent and dependent variables.
The closer that r is to 0, the weaker the linear relationship is
between your independent and dependent variables.

PLEASE NOTE: EXCEL DOES NOT PROVIDE – VALUES FOR r.
THEREFORE YOU SHOULD INTERPRET r AND MAKE IT
NEGATIVE IF b1 IS NEGATIVE.

25
Testing for Significance: t Test
The t test for regression tests the following hypotheses:
H0: 1 = 0 (There is no relationship between the independent variable
Ha: 1 ≠0 (There is a linear relationship between the independent
variable and your dependent variable. )

Rejection Rule

Reject H0 if p<.

26
Example 12.1 Revisited
The Following is the MS Excel Regression output for Example 11.1:
SUMMARY OUTPUT

Regression Statistics
Multiple R            0.783155
R Square              0.613332
Standard Error        1.392183
Observations     10

ANOVA
df          SS           MS       F           Significance F
Regression      1           24.59462     24.59462 12.68959    0.007377
Residual        8           15.50538     1.938173
Total           9           40.1

Coefficient     Standard     t Stat   P-value    Lower       Upper           Lower    Upper
Error                           95%         95%            95%      95%

Intercept     -25.6512        9.96167     -2.57499 0.032871    -48.6229        -2.67957   -48.6229 -2.67957
Height, x     0.514532        0.14444     3.562245 0.007377    0.181452        0.847612   0.181452 0.847612

27
Discussion of Excel Regression Results for
Example 12.1
According to the output on the previous slide the data in this example
reveals:
r2= 0.613332, which implies that 61.33% of shoe size of an
individual can be explained by the linear relationship between a
person’s height and their shoe size (i.e. this line is not very reliable).
r=0.783155, which implies that there is a good positive relationship
between a person’s height and their shoe size.
b0=-25.6512 and b1=0.514532, therefore our estimated regression
ˆ
equation is y =-25.6512+ 0.514532x
Since 0.007377< α =0.05, we can reject H0 and conclude that there is
a linear relationship between the height of a person and their shoe
size.
COLOR CODED TO MATCH SPREADSHEET ON PREVIOUS SLIDE

28
Example 12.2 Revisited
The Following is the MS Excel Regression output for Example 12.1:
SUMMARY OUTPUT

Regression Statistics
Multiple R            0.948413871
R Square              0.899488871
Standard Error        0.654208646
Observations     8

ANOVA
df          SS            MS             F           Significance F
Regression      1           22.98081629   22.98082   53.69488        0.00033
Residual        6           2.567933713   0.427989
Total           7           25.54875

Coefficient     Standard    t Stat       P-value       Lower     Upper        Lower      Upper
Error                                95%         95%        95%       95%
Intercept     15.83785741     0.801773385 19.75353    1.09E-06    13.87599   17.79973   13.87599    17.79973
Height, x     -0.127922       0.01745733 -7.32768     0.00033    -0.17064    -0.08521   -0.17064    -0.08521

29
Discussion of Excel Regression Results for
Example 12.2
According to the output on the previous slide the data in this example
reveals:
r2=0.8995, which implies that 89.95% of the weekly fuel
consumption can be explained by the linear relationship between the
average daily temperature and the weekly fuel consumption (i.e.
based upon the available data, this line appears to be reliable).
r= 0.948413871, which implies that there is a very strong positive
relationship between the average daily temperature and the weekly
fuel consumption.
b0= 15.8378 and b1= -0.127922, therefore our estimated regression
ˆ
equation is y = 15.8378- 0.127922x
Since 0.00033< α =0.05, we can reject H0 and conclude that there is
a linear relationship between average daily temperature and the
weekly fuel consumption.
30
The End

31

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 14 posted: 11/23/2011 language: English pages: 31
How are you planning on using Docstoc?