# Model Building â€“ Quadratic Regression by zwt73245

VIEWS: 46 PAGES: 36

• pg 1
```									     Chapter 18

Model Building –

1
Ch 18 Introduction
• Regression analysis is one of the most
commonly used techniques in statistics.
• It is considered powerful for several
reasons:
– It can cover a variety of mathematical models
• linear relationships.
• non - linear relationships.
• nominal independent variables.
– It provides efficient methods for model
building                                        2
18.1 Polynomial Models
• There are models where the independent
variables (xi) may appear as functions of a
smaller number of predictor variables.
• Polynomial models are one such example.

3
Polynomial Models with One Predictor
Variable

y = b0 + b1x1+ b2x2 +…+ bpxp + e

y = b0 + b1x + b2x2 + …+bpxp + e

4
First Order Model…
• When p = 1, we have our simple linear
regression model:
• That is, we believe there is a straight-line
relationship between the dependent and
independent variables over the range of the
values of x:

5
Second Order Model…
• When p = 2, the polynomial model is a parabola:

6
Third Order Model…
• When p = 3, our third order model looks like:

7
Polynomial Models with Two Predictor
Variables
• First order model                y
y = b 0 + b 1x 1 + e
x1

b2x2 + e
y               x2

x1

x2
8
Polynomial Models with Two Predictor
Variables
• First order model
y = b0 + b1x1 + b2x2 + e                    First order model, two
predictors,and interaction
y = b 0 + b 1x 1 + b 2x 2
The effect of one predictor variable on y
+b3x1x2 + e
is independent of the effect of the other     The two variables interact
predictor variable on y.                      to affect the value of y.
X2 = 3                                        X2 = 3
X2 = 2
X2 = 1                                        X2 = 2

X2 =1

x1                                          x1
9
Polynomial Models with Two Predictor
Variables
Second order model                                           Second order
y = b0 + b1x1 + b2x2                         model with
+ b3x12 + b4x22 + e                             interaction
X2 = 3             y = b0 + b1x15+1b22+ 2e
bxx x
+b3x12    + b4x22+ e             X2 = 3
y = [b0+b2(3)+b4(32)]+ b1x1 + b3x12 + e
X2 = 2                                       X2 = 2

X2 =1
y = [b0+b2(2)+b4   (22)]+   b1 x 1 + b3 x 1 + e
2

X2 =1

y = [b0+b2(1)+b4(12)]+ b1x1 + b3x12 + e
x1                                          10
Selecting a Model
• Several models have been introduced.
• How do we select the right model?
• Selecting a model:
– Use your knowledge of the problem (variables
involved and the nature of the relationship
between them) to select a model.
– Test the model using statistical techniques.

11
Selecting a Model
• In this chapter, we will concentrate on the
• So any problem we solve we will check on
the quadratic model. If this doesn’t work,
we will try a linear model.

12
Selecting a Model
• As a general rule, if the p-value of the
square term is < 0.05, we will keep the
quadratic model, otherwise we will try a
linear model.

13
Selecting a Model; Example
• Example 18.1 The location of a new
restaurant
– A fast food restaurant chain tries to identify
new locations that are likely to be profitable.
– The primary market for such restaurants is
middle-income adults and their children
(between the age 5 and 12).
– Which regression model should be proposed
to predict the profitability of new locations?
14
Selecting a Model; Example
• Solution
– The dependent variable will be Gross Revenue

Quadratic relationships between Revenue and each
predictor variable should be observed. Why?
Members of middle-class               Families with very young or
families are more likely to           older kids will not visit the
visit a fast food family than         restaurant as frequent as
members of poor or wealthy            families with mid-range ages
families.                             of kids.

Revenue                              Revenue

Income                                 15
age
Low   Middle High                   Low   Middle High
Selecting a Model; Example
• Solution
– The quadratic regression model built is

Sales = b0 + b1INCOME + b2AGE
+ b3INCOME2 +b4AGE2 + b5(INCOME)(AGE) +e
Include interaction term when in doubt,
and test its relevance later.

SALES = annual gross sales
INCOME = median annual household income in the
neighborhood
AGE = mean age of children in the neighborhood
16
Selecting a Model; Example
• Example 18.2
– To verify the validity of the model proposed in
example 18.1 for recommending the location of a
new fast food restaurant, 25 areas with fast food
restaurants were randomly selected.
– Each area included one of the firm’s and three
competing restaurants.
– Data collected included (Xm19-02.xls):
• Previous year’s annual gross sales.
• Mean annual household income.
• Mean age of children

17
Selecting a Model; Example
Xm18-02
Revenue          Income      Age
1128             23.5      10.5
1005             17.6       7.2         Collected data
1212             26.3       7.6
.                .         .
.                .         .
Income sq   Age sq     (Income)( Age)
552.25    110.25         246.75
309.76     51.84         126.72
Added data              691.69     57.76         199.88
.         .              .
.         .              .

18
The Quadratic Relationships –
Graphical Illustration
REVENUE vs. AGE

1500

1000

500
REVENUE vs. INCOME
0
0.0      5.0     10.0       15.0   20.0
1500

1000

500

0
0.0   10.0     20.0        30.0    40.0
19
Example 18.2…
• You can take the original data collected
(revenues, household income, and age) and
plot y vs. x1 and y vs. x2 to get a feel for the
data; trend lines were added for clarity…

20
Regression Analysis: Revenue versus Income, Age, ...

The regression equation is
Revenue = - 1134 + 173 Income + 23.6 Age - 3.73 Income s - 3.87 Age sq
+ 1.97 (Income)

Predictor Coef SE Coef          T P
This is a valid model that can be
Constant -1134.0 320.0       -3.54 0.002            used to make predictions.
Income 173.20 28.20           6.14 0.000
Age       23.55 32.23         0.73 0.474                            But…
Income s -3.7261 0.5422     -6.87 0.000
Age sq    -3.869 1.179      -3.28 0.004
(Income) 1.9673 0.9441       2.08 0.051

S = 44.6953 R-Sq = 90.7% R-Sq(adj) = 88.2%

Analysis of Variance

Source         DF SS    MS     F     P
Regression     5 368140 73628 36.86 0.000
Residual Error 19 37956 1998
Total         24 406096                                                          21
Example 18.2…                      INTERPRET

• Checking the regression tool’s output…

The model fits the data well
and its valid…

Uh oh.
multicollinearity

22
Model Validation

The model can be used to make predictions...
…but multicolinearity (relationship between
two or more independent variables) is a problem!!
The t-tests may be distorted, therefore,
do not interpret the coefficients or test them.

23
Model Building
• The problems you will be asked to do
involve only two variables, the
independent and the dependent.
• It is the independent variable that have
a square term.
• To make things easier, we will start by
doing a fitted line plot to see if the
quadratic or linear model looks better.
• Then we’ll do the math to confirm this.
24
Model Building
• Problem 18.3 page 732
• Independent variable (x): shelf space
• Dependent variable (y): number of boxes
sold

25
Model Building
• Fitted Line Plot (Quadratic)

Fitted Line Plot
Sales = - 109.0 + 33.09 Space
- 0.6655 Space**2
400                                                   S           41.1474
R-Sq         40.7%

350

300
Sales

250

200

10    15   20       25          30         35
Space
26
Model Building
• Fitted Line Plot (linear)

Fitted Line Plot
Sales = 239.7 + 1.144 Space
400                                                S           51.5360
R-Sq          2.7%

350

300
Sales

250

200

10   15   20       25         30        35
Space
27
• Minitab printout for quadratic:
• Errors normal and independent
Histogram of the Residuals                                       Residuals Versus the Order of the Data
(response is Sales)                                                     (response is Sales)
125
5
100
4
75

50
Frequency

Residual
3
25

2                                                             0

-25
1
-50

0
-40        0            40         80   120                    2   4      6    8     10   12    14    16   18   20   22   24
Residual                                                            Observation Order

28
Regression Analysis: Sales versus Space, Space sq

The regression equation is
Sales = - 109 + 33.1 Space - 0.666 Space sq

Predictor Coef    SE Coef T     P
Constant -108.99 97.24   -1.12 0.274
Space      33.089 8.590  3.85 0.001
Space sq -0.6655 0.1774 -3.75 0.001

S = 41.1474 R-Sq = 40.7% R-Sq(adj) = 35.3%

Analysis of Variance

Source       DF     SS MS     F P
Regression     2 25540 12770 7.54 0.003
Residual Error 22 37248 1693
Total          24 62788

29
• The quadratic model is valid because the
p-value of the squared term is less than
0.05. It is 0.001.

30
• Testing the complete model
• H0: β1 = β2=0
• H1: At least one β is not equal to 0 (Y has
either a linear or quadratic relationship to X)
• Decision rule: accept H1 if p-value < α
• From Minitab: F = 7.54, p-value = 0.003
• There is overwhelming evidence that at least
one β is not equal to zero, thus there is a
relationship between sales and shelf space,
either linear or quadratic.

31
• Testing the quadratic portion (is there
significant curvature?)
• H0: β2 = 0          H 1: β 2 ≠ 0
• Accept H1 if p-value < α
• t = -3.75, p-value = 0.001
• There is overwhelming evidence to
conclude that sales has a quadratic
relationship with shelf space.
32
• R-sq = 40.7%. This is a rather weak
relationship. 40.7% of the fit is due to the
relationship of the variables, 58.3% is due
to chance.
• You could then do P.I. and C.I. if
requested.

33
Model Building
• Identify the dependent variable, and clearly
define it.
• List potential predictors.
– Bear in mind the problem of multicolinearity.
– Consider the cost of gathering, processing
and storing data.
– Be selective in your choice (try to use as little
variables as possible).

34
Gather the required observations (have at least
six observations for each independent variable).

• Identify several possible models.
– A scatter diagram of the dependent
variables can be helpful in formulating the
right model.
– If you are uncertain, start with first order
and second order models, with and without
interaction.
– Try other relationships (transformations) if
the polynomial models fail to provide a
good fit.
• Use statistical software to estimate the
35
model.
• Determine whether the required
conditions are satisfied. If not, attempt to
correct the problem.
• Select the best model.
– Use the statistical output.
– Use your judgment!!

36

```
To top