Model Building – Quadratic Regression by zwt73245

VIEWS: 46 PAGES: 36

									     Chapter 18

  Model Building –
Quadratic Regression


                       1
          Ch 18 Introduction
• Regression analysis is one of the most
  commonly used techniques in statistics.
• It is considered powerful for several
  reasons:
  – It can cover a variety of mathematical models
     • linear relationships.
     • non - linear relationships.
     • nominal independent variables.
  – It provides efficient methods for model
    building                                        2
     18.1 Polynomial Models
• There are models where the independent
  variables (xi) may appear as functions of a
  smaller number of predictor variables.
• Polynomial models are one such example.




                                            3
Polynomial Models with One Predictor
             Variable

   y = b0 + b1x1+ b2x2 +…+ bpxp + e

   y = b0 + b1x + b2x2 + …+bpxp + e




                                      4
           First Order Model…
• When p = 1, we have our simple linear
  regression model:
• That is, we believe there is a straight-line
  relationship between the dependent and
  independent variables over the range of the
  values of x:




                                                 5
        Second Order Model…
• When p = 2, the polynomial model is a parabola:




                                                    6
         Third Order Model…
• When p = 3, our third order model looks like:




                                              7
 Polynomial Models with Two Predictor
             Variables
• First order model                y
  y = b 0 + b 1x 1 + e
                                           x1

             b2x2 + e
         y               x2


                              x1


x2
                                       8
   Polynomial Models with Two Predictor
               Variables
• First order model
  y = b0 + b1x1 + b2x2 + e                    First order model, two
                                              predictors,and interaction
                                              y = b 0 + b 1x 1 + b 2x 2
  The effect of one predictor variable on y
                                                 +b3x1x2 + e
  is independent of the effect of the other     The two variables interact
  predictor variable on y.                      to affect the value of y.
                               X2 = 3                                        X2 = 3
                               X2 = 2
                               X2 = 1                                        X2 = 2

                                                                             X2 =1

                                  x1                                          x1
                                                                               9
    Polynomial Models with Two Predictor
                Variables
  Second order model                                           Second order
  y = b0 + b1x1 + b2x2                         model with
      + b3x12 + b4x22 + e                             interaction
                                   X2 = 3             y = b0 + b1x15+1b22+ 2e
                                                                  bxx x
                                               +b3x12    + b4x22+ e             X2 = 3
y = [b0+b2(3)+b4(32)]+ b1x1 + b3x12 + e
                                   X2 = 2                                       X2 = 2

                                                                                X2 =1
y = [b0+b2(2)+b4   (22)]+   b1 x 1 + b3 x 1 + e
                                           2

                                   X2 =1


y = [b0+b2(1)+b4(12)]+ b1x1 + b3x12 + e
                                     x1                                          10
          Selecting a Model
• Several models have been introduced.
• How do we select the right model?
• Selecting a model:
  – Use your knowledge of the problem (variables
    involved and the nature of the relationship
    between them) to select a model.
  – Test the model using statistical techniques.


                                               11
          Selecting a Model
• In this chapter, we will concentrate on the
  quadratic model.
• So any problem we solve we will check on
  the quadratic model. If this doesn’t work,
  we will try a linear model.




                                            12
          Selecting a Model
• As a general rule, if the p-value of the
  square term is < 0.05, we will keep the
  quadratic model, otherwise we will try a
  linear model.




                                             13
     Selecting a Model; Example
• Example 18.1 The location of a new
  restaurant
  – A fast food restaurant chain tries to identify
    new locations that are likely to be profitable.
  – The primary market for such restaurants is
    middle-income adults and their children
    (between the age 5 and 12).
  – Which regression model should be proposed
    to predict the profitability of new locations?
                                                      14
      Selecting a Model; Example
• Solution
  – The dependent variable will be Gross Revenue

  Quadratic relationships between Revenue and each
  predictor variable should be observed. Why?
       Members of middle-class               Families with very young or
       families are more likely to           older kids will not visit the
       visit a fast food family than         restaurant as frequent as
       members of poor or wealthy            families with mid-range ages
       families.                             of kids.

  Revenue                              Revenue



                                   Income                                 15
                                                                        age
              Low   Middle High                   Low   Middle High
       Selecting a Model; Example
• Solution
   – The quadratic regression model built is


     Sales = b0 + b1INCOME + b2AGE
     + b3INCOME2 +b4AGE2 + b5(INCOME)(AGE) +e
                                     Include interaction term when in doubt,
                                     and test its relevance later.

 SALES = annual gross sales
 INCOME = median annual household income in the
          neighborhood
 AGE = mean age of children in the neighborhood
                                                                               16
    Selecting a Model; Example
• Example 18.2
  – To verify the validity of the model proposed in
    example 18.1 for recommending the location of a
    new fast food restaurant, 25 areas with fast food
    restaurants were randomly selected.
  – Each area included one of the firm’s and three
    competing restaurants.
  – Data collected included (Xm19-02.xls):
     • Previous year’s annual gross sales.
     • Mean annual household income.
     • Mean age of children

                                                        17
    Selecting a Model; Example
                          Xm18-02
Revenue          Income      Age
  1128             23.5      10.5
  1005             17.6       7.2         Collected data
  1212             26.3       7.6
    .                .         .
    .                .         .
                          Income sq   Age sq     (Income)( Age)
                            552.25    110.25         246.75
                            309.76     51.84         126.72
    Added data              691.69     57.76         199.88
                               .         .              .
                               .         .              .



                                                            18
 The Quadratic Relationships –
     Graphical Illustration
                                     REVENUE vs. AGE


                    1500

                    1000

                     500
             REVENUE vs. INCOME
                       0
                           0.0      5.0     10.0       15.0   20.0
1500

1000

 500

   0
       0.0   10.0     20.0        30.0    40.0
                                                                     19
             Example 18.2…
• You can take the original data collected
  (revenues, household income, and age) and
  plot y vs. x1 and y vs. x2 to get a feel for the
  data; trend lines were added for clarity…




                                                20
Regression Analysis: Revenue versus Income, Age, ...

The regression equation is
Revenue = - 1134 + 173 Income + 23.6 Age - 3.73 Income s - 3.87 Age sq
      + 1.97 (Income)

Predictor Coef SE Coef          T P
                                                    This is a valid model that can be
Constant -1134.0 320.0       -3.54 0.002            used to make predictions.
Income 173.20 28.20           6.14 0.000
Age       23.55 32.23         0.73 0.474                            But…
Income s -3.7261 0.5422     -6.87 0.000
Age sq    -3.869 1.179      -3.28 0.004
(Income) 1.9673 0.9441       2.08 0.051

S = 44.6953 R-Sq = 90.7% R-Sq(adj) = 88.2%

Analysis of Variance

Source         DF SS    MS     F     P
Regression     5 368140 73628 36.86 0.000
Residual Error 19 37956 1998
Total         24 406096                                                          21
            Example 18.2…                      INTERPRET

• Checking the regression tool’s output…


                The model fits the data well
                      and its valid…




                                                    Uh oh.
                                                multicollinearity




                                                            22
             Model Validation

The model can be used to make predictions...
…but multicolinearity (relationship between
 two or more independent variables) is a problem!!
The t-tests may be distorted, therefore,
do not interpret the coefficients or test them.




                                                     23
           Model Building
• The problems you will be asked to do
  involve only two variables, the
  independent and the dependent.
• It is the independent variable that have
  a square term.
• To make things easier, we will start by
  doing a fitted line plot to see if the
  quadratic or linear model looks better.
• Then we’ll do the math to confirm this.
                                             24
           Model Building
• Problem 18.3 page 732
• Independent variable (x): shelf space
• Dependent variable (y): number of boxes
  sold




                                            25
                          Model Building
• Fitted Line Plot (Quadratic)

                                     Fitted Line Plot
                                Sales = - 109.0 + 33.09 Space
                                      - 0.6655 Space**2
               400                                                   S           41.1474
                                                                     R-Sq         40.7%
                                                                     R-Sq(adj)    35.3%

               350


               300
       Sales




               250


               200



                     10    15   20       25          30         35
                                     Space
                                                                                           26
                         Model Building
• Fitted Line Plot (linear)

                                   Fitted Line Plot
                              Sales = 239.7 + 1.144 Space
              400                                                S           51.5360
                                                                 R-Sq          2.7%
                                                                 R-Sq(adj)     0.0%

              350



              300
      Sales




              250



              200



                    10   15   20       25         30        35
                                   Space
                                                                                       27
                                        Quadratic Model
            • Minitab printout for quadratic:
            • Errors normal and independent
                      Histogram of the Residuals                                       Residuals Versus the Order of the Data
                           (response is Sales)                                                     (response is Sales)
                                                                         125
            5
                                                                         100
            4
                                                                         75

                                                                         50
Frequency




                                                              Residual
            3
                                                                         25

            2                                                             0

                                                                         -25
            1
                                                                         -50

            0
                -40        0            40         80   120                    2   4      6    8     10   12    14    16   18   20   22   24
                                  Residual                                                            Observation Order




                                                                                                                                               28
          Quadratic Model
Regression Analysis: Sales versus Space, Space sq

The regression equation is
Sales = - 109 + 33.1 Space - 0.666 Space sq

Predictor Coef    SE Coef T     P
Constant -108.99 97.24   -1.12 0.274
Space      33.089 8.590  3.85 0.001
Space sq -0.6655 0.1774 -3.75 0.001

S = 41.1474 R-Sq = 40.7% R-Sq(adj) = 35.3%

Analysis of Variance

Source       DF     SS MS     F P
Regression     2 25540 12770 7.54 0.003
Residual Error 22 37248 1693
Total          24 62788

                                                    29
          Quadratic Model
• The quadratic model is valid because the
  p-value of the squared term is less than
  0.05. It is 0.001.




                                             30
           Quadratic Model
• Testing the complete model
• H0: β1 = β2=0
• H1: At least one β is not equal to 0 (Y has
  either a linear or quadratic relationship to X)
• Decision rule: accept H1 if p-value < α
• From Minitab: F = 7.54, p-value = 0.003
• There is overwhelming evidence that at least
  one β is not equal to zero, thus there is a
  relationship between sales and shelf space,
  either linear or quadratic.

                                                    31
           Quadratic Model
• Testing the quadratic portion (is there
  significant curvature?)
• H0: β2 = 0          H 1: β 2 ≠ 0
• Accept H1 if p-value < α
• t = -3.75, p-value = 0.001
• There is overwhelming evidence to
  conclude that sales has a quadratic
  relationship with shelf space.
                                            32
           Quadratic Model
• R-sq = 40.7%. This is a rather weak
  relationship. 40.7% of the fit is due to the
  relationship of the variables, 58.3% is due
  to chance.
• You could then do P.I. and C.I. if
  requested.



                                             33
             Model Building
• Identify the dependent variable, and clearly
  define it.
• List potential predictors.
  – Bear in mind the problem of multicolinearity.
  – Consider the cost of gathering, processing
    and storing data.
  – Be selective in your choice (try to use as little
    variables as possible).

                                                        34
Gather the required observations (have at least
six observations for each independent variable).

• Identify several possible models.
   – A scatter diagram of the dependent
     variables can be helpful in formulating the
     right model.
   – If you are uncertain, start with first order
     and second order models, with and without
     interaction.
   – Try other relationships (transformations) if
     the polynomial models fail to provide a
     good fit.
• Use statistical software to estimate the
                                                    35
  model.
• Determine whether the required
  conditions are satisfied. If not, attempt to
  correct the problem.
• Select the best model.
  – Use the statistical output.
  – Use your judgment!!




                                                 36

								
To top