VIEWS: 46 PAGES: 36 CATEGORY: Technology POSTED ON: 5/3/2010 Public Domain
Chapter 18 Model Building – Quadratic Regression 1 Ch 18 Introduction • Regression analysis is one of the most commonly used techniques in statistics. • It is considered powerful for several reasons: – It can cover a variety of mathematical models • linear relationships. • non - linear relationships. • nominal independent variables. – It provides efficient methods for model building 2 18.1 Polynomial Models • There are models where the independent variables (xi) may appear as functions of a smaller number of predictor variables. • Polynomial models are one such example. 3 Polynomial Models with One Predictor Variable y = b0 + b1x1+ b2x2 +…+ bpxp + e y = b0 + b1x + b2x2 + …+bpxp + e 4 First Order Model… • When p = 1, we have our simple linear regression model: • That is, we believe there is a straight-line relationship between the dependent and independent variables over the range of the values of x: 5 Second Order Model… • When p = 2, the polynomial model is a parabola: 6 Third Order Model… • When p = 3, our third order model looks like: 7 Polynomial Models with Two Predictor Variables • First order model y y = b 0 + b 1x 1 + e x1 b2x2 + e y x2 x1 x2 8 Polynomial Models with Two Predictor Variables • First order model y = b0 + b1x1 + b2x2 + e First order model, two predictors,and interaction y = b 0 + b 1x 1 + b 2x 2 The effect of one predictor variable on y +b3x1x2 + e is independent of the effect of the other The two variables interact predictor variable on y. to affect the value of y. X2 = 3 X2 = 3 X2 = 2 X2 = 1 X2 = 2 X2 =1 x1 x1 9 Polynomial Models with Two Predictor Variables Second order model Second order y = b0 + b1x1 + b2x2 model with + b3x12 + b4x22 + e interaction X2 = 3 y = b0 + b1x15+1b22+ 2e bxx x +b3x12 + b4x22+ e X2 = 3 y = [b0+b2(3)+b4(32)]+ b1x1 + b3x12 + e X2 = 2 X2 = 2 X2 =1 y = [b0+b2(2)+b4 (22)]+ b1 x 1 + b3 x 1 + e 2 X2 =1 y = [b0+b2(1)+b4(12)]+ b1x1 + b3x12 + e x1 10 Selecting a Model • Several models have been introduced. • How do we select the right model? • Selecting a model: – Use your knowledge of the problem (variables involved and the nature of the relationship between them) to select a model. – Test the model using statistical techniques. 11 Selecting a Model • In this chapter, we will concentrate on the quadratic model. • So any problem we solve we will check on the quadratic model. If this doesn’t work, we will try a linear model. 12 Selecting a Model • As a general rule, if the p-value of the square term is < 0.05, we will keep the quadratic model, otherwise we will try a linear model. 13 Selecting a Model; Example • Example 18.1 The location of a new restaurant – A fast food restaurant chain tries to identify new locations that are likely to be profitable. – The primary market for such restaurants is middle-income adults and their children (between the age 5 and 12). – Which regression model should be proposed to predict the profitability of new locations? 14 Selecting a Model; Example • Solution – The dependent variable will be Gross Revenue Quadratic relationships between Revenue and each predictor variable should be observed. Why? Members of middle-class Families with very young or families are more likely to older kids will not visit the visit a fast food family than restaurant as frequent as members of poor or wealthy families with mid-range ages families. of kids. Revenue Revenue Income 15 age Low Middle High Low Middle High Selecting a Model; Example • Solution – The quadratic regression model built is Sales = b0 + b1INCOME + b2AGE + b3INCOME2 +b4AGE2 + b5(INCOME)(AGE) +e Include interaction term when in doubt, and test its relevance later. SALES = annual gross sales INCOME = median annual household income in the neighborhood AGE = mean age of children in the neighborhood 16 Selecting a Model; Example • Example 18.2 – To verify the validity of the model proposed in example 18.1 for recommending the location of a new fast food restaurant, 25 areas with fast food restaurants were randomly selected. – Each area included one of the firm’s and three competing restaurants. – Data collected included (Xm19-02.xls): • Previous year’s annual gross sales. • Mean annual household income. • Mean age of children 17 Selecting a Model; Example Xm18-02 Revenue Income Age 1128 23.5 10.5 1005 17.6 7.2 Collected data 1212 26.3 7.6 . . . . . . Income sq Age sq (Income)( Age) 552.25 110.25 246.75 309.76 51.84 126.72 Added data 691.69 57.76 199.88 . . . . . . 18 The Quadratic Relationships – Graphical Illustration REVENUE vs. AGE 1500 1000 500 REVENUE vs. INCOME 0 0.0 5.0 10.0 15.0 20.0 1500 1000 500 0 0.0 10.0 20.0 30.0 40.0 19 Example 18.2… • You can take the original data collected (revenues, household income, and age) and plot y vs. x1 and y vs. x2 to get a feel for the data; trend lines were added for clarity… 20 Regression Analysis: Revenue versus Income, Age, ... The regression equation is Revenue = - 1134 + 173 Income + 23.6 Age - 3.73 Income s - 3.87 Age sq + 1.97 (Income) Predictor Coef SE Coef T P This is a valid model that can be Constant -1134.0 320.0 -3.54 0.002 used to make predictions. Income 173.20 28.20 6.14 0.000 Age 23.55 32.23 0.73 0.474 But… Income s -3.7261 0.5422 -6.87 0.000 Age sq -3.869 1.179 -3.28 0.004 (Income) 1.9673 0.9441 2.08 0.051 S = 44.6953 R-Sq = 90.7% R-Sq(adj) = 88.2% Analysis of Variance Source DF SS MS F P Regression 5 368140 73628 36.86 0.000 Residual Error 19 37956 1998 Total 24 406096 21 Example 18.2… INTERPRET • Checking the regression tool’s output… The model fits the data well and its valid… Uh oh. multicollinearity 22 Model Validation The model can be used to make predictions... …but multicolinearity (relationship between two or more independent variables) is a problem!! The t-tests may be distorted, therefore, do not interpret the coefficients or test them. 23 Model Building • The problems you will be asked to do involve only two variables, the independent and the dependent. • It is the independent variable that have a square term. • To make things easier, we will start by doing a fitted line plot to see if the quadratic or linear model looks better. • Then we’ll do the math to confirm this. 24 Model Building • Problem 18.3 page 732 • Independent variable (x): shelf space • Dependent variable (y): number of boxes sold 25 Model Building • Fitted Line Plot (Quadratic) Fitted Line Plot Sales = - 109.0 + 33.09 Space - 0.6655 Space**2 400 S 41.1474 R-Sq 40.7% R-Sq(adj) 35.3% 350 300 Sales 250 200 10 15 20 25 30 35 Space 26 Model Building • Fitted Line Plot (linear) Fitted Line Plot Sales = 239.7 + 1.144 Space 400 S 51.5360 R-Sq 2.7% R-Sq(adj) 0.0% 350 300 Sales 250 200 10 15 20 25 30 35 Space 27 Quadratic Model • Minitab printout for quadratic: • Errors normal and independent Histogram of the Residuals Residuals Versus the Order of the Data (response is Sales) (response is Sales) 125 5 100 4 75 50 Frequency Residual 3 25 2 0 -25 1 -50 0 -40 0 40 80 120 2 4 6 8 10 12 14 16 18 20 22 24 Residual Observation Order 28 Quadratic Model Regression Analysis: Sales versus Space, Space sq The regression equation is Sales = - 109 + 33.1 Space - 0.666 Space sq Predictor Coef SE Coef T P Constant -108.99 97.24 -1.12 0.274 Space 33.089 8.590 3.85 0.001 Space sq -0.6655 0.1774 -3.75 0.001 S = 41.1474 R-Sq = 40.7% R-Sq(adj) = 35.3% Analysis of Variance Source DF SS MS F P Regression 2 25540 12770 7.54 0.003 Residual Error 22 37248 1693 Total 24 62788 29 Quadratic Model • The quadratic model is valid because the p-value of the squared term is less than 0.05. It is 0.001. 30 Quadratic Model • Testing the complete model • H0: β1 = β2=0 • H1: At least one β is not equal to 0 (Y has either a linear or quadratic relationship to X) • Decision rule: accept H1 if p-value < α • From Minitab: F = 7.54, p-value = 0.003 • There is overwhelming evidence that at least one β is not equal to zero, thus there is a relationship between sales and shelf space, either linear or quadratic. 31 Quadratic Model • Testing the quadratic portion (is there significant curvature?) • H0: β2 = 0 H 1: β 2 ≠ 0 • Accept H1 if p-value < α • t = -3.75, p-value = 0.001 • There is overwhelming evidence to conclude that sales has a quadratic relationship with shelf space. 32 Quadratic Model • R-sq = 40.7%. This is a rather weak relationship. 40.7% of the fit is due to the relationship of the variables, 58.3% is due to chance. • You could then do P.I. and C.I. if requested. 33 Model Building • Identify the dependent variable, and clearly define it. • List potential predictors. – Bear in mind the problem of multicolinearity. – Consider the cost of gathering, processing and storing data. – Be selective in your choice (try to use as little variables as possible). 34 Gather the required observations (have at least six observations for each independent variable). • Identify several possible models. – A scatter diagram of the dependent variables can be helpful in formulating the right model. – If you are uncertain, start with first order and second order models, with and without interaction. – Try other relationships (transformations) if the polynomial models fail to provide a good fit. • Use statistical software to estimate the 35 model. • Determine whether the required conditions are satisfied. If not, attempt to correct the problem. • Select the best model. – Use the statistical output. – Use your judgment!! 36