Multiple Linear Regression I by linxiaoqin


									DS 101 – Spring 2008               Regression Analysis and Model Building #2                                             February 19th
I.         Regression Analysis Review
We start out with the idea that there is a relationship between two variables in the form of a data set of paired
x’s and y’s. And we found we measure this relationship with the correlation coefficient and the coefficient
of determination. And we can see this relationship in a simple scatter diagram. This is what we did in the
first lab. The method used was the least squares formula to develop the quantitative form of the model with
a y-intercept and a slope using the notation b0 and b1 respectively.

     Selection of                  Organize x’s                   Enter data into                              Run the simple
     x’s and y’s                   and y’s into a                 Statgraphics                                 regression
                                   data set                                                                    analysis

                                                                                                         Plot of Fitted Model
                                                                                                     FuelCons = 13.5821 - 0.74881*Week_#





                                                                                                 0         2         4          6          8


II.    Model Building Exercise: We need to remind ourselves before we get too far along what is the purpose
       of regression analysis. The following are cases where you need to come up with the proposed linear
       regression model in the form of y = f(X1, X2, …Xn) → Ŷ = β0 + β1 X1 + β2 X2 + … +.βn X1. Write
       out the initial model in the space after the case in the above format. Think about why and how you
       would want to use regression analysis for each case – this gets to focus on what would be the response

      1.      You are a marketing manager for Home Depot. For the market areas for the 56 Home Depot stores
              in your district you have the assessed valuation of the houses, total store sales, square footage of the
              houses, age of the houses, population figures, and if area is considered urban or suburban.

      2.      You are the director of the Career Center at a large university. Your variables for a sample of recent
              graduates include overall GPA, major area of study GPA, starting salaries, industry code where they
              accepted their job, whether or not the graduate was a business major, age, years of work experience
              prior to graduation.

aeb579f3-3662-4161-932d-017c94407302.doc Page 1 of 5
DS 101 – Spring 2008         Regression Analysis and Model Building #2                          February 19th
  3.    You are a product designer with Black and Decker Tools. Your sample variables for your study of a
        new drill-bit design include average drill speed used in the test, hardness index of the material used
        to test the drill for a three-inch drill hole, length of time for the drill bit to break, whether or not
        water was used to cool the drill bit when being used.

  4.    You are a loan manager for a bank. Your sample variables from the mortgages loans made in the
        last year are the selling price of houses, square footage of the houses, age of the houses, number of
        room of the houses, number of bathroom in the houses, if the garages is detached (not connected
        directly to the house), and if it is a corner lot or not.

  5.    You are a managing director of a large consulting firm. You sample includes the following
        information about you consulting project manages. These are consultants with between five and
        seven years experience and are responsible for day-to-day management of your consulting projects.
        You have their most current ranking as to top third, middle third, bottom third; their field of study in
        college (business, mathematics, engineering, liberal arts, science), whether they have a masters
        degree or not, upper-division college GPA, if they have any prior work experience, and if their
        major client base is private sector or public sector.

III. Multiple Linear Regression.
     Example: We will expand our natural gas consumption to now include a chill factor.

                                       Menu Options: Relate →
                                       Multiple Factors →
                                       Multiple Regression

aeb579f3-3662-4161-932d-017c94407302.doc Page 2 of 5
DS 101 – Spring 2008          Regression Analysis and Model Building #2                            February 19th
IV. : Hypothesis Testing in Multiple Regression Analysis: With More Than One Predictor Variable.

    Example: Gas Consumption = f (Average Temperature, Chill Factor)
    It is a three step process:
     Step I. State the hypothesis:
     H0: All βi’s = 0 or Ha: Not all βi’s = 0

     Step II. This is the decision rule:

         Reject H0 in favor of Ha if and only if the p-value for any of the predictor variables is less
         than your α-value (0.10, 0.050, 0,01 or 0.001), otherwise fail to reject Ho.

     Step III. State the results:
     For example, we do not reject H0 in favor of Ha as p-value is 0.456 which is greater than 0.05.
         Note: The hypothesis test is not a test of the size of the slope, but a test of the size of the
         variation of the observations around the regression line.

V. Assumptions for Linear Regression- both simple and multiple

There are six assumptions for linear regression.

                                                                   This is a plot of our natural gas consumption
                                                                   study to be used to see how the following
                                                                   assumptions apply to the standard model for
                                                                   regression analysis.

   1. Linearity: The regression function is linear.

aeb579f3-3662-4161-932d-017c94407302.doc Page 3 of 5
DS 101 – Spring 2008           Regression Analysis and Model Building #2                        February 19th
   2. Constant Variance: Error terms (e or ε) have a constant variance.

   3. Independence of Error Terms: The error terms (e or ε) are independent:

   4. There are no outliers.

   5. Normal Distribution of Error Terms: The error terms are normally distributed for each set of values
      of xi,yi.

   6. Completeness: One or more important (significant) predictors have not been omitted from the model.

These assumptions are important for residual analysis that is used to assess the validity of the regression

aeb579f3-3662-4161-932d-017c94407302.doc Page 4 of 5
DS 101 – Spring 2008           Regression Analysis and Model Building #2                        February 19th
V. Review Exercises

1.   Returns on common stocks in North American markets and overseas markets appear to be growing
     more closely correlated as economies become more interdependent. Suppose this regression formula
     connects total annual returns in percent for North American markets and overseas markets:

Mean overseas return in %-terms = 4.7% + 0.66 x (Mean US return in %-terms)

     a.   What do we mean by correlated and how do we see it in the above formula? Do are freehand
          sketch of the regression model/formula in the space provided below:

     b.   What is the β0 in this line: _____________

     c.   What is the β1 in this line: _____________

     d.   What is the b0 in this line: _____________

     e.   What is the b1 in this line: _____________

     f.   If the Mean US return is 6% want is the mean overseas return: ____________

     g.   Give an interpretation of the 4.7%

2.   There is some evidence that drinking moderate amounts of red wine can prevent heart attacks. The
     following table gives the related data for 1998. Do a free-hand scatter plot of the data in the area
     provided below. This time put in the data points – hint, find the range for each variable:

Country                  Italy        Spain        Sweden       Ireland     Canada       France
Wine Consumption         7.9          6.5          1.6          0.7         2.4          9.1
Heart Disease Index      107          86           207          300         191          71

     a.   What is the dependent variable: _________

    b. What is your initial regression model:

     c.   Indicate – do not calculate - one residual on the plot.

     d.   The formula for the residual is:

3.   Why is the p-value compared to the alpha (α) value?

4.   What is the expected value of the sum the errors – either ei or εi in the regression formula or regression
     model: ___________ - hint – write out the regression model: ______ = ______ + ______ +/-

     Why is this?
aeb579f3-3662-4161-932d-017c94407302.doc Page 5 of 5

To top