Regression I_ Introduction

Document Sample
Regression I_ Introduction Powered By Docstoc
					Introduction to Regression

        MSIT3000
        Lecture 18
                 Objectives

 Learn key terms and uses of regression.
 Describe the assumptions needed for simple
  Ordinary Least Squares regression.
 Estimate the parameters for a simple linear
  probabilistic model.
Text: 9.1, 9.2 & 9.3


                    MSIT3000                    2
               What is regression?

 To ‘regress’ one variable on another is to ‘fit’ a
  function.
 The simplest function to fit is:
      Y=A          (Not very useful).
 The second simplest function to fit is:
      Y = A + Bx    (Remarkably useful!)
 ‘Regression’ refers to finding values for A & B
  from values of X & Y.

                          MSIT3000                     3
                Fitting a line to data:

                     Example from p 455

        5

        4

        3
Sales




        2

        1

        0
            0    1     2          3        4   5   6
                             Advertising




                           MSIT3000                    4
Data with “regression line”

                     'Fitted' Line       y = 0.7x - 0.1


    5

    4

    3
y




    2

    1

    0
        0       1      2          3      4        5          6
                                  x

            y       Predicted y       Linear (Predicted y)


                           MSIT3000                              5
     What is regression useful for?

 Marketing: advertising & sales models.
 Real estate: estimating the value of property and
  property attributes.
 Finance: Valuing assets. Modeling default risk.
  Establishing benchmarks.
 Accounting: Measuring financial performance –
  what is an appropriate benchmark?
 Organization Behavior: Relating performance to
  different kinds of pay or responsibilities.

                       MSIT3000                       6
                       Terminology

 Dependent variable.
      This is what you wish to model, explain and predict. In a
       sales-advertising model, you would want to predict sales
       based on how much you advertise.
 Independent variable (a.k.a. explanatory variable or
  predictor):
      This is the input to the model (advertising, in the sales
       advertising model).


                             MSIT3000                              7
                        Terminology

 Probabilistic vs deterministic models.
      Deterministic models have no room for ‘error’. I.e. if y = a
       + bx then that must be exactly true for all pairs of y and x.
      Probabilistic models recognize that there may be some
       ‘disturbance’ in our data. We therefore add noise to the
       model: y = a + bx + 
 The noise term is denoted with  and a.k.a.
      Disturbance
      Random error

                              MSIT3000                                 8
                    Terminology

 Ordinary Least Squares regression:
     Ordinary refers to the deterministic part of the
      model being linear. We will expand on what
      “linear” means further when we get to multiple
      regression.
     ‘Least Squares’ refers to how we find the
      regression line. More on that shortly.


                        MSIT3000                         9
                     Where are we?

 We have a few terms and definitions.
 We have a set of problems in business that
  regression is useful for.
 We have found that it is possible to ‘fit’ a
  regression line by sight.
      The main problem with this method: it is subjective.
 This was terminology & motivation; now we will
  examine a method to find the regression line
  objectively.

                            MSIT3000                          10
                   Assumptions

 In order to fit a linear regression line, we
  need the following assumptions (cfr text):
  1.   Y = 0 + 1 x +  (implied in text).
  2.     N(0,2)
  3.   i & j are independent if i  j




                        MSIT3000                 11
                         Fitting an OLS

 The ‘fitted line’:
                              ˆ ˆ
       Yhat = b0 + b1 x  y  0  1 x  ˆ
      We can find ‘errors’ [a.k.a. ‘prediction errors’ or
       ‘residuals’] for each pair of x & y:
          e = y- yhat
 How can we use the errors to find a “best”
  line through our data?

                            MSIT3000                     12
                              'Fitted' Line
                                                y = 0.7x - 0.1

            5

            4

            3
y
            2

            1

            0
                 0       1      2          3      4        5          6
                                           x

                     y       Predicted y       Linear (Predicted y)



                             x Residual Plot

            1
Residuals




            0
                 0       1      2          3       4       5          6
            -1
                                           x
                 Using the error terms

 In order to minimize the error in some meaningful
  way, we must first measure the overall error. How?
      we square each error to make sure each component of the
       overall error term is positive.
      then we sum all the squared error terms in order to get a
       measure for all of the data.
      finally we minimize that function; based on which
       ‘variables’?
          the parameter-estimates


                                MSIT3000                      14
                              Formulas

 When we minimize SSE using the parameter
  estimates, we find that:
     the slope 1hat = SS(xy)/SS(xx)
     the intercept 0hat = ybar - 1hat*xbar
         this is another way of saying that the OLS line passes
          through the pair of sample means, xbar and ybar.
     Where: SS           ( x  x)( y  y )   x y 
                                                         x  y 
                                                                   i       i
                   xy         i           i        i   i
                                                                       n
                SS xx   ( xi  x)   2
                                           x 
                                              2
                                                  x     i
                                                               2

                                              i
                                                   n
                                  MSIT3000                                     15
                    Conclusion

 Objectives addressed:
     Terminology and some uses of regression.
     Assumptions needed OLS regression.
     Estimating the OLS parameters.
 Problem: Example on page 479.



                       MSIT3000                  16

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:18
posted:10/25/2012
language:Unknown
pages:16