Introduction to Regression Analysis by moti

VIEWS: 159 PAGES: 13

									Introduction to Regression
Dependent variable (response variable)
   Measures an outcome of a study
       Income
       GRE scores
   Dependent variable = Mean (expected value)
    + random error
       y = E(y) + ε
   If y is normally distributed, know the mean
    and the standard deviation, we can make a
    probability statement
Probability statement

   Let’s say the mean cholesterol level for
    graduate students= 250
   Standard deviation= 50 units
   What does this distribution look like?
   “the probability that ____’s cholesterol will fall
    within 2 standard deviations of the mean is
Independent variables (predictor
   explains or causes changes in the response
       (The effect of the IV on the DV)
       (Predicting the DV based on the IV)
   What independent variables might help us
    predict cholesterol levels?

   The effect of a reading intervention
    program on student achievement in
   Predict state revenues
   Predict GPA based on SAT
   predict reaction time from blood alcohol
Regression Analysis

   Build a model that can be used to predict one
    variable (y) based on other variables (x1, x2,
    x3,… xk,)
       Model: a prediction equation relating y to x1, x2,
        x3,… xk,
   Predict with a small amount of error
  Typical Strategy for Regression Analysis


                    Conduct exploratory data analysis

Develop one or more tentative models

                    Identify most suitable model

                                       Make inferences based on model

Fitting the Model: Least Squares Method

   Model: an equation that describes the
    relationship between variables
   Let’s look at the persistence example
Method of Least Squares

Let’s look at the persistence example
ˆ      SS xy
1 
       SS xx

           Finding the Least Squares Line

                            ˆ      SS xy
              Slope:       1 
                                   SS xx

                         ˆ         ˆ
               Intercept:  0  y  1 x

              The line that makes the vertical distances of the
               data points from the line as small as possible
                  The SE [Sum of Errors (deviations from the line, residuals)]
                   equals 0
                  The SSE (Sum of Squared Errors) is smaller than for any
                   other straight-line model with SE=0.
Regression Line

   Has the form y = a + bx
       b is the slope, the amount by which y changes
        when x increases by 1 unit
       a is the y-intercept, the value of y when x = 0 (or
        the point at which the line cuts through the x-axis)
Simplest of the probabilistic models:
Straight-Line Regression Model
 First order linear model
 Equation: y = β0 + β1x + ε

 Where

  y = dependent variable
  x = independent variable
  β0 = y-intercept
  β1 = slope of the line
 ε = random error component
Let’s look at the relationship between two
variables and construct the line of best fit
   Minitab example: Beers and BAC

To top