# Introduction to Regression Analysis by moti

VIEWS: 159 PAGES: 13

• pg 1
```									Introduction to Regression
Analysis
Dependent variable (response variable)
   Measures an outcome of a study
   Income
   GRE scores
   Dependent variable = Mean (expected value)
+ random error
   y = E(y) + ε
   If y is normally distributed, know the mean
and the standard deviation, we can make a
probability statement
Probability statement

   Let’s say the mean cholesterol level for
   Standard deviation= 50 units
   What does this distribution look like?
   “the probability that ____’s cholesterol will fall
within 2 standard deviations of the mean is
.95”
Independent variables (predictor
variables)
   explains or causes changes in the response
variables
(The effect of the IV on the DV)
(Predicting the DV based on the IV)
   What independent variables might help us
predict cholesterol levels?
Examples

   The effect of a reading intervention
program on student achievement in
   Predict state revenues
   Predict GPA based on SAT
   predict reaction time from blood alcohol
level
Regression Analysis

   Build a model that can be used to predict one
variable (y) based on other variables (x1, x2,
x3,… xk,)
   Model: a prediction equation relating y to x1, x2,
x3,… xk,
   Predict with a small amount of error
Typical Strategy for Regression Analysis

Start

Conduct exploratory data analysis

Develop one or more tentative models

Identify most suitable model

Make inferences based on model

Stop
Fitting the Model: Least Squares Method

   Model: an equation that describes the
relationship between variables
   Let’s look at the persistence example
Method of Least Squares

Let’s look at the persistence example
ˆ      SS xy
1 
SS xx

Finding the Least Squares Line

ˆ      SS xy
   Slope:       1 
SS xx

              ˆ         ˆ
Intercept:  0  y  1 x

   The line that makes the vertical distances of the
data points from the line as small as possible
   The SE [Sum of Errors (deviations from the line, residuals)]
equals 0
   The SSE (Sum of Squared Errors) is smaller than for any
other straight-line model with SE=0.
Regression Line

   Has the form y = a + bx
   b is the slope, the amount by which y changes
when x increases by 1 unit
   a is the y-intercept, the value of y when x = 0 (or
the point at which the line cuts through the x-axis)
Simplest of the probabilistic models:
Straight-Line Regression Model
 First order linear model
 Equation: y = β0 + β1x + ε

 Where

y = dependent variable
x = independent variable
β0 = y-intercept
β1 = slope of the line
ε = random error component
Let’s look at the relationship between two
variables and construct the line of best fit
   Minitab example: Beers and BAC

```
To top