Embed
Email

Regression

Document Sample
Regression
Shared by: HC1112140287
Categories
Tags
Stats
views:
4
posted:
12/13/2011
language:
pages:
14
Regression

• Regression: Mathematical method for determining the

best equation that reproduces a data set

• Linear Regression: Regression method applied with

a linear model (straight line)

• Uses

– Prediction of new X,Y values

– Understanding data behavior

• Verification of hypotheses/physical laws

Regression

• The Linear Model X1=1, Y1=2.4

X2=20, Y2=10



Y = mX + b 12



10



8

Y = Dependent variable



Y

6

DY

X = Independent variable 4



DX

m = slope = DY/DX 2



0

b = y-intercept (point where 0 5 10 15 20 25



line crosses y-axis at x=0) X

Regression

• Fitting the data: finding the equation for the straight

line that does the best job of reproducing the data.

Average Income versus % with a College Degree (by State)





40,000

Average Income Level ($ per year)









35,000





30,000





25,000





20,000





15,000

10 15 20 25 30 35



Percentage of Population with

College Degree or Higher

Regression

• Residual: Difference between measured and

calculated Y-values



Average Income versus % with a College Degree (by State)





26,000

Average Income Level ($ per year)









25,500

25,000

24,500

24,000



23,500

23,000

22,500

22,000

15 15.5 16 16.5 17 17.5 18 18.5 19 19.5 20



Percentage of Population with

College Degree or Higher

Regression Analysis

• Use the least square method to “best fit” a

straight line through the data points.

• A straight line is described by its slope and “y”-

intercept in a x-y plot.

• Need to determine the numerical values of the

slope and the “y”-intercept from the data.

• This is equivalent to adding a trendline to your

scatter plot in EXCEL.

Regression Analysis

• The least square method consists of defining a

difference, called the residual, between the

regression line and a data point along a

measured “x” value.

• Then add up the squared residuals for all data

points.

• Adjusting the slope and the “y”-intercept of the

regression line so that the sum of squared

residuals, called regression error, has the

smallest value.

Regression Analysis

• The covariance appears in the calculation

of the correlation coefficient between the

measurements of two variables.

• Let us denote the two variables as “x” and

“y”.

• Their measurements are the “x” data set

and the “y” data set.

Regression Analysis

• The slope of the regression line is given by the

ratio of the covariance between the “x” and “y”

data sets and of the variance of the “x” data set.

• You then use the equation of the line to

determine the y-intercept. You MUST use the

mean of x and the mean of y for this equation

since your data points are likely not on the

regression line.

Regression Analysis

• Once we determined the slope and the “y”

intercept of the regression line, we have a

mathematical relation that ties the “x”

variable to the “y” variable.

• We can use this relation to predict

values of “y” given a “x” value that are

not on the data sets.

Regression Analysis

• Interpolation – the process by which we

use the regression line to predict a value

of the “y” variable for a value of the “x”

variable that is not one of the data points

but is within the range of the data set.

• The “x” and “y” points will lie on the

regression line.

Regression Analysis

• Extrapolation – the process by which we

use the regression line to predict a value

of the “y” variable for a value of the “x”

variable that is outside of the range of the

data set.

• The “x” and “y” points also lie on the

regression line but outside of the range of

the data set.

Tricks of the Trade

• A curve can be partitioned into sections

and “best” fitted a different curve in each

section.

• Use scaling as a mean to increase the

accuracy of the “fitted” curve.

Multivariate Analysis

Regression

• Prediction: Once the best fit line has been determined,

the equation can be used to predict new values of Y for

any given X and vice versa. (Interpolation/Extrapolation)

y = 772.03x + 10810



If a states % of the population with a college degree is 20%,

then they can expect an average income level of

y = 772.03(20) + 10810 = $26,250



If a states average income level is $30,000, then what % of

its population has a college degree?

x = (30,000 – 10810)/772.03 = 24.9%

Multivariate Analysis



• Excel Functions and Tools

– SLOPE() - Returns the slope when passed X, Y data..

– INTERCEPT() - Returns the intercept when passed X, Y data..

– LINEST() - Returns the slope and intercepts when passed X, Y

data..

– TREND() - Returns predicted values in a linear trend when

passed X, Y data..

– Trendline (from the Chart menu) Returns the trendline,

equation, and correlation coefficient for a set of X,Y data.


Related docs
Other docs by HC1112140287
Plyn w jamie oplucnowej
Views: 3  |  Downloads: 0
Background Material: Markov Decision Process
Views: 0  |  Downloads: 0
ME 313 Sample Lab Report
Views: 1  |  Downloads: 0
Ideas for questions
Views: 0  |  Downloads: 0
COL
Views: 0  |  Downloads: 0
Voice Over Internet Protocol
Views: 2  |  Downloads: 0
???????
Views: 0  |  Downloads: 0
LEARNER-CENTEREDNESS 101
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!