# Correlation

Document Sample

```					                                                   Overview

Correlation

Correlation is a (dimensionless) measure of the STRENGTH of the LINEAR
relationship between TWO variables. It takes values between -1 (a
perfectly-linear downward-sloping relationship) and +1 (perfectly-linear
upward-sloping).

Cautions:
Correlation measures the linearity of the relationship, not the slope.
Independent random variables are uncorrelated (i.e., have a correlation of
zero with one another), but uncorrelated variables may be related
nonlinearly, or linearly in a relationship including other variables.

Correlations show the two-dimensional shadows of a multi-dimensional
relationship (and shadows can be severe distortions of reality, either
concealing the truth, or even reversing it). Beware!

A correlation matrix:

Correlations

Costs   Mileage          Age        Make
Costs       1.00000 0.77058         0.02346    -0.24036
Mileage      0.77058 1.00000        -0.49615    -0.47808
Age        0.02346 -0.49615        1.00000     0.16366
Make       -0.24036 -0.47808        0.16366     1.00000

The correlation between any variable and itself is 1; the correlation between
X and Y is equal to the correlation between Y and X. The positive correlation
between Costs and Mileage shows that the cars being driven further
typically are the cars with the higher maintenance costs. The negative
correlation between Mileage and Age shows that the newer cars are being
driver further (i.e., the lower values of Age are typically associated with the
higher values of Mileage, and vice versa). The negative correlation between
Costs and Make shows that the cars coded as Make = 1 currently are
experiencing lower costs, on average, than are the cars coded as Make = 0.
All of these observations should be viewed as "typical." For example,
it might well be that a regression analysis separating the effect of Make from
the effect of Mileage on Costs shows that, given cars of both makes driven
the same number of miles, you'd expect higher maintenance costs for the
car of Make = 1, and that the "typical" Make = 1 car has lower costs
currently only because the Make = 0 cars are being driven much further (as
indicated by the negative correlation between Mileage and Make).

Calculation:
Cov (X,Y) = E[XY] - E[X] E[Y]
Corr (X,Y) = Cov (X,Y) / (StdDev(X) • StdDev(Y))

In a SIMPLE (one independent variable) regression:
The (unadjusted) coefficient of determination is the square of the
correlation between the dependent and independent variables.

Page 1
Overview

This is why the coefficient of determination is sometimes referred
to as the "R-squared" of the regression, where "R" represents the
correlation. (I dislike this terminology, since in the case of a
multiple regression, the coefficient of determination is not the
square of any meaningful quantity.)

Numerical Interpretation (1):
The magnitude of the correlation between two variables can be
interpreted by squaring it, and then viewing it as the coefficient of
determination in a simple regression of one of the variables onto
the other.

Again, in a SIMPLE (one independent variable) regression:
The slope of the regression line is:

b = Cov(X,Y) / Var(X) = Corr(X,Y) • ( StdDev(Y) / StdDev(X) )

When there is only one independent variable, the correlation
appears as the (otherwise meaningless) beta-weight. Indeed, the
following relationship (really just another way of writing the
prediction equation for a simple regression) helps to explain the
phenomenon known as "regression to the mean":
Ypred-Y                  X-X 
= Corr(X,Y)       
sY                    sX 

Numerical Interpretation (2):
If one variable is z standard deviations above average, then
(with no other information available) one would expect the other to
be ( z • correlation ) standard deviations above average.

Page 2
Demonstration chart

Correlation(X,Y) = 0.884

27

25

23

21
Y

19

17

15

13

11
1   3   5   7          9             11    13   15   17   19
X

Page 3

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 9 posted: 8/8/2012 language: pages: 3
How are you planning on using Docstoc?