# Correlation by benbenzhou

VIEWS: 5 PAGES: 17

• pg 1
```									              Covariance and Correlation

HAPINES2 3 4 5
Questions:
What does it mean to say that
two variables are associated
with one another?
How can we mathematically
formalize the concept of
association?
0 1
1   2   3   4   5

CO F F E
Limitation of covariance

• One limitation of the covariance is that the size of the
covariance depends on the variability of the
variables.
• As a consequence, it can be difficult to evaluate the
magnitude of the covariation between two variables.
– If the amount of variability is small, then the
highest possible value of the covariance will also
be small. If there is a large amount of variability,
the maximum covariance can be large.
Limitations of covariance

• Ideally, we would like to evaluate the magnitude of
the covariance relative to maximum possible
covariance
• How can we determine the maximum possible
covariance?
Go vary with yourself

• Let’s first note that, of all the variables a variable may
covary with, it will covary with itself most strongly
• In fact, the “covariance of a variable with itself” is an
alternative way to define variance:

  X  M  X  M   cov
X            X
XX
N

 X  M   X
2

 cov XX  varX
N
Go vary with yourself

• Thus, if we were to divide the covariance of a
variable with itself by the variance of the variable, we
would obtain a value of 1. This will give us a
standard for evaluating the magnitude of the
covariance.

  X  M  X  M 
X          X
Note: I’ve written
the variance of X as
Ns X s X                 sX  sX because the
variance is the SD
squared
Go vary with yourself

• However, we are interested in evaluating the
covariance of a variable with another variable (not with
itself), so we must derive a maximum possible
covariance for these situations too.
• By extension, the covariance between two variables
cannot be any greater than the product of the SD’s for
the two variables.
• Thus, if we divide by sxsy, we can evaluate the
magnitude of the covariance relative to 1.

  X  M Y  M 
X         Y

Ns X sY
Spine-tingling moment

• Important: What we’ve done is taken the covariance
and “standardized” it. It will never be greater than 1
(or smaller than –1). The larger the absolute value of
this index, the stronger the association between two
variables.
Spine-tingling moment
• When expressed this way, the covariance is called a
correlation
• The correlation is defined as a standardized
covariance.

  X  M Y  M   r
X         Y

Ns X sY
Correlation
z    X   zY
r
N
• It can also be defined as the average product of z-
scores because the two equations are identical.
• The correlation, r, is a quantitative index of the
association between two variables. It is the average
of the products of the z-scores.
• When this average is positive, there is a positive
correlation; when negative, a negative correlation
y0 1 2
•Mean of each variable is
zero
A

•A, D, & B are above the
D
mean on both variables
B       •E & C are below the mean
F
on both variables

E
•F is above the mean on x,
but below the mean on y
-2 -1

C

-2-10     1   2

x
y0 1 2   +=   ++=+
A

D

B

F

E
-2 -1

C

=+   +=

-2-10       1   2

x
Correlation
Person                Zx            Zy            ZxZy
A                   1.55          1.39           2.15
B                   0.15          0.28           0.04
C                  -0.75         -1.44           1.08
D                   0.48          0.64           0.31
E                  -1.34         -0.69           0.92
F                   0.08         -0.19          -0.02

 z ZxZy = .4.49
 z  4 49
x y
-2 -1 0y 1 2

A

 x y N .75
zzZxZy = .75
D

B

F

E
N
C

-2-10     1   2

x
Correlation

• The value of r can range between -1 and + 1.
• If r = 0, then there is no correlation between the two
variables.
• If r = 1 (or -1), then there is a perfect positive (or
negative) relationship between the two variables.
-2 -1 y02 1 2 3

2

r=+1
- - 0
11

y1
23
2
-3 -2 -1 y2 0 1 2
r=-1
- - 0
11

y1
23

-4 -3 -2 y-12 0 1 2
2

r=0
- - 0
11

y1
23
Correlation

• The absolute size of the correlation corresponds to
the magnitude or strength of the relationship
• When a correlation is strong (e.g., r = .90), then
people above the mean on x are substantially more
likely to be above the mean on y than they would be
if the correlation was weak (e.g., r = .10).
y02 1 2 3

y02 1 2
-2 -1 0 y2 1 2
-2 -1

-2 -1
2 11
- - 0 23                  2
- - 0
11 23                     2 11
- - 0 23

y1                         y1                           y1

r=+1                    r = + .70                   r = + .30
Correlation

• Advantages and uses of the correlation coefficient
– Provides an easy way to quantify the association
between two variables
– Employs z-scores, so the variances of each
variable are standardized & = 1
– Foundation for many statistical applications

```
To top