# Correlation Coefficient

Document Sample

```					The Correlation Coefficient
Social Security Numbers
A Scatter Diagram
The Point of Averages
• Where is the center of the cloud?
• Take the average of the x-values and the
average of the y-values; this is the point of
averages.
• It locates the center of the cloud.
• Similarly, take the SD of the x-values and
the SD of the y-values.
Examples
The Correlation Coefficient
• An association can be stronger or weaker.
• Remember: a strong association means that
knowing one variable helps to predict the
other variable to a large extend.
• The correlation coefficient is a numerical
value expressing the strength of the
association.
The Correlation Coefficient
• We denote the correlation coefficient by r.
• If r = 0, the cloud is completely formless;
there is no correlation between the
variables.
• If r = 1, all the points lie exactly on a line
(not necessarily x = y) and there is perfect
correlation.
Strong and Weak
The Correlation Coefficient
• The correlation coefficient is between –1
and 1, negative shows negative association,
positive indicates positive association.
• Note that –0.90 shows the same degree of
association as +0.90, only negative instead
of positive.
Computing the
Correlation Coefficient
1. Convert each variable to standard units.
2. The average of the products gives the
correlation coefficient r.

r = average of
(x in standard units)  (y in standard units)
Example
We must first convert to standard
x   y
units.
1   5    Find the average and the SD of
3   9    the x-values: average = 4, SD = 2.

4   7    Find the deviation: subtract the
average from each value, and
5   1    divide by the SD.
7   13   Then do the same for the y-values.
Example
Standard units
x   y      x       y      xy
1   5    -1.5     -0.5    0.75
3   9    -0.5     0.5     -0.25
4   7     0.0     0.0     0.00
5   1     0.5     -1.5    -0.75
7   13    1.5     1.5     2.25
Example
• Finally, take the average of the products

r = average of
(x in standard units)  (y in standard units)

• In this example, r = 0.40.
The SD line
• If there is some association, the points in the
scatter diagram cluster around a line. But around
which line?
• Generally, this is the SD line. It is the line through
the point of averages.
• It climbs at the rate of one vertical SD for each
horizontal SD.
• Its slope is (SD of y) / (SD of x) in case of a
positive correlation, and –(SD of y) / (SD of x) in
case of a negative correlation.
Five-point Summary
• Remember the five-point summary of a data
set: minimum, lower quartile, median,
upper quartile, and maximum.
• A five-point summary for a scatter plot is:
average x-values, SD x-values, average y-
values, SD y-values, and correlation
coefficient r.

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 6 posted: 8/13/2012 language: English pages: 16