Correlation Coefficient

Document Sample
Correlation Coefficient Powered By Docstoc
					The Correlation Coefficient
Social Security Numbers
A Scatter Diagram
       The Point of Averages
• Where is the center of the cloud?
• Take the average of the x-values and the
  average of the y-values; this is the point of
• It locates the center of the cloud.
• Similarly, take the SD of the x-values and
  the SD of the y-values.
  The Correlation Coefficient
• An association can be stronger or weaker.
• Remember: a strong association means that
  knowing one variable helps to predict the
  other variable to a large extend.
• The correlation coefficient is a numerical
  value expressing the strength of the
   The Correlation Coefficient
• We denote the correlation coefficient by r.
• If r = 0, the cloud is completely formless;
  there is no correlation between the
• If r = 1, all the points lie exactly on a line
  (not necessarily x = y) and there is perfect
Strong and Weak
  The Correlation Coefficient
• What about negative values?
• The correlation coefficient is between –1
  and 1, negative shows negative association,
  positive indicates positive association.
• Note that –0.90 shows the same degree of
  association as +0.90, only negative instead
  of positive.
         Computing the
      Correlation Coefficient
1. Convert each variable to standard units.
2. The average of the products gives the
   correlation coefficient r.

                 r = average of
  (x in standard units)  (y in standard units)
         We must first convert to standard
x   y
1   5    Find the average and the SD of
3   9    the x-values: average = 4, SD = 2.

4   7    Find the deviation: subtract the
         average from each value, and
5   1    divide by the SD.
7   13   Then do the same for the y-values.
         Standard units
x   y      x       y      xy
1   5    -1.5     -0.5    0.75
3   9    -0.5     0.5     -0.25
4   7     0.0     0.0     0.00
5   1     0.5     -1.5    -0.75
7   13    1.5     1.5     2.25
• Finally, take the average of the products

                 r = average of
  (x in standard units)  (y in standard units)

• In this example, r = 0.40.
                 The SD line
• If there is some association, the points in the
  scatter diagram cluster around a line. But around
  which line?
• Generally, this is the SD line. It is the line through
  the point of averages.
• It climbs at the rate of one vertical SD for each
  horizontal SD.
• Its slope is (SD of y) / (SD of x) in case of a
  positive correlation, and –(SD of y) / (SD of x) in
  case of a negative correlation.
       Five-point Summary
• Remember the five-point summary of a data
  set: minimum, lower quartile, median,
  upper quartile, and maximum.
• A five-point summary for a scatter plot is:
  average x-values, SD x-values, average y-
  values, SD y-values, and correlation
  coefficient r.

Shared By: