Correlation by benbenzhou

VIEWS: 5 PAGES: 17

									              Covariance and Correlation




                             HAPINES2 3 4 5
Questions:
What does it mean to say that
  two variables are associated
  with one another?
How can we mathematically
  formalize the concept of
  association?
                         0 1
                                              1   2   3   4   5

                                                  CO F F E
            Limitation of covariance

• One limitation of the covariance is that the size of the
  covariance depends on the variability of the
  variables.
• As a consequence, it can be difficult to evaluate the
  magnitude of the covariation between two variables.
   – If the amount of variability is small, then the
     highest possible value of the covariance will also
     be small. If there is a large amount of variability,
     the maximum covariance can be large.
           Limitations of covariance

• Ideally, we would like to evaluate the magnitude of
  the covariance relative to maximum possible
  covariance
• How can we determine the maximum possible
  covariance?
               Go vary with yourself

• Let’s first note that, of all the variables a variable may
  covary with, it will covary with itself most strongly
• In fact, the “covariance of a variable with itself” is an
  alternative way to define variance:


               X  M  X  M   cov
                         X            X
                                               XX
                         N

              X  M   X
                             2

                                  cov XX  varX
                    N
              Go vary with yourself

• Thus, if we were to divide the covariance of a
  variable with itself by the variance of the variable, we
  would obtain a value of 1. This will give us a
  standard for evaluating the magnitude of the
  covariance.


      X  M  X  M 
               X          X
                                     Note: I’ve written
                                     the variance of X as
            Ns X s X                 sX  sX because the
                                     variance is the SD
                                     squared
              Go vary with yourself

• However, we are interested in evaluating the
  covariance of a variable with another variable (not with
  itself), so we must derive a maximum possible
  covariance for these situations too.
• By extension, the covariance between two variables
  cannot be any greater than the product of the SD’s for
  the two variables.
• Thus, if we divide by sxsy, we can evaluate the
  magnitude of the covariance relative to 1.

     X  M Y  M 
              X         Y

           Ns X sY
             Spine-tingling moment

• Important: What we’ve done is taken the covariance
  and “standardized” it. It will never be greater than 1
  (or smaller than –1). The larger the absolute value of
  this index, the stronger the association between two
  variables.
            Spine-tingling moment
• When expressed this way, the covariance is called a
  correlation
• The correlation is defined as a standardized
  covariance.




               X  M Y  M   r
                         X         Y

                      Ns X sY
                    Correlation
                   z    X   zY
                                  r
                       N
• It can also be defined as the average product of z-
  scores because the two equations are identical.
• The correlation, r, is a quantitative index of the
  association between two variables. It is the average
  of the products of the z-scores.
• When this average is positive, there is a positive
  correlation; when negative, a negative correlation
      y0 1 2
                             •Mean of each variable is
                              zero
                         A

                             •A, D, & B are above the
                     D
                              mean on both variables
                     B       •E & C are below the mean
                     F
                              on both variables

                E
                             •F is above the mean on x,
                              but below the mean on y
-2 -1


                 C




               -2-10     1   2

                    x
      y0 1 2   +=   ++=+
                               A



                       D

                       B


                       F


                 E
-2 -1

                   C

               =+   +=

               -2-10       1   2

                       x
                                     Correlation
        Person                Zx            Zy            ZxZy
          A                   1.55          1.39           2.15
          B                   0.15          0.28           0.04
          C                  -0.75         -1.44           1.08
          D                   0.48          0.64           0.31
          E                  -1.34         -0.69           0.92
          F                   0.08         -0.19          -0.02

                                                    z ZxZy = .4.49
                                                      z  4 49
                                                       x y
-2 -1 0y 1 2




                         A




                                                    x y N .75
                                                    zzZxZy = .75
                     D

                     B


                     F


                E
                                                    N
                 C




               -2-10     1   2

                    x
                     Correlation

• The value of r can range between -1 and + 1.
• If r = 0, then there is no correlation between the two
  variables.
• If r = 1 (or -1), then there is a perfect positive (or
  negative) relationship between the two variables.
                          -2 -1 y02 1 2 3


              2




 r=+1
             - - 0
                11

        y1
                   23
              2
                        -3 -2 -1 y2 0 1 2
r=-1
             - - 0
                11

        y1
                   23




                        -4 -3 -2 y-12 0 1 2
              2




 r=0
             - - 0
                11

        y1
                   23
                    Correlation

• The absolute size of the correlation corresponds to
  the magnitude or strength of the relationship
• When a correlation is strong (e.g., r = .90), then
  people above the mean on x are substantially more
  likely to be above the mean on y than they would be
  if the correlation was weak (e.g., r = .10).
      y02 1 2 3




                                                             y02 1 2
                          -2 -1 0 y2 1 2
-2 -1




                                                       -2 -1
                   2 11
                  - - 0 23                  2
                                           - - 0
                                              11 23                     2 11
                                                                       - - 0 23

                    y1                         y1                           y1




                   r=+1                    r = + .70                   r = + .30
                    Correlation

• Advantages and uses of the correlation coefficient
   – Provides an easy way to quantify the association
     between two variables
   – Employs z-scores, so the variances of each
     variable are standardized & = 1
   – Foundation for many statistical applications

								
To top