VIEWS: 5 PAGES: 17 POSTED ON: 11/28/2011 Public Domain
Covariance and Correlation HAPINES2 3 4 5 Questions: What does it mean to say that two variables are associated with one another? How can we mathematically formalize the concept of association? 0 1 1 2 3 4 5 CO F F E Limitation of covariance • One limitation of the covariance is that the size of the covariance depends on the variability of the variables. • As a consequence, it can be difficult to evaluate the magnitude of the covariation between two variables. – If the amount of variability is small, then the highest possible value of the covariance will also be small. If there is a large amount of variability, the maximum covariance can be large. Limitations of covariance • Ideally, we would like to evaluate the magnitude of the covariance relative to maximum possible covariance • How can we determine the maximum possible covariance? Go vary with yourself • Let’s first note that, of all the variables a variable may covary with, it will covary with itself most strongly • In fact, the “covariance of a variable with itself” is an alternative way to define variance: X M X M cov X X XX N X M X 2 cov XX varX N Go vary with yourself • Thus, if we were to divide the covariance of a variable with itself by the variance of the variable, we would obtain a value of 1. This will give us a standard for evaluating the magnitude of the covariance. X M X M X X Note: I’ve written the variance of X as Ns X s X sX sX because the variance is the SD squared Go vary with yourself • However, we are interested in evaluating the covariance of a variable with another variable (not with itself), so we must derive a maximum possible covariance for these situations too. • By extension, the covariance between two variables cannot be any greater than the product of the SD’s for the two variables. • Thus, if we divide by sxsy, we can evaluate the magnitude of the covariance relative to 1. X M Y M X Y Ns X sY Spine-tingling moment • Important: What we’ve done is taken the covariance and “standardized” it. It will never be greater than 1 (or smaller than –1). The larger the absolute value of this index, the stronger the association between two variables. Spine-tingling moment • When expressed this way, the covariance is called a correlation • The correlation is defined as a standardized covariance. X M Y M r X Y Ns X sY Correlation z X zY r N • It can also be defined as the average product of z- scores because the two equations are identical. • The correlation, r, is a quantitative index of the association between two variables. It is the average of the products of the z-scores. • When this average is positive, there is a positive correlation; when negative, a negative correlation y0 1 2 •Mean of each variable is zero A •A, D, & B are above the D mean on both variables B •E & C are below the mean F on both variables E •F is above the mean on x, but below the mean on y -2 -1 C -2-10 1 2 x y0 1 2 += ++=+ A D B F E -2 -1 C =+ += -2-10 1 2 x Correlation Person Zx Zy ZxZy A 1.55 1.39 2.15 B 0.15 0.28 0.04 C -0.75 -1.44 1.08 D 0.48 0.64 0.31 E -1.34 -0.69 0.92 F 0.08 -0.19 -0.02 z ZxZy = .4.49 z 4 49 x y -2 -1 0y 1 2 A x y N .75 zzZxZy = .75 D B F E N C -2-10 1 2 x Correlation • The value of r can range between -1 and + 1. • If r = 0, then there is no correlation between the two variables. • If r = 1 (or -1), then there is a perfect positive (or negative) relationship between the two variables. -2 -1 y02 1 2 3 2 r=+1 - - 0 11 y1 23 2 -3 -2 -1 y2 0 1 2 r=-1 - - 0 11 y1 23 -4 -3 -2 y-12 0 1 2 2 r=0 - - 0 11 y1 23 Correlation • The absolute size of the correlation corresponds to the magnitude or strength of the relationship • When a correlation is strong (e.g., r = .90), then people above the mean on x are substantially more likely to be above the mean on y than they would be if the correlation was weak (e.g., r = .10). y02 1 2 3 y02 1 2 -2 -1 0 y2 1 2 -2 -1 -2 -1 2 11 - - 0 23 2 - - 0 11 23 2 11 - - 0 23 y1 y1 y1 r=+1 r = + .70 r = + .30 Correlation • Advantages and uses of the correlation coefficient – Provides an easy way to quantify the association between two variables – Employs z-scores, so the variances of each variable are standardized & = 1 – Foundation for many statistical applications