# Correlation, Covariance, Pearson's Coefficient I. Basics A

Document Sample

```					                       Correlation, Covariance, & Pearson’s Coefficient

I. Basics:

A. CORRELATION-
1) A way of describing if two things have a linear trend among each other. In other
words, if one goes up, the other does the same, or if one does less the other variable
does more.
a) Negatively correlated -as one goes up the other goes down
1) ex: hours spent watching television & grade on statistics test
b) Positively correlated - as one goes up in value, so does the other
1) ex: hrs studying & grade on stat test

B. COVARIANCE (Sxy)-

1) Not a good measure as to the extent of, but will tell if linear relationship exists.
a) usually just tells you if the trends exist
2) Formula for sample:
∑                     ∑
:
1                    1
*denominator becomes N for population*

C. CORRELATION COEFFICIENT(r-sample, p-population)-
1) A better indicator of degree.
c) Range [-1,1]
d) r = -1 → negatively correlated →as one goes up the other goes down
2) ex: hours spent watching television & grade on statistics test
e) r = 1 → positively correlated → as one goes up in value, so does the other
2) ex: hrs studying & grade on stat test
f) closer to 1 or -1, the stronger the correlation
g) r=0; no correlation
2) formula:
∑                                   ∑
∑         ∑                         ∑                 ∑
∑                ∑
3) Ex: A sample of 10 students in your class record the time they spent studying (in
minutes), & then later looked at their grades to obtain the following data:
Table1

Time spent               40      42   47    47   25    44    41    48    35   28   Σ=
studying (X)                                                                       397

Grades   (Y)             78      63   80    85   50    79    84    92    94   48   Σ=
723

Steps:
1) Calculate means:
∑
a)                39.7
∑
b)                72.3
2) Calculate covariance:
∑
*Note- if you want covariance of a population, the bottom changes to N.

Table 2

X            X2          Y              Y2          XY

40           1600        78             6084        3120

42           1764        63             3969        2646

47           2209        80             6400        3760

47           2209        85             7225        3995

25           625         50             2500        1250

44           1936        79             6241        3476

41           1681        84             7056        3444

48           2304        92             8464        4416

35           1225        64             4096        2240

28           784         48             2304        1344

Σ = 397         Σ= 16337     Σ = 723    Σ =54339        Σ = 29691
a) Method 1 for calculating Sxy:
∑                                    .     .
COVARIANCE =                          =

.
=
.
=            109.76, or if you prefer the other method,

b) Method 2 for calculating Sxy
Table 3:

X         (X -        )           Y                 (Y -    )   (X -     ) (Y -   )

40        0.3                     78                5.7         1.71

42        2.3                     63                -9.3        -21.39

47        7.3                     80                7.7         56.21

47        7.3                     85                12.7        92.71

25        -14.7                   50                -22.3       327.81

44        4.3                     79                6.7         28.81

41        1.3                     84                11.7        15.21

48        8.3                     92                19.7        163.51

35        -4.7                    64                -8.3        39.01

28        -11.7                   48                -24.3       284.31

Σ = 397      Σ=0                     Σ = 723           Σ=0         Σ = 987.9

∑                 .
Sxy =              =              = 109.76

*So there is a positive, correlation between x & y, we just don’t know to what
degree. To get a better idea as to how much of a correlation, Pearson’s
coefficient is used. In order to attain this, the individual standard deviations
must be calculated, and then you divide the covariance by these values*
3) If asked to subsequently calculate r, calculate: Sx & Sy

Table 4:

X              (X -        )   (X -             )2   Y         (Y -    )   (Y -     )2   (X -     ) (Y -   )

40             0.3             .09                   78        5.7         32.49         1.71

42             2.3             5.29                  63        -9.3        86.49         -21.39

47             7.3             53.29                 80        7.7         59.29         56.21

47             7.3             53.29                 85        12.7        161.29        92.71

25             -14.7           216.09                50        -22.3       497.29        327.81

44             4.3             18.49                 79        6.7         44.89         28.81

41             1.3             1.69                  84        11.7        136.89        15.21

48             8.3             68.89                 92        19.7        388.09        163.51

35             -4.7            22.09                 64        -8.3        68.89         39.01

28             -11.7           136.89                48        -24.3       590.49        284.31

Σ = 397         Σ=      0       Σ =576.1              Σ = 723   Σ=0         Σ = 2066.1 Σ = 987.9

a) first way to calculate the standard deviations:
X                    .
=                    √64.01 = 8.00,

Y                        .
=                         √229.567 = 15.15

.
Therefore,                                       0.903;
.
This is a better indicator because it is close to the maximal value of 1 for r!
b) If you prefer to use the other formula for standard deviation:

Table 5

X           X2               Y              Y2              XY

40          1600             78             6084            3120

42          1764             63             3969            2646

47          2209             80             6400            3760

47          2209             85             7225            3995

25          625              50             2500            1250

44          1936             79             6241            3476

41          1681             84             7056            3444

48          2304             92             8464            4416

35          1225             64             4096            2240

28          784              48             2304            1344

Σ = 397     Σ = 16337 723                  Σ = 16337 Σ = 29691

∑                                                     ∑
∑                                                     ∑

.                                                .
√64.01           =

√229.6
=8.00                                                = 15.2

.
Then,                               0.906; This is a better indicator because it is
.
close to the maximal, or if you prefer the other method, the table differs a little
bit…
4) If you are asked to calculate r only:

∑
a) the first method is referring to Table 5
∑            ∑
∑            ∑

.   .                          .          .
=                                                                        0.905    0.91
∑              .       .
∑

*As you would expect, the value is close to 1; indicating that there is a pretty linear
relationship between hrs studied and the grade on the exam!

b) Method 2: refer back to Table 4 for the values:
∑                              .
= .905       .91
∑      ∑               .           .

II. Other corresponding topics:

A. Coefficient of Determinaiton, (r2):

1. an expression that measures the proportion of the variation in y that is explained by
the variation in x.

a) So in our example, the r2 value is .8281, this indicates that roughly 83% of the
variation in your exam grades is accounted for by the minutes spent studying.

b) The r2 can also be calculated using the sum of the errors, SSE = 1- SSE/SSy, this
will be discussed in greater detail in the regression analysis handout.

B. Hypothesis testing with r:

1. If you are testing correlation, the null and research(alternative) hypotheses are,

a) i) Ho : p = 0                  *Note- we are dealing with the population
parameter p, correlation coefficient, as
opposed to r when dealing with the
hypotheses.

HA: p    0            This is basically stating there is no correlation, p = 0, and
there is a correlation, p 0.
ii) Use the following formula: t = r        as the test statistic,

1. df = N – 2

2. So for our example, let’s calculate t:

a) t = .91            = 6.2,
.

b) if our    = .05,      2 = .025, using the t-score table with a df = 8,

the cutoff is 2.306. Since the absolute value of our calculated t-score

is greater than our cutoff value of 2.306, we can reject the null in favor

of the alternative. In other words, there is definitely a correlation.

Even if     is decreased to .01, the cutoff is 3.355, which is still way

under our calculated value. You would expect a correlation between

hours studied and the grade.

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 975 posted: 8/28/2009 language: English pages: 7
How are you planning on using Docstoc?