```					         STATISTICS
Chapter 5 – Correlation/Regression

MVS 250: V. Katch                     1
Overview

Paired Data
v is there a relationship
v if so, what is the equation
v use the equation for prediction

Correlation

Definition
vCorrelation
exists between two variables
when one of them is related to
the other in some way

Assumptions
1.  The sample of paired data (x,y) is a
random sample.
2.  The pairs of (x,y) data have a
bivariate normal distribution.

Definition
vScatterplot (or scatter diagram)
is a graph in which the paired
(x,y) sample data are plotted with
a horizontal x axis and a vertical
y axis.  Each individual (x,y) pair
is plotted as a single point.
Scatter Diagram of Paired Data

Scatter Diagram of Paired Data

Positive Linear Correlation

y                      y                         y

x                         x                         x
(a) Positive             (b) Strong                (c) Perfect
positive                  positive

Scatter Plots
Negative Linear Correlation

y                        y                     y

x                     x                      x
(d) Negative             (e) Strong            (f) Perfect
negative               negative

Scatter Plots
No Linear Correlation

y                        y

x                               x
(g) No Correlation           (h) Nonlinear Correlation

Scatter Plots
Definition
vLinear Correlation Coefficient r
measures strength of the linear relationship
between paired x and y values in a sample

Sxy/n - (Sx/n)(Sy/n)
r=
(SDx) (SDy)
Where Sxy/n is the mean of the cross products;
(Sx/n) is the mean of the x variable; (Sy/n) is the
mean of the y variable; SDx is the standard
deviation of the x variable and SDy is the
standard deviation of the x variable
Notation for the
Linear Correlation Coefficient
n     number of pairs of data  presented

S     denotes the addition of the items indicated.

Sx/n denotes the mean of all x values.
Sy/n denotes the mean of all y values.
Sxy/n denotes the mean of the cross products [x
times y, summed; divided by n]

r     linear correlation coefficient for a sample
r     linear correlation coefficient for a
population
Rounding the
Linear Correlation Coefficient r

v  Round to three decimal places
vUse calculator or computer if possible

Properties of the
Linear Correlation Coefficient r

1.   -1 £ r £ 1
2.   Value of r does not change if all values of
either variable are converted to a different
scale.
3.  The r is not affected by the choice of x and y.
Interchange x and y and the value of r will not
change.
4.  r measures strength of a linear relationship.
Interpreting the Linear
Correlation Coefficient
vIf the absolute value of r exceeds the
value in Sig. Table, conclude that there is
a significant linear correlation.

vOtherwise, there is not sufficient
evidence to support the conclusion of
significant linear correlation.

vRemember to use n-2
Common Errors Involving Correlation

1.  Causation:  It is wrong to conclude that
correlation implies causality.

2.   Averages:  Averages suppress individual
variation and may inflate the correlation
coefficient.

3.   Linearity:  There may be some relationship
between x and y even when there is no
significant linear correlation.
Common Errors Involving Correlation
250

200

150
Distance
(feet)

100

50

0
0   1   2   3    4     5     6   7   8

Time (seconds)
Correlation is Not Causation

A                      B

C

Correlation Calculations

Rank Order Correlation - Rho
Pearson’s - r

Rank Order Correlation
Hits     Rank   HR   Rank   D    D2
1        10    3     8     2    4
2         9    4     7     2    4
3         8    5     6     2    4
4        7     1     10    -3   9
5        6      7    4      2    4
6        5      6    5      0    0
7        4      2    9     -5   25
8        3     10    1      2    4
9        2      9    2      0    0
10        1      8    3      2    4

Rank Order Correlation, cont
2            2
Rho = 1- [6 (∑D ) / N (N -1)]
D2
Hits   Rank   HR   Rank   D
Rho = 1-  [6(58)/10(102-1)]
1      10    3     8     2     4
2      9     4     7     2     4
Rho = 1- [348 / 10 (100 -1)]
3      8     5     6     2     4
4      7     1     10    -3    9    Rho = 1- [348 / 990]
5      6     7     4     2     4
6      5     6     5     0     0    Rho = 1- 0.352
7      4     2     9     -5    25
8      3     10    1     2     4    Rho = 0.648
9      2     9     2     0     0
10      1     8     3     2     4

N=10
(∑D2 = 58)
Pearson’s r
Hits   HR   Sxy
1     3     3         Sxy/n - (Sx/n)(Sy/n)
2     4     8      r=
3     5     15           (SDx) (SDy)
4     1     4
5     7     35
6     6     36     r = 32.86 - (5.5) (5.5)/(3.03) (3.03)
7     2     14
r = 35.86 - 30.25 / 9.09
8     10    80
9     9     81     r =  5.61 / 9.09
10    8     80
r = 0.6172
Sx/n Sx/n Sxy/n
=5.5 = 5.5 =32.86
Pearson’s r
Excel Demonstration

Is there a significant linear correlation?
Data from the Garbage Project
x  Plastic (lb)   0.27 1.41   2.19   2.83   2.19   1.81   0.85   3.05
y  Household       2    3      3      6      4      2      1      5

Is there a significant linear correlation?
Data from the Garbage Project
x  Plastic (lb)   0.27 1.41   2.19   2.83   2.19   1.81   0.85   3.05
y  Household       2    3      3      6      4      2      1      5

Is there a significant linear correlation?
Data from the Garbage Project
x  Plastic (lb)   0.27 1.41   2.19    2.83   2.19   1.81   0.85   3.05
y  Household       2    3      3       6      4      2      1      5

r = 0.842
R2 = 0.71

Is there a significant linear correlation?
n     a = .05   a = .01
n = 8      a = 0.05     H0: r = 0                                  4      .950      .999

H1 :r  ¹ 0
5      .878      .959
6      .811      .917
7      .754      .875
8      .707      .834
9      .666      .798
10      .632      .765
11      .602      .735

Test statistic is r = 0.842
12      .576      .708
13      .553      .684
14      .532      .661
15      .514      .641
16      .497      .623
17      .482      .606
18      .468      .590
19      .456      .575

Critical values are r = - 0.707 and 0.707
20      .444      .561
25      .396      .505
30      .361      .463
(Table  R with n = 8 and a = 0.05)                                   35
40
.335
.312
.430
.402
45      .294      .378
50      .279      .361
60      .254      .330
70      .236      .305
80      .220      .286
90      .207      .269
100      .196      .256
TABLE R Critical Values of the Pearson Correlation Coefficient  r

Is there a significant linear correlation?
0.842  > 0.707, That is the test statistic does fall within  the
critical region.
Therefore, we REJECT H0: r = 0 (no correlation) and conclude
there is a significant linear correlation between the weights of

Reject            Fail to reject                  Reject
r = 0                 r = 0                       r = 0

-1       r = - 0.707         0                                  1
r =  0.707

Sample data:
r = 0.842

Method 1:   Test Statistic is t
(follows format of earlier chapters)

Formal Hypothesis Test
v  To determine whether there is a
significant linear correlation
between two variables
v  Two methods
v  Both methods let H0: r = 0
(no significant linear correlation)
H1: r ¹ 0
(significant linear correlation)
Method 2:   Test Statistic is r
(uses fewer calculations)

vTest statistic: r
vCritical values: Refer to Table R
(no  degrees of freedom)

Method 2:   Test Statistic is r
(uses fewer calculations)

vTest statistic: r
vCritical values: Refer to Table A-6
(no  degrees of freedom)

Reject             Fail to reject               Reject
r = 0                  r = 0                    r = 0

-1        r = - 0.811         0             r =  0.811          1

Sample data:
r = 0.828

Method 1:   Test Statistic is t
(follows format of earlier chapters)

Test statistic:
r
t=
1-r2
n-2

Critical values:

use Table T with
degrees of freedom = n - 2
Start

Testing for a                                       Let H0: r = 0
H1: r ¹ 0

Linear Correlation                                        Select a
significance
level a

Calculate r  using
Formula 9-1
METHOD 1                                                                 METHOD 2

The test statistic is                                                                  The test statistic is r
r
t =                                                                          Critical values of t  are from
1 - r 2                                                              Table A-6
n -2
Critical values of t  are from Table A-3
with n -2 degrees of  freedom

If the absolute value of the
test statistic exceeds the
critical values, reject H0: r = 0
Otherwise fail to reject H0

If H0 is rejected conclude that there
is a significant linear correlation.
If you fail to reject H0, then there is
not sufficient evidence to conclude
that there is linear correlation.
Why does the critical value of r
increase as sample size decreases?

A correlation by chance is more likely.

Coefficient of Determination
(Effect Size)

2
r
The part of variance of one variable that can be
explained by the variance of a related variable.

Justification for r Formula

S (x -x) (y -y)
r =    (n -1) Sx Sy
(x, y)           centroid of sample points
x=3
y                            x - x  =  7- 3 =  4
(7, 23)
24
•
20
y - y =  23 - 11 =  12

16
•
12
y = 11
(x, y)
8

4
•
•
0                                                             x
0      1     2        3        4        5       6     7
38

```
