Chapter 9 Correlation and Regression - s3.amazonaws.com s3

Document Sample

```					         STATISTICS
Chapter 5 – Correlation/Regression

MVS 250: V. Katch                     1
Overview

Paired Data
v is there a relationship
v if so, what is the equation
v use the equation for prediction

2
Correlation

3
Definition
vCorrelation
exists between two variables
when one of them is related to
the other in some way

4
Assumptions
1.  The sample of paired data (x,y) is a
random sample.
2.  The pairs of (x,y) data have a
bivariate normal distribution.

5
Definition
vScatterplot (or scatter diagram)
is a graph in which the paired
(x,y) sample data are plotted with
a horizontal x axis and a vertical
y axis.  Each individual (x,y) pair
is plotted as a single point.
6
Scatter Diagram of Paired Data

7
Scatter Diagram of Paired Data

8
Positive Linear Correlation

y                      y                         y

x                         x                         x
(a) Positive             (b) Strong                (c) Perfect
positive                  positive

Scatter Plots
9
Negative Linear Correlation

y                        y                     y

x                     x                      x
(d) Negative             (e) Strong            (f) Perfect
negative               negative

Scatter Plots
10
No Linear Correlation

y                        y

x                               x
(g) No Correlation           (h) Nonlinear Correlation

Scatter Plots
11
Definition
vLinear Correlation Coefficient r
measures strength of the linear relationship
between paired x and y values in a sample

Sxy/n - (Sx/n)(Sy/n)
r=
(SDx) (SDy)
Where Sxy/n is the mean of the cross products;
(Sx/n) is the mean of the x variable; (Sy/n) is the
mean of the y variable; SDx is the standard
deviation of the x variable and SDy is the
standard deviation of the x variable
12
Notation for the
Linear Correlation Coefficient
n     number of pairs of data  presented

S     denotes the addition of the items indicated.

Sx/n denotes the mean of all x values.
Sy/n denotes the mean of all y values.
Sxy/n denotes the mean of the cross products [x
times y, summed; divided by n]

r     linear correlation coefficient for a sample
r     linear correlation coefficient for a
population
13
Rounding the
Linear Correlation Coefficient r

v  Round to three decimal places
vUse calculator or computer if possible

14
Properties of the
Linear Correlation Coefficient r

1.   -1 £ r £ 1
2.   Value of r does not change if all values of
either variable are converted to a different
scale.
3.  The r is not affected by the choice of x and y.
Interchange x and y and the value of r will not
change.
4.  r measures strength of a linear relationship.
15
Interpreting the Linear
Correlation Coefficient
vIf the absolute value of r exceeds the
value in Sig. Table, conclude that there is
a significant linear correlation.

vOtherwise, there is not sufficient
evidence to support the conclusion of
significant linear correlation.

vRemember to use n-2
16
Common Errors Involving Correlation

1.  Causation:  It is wrong to conclude that
correlation implies causality.

2.   Averages:  Averages suppress individual
variation and may inflate the correlation
coefficient.

3.   Linearity:  There may be some relationship
between x and y even when there is no
significant linear correlation.
17
Common Errors Involving Correlation
250

200

150
Distance
(feet)

100

50

0
0   1   2   3    4     5     6   7   8

Time (seconds)
18
Correlation is Not Causation

A                      B

C

19
Correlation Calculations

Rank Order Correlation - Rho
Pearson’s - r

20
Rank Order Correlation
Hits     Rank   HR   Rank   D    D2
1        10    3     8     2    4
2         9    4     7     2    4
3         8    5     6     2    4
4        7     1     10    -3   9
5        6      7    4      2    4
6        5      6    5      0    0
7        4      2    9     -5   25
8        3     10    1      2    4
9        2      9    2      0    0
10        1      8    3      2    4

21
Rank Order Correlation, cont
2            2
Rho = 1- [6 (∑D ) / N (N -1)]
D2
Hits   Rank   HR   Rank   D
Rho = 1-  [6(58)/10(102-1)]
1      10    3     8     2     4
2      9     4     7     2     4
Rho = 1- [348 / 10 (100 -1)]
3      8     5     6     2     4
4      7     1     10    -3    9    Rho = 1- [348 / 990]
5      6     7     4     2     4
6      5     6     5     0     0    Rho = 1- 0.352
7      4     2     9     -5    25
8      3     10    1     2     4    Rho = 0.648
9      2     9     2     0     0
10      1     8     3     2     4

N=10
(∑D2 = 58)
22
Pearson’s r
Hits   HR   Sxy
1     3     3         Sxy/n - (Sx/n)(Sy/n)
2     4     8      r=
3     5     15           (SDx) (SDy)
4     1     4
5     7     35
6     6     36     r = 32.86 - (5.5) (5.5)/(3.03) (3.03)
7     2     14
r = 35.86 - 30.25 / 9.09
8     10    80
9     9     81     r =  5.61 / 9.09
10    8     80
r = 0.6172
Sx/n Sx/n Sxy/n
=5.5 = 5.5 =32.86
23
Pearson’s r
Excel Demonstration

24
Is there a significant linear correlation?
Data from the Garbage Project
x  Plastic (lb)   0.27 1.41   2.19   2.83   2.19   1.81   0.85   3.05
y  Household       2    3      3      6      4      2      1      5

25
Is there a significant linear correlation?
Data from the Garbage Project
x  Plastic (lb)   0.27 1.41   2.19   2.83   2.19   1.81   0.85   3.05
y  Household       2    3      3      6      4      2      1      5

26
Is there a significant linear correlation?
Data from the Garbage Project
x  Plastic (lb)   0.27 1.41   2.19    2.83   2.19   1.81   0.85   3.05
y  Household       2    3      3       6      4      2      1      5

r = 0.842
R2 = 0.71

27
Is there a significant linear correlation?
n     a = .05   a = .01
n = 8      a = 0.05     H0: r = 0                                  4      .950      .999

H1 :r  ¹ 0
5      .878      .959
6      .811      .917
7      .754      .875
8      .707      .834
9      .666      .798
10      .632      .765
11      .602      .735

Test statistic is r = 0.842
12      .576      .708
13      .553      .684
14      .532      .661
15      .514      .641
16      .497      .623
17      .482      .606
18      .468      .590
19      .456      .575

Critical values are r = - 0.707 and 0.707
20      .444      .561
25      .396      .505
30      .361      .463
(Table  R with n = 8 and a = 0.05)                                   35
40
.335
.312
.430
.402
45      .294      .378
50      .279      .361
60      .254      .330
70      .236      .305
80      .220      .286
90      .207      .269
100      .196      .256
TABLE R Critical Values of the Pearson Correlation Coefficient  r

28
Is there a significant linear correlation?
0.842  > 0.707, That is the test statistic does fall within  the
critical region.
Therefore, we REJECT H0: r = 0 (no correlation) and conclude
there is a significant linear correlation between the weights of

Reject            Fail to reject                  Reject
r = 0                 r = 0                       r = 0

-1       r = - 0.707         0                                  1
r =  0.707

Sample data:
r = 0.842

29
Method 1:   Test Statistic is t
(follows format of earlier chapters)

30
Formal Hypothesis Test
v  To determine whether there is a
significant linear correlation
between two variables
v  Two methods
v  Both methods let H0: r = 0
(no significant linear correlation)
H1: r ¹ 0
(significant linear correlation)
31
Method 2:   Test Statistic is r
(uses fewer calculations)

vTest statistic: r
vCritical values: Refer to Table R
(no  degrees of freedom)

32
Method 2:   Test Statistic is r
(uses fewer calculations)

vTest statistic: r
vCritical values: Refer to Table A-6
(no  degrees of freedom)

Reject             Fail to reject               Reject
r = 0                  r = 0                    r = 0

-1        r = - 0.811         0             r =  0.811          1

Sample data:
r = 0.828

33
Method 1:   Test Statistic is t
(follows format of earlier chapters)

Test statistic:
r
t=
1-r2
n-2

Critical values:

use Table T with
degrees of freedom = n - 2
34
Start

Testing for a                                       Let H0: r = 0
H1: r ¹ 0

Linear Correlation                                        Select a
significance
level a

Calculate r  using
Formula 9-1
METHOD 1                                                                 METHOD 2

The test statistic is                                                                  The test statistic is r
r
t =                                                                          Critical values of t  are from
1 - r 2                                                              Table A-6
n -2
Critical values of t  are from Table A-3
with n -2 degrees of  freedom

If the absolute value of the
test statistic exceeds the
critical values, reject H0: r = 0
Otherwise fail to reject H0

If H0 is rejected conclude that there
is a significant linear correlation.
If you fail to reject H0, then there is
not sufficient evidence to conclude
that there is linear correlation.
35
Why does the critical value of r
increase as sample size decreases?

A correlation by chance is more likely.

36
Coefficient of Determination
(Effect Size)

2
r
The part of variance of one variable that can be
explained by the variance of a related variable.

37
Justification for r Formula

S (x -x) (y -y)
r =    (n -1) Sx Sy
(x, y)           centroid of sample points
x=3
y                            x - x  =  7- 3 =  4
(7, 23)
24
•
20
y - y =  23 - 11 =  12

16
•
12
y = 11
(x, y)
8

4
•
•
0                                                             x
0      1     2        3        4        5       6     7
38

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 1 posted: 7/17/2013 language: English pages: 38