Docstoc

Chapter 9 Correlation and Regression - s3.amazonaws.com s3

Document Sample
Chapter 9 Correlation and Regression - s3.amazonaws.com s3 Powered By Docstoc
					         STATISTICS
 Chapter 5 – Correlation/Regression




MVS 250: V. Katch                     1
            Overview

           Paired Data
v is there a relationship
v if so, what is the equation
v use the equation for prediction



                                    2
Correlation



              3
          Definition
vCorrelation 
 exists between two variables 
 when one of them is related to 
 the other in some way



                                   4
        Assumptions
1.  The sample of paired data (x,y) is a 
     random sample.
2.  The pairs of (x,y) data have a      
     bivariate normal distribution.




                                            5
           Definition
vScatterplot (or scatter diagram)
 is a graph in which the paired 
 (x,y) sample data are plotted with 
 a horizontal x axis and a vertical 
 y axis.  Each individual (x,y) pair 
 is plotted as a single point.
                                        6
Scatter Diagram of Paired Data




                                 7
Scatter Diagram of Paired Data




                                 8
    Positive Linear Correlation

y                      y                         y




                   x                         x                         x
    (a) Positive             (b) Strong                (c) Perfect
                                  positive                  positive



          Scatter Plots 
                                                                           9
Negative Linear Correlation

y                        y                     y




                     x                     x                      x
    (d) Negative             (e) Strong            (f) Perfect
                                negative               negative



                   Scatter Plots 
                                                                      10
    No Linear Correlation

y                        y




                     x                               x
(g) No Correlation           (h) Nonlinear Correlation




         Scatter Plots 
                                                         11
                    Definition
     vLinear Correlation Coefficient r
   measures strength of the linear relationship 
   between paired x and y values in a sample  

            Sxy/n - (Sx/n)(Sy/n)
         r=
               (SDx) (SDy)
Where Sxy/n is the mean of the cross products;
(Sx/n) is the mean of the x variable; (Sy/n) is the
mean of the y variable; SDx is the standard
deviation of the x variable and SDy is the
standard deviation of the x variable
                                                      12
               Notation for the 
        Linear Correlation Coefficient
n     number of pairs of data  presented

S     denotes the addition of the items indicated.

Sx/n denotes the mean of all x values.
Sy/n denotes the mean of all y values.
Sxy/n denotes the mean of the cross products [x 
      times y, summed; divided by n] 

r     linear correlation coefficient for a sample
r     linear correlation coefficient for a 
      population
                                                    13
           Rounding the 
   Linear Correlation Coefficient r

v  Round to three decimal places
vUse calculator or computer if possible




                                          14
        Properties of the 
 Linear Correlation Coefficient r

1.   -1 £ r £ 1
2.   Value of r does not change if all values of 
     either variable are converted to a different 
     scale.
3.  The r is not affected by the choice of x and y. 
      Interchange x and y and the value of r will not 
      change.
4.  r measures strength of a linear relationship.
                                                     15
        Interpreting the Linear 
        Correlation Coefficient
vIf the absolute value of r exceeds the 
 value in Sig. Table, conclude that there is 
 a significant linear correlation.  

vOtherwise, there is not sufficient 
 evidence to support the conclusion of 
 significant linear correlation.

vRemember to use n-2
                                                16
Common Errors Involving Correlation

1.  Causation:  It is wrong to conclude that 
    correlation implies causality.

2.   Averages:  Averages suppress individual 
     variation and may inflate the correlation 
     coefficient.

3.   Linearity:  There may be some relationship 
     between x and y even when there is no 
     significant linear correlation.
                                                   17
Common Errors Involving Correlation
           250



           200



           150
Distance
  (feet)




           100



            50




             0
                 0   1   2   3    4     5     6   7   8


                             Time (seconds)
                                                          18
Correlation is Not Causation



A                      B


           C


                               19
Correlation Calculations

Rank Order Correlation - Rho
Pearson’s - r


                               20
       Rank Order Correlation
Hits     Rank   HR   Rank   D    D2
 1        10    3     8     2    4
 2         9    4     7     2    4
 3         8    5     6     2    4
 4        7     1     10    -3   9
 5        6      7    4      2    4
 6        5      6    5      0    0
 7        4      2    9     -5   25
 8        3     10    1      2    4
 9        2      9    2      0    0
10        1      8    3      2    4

                                      21
        Rank Order Correlation, cont
                                      2            2
Rho = 1- [6 (∑D ) / N (N -1)]
                                D2
Hits   Rank   HR   Rank   D
                                     Rho = 1-  [6(58)/10(102-1)]
 1      10    3     8     2     4
 2      9     4     7     2     4
                                     Rho = 1- [348 / 10 (100 -1)]
 3      8     5     6     2     4
 4      7     1     10    -3    9    Rho = 1- [348 / 990]
 5      6     7     4     2     4
 6      5     6     5     0     0    Rho = 1- 0.352
 7      4     2     9     -5    25
 8      3     10    1     2     4    Rho = 0.648
 9      2     9     2     0     0
10      1     8     3     2     4


N=10
                               (∑D2 = 58)
                                                                   22
              Pearson’s r
Hits   HR   Sxy
 1     3     3         Sxy/n - (Sx/n)(Sy/n)
 2     4     8      r=
 3     5     15           (SDx) (SDy)
 4     1     4
 5     7     35
 6     6     36     r = 32.86 - (5.5) (5.5)/(3.03) (3.03)
 7     2     14
                    r = 35.86 - 30.25 / 9.09
 8     10    80
 9     9     81     r =  5.61 / 9.09
 10    8     80
                    r = 0.6172
Sx/n Sx/n Sxy/n
=5.5 = 5.5 =32.86
                                                        23
     Pearson’s r
Excel Demonstration




                      24
 Is there a significant linear correlation?
Data from the Garbage Project
x  Plastic (lb)   0.27 1.41   2.19   2.83   2.19   1.81   0.85   3.05
y  Household       2    3      3      6      4      2      1      5




                                                                        25
 Is there a significant linear correlation?
Data from the Garbage Project
x  Plastic (lb)   0.27 1.41   2.19   2.83   2.19   1.81   0.85   3.05
y  Household       2    3      3      6      4      2      1      5




                                                                        26
 Is there a significant linear correlation?
Data from the Garbage Project
x  Plastic (lb)   0.27 1.41   2.19    2.83   2.19   1.81   0.85   3.05
y  Household       2    3      3       6      4      2      1      5




                                     r = 0.842
                                     R2 = 0.71




                                                                         27
 Is there a significant linear correlation?
                                                                    n     a = .05   a = .01
   n = 8      a = 0.05     H0: r = 0                                  4      .950      .999


                                     H1 :r  ¹ 0
                                                                      5      .878      .959
                                                                      6      .811      .917
                                                                      7      .754      .875
                                                                      8      .707      .834
                                                                      9      .666      .798
                                                                     10      .632      .765
                                                                     11      .602      .735

     Test statistic is r = 0.842 
                                                                     12      .576      .708
                                                                     13      .553      .684
                                                                     14      .532      .661
                                                                     15      .514      .641
                                                                     16      .497      .623
                                                                     17      .482      .606
                                                                     18      .468      .590
                                                                     19      .456      .575

Critical values are r = - 0.707 and 0.707
                                                                     20      .444      .561
                                                                     25      .396      .505
                                                                     30      .361      .463
(Table  R with n = 8 and a = 0.05)                                   35
                                                                     40
                                                                             .335
                                                                             .312
                                                                                       .430
                                                                                       .402
                                                                     45      .294      .378
                                                                     50      .279      .361
                                                                     60      .254      .330
                                                                     70      .236      .305
                                                                     80      .220      .286
                                                                     90      .207      .269
                                                                    100      .196      .256
TABLE R Critical Values of the Pearson Correlation Coefficient  r


                                                                                              28
 Is there a significant linear correlation?
0.842  > 0.707, That is the test statistic does fall within  the 
critical region.
Therefore, we REJECT H0: r = 0 (no correlation) and conclude
there is a significant linear correlation between the weights of
discarded plastic and household size.

          Reject            Fail to reject                  Reject
          r = 0                 r = 0                       r = 0



     -1       r = - 0.707         0                                  1
                                                 r =  0.707

                                             Sample data:
                                               r = 0.842



                                                                         29
Method 1:   Test Statistic is t
  (follows format of earlier chapters)




                                         30
      Formal Hypothesis Test
v  To determine whether there is a      
   significant linear correlation 
   between two variables
v  Two methods
v  Both methods let H0: r = 0
                                (no significant linear correlation)
                                        H1: r ¹ 0
                                (significant linear correlation)
                                                                   31
      Method 2:   Test Statistic is r
             (uses fewer calculations)

vTest statistic: r
vCritical values: Refer to Table R 
                 (no  degrees of freedom)




                                            32
      Method 2:   Test Statistic is r
                    (uses fewer calculations)

vTest statistic: r
vCritical values: Refer to Table A-6 
                        (no  degrees of freedom)

          Reject             Fail to reject               Reject
          r = 0                  r = 0                    r = 0




     -1        r = - 0.811         0             r =  0.811          1

                                          Sample data:
                                            r = 0.828

                                                                         33
       Method 1:   Test Statistic is t
          (follows format of earlier chapters)

     Test statistic:
                          r
                   t=
                         1-r2
                         n-2



    Critical values: 

     use Table T with 
     degrees of freedom = n - 2 
                                                 34
                                                            Start



   Testing for a                                       Let H0: r = 0
                                                              H1: r ¹ 0


Linear Correlation                                        Select a
                                                        significance
                                                           level a


                                                    Calculate r  using
                                                      Formula 9-1
                    METHOD 1                                                                 METHOD 2



The test statistic is                                                                  The test statistic is r
                    r
          t =                                                                          Critical values of t  are from 
                  1 - r 2                                                              Table A-6
                  n -2
Critical values of t  are from Table A-3 
with n -2 degrees of  freedom




                                                If the absolute value of the
                                                 test statistic exceeds the 
                                               critical values, reject H0: r = 0
                                                 Otherwise fail to reject H0



                                            If H0 is rejected conclude that there
                                              is a significant linear correlation.
                                             If you fail to reject H0, then there is
                                            not sufficient evidence to conclude
                                                that there is linear correlation.
                                                                                                                         35
  Why does the critical value of r
increase as sample size decreases?


     A correlation by chance is more likely.




                                               36
  Coefficient of Determination
          (Effect Size)

                         2
                     r
The part of variance of one variable that can be
explained by the variance of a related variable.




                                                   37
           Justification for r Formula

      S (x -x) (y -y)
r =    (n -1) Sx Sy
                                                  (x, y)           centroid of sample points
                                    x=3
               y                            x - x  =  7- 3 =  4
                                                                     (7, 23)
          24
                                                                     •
          20
                                                                               y - y =  23 - 11 =  12

                   Quadrant 2                     Quadrant 1
          16
                                              •
          12
                                                                               y = 11
                                         (x, y)
           8
                    Quadrant 3  •            Quadrant 4

           4
                       •
                       •
           0                                                             x
               0      1     2        3        4        5       6     7
                                                                                                        38

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:1
posted:7/17/2013
language:English
pages:38