Docstoc

Regression2

Document Sample
Regression2 Powered By Docstoc
					PY206: Statistical Methods for Psychology

Correlation and Regression

Regression II
Last updated: 9/15/04

Slide 1

Luiz Pessoa, Brown University © 2004

PY206: Statistical Methods for Psychology

Correlation and Regression

Accuracy of prediction: recap
 Using knowledge of X and regression equation: variance of Y after taking into account the dependence of Y on X
s
2 Y X



(Yi  Y )2  ˆ N 2

SSresidual  df

 It can be shown that
E{MSE}   2

Slide 2

Luiz Pessoa, Brown University © 2004

PY206: Statistical Methods for Psychology

Correlation and Regression

Confidence limits on Y
 Although the standard error of estimate (MSE) is useful as an overall measure of error, it is not a good estimate of the error associated with any single prediction
 Someone who was not included in the original sample

Slide 3

Luiz Pessoa, Brown University © 2004

Confidence limits on Y
 How well can we predict new values of Y for a given X?
ˆ  The standard error of the prediction Y depends on the residual error variance (MSE) (Yi  Y )2 SSresidual  ˆ  2 sY  X  N 2 df

 And can be shown to be

 sY  X  sY  X

1  Xi  X  1  2 N  N  1 s X

2

 Where X i  X is the deviation from the mean X and sX2 is the variance of X

Confidence limits on Y
ˆ  Note that the variability of Yi is affected by how far X i is from X
 The further Xi is from X , the larger the variance of Y ˆ i

 Note that near X1, the two estimates are near each other; at X2, situation is the opposite

Confidence limits on Y
 Confidence limits for a confidence level :

ˆ  CI(Y )  Y  t / 2  sY X

ˆ Confidence limits on Y
 Confidence limits are narrowest for X  X and become wider as X  X increases
 Elliptical confidence limits  Confidence limits quite wide!

Confidence limits on Y
 Example: Number of symptoms (Y) experienced by a student with a stress score of “low level of stress”
 Predicted value: 81.72  t0.025 = 1.984  N = 107   Standard error: sY  X  17.708

ˆ  CI(Y )  Y  t / 2  sY  X  81.72  1.984(17.708)  81.72  35.13

 Thus: 46.59  Y  116.86

Testing the significance of r
 Most common hypothesis for a sample correlation: correlation  between X and Y in the population is zero
 X and Y are independent

 It can be shown that when  = 0, for large N, r will be approximately normal around zero  Thus, to test H0:  = 0, we can use a standard t test

r t sr

PY206: Statistical Methods for Psychology

Correlation and Regression

Testing the significance of r
 Where
1 r2 sr  N 2

 Ratio is distributed as t with N – 2 df

Slide 10

Luiz Pessoa, Brown University © 2004

Testing the significance of b
 If we can reject the null hypothesis that  = 0, we should be able to reject the null hypothesis that b = 0

 The general form of testing the significant of b will be important for the case of multiple regression

b t sb
 Distributed as t with N – 2 df

  is the population parameter of interest

PY206: Statistical Methods for Psychology

Correlation and Regression

Testing the significance of b
 Typically,  = 0 so that we want to test whether b is significantly different from zero
 Magnitude of b should be assessed relative to its standard error

b t sb

 It can be shown that the standard error of b depends on the standard error of Y given the regression information and the standard deviation of X
sY  X sb  sX N  1
Slide 12
Luiz Pessoa, Brown University © 2004

Testing the difference between two bs
 Suppose we knew the following:
sY  X 2 sX
N b Males Females -0.4 -0.2 2.1 2.3 2.5 2.8 101 101

 Is the difference between males and females significant?
 Test is directly analogous to test of difference between two means

PY206: Statistical Methods for Psychology

Correlation and Regression

Testing the difference between two bs
 We can test this by creating the following t statistic
b1  b2 t sb1 b2

 How do we compute sb1-b2?

 Remember that variances sum
 2{ X  Y }   2{ X  Y }   2{ X }   2{Y }

 Standard error:
2 2 sb1b2  sb1  sb2

Slide 14

Luiz Pessoa, Brown University © 2004

PY206: Statistical Methods for Psychology

Correlation and Regression

Testing the difference between two bs
 To test for the difference we need to assess it relative to its standard error

b1  b2 b1  b2 t  2 2 sb1b2 sb1  sb2
 To compute sb1-b2 remember that

sY  X sb  sX N  1

Slide 15

Luiz Pessoa, Brown University © 2004

PY206: Statistical Methods for Psychology

Correlation and Regression

Testing the difference between two bs
 Test statistic t  b1  b2 has N1 + N2 – 4 df 2 2
sb1  sb2

 sY2X and sY2X are the error variances for the two samples: sY  X 2 2 sb  sb1b2  sb1  sb2 sX N  1
1 2

sb1b2 

s

2 X1

 N1  1

2 sY  X1



2 s X 2  N 2  1

2 sY  X 2

Slide 16

Luiz Pessoa, Brown University © 2004

PY206: Statistical Methods for Psychology

Correlation and Regression

Testing the difference between two bs
 As in the case of means, we can pool the two estimates (assuming homogeneity of variance) weighting each by the degrees of freedom:
2 sY  X 

 N1  2  sY2X   N2  2  sY2X
1

2

N1  N 2  4

 The pooled estimate is then used for the standard error: 2 2
sb1b2 sY  X sY  X  2  2 s X1  N1  1 s X 2  N 2  1

Slide 17

Luiz Pessoa, Brown University © 2004

PY206: Statistical Methods for Psychology

Correlation and Regression

Testing the difference between two bs
 Testing for the difference between males and females

b1  b2 t  1.04 sb1 b2
 Because t.025(101+101-4 df) = 1.97, we would fail to reject H0
 We would not doubt that life expectancy decreases as a function of smoking at the same rate for males and females

Slide 18

Luiz Pessoa, Brown University © 2004

PY206: Statistical Methods for Psychology

Correlation and Regression

Testing the difference between two rs
 When   , the sampling distribution of r is not approximately normal, and its standard error is not easily estimated  Same is true for r1 – r2

 = 0.6, N = 12
Slide 19

 = 0.9, N = 12
Luiz Pessoa, Brown University © 2004

PY206: Statistical Methods for Psychology

Correlation and Regression

Testing the difference between two rs
 Fisher’s zr transform is approx normally distributed  With standard error
 1 r  zr  0.5ln   1 r  

s zr 

1 N 3

 Note that the standard error does not rely on statistics computed from the sample: Z test statistic

Slide 20

Luiz Pessoa, Brown University © 2004

PY206: Statistical Methods for Psychology

Correlation and Regression

Testing the difference between two rs
 We can test H0: 1 – 2 = 0
Z zr1  zr2 1 1  N1  3 N 2  3

 Note that Z is a normal deviate and zr is Fisher’s transform…

Slide 21

Luiz Pessoa, Brown University © 2004

PY206: Statistical Methods for Psychology

Correlation and Regression

Example:Testing difference between two rs
Males r z N Females 0.5 0.4 0.549 0.424 53 53

0.549  0.424 0.125 0.125 Z    0.625 1 1 1 2  5 53  3 53  3 50

 Because Z = 0.625 is less than Z.025 = 1.96, we fail to
reject H0
 With a two-tailed test at  = 0.05, we have no reason to doubt that the correlation between smoking and life expectancy is the same for males and females
Slide 22
Luiz Pessoa, Brown University © 2004

PY206: Statistical Methods for Psychology

Correlation and Regression

Testing the hypothesis that  = 0
 This is not very common, but will allow us to establish confidence limits on 

 For any value of , the sampling distribution of zr is approximately normally distributed around z with standard error 1 N  3
Z zr  z  1 N 3

Slide 23

Luiz Pessoa, Brown University © 2004

PY206: Statistical Methods for Psychology

Correlation and Regression

Example: Testing the hypothesis that  = 0
 If we want to test the null hypothesis that a sample r = 0.30 (N = 103) came from a population where  = 0.5
r  .30 zr  .31 z   .549 sz  1/ N  3  0.10

  .50
N  103

.310  .549 .239 Z   2.39 0.10 0.10

 Because Z = –2.39 is more extreme than Z.025 = 1.96, we reject H0 at  = 0.05 (two-tailed) and conclude that our sample did not come from a population where  = 0.5
Slide 24
Luiz Pessoa, Brown University © 2004

PY206: Statistical Methods for Psychology

Correlation and Regression

Confidence limits on 
 Confidence limits: All we need to do is to solve for 
 In the end must undo the r  z f transformation zr  z  Z 1 N 3

1 Z  zr  z  N 3
CI ( z  )  zr  Z / 2 1 N 3

Slide 25

Luiz Pessoa, Brown University © 2004

PY206: Statistical Methods for Psychology

Correlation and Regression

Example: Confidence limits on 
 For the stress example, r = .506 and zr = .556. With N = 107 the 95% confidence limit is:
CI( z  )  .556  1.96 1 104  .556  1.96(0.098)  .556  .192  .364  z   .748  .350    .635

 Note that  = 0 is not included within our limits

Slide 26

Luiz Pessoa, Brown University © 2004

PY206: Statistical Methods for Psychology

Correlation and Regression

Confidence limits on 
 Caveat: By using Fisher’s transformation to determine the confidence interval, CI will generally be larger than with other methods (see Zar, p. 384) and the confidence interval may occasionally (and undesirably) be less than 1 –  (e.g., < 95%)  If this is important in your research question, check what your software is doing…

Slide 27

Luiz Pessoa, Brown University © 2004

PY206: Statistical Methods for Psychology

Correlation and Regression

Testing the difference between two nonindependent rs

 Tests so far have assumed that rs come from independent samples

 Occasionally, one is interested in correlations that are not independent
 Example: correlation between test scores that were applied 2 years apart on the same group of children  Lack of independence must be taken into account (see Howell). Make sure this is done…

Slide 28

Luiz Pessoa, Brown University © 2004

PY206: Statistical Methods for Psychology

Correlation and Regression

Assumptions in correlation and regression
 Assumptions
 Linearity  Homogeneity of variance  Normality for Xi levels

 Interestingly, to assess the degree to which variability in Y is (linearly) attributable to variance in X, no assumptions are necessary

r 
2

SSYˆ SSY
Luiz Pessoa, Brown University © 2004

Slide 29

PY206: Statistical Methods for Psychology

Correlation and Regression

Assumptions: linearity

Slide 30

Luiz Pessoa, Brown University © 2004

Assumptions: homogeneity of variance

Assumptions: normality
 Further assumption for hypothesis testing: normality
 Population of Yi values corresponding to a given Xi are ˆ normally distributed around Y (as in a t test)

PY206: Statistical Methods for Psychology

Correlation and Regression

Making inferences about b
 Departures from Normality: if the probability distributions of Y are not exactly normal but do not depart seriously, the sampling distribution of b will be approximately Normal  Even if the distributions of Y are far from Normal, the estimator for b generally has the property of asymptotic normality
 Approaches normality under general conditions as the sample size increases

Slide 33

Luiz Pessoa, Brown University © 2004

PY206: Statistical Methods for Psychology

Correlation and Regression

Assumptions: errors
 Regression model

Yi  a  bX i   i
 Assumption: error terms are assumed to be uncorrelated
 Outcome in any one trial has no effect on the error term for any other trial

 Errors i are normally distributed
 Error terms represent the effects of factors omitted from the model that affect the response and vary at random without reference to the variable X; also random measurement errors  Tests on b are sensitive only to large departures from normality
Slide 34
Luiz Pessoa, Brown University © 2004

PY206: Statistical Methods for Psychology

Correlation and Regression

Factors that affect correlation
 The correlation coefficient can be substantially affected by characteristics of the sample
 Range restrictions

 Usual effect of range restriction is to reduce correlation
 In some cases it might increase also…

Slide 35

Luiz Pessoa, Brown University © 2004

PY206: Statistical Methods for Psychology

Correlation and Regression

Factors that affect correlation
 Heterogeneous subsamples
 Combining data may increase correlation  It may also eliminate it…

Slide 36

Luiz Pessoa, Brown University © 2004


				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:10/16/2009
language:English
pages:36