Document Sample

PY206: Statistical Methods for Psychology Correlation and Regression Regression II Last updated: 9/15/04 Slide 1 Luiz Pessoa, Brown University © 2004 PY206: Statistical Methods for Psychology Correlation and Regression Accuracy of prediction: recap Using knowledge of X and regression equation: variance of Y after taking into account the dependence of Y on X s 2 Y X (Yi Y )2 ˆ N 2 SSresidual df It can be shown that E{MSE} 2 Slide 2 Luiz Pessoa, Brown University © 2004 PY206: Statistical Methods for Psychology Correlation and Regression Confidence limits on Y Although the standard error of estimate (MSE) is useful as an overall measure of error, it is not a good estimate of the error associated with any single prediction Someone who was not included in the original sample Slide 3 Luiz Pessoa, Brown University © 2004 Confidence limits on Y How well can we predict new values of Y for a given X? ˆ The standard error of the prediction Y depends on the residual error variance (MSE) (Yi Y )2 SSresidual ˆ 2 sY X N 2 df And can be shown to be sY X sY X 1 Xi X 1 2 N N 1 s X 2 Where X i X is the deviation from the mean X and sX2 is the variance of X Confidence limits on Y ˆ Note that the variability of Yi is affected by how far X i is from X The further Xi is from X , the larger the variance of Y ˆ i Note that near X1, the two estimates are near each other; at X2, situation is the opposite Confidence limits on Y Confidence limits for a confidence level : ˆ CI(Y ) Y t / 2 sY X ˆ Confidence limits on Y Confidence limits are narrowest for X X and become wider as X X increases Elliptical confidence limits Confidence limits quite wide! Confidence limits on Y Example: Number of symptoms (Y) experienced by a student with a stress score of “low level of stress” Predicted value: 81.72 t0.025 = 1.984 N = 107 Standard error: sY X 17.708 ˆ CI(Y ) Y t / 2 sY X 81.72 1.984(17.708) 81.72 35.13 Thus: 46.59 Y 116.86 Testing the significance of r Most common hypothesis for a sample correlation: correlation between X and Y in the population is zero X and Y are independent It can be shown that when = 0, for large N, r will be approximately normal around zero Thus, to test H0: = 0, we can use a standard t test r t sr PY206: Statistical Methods for Psychology Correlation and Regression Testing the significance of r Where 1 r2 sr N 2 Ratio is distributed as t with N – 2 df Slide 10 Luiz Pessoa, Brown University © 2004 Testing the significance of b If we can reject the null hypothesis that = 0, we should be able to reject the null hypothesis that b = 0 The general form of testing the significant of b will be important for the case of multiple regression b t sb Distributed as t with N – 2 df is the population parameter of interest PY206: Statistical Methods for Psychology Correlation and Regression Testing the significance of b Typically, = 0 so that we want to test whether b is significantly different from zero Magnitude of b should be assessed relative to its standard error b t sb It can be shown that the standard error of b depends on the standard error of Y given the regression information and the standard deviation of X sY X sb sX N 1 Slide 12 Luiz Pessoa, Brown University © 2004 Testing the difference between two bs Suppose we knew the following: sY X 2 sX N b Males Females -0.4 -0.2 2.1 2.3 2.5 2.8 101 101 Is the difference between males and females significant? Test is directly analogous to test of difference between two means PY206: Statistical Methods for Psychology Correlation and Regression Testing the difference between two bs We can test this by creating the following t statistic b1 b2 t sb1 b2 How do we compute sb1-b2? Remember that variances sum 2{ X Y } 2{ X Y } 2{ X } 2{Y } Standard error: 2 2 sb1b2 sb1 sb2 Slide 14 Luiz Pessoa, Brown University © 2004 PY206: Statistical Methods for Psychology Correlation and Regression Testing the difference between two bs To test for the difference we need to assess it relative to its standard error b1 b2 b1 b2 t 2 2 sb1b2 sb1 sb2 To compute sb1-b2 remember that sY X sb sX N 1 Slide 15 Luiz Pessoa, Brown University © 2004 PY206: Statistical Methods for Psychology Correlation and Regression Testing the difference between two bs Test statistic t b1 b2 has N1 + N2 – 4 df 2 2 sb1 sb2 sY2X and sY2X are the error variances for the two samples: sY X 2 2 sb sb1b2 sb1 sb2 sX N 1 1 2 sb1b2 s 2 X1 N1 1 2 sY X1 2 s X 2 N 2 1 2 sY X 2 Slide 16 Luiz Pessoa, Brown University © 2004 PY206: Statistical Methods for Psychology Correlation and Regression Testing the difference between two bs As in the case of means, we can pool the two estimates (assuming homogeneity of variance) weighting each by the degrees of freedom: 2 sY X N1 2 sY2X N2 2 sY2X 1 2 N1 N 2 4 The pooled estimate is then used for the standard error: 2 2 sb1b2 sY X sY X 2 2 s X1 N1 1 s X 2 N 2 1 Slide 17 Luiz Pessoa, Brown University © 2004 PY206: Statistical Methods for Psychology Correlation and Regression Testing the difference between two bs Testing for the difference between males and females b1 b2 t 1.04 sb1 b2 Because t.025(101+101-4 df) = 1.97, we would fail to reject H0 We would not doubt that life expectancy decreases as a function of smoking at the same rate for males and females Slide 18 Luiz Pessoa, Brown University © 2004 PY206: Statistical Methods for Psychology Correlation and Regression Testing the difference between two rs When , the sampling distribution of r is not approximately normal, and its standard error is not easily estimated Same is true for r1 – r2 = 0.6, N = 12 Slide 19 = 0.9, N = 12 Luiz Pessoa, Brown University © 2004 PY206: Statistical Methods for Psychology Correlation and Regression Testing the difference between two rs Fisher’s zr transform is approx normally distributed With standard error 1 r zr 0.5ln 1 r s zr 1 N 3 Note that the standard error does not rely on statistics computed from the sample: Z test statistic Slide 20 Luiz Pessoa, Brown University © 2004 PY206: Statistical Methods for Psychology Correlation and Regression Testing the difference between two rs We can test H0: 1 – 2 = 0 Z zr1 zr2 1 1 N1 3 N 2 3 Note that Z is a normal deviate and zr is Fisher’s transform… Slide 21 Luiz Pessoa, Brown University © 2004 PY206: Statistical Methods for Psychology Correlation and Regression Example:Testing difference between two rs Males r z N Females 0.5 0.4 0.549 0.424 53 53 0.549 0.424 0.125 0.125 Z 0.625 1 1 1 2 5 53 3 53 3 50 Because Z = 0.625 is less than Z.025 = 1.96, we fail to reject H0 With a two-tailed test at = 0.05, we have no reason to doubt that the correlation between smoking and life expectancy is the same for males and females Slide 22 Luiz Pessoa, Brown University © 2004 PY206: Statistical Methods for Psychology Correlation and Regression Testing the hypothesis that = 0 This is not very common, but will allow us to establish confidence limits on For any value of , the sampling distribution of zr is approximately normally distributed around z with standard error 1 N 3 Z zr z 1 N 3 Slide 23 Luiz Pessoa, Brown University © 2004 PY206: Statistical Methods for Psychology Correlation and Regression Example: Testing the hypothesis that = 0 If we want to test the null hypothesis that a sample r = 0.30 (N = 103) came from a population where = 0.5 r .30 zr .31 z .549 sz 1/ N 3 0.10 .50 N 103 .310 .549 .239 Z 2.39 0.10 0.10 Because Z = –2.39 is more extreme than Z.025 = 1.96, we reject H0 at = 0.05 (two-tailed) and conclude that our sample did not come from a population where = 0.5 Slide 24 Luiz Pessoa, Brown University © 2004 PY206: Statistical Methods for Psychology Correlation and Regression Confidence limits on Confidence limits: All we need to do is to solve for In the end must undo the r z f transformation zr z Z 1 N 3 1 Z zr z N 3 CI ( z ) zr Z / 2 1 N 3 Slide 25 Luiz Pessoa, Brown University © 2004 PY206: Statistical Methods for Psychology Correlation and Regression Example: Confidence limits on For the stress example, r = .506 and zr = .556. With N = 107 the 95% confidence limit is: CI( z ) .556 1.96 1 104 .556 1.96(0.098) .556 .192 .364 z .748 .350 .635 Note that = 0 is not included within our limits Slide 26 Luiz Pessoa, Brown University © 2004 PY206: Statistical Methods for Psychology Correlation and Regression Confidence limits on Caveat: By using Fisher’s transformation to determine the confidence interval, CI will generally be larger than with other methods (see Zar, p. 384) and the confidence interval may occasionally (and undesirably) be less than 1 – (e.g., < 95%) If this is important in your research question, check what your software is doing… Slide 27 Luiz Pessoa, Brown University © 2004 PY206: Statistical Methods for Psychology Correlation and Regression Testing the difference between two nonindependent rs Tests so far have assumed that rs come from independent samples Occasionally, one is interested in correlations that are not independent Example: correlation between test scores that were applied 2 years apart on the same group of children Lack of independence must be taken into account (see Howell). Make sure this is done… Slide 28 Luiz Pessoa, Brown University © 2004 PY206: Statistical Methods for Psychology Correlation and Regression Assumptions in correlation and regression Assumptions Linearity Homogeneity of variance Normality for Xi levels Interestingly, to assess the degree to which variability in Y is (linearly) attributable to variance in X, no assumptions are necessary r 2 SSYˆ SSY Luiz Pessoa, Brown University © 2004 Slide 29 PY206: Statistical Methods for Psychology Correlation and Regression Assumptions: linearity Slide 30 Luiz Pessoa, Brown University © 2004 Assumptions: homogeneity of variance Assumptions: normality Further assumption for hypothesis testing: normality Population of Yi values corresponding to a given Xi are ˆ normally distributed around Y (as in a t test) PY206: Statistical Methods for Psychology Correlation and Regression Making inferences about b Departures from Normality: if the probability distributions of Y are not exactly normal but do not depart seriously, the sampling distribution of b will be approximately Normal Even if the distributions of Y are far from Normal, the estimator for b generally has the property of asymptotic normality Approaches normality under general conditions as the sample size increases Slide 33 Luiz Pessoa, Brown University © 2004 PY206: Statistical Methods for Psychology Correlation and Regression Assumptions: errors Regression model Yi a bX i i Assumption: error terms are assumed to be uncorrelated Outcome in any one trial has no effect on the error term for any other trial Errors i are normally distributed Error terms represent the effects of factors omitted from the model that affect the response and vary at random without reference to the variable X; also random measurement errors Tests on b are sensitive only to large departures from normality Slide 34 Luiz Pessoa, Brown University © 2004 PY206: Statistical Methods for Psychology Correlation and Regression Factors that affect correlation The correlation coefficient can be substantially affected by characteristics of the sample Range restrictions Usual effect of range restriction is to reduce correlation In some cases it might increase also… Slide 35 Luiz Pessoa, Brown University © 2004 PY206: Statistical Methods for Psychology Correlation and Regression Factors that affect correlation Heterogeneous subsamples Combining data may increase correlation It may also eliminate it… Slide 36 Luiz Pessoa, Brown University © 2004

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 6 |

posted: | 10/16/2009 |

language: | English |

pages: | 36 |

OTHER DOCS BY dkkauwe

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.