Document Sample

Overview • Bivariate Regression – Purpose – Regression equation – Variance accounted for – Confidence intervals – Statistical inference – Practical exercise Purpose • Correlation treats X and Y as if they have equal status • Bivariate Regression is used when: – One variable (X) is thought to cause the other (Y) • Hypothesis: obesity (X) causes diabetes (Y) – Prediction • Do A-level results predict University grades? • Does personality predict hormonal response to stress? • Does animal behaviour predict snowfall? • Even if no causal inference is implied • Regression retains the original measurement scales of X and Y Bivariate Regression • Predicting Y from X – How much change in winning percentage (Y) is associated with a given increase in payroll (X)? • Regression line determined by least squares criterion Visually: it’s the straight line that minimizes error, in terms of squared deviations. Cubs have a big squared deviation; Reds’ is small; but, the total squared r = .54 deviation is minimized Bivariate Regression • Algebraically, the regression line can be obtained from a linear conversion of X • In standarised form, it’s just zX multiplied by the correlation zY rXY z X ˆ “Hat” indicates predicted value • But, we want to express the relation in raw scores ˆ Y B0 BYX X • The predicted value of Y (winning) associated with a given value of X (payroll) Regression Equation ˆ Y B0 BYX X • BYX is the regression coefficient (“slope”) – the amount of change in Ŷ associated with a unit change in X • B0 is a constant (“intercept”) – the value of Ŷ when X = 0 – sometimes the intercept is of theoretical interest • if an X value of 0 is meaningful • not of much interest with our baseball example Regression Equation BYX r sdY XY sd X baseball: BYX .538 ( 32..22 ) .104 6 27 B0 M Y BYX M X baseball: B0 (50 ) .104 (77 .56 ) 41 .93 baseball: ˆ Y 41 .93 .104 X Unlike correlations, regression coefficients will differ based on which variable is X and which is Y Residual is difference between Y and Ŷ. Residuals sum to zero. Regression line minimises (squared) residuals. ˆ Y 41 .93 .104 X • Do males and females differ in Body Mass Index? • Bivariate regression can test differences between two independent means SEX N Mean Std. Deviation BMIACT M 29 28.2459 7.60284 Pearson Correlation -.359 F 12 22.7601 3.13329 Sig. (2-tailed) .021 N 41 Regression B Std. Error Beta t Sig. (Constant) 33.732 3.130 10.778 .000 SEX -5.486 2.284 -.359 -2.402 .021 60 50 ˆ Y 33 .73 5.49 X 40 Mean Difference 28.25-22.76 BMI 30 20 10 0.0 1.0 2.0 3.0 SEX r2 as “Variance Accounted For” Y B0 BYX X e • To what degree do X and Y share variance, and to what degree are they independent? • Variances are additive sd sd sd 2 Y 2 Yˆ 2 Y Yˆ sd sd 2 Yˆ 2 e • So, Y variance is “partitioned” into that accounted for by X (sdŶ2) and that which is residual (sde2) r2 as “Variance Accounted For” • For standardized scores sd 2 zY sd sd 2 zYˆ 2 zY Yˆ sd 12 zY sd 2 z 2 Yˆ (r z X ) 2 r 2 z X 2 r2 n 1 n 1 n 1 zYˆ 1 r sd 2 2 e • so, r2 is the proportion of Y variance accounted for by X (and vice versa) In standard score form… 1 r sd2 2 ZY Y ˆ r2 = “shared variance” 2 sd ZY Y ˆ = “residual variance” Visual Representation = “Ballantine” ZPayroll ZWin Baseball Example: r = .538 r2 = .289 sd2ZY-Ŷ = 1- r2 = .711 Statistical Inference • The regression coefficients (B0 and BYX) are sample statistics (parameter estimates) • Because these are estimates, they are subject to sampling error • A given estimate falls somewhere on a hypothetical sampling distribution Confidence Intervals • Place our sample statistic (estimate) within a confidence interval to indicate its margin of error • To compute CI for BYX – Need to know two things about the sampling distribution of BYX… 1. Its standard deviation • SE of BYX 2. Its degrees of freedom • n-2 Baseball Example ˆ Y 41 .93 .104 X Standard deviation of the sampling distribution of BYX with 28 df is: sd Y 1 r 2 6.22 .711 SEBYX .031 sd X n 2 32.27 28 Sampling distribution of BYX is a t distribution with n – 2 df. For 95% CI, find appropriate t value with 28 df Baseball Example ˆ Y 41 .93 .104 X Standard deviation of the sampling distribution of BYX with 28 df is: sd Y 1 r 2 6.22 .711 SEBYX .031 sd X n 2 32.27 28 Sampling distribution of BYX is a t distribution with n – 2 df. For 95% CI, find appropriate t value with 28 df Margin of error = SE(t) = .031(2.048) = .063 95% CL = .104 ± .063 95% CI = .041 to .167 Confidence Intervals • To compute CI for Ŷi – Need to know two things about the sampling distribution of Ŷi… What happens when 1. Its standard deviation Xi = MX? Xi = 0? (B0 intercept) 1 ( Xi M X ) 2 SEYˆ SEY Yi i n (n 1) sd X 2 • Note that SEY-Yi = standard error of the estimate – the estimated population σ of the residuals 2. Its degrees of freedom • n-2 Standard Error of Estimate… SEY Yˆ ˆ (Y Y ) 2 (1 r 2 ) (Y M Y ) 2 n2 n2 The estimated population standard deviation (σ) of the residuals Baseball Example: Model Sum m ary Adjusted Std. Error of Model R R Square R Square the Estimate 1 .538 a .289 .264 5.33958 a. Predictors: (Constant), paymil a Coe fficients Unstandardiz ed Standardized Coef f icients Coef f icients 95% Conf idence Interval f or B Model B Std. Error Beta t Sig. Low er Bound Upper Bound 1 (Cons tant) 41.957 2.575 16.296 .000 36.683 47.231 paymil .104 .031 .538 3.375 .002 .041 .167 a. Dependent Variable: percent These SPSS values are identical to those we calculated (with some rounding) CI for Ŷi – Baseball Example 54.62 = 41.93 + .104(122) ˆ Y 41 .93 .104 X 54.62 = estimated wins for $122m payroll (Red Sox) 1 ( X i M X )2 SEYˆ SEY Yi i n (n 1) sd X2 1 (122 77.56) 2 SEYˆ 5.34 2 .249 i 30 29(32.27 ) Margin of error = SE(t) = .249(2.048) = .51 95% CL = 54.62 ± .51 95% CI = 54.11 to 56.13 CI for Ŷi – Baseball Example Note that CI for Ŷi increases as Xi deviates from MX Whatever sampling error was made using the sample BYX rather than the (unavailable) population coefficient, it will have more serious consequences for X values that are more distant from the mean Null Hypothesis Significance Testing • Is the statistic significant? – How likely is it that you would get a sample statistic of that magnitude if there really was no association between X and Y in the population? • NHST for BYX BYX H 0 t sample value H 0 value t standard error SEBYX Baseball Example: .104 0 t 3.35, p .01 How was p determined? .031 Null Hypothesis Significance Testing • Same basic strategy for B0 (constant/intercept) if of interest B0 H 0 t sample value H 0 value t standard error SEB0 Baseball Example (Meaningless): 41.93 0 t 16.3, p .001 2.57

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 5 |

posted: | 8/8/2012 |

language: | |

pages: | 22 |

OTHER DOCS BY R0dxt80

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.