VIEWS: 0 PAGES: 26 POSTED ON: 5/7/2013
Introduction to Probability and Statistics Twelfth Edition Chapter 3 Describing Bivariate Data Bivariate Data • When two variables are measured on a single experimental unit, the resulting data are called bivariate data. • You can describe each variable individually, and you can also explore the relationship between the two variables. • Bivariate data can be described with – Graphs – Numerical Measures Graphs for Qualitative Variables • When at least one of the variables is qualitative, you can use comparative pie charts or bar charts. Do you think that men and women are Variable #1 = Opinion treated equally in the workplace? Variable #2 = Gender Men Women Comparative Bar Charts 120 Gender 70 Men Women 60 100 50 80 Percent 40 Percent 60 30 40 20 20 10 0 0 Gender Men Women Men Women Men Women Opinion Agree Disagree No Opinion Opinion Agree Disagree No Opinion • Stacked Bar Chart • Side-by-Side Bar Chart Describe the relationship between opinion and gender: More women than men feel that they are not treated equally in the workplace. Line Charts Year Income Expenses Income vs Expenses 55 ($000) ($000) Variable Income Expense 1998 42 33 50 45 1999 44 35 $000 40 2000 47 40 35 2001 51 43 30 1998 1999 2000 2001 2002 2002 53 44 Year Two Quantitative Variables When both of the variables are quantitative, call one variable x and the other y. A single measurement is a pair of numbers (x, y) that can be plotted using a two-dimensional graph called a scatterplot. y (2, 5) y=5 x x=2 Describing the Scatterplot • What pattern or form do you see? • Straight line upward or downward • Curve or no pattern at all • How strong is the pattern? • Strong or weak • Are there any unusual observations? • Clusters or outliers Lines & Slopes (Recall) Line y=-1 + 0.5x 1.0 0.5 • Upward (uphill) y 0.0 • Positive slope -0.5 -1.0 0 1 2 3 4 x Line y = 1 - x 2 1 • Downward (downhill) • Negative slope y 0 -1 -2 -1 0 1 2 3 x Examples Positive linear - strong Negative linear -weak Curved pattern No relationship Numerical Measures for Two Quantitative Variables • Assume that the two variables x and y exhibit a linear pattern, relationship or form. (All data points lying around a line.) • There are two numerical measures to describe – The strength and direction of the linear relationship between x and y. – The form of the relationship. The Correlation Coefficient • The strength and direction of the linear relationship between x and y are measured using the correlation coefficient r. s xy ( xi x )( yi y ) r s xy sx s y n 1 ( xi )( yi ) where xi yi s xy n Covariance sxy n 1 sx = Standard Deviation of the x’s sy = Standard Deviation of the y’s Example • Living area x and selling price y of 5 homes. Residence 1 2 3 4 5 x (hundred sq ft) 14 15 17 19 16 y ($000) 178 230 240 275 200 •The scatterplot indicates a positive linear relationship. x 14 y 178 xy 2492 Example 15 230 3450 Calculate 17 240 4080 x 16.2 s x 1.924 19 275 5225 16 200 3200 y 224.6 s y 37.360 81 1123 18447 ( xi )( yi ) s xy xi yi r s xy n sx s y n 1 63.6 (81)(1123) .885 18447 1.924(37.36) 5 63.6 4 Interpreting r •-1 r 1 Sign of r indicates direction of the linear relationship. •r 0 Weak relationship; random scatter of points •r 1 or –1 Strong relationship; either positive or negative •r = 1 or –1 All points fall exactly on a straight line. Example •r = -0.1 Very weak negative linear relationship •r = 0.93 Very strong positive linear relationship •r = -0.4 Weak negative linear relationship All points fall exactly on a •r = –1 downward straight line. x y Bivariate Data & Scatterplot 1 50 Scatterplot of x vs y 2 80 120 110 100 3 105 Utility (y) 90 80 4 120 70 60 • x - Number of Residents 50 1 2 3 4 Number of Residents (x) (independent variable) Fitted Line Plot • y - Monthly Utility 130 y = 30.00 + 23.50 x 120 (dependent variable) 110 100 Utility (y) 90 • Strong Positive Linear 80 70 60 50 • Best fitting line? 1 2 Number of Residents (x) 3 4 The Regression Line • The form of the linear relationship between x and y can be described by fitting a line as best as we can through the points. This ‘best’ fitting line is the (least squares) Regression Line: y = a + b x. (Intercept-slope form) a is the y-intercept of the line b is the slope of the line The Regression Line • To find the slope and y-intercept of the best fitting line, use: sy br sx a y bx • The least squares regression line is y = a + bx x y 14 178 Example 15 230 Recall 17 240 x 16.2 s x 1.9235 19 275 y 224.6 s y 37.3604 16 200 r .885 sy 37.3604 br (.885) 17.189 sx 1.9235 a y bx 224.6 17.189(16.2) 53.86 RegressionLine : y 53.86 17.189x Example • Predict the selling price for another residence with 16 thousand square feet of living area. Predict: y 53.86 17.189 x 53.86 17.189(16) 221.16 or $221,160 Key Concepts I. Bivariate Data 1. Both qualitative and quantitative variables 2. Describing each variable separately 3. Describing the relationship between the variables II. Describing Two Qualitative Variables 1. Side-by-Side pie charts 2. Comparative line charts 3. Comparative bar charts Side-by-Side Stacked Key Concepts III. Describing Two Quantitative Variables 1. Scatterplots Linear or nonlinear pattern Strength of relationship Unusual observations; clusters and outliers 2. Covariance and correlation coefficient 3. The best fitting line Calculating the slope and y-intercept Graphing the line Using the line for prediction x y Question 1 50 Scatterplot of x vs y 2 80 120 110 3 105 100 Utility (y) 90 4 120 80 70 • x - Number of Residents 60 50 • 1 2 3 4 y - Monthly Utility Number of Residents (x) 1. Find Covariance between x and y; 2. Find Correlation between x and y; 3. Find the Regression line. x y xy 1 50 50 Solution 2 80 160 3 105 315 Calculate 4 120 480 x 2.5 s x 1.29 10 355 1005 (sum) y 88.75 s y 30.7 ( xi )( yi ) s xy xi yi s xy n r n 1 sx s y (10)(355) 1005 39.17 .99 4 39.17 1.29(30.7) 4 1 x y ( xi x )( yi y ) 1 50 58.125 Solution 2 80 4.375 3 105 8.125 Calculate 4 120 46.875 x 2.5 s x 1.29 10 355 117.5 (sum) y 88.75 s y 30.7 s xy ( x x )( y y ) i i r s xy n 1 sx s y 117.5 39.17 39.17 .99 4 1 1.29(30.7) sy br Solution sx • Regression Line y a bx a y bx sy Fitted Line Plot br y = 30.00 + 23.50 x sx 130 120 30.7 110 .99 23.5 100 Utility (y) 1.29 90 80 70 a y bx 60 50 88.75 23.5( 2.5) 30 1 2 Number of Residents (x) 3 4 y 30 23.5 x