# Chap3 by gegouzhen12

VIEWS: 0 PAGES: 26

• pg 1
```									Introduction to Probability
and Statistics
Twelfth Edition

Chapter 3
Describing Bivariate Data
Bivariate Data
• When two variables are measured on a single
experimental unit, the resulting data are
called bivariate data.
• You can describe each variable individually,
and you can also explore the relationship
between the two variables.
• Bivariate data can be described with
– Graphs
– Numerical Measures
Graphs for Qualitative
Variables
• When at least one of the variables is
qualitative, you can use comparative pie charts
or bar charts.
Do you think that men and women are   Variable #1 = Opinion
treated equally in the workplace?
Variable #2 = Gender
Men                   Women
Comparative Bar Charts
120                                     Gender                  70
Men
Women                   60
100

50
80

Percent
40
Percent

60
30

40
20

20                                                             10

0
0                                                       Gender     Men Women   Men Women   Men Women
Opinion    Agree    Disagree   No Opinion
Opinion      Agree      Disagree   No Opinion

• Stacked Bar Chart         • Side-by-Side Bar Chart
Describe the relationship between opinion and
gender:
More women than men feel that they are
not treated equally in the workplace.
Line Charts
Year Income Expenses                             Income vs Expenses
55
(\$000) (\$000)                                                            Variable
Income
Expense
1998 42     33                50

45
1999 44     35         \$000

40
2000 47     40
35
2001 51     43
30
1998   1999       2000       2001   2002
2002 53     44                                       Year
Two Quantitative Variables
When both of the variables are quantitative, call
one variable x and the other y. A single
measurement is a pair of numbers (x, y) that
can be plotted using a two-dimensional graph
called a scatterplot.
y
(2, 5)

y=5

x
x=2
Describing the Scatterplot
• What pattern or form do you see?
• Straight line upward or downward
• Curve or no pattern at all
• How strong is the pattern?
• Strong or weak
• Are there any unusual observations?
• Clusters or outliers
Lines & Slopes (Recall)
Line y=-1 + 0.5x
1.0

0.5
• Upward (uphill)
y   0.0
• Positive slope
-0.5

-1.0

0        1   2        3   4
x

Line y = 1 - x
2

1                                         • Downward (downhill)
• Negative slope
y   0

-1

-2

-1        0       1        2   3
x
Examples

Positive linear - strong   Negative linear -weak

Curved pattern          No relationship
Numerical Measures for
Two Quantitative Variables
• Assume that the two variables x and y exhibit a
linear pattern, relationship or form. (All
data points lying around a line.)
• There are two numerical measures to describe
– The strength and direction of the linear
relationship between x and y.
– The form of the relationship.
The Correlation Coefficient
• The strength and direction of the linear
relationship between x and y are measured using
the correlation coefficient r.
s xy                 ( xi  x )( yi  y )
r                  s xy 
sx s y                       n 1
( xi )(  yi )
where             xi yi 
s xy                  n
Covariance sxy
n 1
sx = Standard Deviation of the x’s
sy = Standard Deviation of the y’s
Example
• Living area x and selling price y of 5 homes.
Residence            1       2        3      4      5
x (hundred sq ft)    14      15       17     19     16
y (\$000)             178     230      240    275    200

•The scatterplot
indicates a positive
linear relationship.
x
14
y
178
xy
2492
Example
15   230     3450              Calculate
17   240     4080
x  16.2 s x  1.924
19   275     5225
16   200     3200
y  224.6 s y  37.360
81   1123    18447

( xi )(  yi )              s xy
 xi yi                         r
s xy                  n                     sx s y
n 1
63.6
(81)(1123)                                   .885
18447                                  1.924(37.36)
              5      63.6
4
Interpreting r
•-1  r  1    Sign of r indicates direction
of the linear relationship.

•r  0         Weak relationship; random
scatter of points

•r  1 or –1   Strong relationship; either
positive or negative

•r = 1 or –1   All points fall exactly on a
straight line.
Example
•r = -0.1   Very weak negative linear
relationship

•r = 0.93   Very strong positive linear
relationship

•r = -0.4   Weak negative linear
relationship

All points fall exactly on a
•r = –1     downward straight line.
x     y      Bivariate Data & Scatterplot
1     50                                              Scatterplot of x vs y

2     80                                    120

110

100

3     105

Utility (y)
90

80

4     120                                   70

60

• x - Number of Residents
50

1        2                    3   4
Number of Residents (x)

(independent variable)                             Fitted Line Plot
• y - Monthly Utility                        130
y = 30.00 + 23.50 x

120
(dependent variable)                   110

100
Utility (y)

90

• Strong Positive Linear                     80

70

60

50

• Best fitting line?                               1        2
Number of Residents (x)
3   4
The Regression Line
• The form of the linear relationship between x
and y can be described by fitting a line as best as
we can through the points. This ‘best’ fitting line
is the (least squares) Regression Line:

y = a + b x. (Intercept-slope form)

a is the y-intercept of the line
b is the slope of the line
The Regression Line
• To find the slope and y-intercept of
the best fitting line, use:

sy
br
sx
a  y  bx

• The least squares
regression line is y = a + bx
x    y
14 178
Example
15 230               Recall
17 240               x  16.2   s x  1.9235
19 275
y  224.6 s y  37.3604
16 200
r  .885

sy        37.3604
br     (.885)          17.189
sx          1.9235
a  y  bx  224.6  17.189(16.2)  53.86
RegressionLine : y  53.86  17.189x
Example
• Predict the selling price for another
residence with 16 thousand square feet
of living area.

Predict: y  53.86  17.189 x
 53.86  17.189(16)  221.16 or \$221,160
Key Concepts
I. Bivariate Data
1. Both qualitative and quantitative variables
2. Describing each variable separately
3. Describing the relationship between the variables

II. Describing Two Qualitative Variables
1. Side-by-Side pie charts
2. Comparative line charts
3. Comparative bar charts
 Side-by-Side
 Stacked
Key Concepts
III. Describing Two Quantitative Variables
1. Scatterplots
 Linear or nonlinear pattern
 Strength of relationship
 Unusual observations; clusters and outliers
2. Covariance and correlation coefficient
3. The best fitting line
 Calculating the slope and y-intercept
 Graphing the line
 Using the line for prediction
x    y     Question
1    50                                Scatterplot of x vs y

2    80                      120

110

3    105                     100

Utility (y)
90

4    120                     80

70

•    x - Number of Residents              60

50

•
1        2             3       4

y - Monthly Utility                             Number of Residents (x)

1.   Find Covariance between x and y;
2.   Find Correlation between x and y;
3.   Find the Regression line.
x   y     xy
1   50    50                           Solution
2   80    160
3   105   315
Calculate
4   120   480                        x  2.5   s x  1.29
10 355    1005   (sum)               y  88.75 s y  30.7

( xi )(  yi )                  s xy
 xi yi 
s xy                  n                    r
n 1                            sx s y
(10)(355)
1005                                        
39.17
 .99
                 4         39.17               1.29(30.7)
4 1
x    y         ( xi  x )( yi  y )

1    50       58.125                        Solution
2    80       4.375
3    105      8.125
Calculate
4    120      46.875                      x  2.5   s x  1.29
10 355        117.5        (sum)          y  88.75 s y  30.7

s xy   
 ( x  x )( y  y )
i                i
r
s xy
n 1                          sx s y
117.5                                       39.17
        39.17                                       .99
4 1                                    1.29(30.7)
sy
br
Solution                                                           sx
• Regression Line           y  a  bx a  y  bx
sy                                             Fitted Line Plot
br                                                    y = 30.00 + 23.50 x

sx                                   130

120

30.7                                 110

 .99       23.5                          100

Utility (y)
1.29                                 90

80

70

a  y  bx                                 60

50

 88.75  23.5( 2.5)  30                        1      2
Number of Residents (x)
3        4

y  30  23.5 x

```
To top