Chap3 by gegouzhen12

VIEWS: 0 PAGES: 26

									Introduction to Probability
      and Statistics
     Twelfth Edition

           Chapter 3
   Describing Bivariate Data
          Bivariate Data
• When two variables are measured on a single
  experimental unit, the resulting data are
  called bivariate data.
• You can describe each variable individually,
  and you can also explore the relationship
  between the two variables.
• Bivariate data can be described with
   – Graphs
   – Numerical Measures
         Graphs for Qualitative
              Variables
• When at least one of the variables is
  qualitative, you can use comparative pie charts
  or bar charts.
Do you think that men and women are   Variable #1 = Opinion
  treated equally in the workplace?
                                      Variable #2 = Gender
                      Men                   Women
                             Comparative Bar Charts
              120                                     Gender                  70
                                                      Men
                                                      Women                   60
              100

                                                                              50
               80




                                                               Percent
                                                                              40
Percent




               60
                                                                              30

               40
                                                                              20


               20                                                             10


                                                                                0
                 0                                                       Gender     Men Women   Men Women   Men Women
          Opinion    Agree    Disagree   No Opinion
                                                                         Opinion      Agree      Disagree   No Opinion




              • Stacked Bar Chart         • Side-by-Side Bar Chart
                Describe the relationship between opinion and
                gender:
                        More women than men feel that they are
                        not treated equally in the workplace.
                 Line Charts
Year Income Expenses                             Income vs Expenses
                              55
     ($000) ($000)                                                            Variable
                                                                              Income
                                                                              Expense
1998 42     33                50



                              45
1999 44     35         $000


                              40
2000 47     40
                              35
2001 51     43
                              30
                                   1998   1999       2000       2001   2002
2002 53     44                                       Year
Two Quantitative Variables
When both of the variables are quantitative, call
one variable x and the other y. A single
measurement is a pair of numbers (x, y) that
can be plotted using a two-dimensional graph
called a scatterplot.
          y
                                    (2, 5)

       y=5

                                x
                     x=2
   Describing the Scatterplot
• What pattern or form do you see?
  • Straight line upward or downward
  • Curve or no pattern at all
• How strong is the pattern?
  • Strong or weak
• Are there any unusual observations?
  • Clusters or outliers
         Lines & Slopes (Recall)
                    Line y=-1 + 0.5x
    1.0



    0.5
                                              • Upward (uphill)
y   0.0
                                              • Positive slope
    -0.5



    -1.0

                0        1   2        3   4
                             x



                     Line y = 1 - x
    2



    1                                         • Downward (downhill)
                                              • Negative slope
y   0



    -1



    -2

           -1        0       1        2   3
                             x
                 Examples



Positive linear - strong   Negative linear -weak




     Curved pattern          No relationship
  Numerical Measures for
 Two Quantitative Variables
• Assume that the two variables x and y exhibit a
  linear pattern, relationship or form. (All
  data points lying around a line.)
• There are two numerical measures to describe
   – The strength and direction of the linear
     relationship between x and y.
   – The form of the relationship.
   The Correlation Coefficient
• The strength and direction of the linear
  relationship between x and y are measured using
  the correlation coefficient r.
       s xy                 ( xi  x )( yi  y )
 r                  s xy 
      sx s y                       n 1
                                     ( xi )(  yi )
          where             xi yi 
                    s xy                  n
 Covariance sxy
                                     n 1
 sx = Standard Deviation of the x’s
 sy = Standard Deviation of the y’s
                     Example
• Living area x and selling price y of 5 homes.
 Residence            1       2        3      4      5
 x (hundred sq ft)    14      15       17     19     16
 y ($000)             178     230      240    275    200



                                  •The scatterplot
                                  indicates a positive
                                  linear relationship.
x
14
     y
     178
             xy
             2492
                         Example
15   230     3450              Calculate
17   240     4080
                               x  16.2 s x  1.924
19   275     5225
16   200     3200
                               y  224.6 s y  37.360
81   1123    18447


                 ( xi )(  yi )              s xy
        xi yi                         r
s xy                  n                     sx s y
                 n 1
                                              63.6
          (81)(1123)                                   .885
  18447                                  1.924(37.36)
              5      63.6
           4
         Interpreting r
•-1  r  1    Sign of r indicates direction
               of the linear relationship.

•r  0         Weak relationship; random
               scatter of points


•r  1 or –1   Strong relationship; either
               positive or negative


•r = 1 or –1   All points fall exactly on a
               straight line.
            Example
•r = -0.1   Very weak negative linear
            relationship

•r = 0.93   Very strong positive linear
            relationship


•r = -0.4   Weak negative linear
            relationship

            All points fall exactly on a
•r = –1     downward straight line.
 x     y      Bivariate Data & Scatterplot
 1     50                                              Scatterplot of x vs y

 2     80                                    120

                                             110

                                             100

 3     105




                               Utility (y)
                                             90

                                             80


 4     120                                   70

                                             60



• x - Number of Residents
                                             50

                                                   1        2                    3   4
                                                        Number of Residents (x)

      (independent variable)                             Fitted Line Plot
• y - Monthly Utility                        130
                                                           y = 30.00 + 23.50 x



                                             120
      (dependent variable)                   110

                                             100
                               Utility (y)


                                             90


• Strong Positive Linear                     80

                                             70

                                             60

                                             50


• Best fitting line?                               1        2
                                                        Number of Residents (x)
                                                                                 3   4
           The Regression Line
• The form of the linear relationship between x
  and y can be described by fitting a line as best as
  we can through the points. This ‘best’ fitting line
  is the (least squares) Regression Line:

  y = a + b x. (Intercept-slope form)

  a is the y-intercept of the line
  b is the slope of the line
  The Regression Line
 • To find the slope and y-intercept of
   the best fitting line, use:

          sy
    br
          sx
    a  y  bx

• The least squares
  regression line is y = a + bx
x    y
14 178
                    Example
15 230               Recall
17 240               x  16.2   s x  1.9235
19 275
                     y  224.6 s y  37.3604
16 200
                     r  .885


          sy        37.3604
    br     (.885)          17.189
        sx          1.9235
a  y  bx  224.6  17.189(16.2)  53.86
    RegressionLine : y  53.86  17.189x
                 Example
• Predict the selling price for another
  residence with 16 thousand square feet
  of living area.




 Predict: y  53.86  17.189 x
    53.86  17.189(16)  221.16 or $221,160
                 Key Concepts
I. Bivariate Data
   1. Both qualitative and quantitative variables
  2. Describing each variable separately
  3. Describing the relationship between the variables

II. Describing Two Qualitative Variables
   1. Side-by-Side pie charts
   2. Comparative line charts
   3. Comparative bar charts
     Side-by-Side
     Stacked
              Key Concepts
III. Describing Two Quantitative Variables
   1. Scatterplots
    Linear or nonlinear pattern
    Strength of relationship
    Unusual observations; clusters and outliers
  2. Covariance and correlation coefficient
  3. The best fitting line
    Calculating the slope and y-intercept
    Graphing the line
    Using the line for prediction
             x    y     Question
             1    50                                Scatterplot of x vs y

             2    80                      120

                                          110


             3    105                     100




                            Utility (y)
                                          90


             4    120                     80

                                          70



•    x - Number of Residents              60

                                          50



•
                                                1        2             3       4

     y - Monthly Utility                             Number of Residents (x)



1.   Find Covariance between x and y;
2.   Find Correlation between x and y;
3.   Find the Regression line.
x   y     xy
1   50    50                           Solution
2   80    160
3   105   315
                                     Calculate
4   120   480                        x  2.5   s x  1.29
10 355    1005   (sum)               y  88.75 s y  30.7


                  ( xi )(  yi )                  s xy
         xi yi 
 s xy                  n                    r
                  n 1                            sx s y
           (10)(355)
  1005                                        
                                                    39.17
                                                             .99
                 4         39.17               1.29(30.7)
          4 1
x    y         ( xi  x )( yi  y )

1    50       58.125                        Solution
2    80       4.375
3    105      8.125
                                          Calculate
4    120      46.875                      x  2.5   s x  1.29
10 355        117.5        (sum)          y  88.75 s y  30.7



    s xy   
              ( x  x )( y  y )
                     i                i
                                                  r
                                                        s xy
                         n 1                          sx s y
             117.5                                       39.17
                   39.17                                       .99
              4 1                                    1.29(30.7)
                                                                                 sy
                                                            br
              Solution                                                           sx
• Regression Line           y  a  bx a  y  bx
      sy                                             Fitted Line Plot
br                                                    y = 30.00 + 23.50 x

      sx                                   130

                                           120

      30.7                                 110

 .99       23.5                          100




                             Utility (y)
      1.29                                 90

                                           80

                                           70

a  y  bx                                 60

                                           50

 88.75  23.5( 2.5)  30                        1      2
                                                     Number of Residents (x)
                                                                             3        4




    y  30  23.5 x

								
To top