Summary Descriptive Measures by rogerholland

VIEWS: 4 PAGES: 11

									                   Summary Descriptive Measures


                       Projects Completed Early

                  35
                  30
     Percentage




                  25
                  20
                  15
                  10
                   5
                   0




                                                                 0
                  10

                       20

                            30

                                 40

                                      50

                                           60

                                                70

                                                     80

                                                          90
                                                               10
                                       Percent



Location is an indicator of where the data is located.
                      Projects Completed Early

               40

               30

            % 20

                10
                                                    Plant B
                 0
                     10 15 20                     Plant A
                              25 30 35
                                       40 45 50
                         Percent



Scale is a measure of how “spread out” data is.
  Criteria for Measures of Location and Scale

Must be well defined for:   Raw Data

                            Grouped Data

                            Theoretical Curves


For Business Purposes:      Must be arithmetic
                Measures of Location


                            Mode
            Simply the most frequent value in a data set.


Problems:

Raw Data:   Many data sets have no repeat values, therefore mode does not
            exist.
Grouped Data:    Mode is taken as midpoint of the bin with the greatest
                 frequency.

                 But consider the data discussed in the last lecture.




                                      Histogram of Labor Costs

                            30

                            25
                Frequency




                            20

                            15

                            10

                            5

                            0
                                 20   30     40        50        60    70   80
                                                  Labor Cost




                                      Histogram of Labor Costs

                            35
                            30
                            25
                Frequency




                            20
                            15
                            10
                            5
                            0
                                 25     35        45        55        65    75
                                                  Labor Costs
Theoretical Data: Mode may not exist; consider the theoretical distribution of
random numbers which should look like:




                           Uniform Density Function

                    1.2
                      1
                    0.8
             f(x)




                    0.6
                    0.4
                    0.2
                      0
                      0




                                                                        1
                          1

                               2

                                    3

                                          4

                                              5

                                                   6

                                                        7

                                                              8

                                                                   9
                          0.

                               0.

                                    0.

                                         0.

                                              0.

                                                   0.

                                                        0.

                                                             0.

                                                                  0.
                                         x= random number
                          Measures of Location

                                      Median
       The median is that data value which has approximately the same percentage
of observations below it as above it (for large data sets this proportion will approach
50%).

      The word “median” comes from the Latin word “medius”, meaning
“middle”.


       Raw Data:

              Finding the median from raw data is a two step process. First you
must put the data in order, then you need to find the middle value.

       Example:      Data = 3, -1, 6, 10, 11

                     Ordered Data = -1, 3, 6, 10, 11

                     Median = 6

                     If sample size is odd then median will be the value occupying
                     position (n+1)/2 in the ordered data.


       Example:      Data = 3, -1, 6, 10, 11, 7

                     Ordered Data= -1, 3, 6, 7, 10, 11

                     Median = any value between 6 and 7. Usually average two
                     points to get 6.5 .

                     If sample size is even then median is the arithmetic average of
                     the values occupying positions (n/2) and (n/2) +1 in the ordered
                     data.

Notice: Median is not computed, it is found. For example replace the value of 11 in
the above example by 12,000. The median remains 6.5

              Cannot be manipulated algebraically.
      Finding the Median of Raw Data Using EXCEL


Open the file “thickdat.xls” in the MBA Mod 1 folder.

Find an empty cell and type in =median(

Then highlight the range of the data. You should see something that looks like the
following:




Finally, type in the right parenthesis.

The result is 355 which is the average of the 30th and 31st values, both of which
happen to be 355.
            Finding the Median from Grouped Data


Suppose you did not have the raw data for steel thickness, but only had the data
grouped as shown below:


                                        m(i)         f(i)
                  Interval            Midpoint      Freq         F


                   341.5      344.5      343         1          1
                   344.5      347.5      346         3          4
                   347.5      350.5      349         8          12
                   350.5      353.5      352         8          20
                   353.5      356.5      355         20         40
                   356.5      359.5      358         13         53
                   359.5      362.5      361         5          58
                   362.5      365.5      364         2          60


Using the column labeled “F”, it is clear that the 30th and 31st observations lie in the
interval [353.5 to 356.5].

Altogether there are 20 observations in the interval [353.5 to 356.5].

Since there are 20 observations below 353.5, we need 10 more to get to the 30th
value.

ASSUMPTION:           The data points in the interval are equi-spaced throughout the
                      interval

To get the 30th value, we need to go 10/20ths (or .5) into the interval. Since the bin is
3 units wide, we need to go a distance of (10/20)*3 = 1.5 into the interval. Therefore
we estimate the 30th value as 353.5 + 1.5 = 355

To get the 31st value, we need to go 11/20ths (or .55) into the interval. Since the bin
is 3 units wide, we need to go a distance of (11/20)*3 = 1.65 into the interval.
Therefore we estimate the 31st value as 353.5 + 1.65 = 355.15.

The median is estimated as median = (355 + 355.15)/2 = 355.075.
     Finding the Median From Theoretical Probability
                      Distributions

        If f(x) is the probability density function of x, the median is that value med
satisfying the integral equation:



                                     med


                                     f ( x)dx .5
                                   
                     Problems with the Median
      Suppose you had two groups of people. In Group 1 you had 50 people with a
median hourly wage of $15.00 per hour. In Group 2 you had 100 people with a
median hourly wage of $17.00 per hour. Given this information can you determine
the median hourly wage of all 150 people?




      Consider the following data:


                               Time 1     Time 2 change

                                     5     4        -1
                                     10    12       2
                                     15    18       3
                                     20    19       -1
                                     25    23       -2

                     median          15    18       -1

                              Change in median is 18 -
                              15 =3

                              Median Change
                              is -1

								
To top