Document Sample

Measures of Central Tendency By Rahul Jain The Motivation • Measure of central tendency are used to describe the typical member of a population. • Depending on the type of data, typical could have a variety of “best” meanings. • We will discuss four of these possible choices. 4 Measures of Central Tendency • Mean – the arithmetic average. This is used for continuous data. • Median – a value that splits the data into two halves, that is, one half of the data is smaller than that number, the other half larger. May be used for continuous or ordinal data. • Mode – this is the category that has the most data. As the description implies it is used for categorical data. • Midrange – not used as often as the other three, it is found by taking the average of the lowest and highest number in the data set. Also primarily used for continuous data. Measures of Central Tendency • The central tendency is measured by averages. These describe the point about which the various observed values cluster. • In mathematics, an average, or central tendency of a data set refers to a measure of the "middle" or "expected" value of the data set. Mean • To find the mean, add all of the values, then divide by the number of values. x Population • The lower case, Greek N letter mu is used for population mean. x x Sample • An “x” with a bar over n it, read x-bar, is used for sample mean. Mean Example listing X 1 14 2 17 3 31 x-bar 4 28 737/15 = 49.13333 5 42 6 43 7 51 8 51 9 66 10 70 11 67 12 70 13 78 14 62 n = 15 47 total 737 Arithmetic Mean of Group Data • if 1 , z2 , z3 ,.........., zk are the mid-values and z f1 , f 2 , f 3 ,........, f k are the corresponding frequencies, where the subscript ‘k’ stands for the number of classes, then the mean is z fz i i f i Exercise-1: Find the Arithmetic Mean Class Frequency x fx (f) 20-29 3 24.5 73.5 30-39 5 34.5 172.5 40-49 20 44.5 890 50-59 10 54.5 545 60-69 5 64.5 322.5 Sum N=43 2003.5 Median • The median is a number chosen so that half of the values in the data set are smaller than that number, and the other half are larger. • To find the median – List the numbers in ascending order – If there is a number in the middle (odd number of values) that is the median – If there is not a middle number (even number of values) take the two in the middle, their average is the median Median Example listing X listing X 1 14 1 14 2 17 2 17 3 28 3 28 4 31 4 31 5 42 5 42 6 43 6 43 7 47 7 47 8 51 8 51 51+53 = 52 9 51 9 53 2 10 62 10 57 11 66 11 62 12 67 12 66 13 70 13 67 14 70 14 70 15 78 15 70 16 78 Median • The implication of this definition is that a median is the middle value of the observations such that the number of observations above it is equal to the number of observations below it. If “n” is odd If “n” is Even Me X 1 1 Me X n X n 2 ( n 1) 2 2 2 1 Median of Group Data h n M e Lo F fo 2 • L0 = Lower class boundary of the median class • h = Width of the median class • f0 = Frequency of the median class • F = Cumulative frequency of the pre- median class Steps to find Median of group data 1. Compute the less than type cumulative frequencies. 2. Determine N/2 , one-half of the total number of cases. 3. Locate the median class for which the cumulative frequency is more than N/2 . 4. Determine the lower limit of the median class. This is L0. 5. Sum the frequencies of all classes prior to the median class. This is F. 6. Determine the frequency of the median class. This is f0. 7. Determine the class width of the median class. This is h. Example-:Find Median Age in years Number of births Cumulative number of births 14.5-19.5 677 677 19.5-24.5 1908 2585 24.5-29.5 1737 4332 29.5-34.5 1040 5362 34.5-39.5 294 5656 39.5-44.5 91 5747 44.5-49.5 16 5763 All ages 5763 - Mode • The mode is simply the category or value which occurs the most in a data set. • If a category has radically more than the others, it is a mode. • Generally speaking we do not consider more than two modes in a data set. • No clear guideline exists for deciding how many more entries a category must have than the others to constitute a mode. Obvious Example Beach Ball Production • There is 80 obviously more 70 yellow than red 60 or blue. 50 thousands • Yellow is the 40 mode. 30 • The mode is the 20 10 class, not the 0 frequency. blue red yellow Bimodal Geometry Scores For TASP 120 100 80 60 40 20 0 very bad bad neutral good very good No Mode Category Frequency 1 51 70 2 51 60 3 66 50 4 62 40 5 65 30 6 57 20 10 7 47 0 8 43 1 2 3 4 5 6 7 8 9 9 64 • Although the third category is the largest, it is not sufficiently different to be called the mode. Example-2: Find Mean, Median and Mode of Ungroup Data The weekly pocket money for 9 first year pupils was found to be: 3 , 12 , 4 , 6 , 1 , 4 , 2 , 5 , 8 Mean Median Mode 5 4 4 Mode of Group Data 1 M 0 L1 h 1 2 • L1 = Lower boundary of modal class • Δ1 = difference of frequency between modal class and class before it • Δ2 = difference of frequency between modal class and class after • H = class interval Steps of Finding Mode • Find the modal class which has highest frequency • L0 = Lower class boundary of modal class • h = Interval of modal class • Δ1 = difference of frequency of modal class and class before modal class • Δ2 = difference of frequency of modal class and class after modal class Example -4: Find Mode Slope Angle Midpoint (x) Frequency (f) Midpoint x (°) frequency (fx) 0-4 2 6 12 5-9 7 12 84 10-14 12 7 84 15-19 17 5 85 20-24 22 0 0 Total n = 30 ∑(fx) = 265 Midrange • The midrange is the average of the lowest and highest value in the data set. • This measure is not often used since it is based strictly on the two extreme values in the data. Midrange Example X min 14 17 28 31 42 14 + 78 midrange = = 46 43 2 47 51 51 62 66 67 70 70 max 78 0 20 40 60 80 100 120 140 160 180 200 -6.33939635 -5.447617432 -4.555838513 -3.664059595 -2.772280676 -1.880501757 -0.988722839 -0.09694392 0.794834998 1.686613917 2.578392835 3.470171754 4.361950672 5.253729591 Same mean, but y varies more than x. 6.145508509 Measures of Variation 7.037287428 y x Three Measures of Variation • While there are other measures, we will look at only three: – Variance – Standard deviation – Coefficient of variation • Population mean and sample mean use an identical formula for calculation. • There is a minor difference in the formulas for variation. Population Variance • The population variance, σ2, is found using either of the formulas to the right. • The differences are squared to 2 (x ) 2 prevent the sum from being zero N for all cases. • N is the size of the population, μ 2 x 2 2 is the population mean. N • Note that variance is always positive if x can take on more than one value. Population Standard Deviation • The standard deviation can be thought of as the average amount we could expect the x’s in the population to differ from the mean value of the population. • To get the standard deviation, simply take the square root of the variance. Sample Variance • The sample variance, s2, is found using either of the formulas to the right. • The differences are squared to prevent the sum from being zero for all cases. • The sample size is n, x-bar is the sample mean. • Note that n-1 is used rather than n. This adjustment prevents bias in the estimate. Sample Standard Deviation • Just like the standard deviation of a population, to find the standard deviation of a sample, take the square root of the sample variance. Coefficient of Variation • The measures discussed so far are primarily useful when comparing members from the same population, or comparing similar populations. • When looking at two or more dissimilar populations, it doesn’t make any more sense to compare standard deviations than it does to compare means. Coefficient of Variation Cont. • Example 1: Weight loss programs A and B. A B • Two different programs Mean 20 25 with the same goal and (weight target population. loss per • While program B averages more weight loss, it also month) has less consistent results. Standard 15 30 deviation Coefficient of Variation Cont. • Example 2: Weight loss program A and tax refund B. A B • Two different programs with Mean 20 650 different goals and different target populations. • We know that average Standard 15 30 weight loss and average tax deviation refund are not comparable. Are the standard deviations comparable? Coefficient of Variation Cont. • In the last example we can see an argument that standard deviation does not give the complete picture. • The coefficient of variation addresses this issue by establishing a ratio of the standard deviation to the mean. This ratio is expressed as a percentage. 100s 100 CV (sample) or CV (population) x Coefficient of Variation Cont. • Looking at the two examples. We see that in A B both cases the standard deviation for B is twice CV 75% 120% that of A. Example 1 • In the first example we have almost twice the relative variation in B. CV 75% 4.6% • In the second example, we Example 2 have a little over 16 times as much variation in A. Measures of Position The dot on the left is at about -1, the dot on the right is at approximately 0.8. But where are they relative to the rest of the values in this distribution. Quartiles, Percentiles and Other Fractiles • We will only consider the quartile, but the same concept is often extended to percentages or other fractions. • The median is a good starting point for finding the quartiles. • Recall that to find the median, we wanted to locate a point so that half of the data was smaller, and the other half larger than that point. Quartile • For quartiles, we want to divide our data into 4 equal pieces. Suppose we had the following data set (already in order) 2 3 7 8 8 8 9 13 17 20 21 21 Choosing the numbers 7.5, 8.5, and 18.5 as markers would Divide the data into 4 groups, each with three elements. These numbers would be the three quartiles for this data set. Quartiles Continued • Conceptually, this is easy, simply find the median, then treat the left hand side as if it were a data set, and find its median; then do the same to the right hand side. • This is not always simple. Consider the following data set. • 3333356888889 • The first difficulty is that the data set does not divide nicely. • Using the rules for finding a median, we would get quartiles of 3, 6 and 8. • The second difficulty is how many of the 3’s are in the first quartile, and how many in the second? Quartiles Continued • For this course, let’s pretend that this is not an issue. • I will give you the quartiles. • I will not ask how many are in a quartile. Interquartile Range • One method for identifying these outliers, involves the use of quartiles. • The interquartile range (IQR) is Q3 – Q1. • All numbers less than Q1 – 1.5(IQR) are probably too small. • All numbers greater than Q3 + 1.5(IQR) are probably too large. Measures of Variation: Variance & Standard Deviation for GROUPED DATA • The grouped variance is n f X m f X m f Xm X 2 2 2 s 2 2 n n 1 s n 1 • The grouped standard deviation is s s 2 42 Example 3-24 : Miles Run per Week (p130) Find the variance and the standard deviation for the frequency distribution below. The data represents the number of miles that 20 runners ran during one week. Class f Xm f·Xm f·(Xm –X) 5.5 – 10.5 1 1·8 = 8 1(8-24.3)2 = 265.69 8 10.5 – 15.5 2 2·13 = 26 2(13-24.3)2 = 255.38 15.5 – 20.5 3 13 3·18 = 54 3(18-24.3)2 = 119.07 5·23 = 115 5(23-24.3)2 = 8.45 20.5 – 25.5 5 18 4·28 =108 4(28-24.3)2 =54.76 25.5 – 30.5 4 3·33 = 99 3(33-24.3)2 = 227.07 23 2·38 = 76 2(38-24.3)2 = 375.38 30.5 – 35.5 3 28 Σf·Xm= 486 Σ f·(Xm –X) = 1305.80 35.5 – 40.5 2 20 33 38 X f Xm 486 24.3 s2 1305.80 68.726315 n f 20 20 1 s s2 68.726315 8.2901335 8.3 43 Mean Deviation • The mean deviation is an average of absolute deviations of individual observations from the central value of a series. Average deviation about mean k f i xi x MD x i 1 n • k = Number of classes • xi= Mid point of the i-th class • fi= frequency of the i-th class Coefficient of Mean Deviation • The third relative measure is the coefficient of mean deviation. As the mean deviation can be computed from mean, median, mode, or from any arbitrary value, a general formula for computing coefficient of mean deviation may be put as follows: Mean deviation Coefficient of mean deviation = 100 Mean Coefficient of Range • The coefficient of range is a relative measure corresponding to range and is obtained by the following formula: LS Coefficient of range 100 LS • where, “L” and “S” are respectively the largest and the smallest observations in the data set. Coefficient of Quartile Deviation • The coefficient of quartile deviation is computed from the first and the third quartiles using the following formula: Q3 Q1 Coefficient of quartile deviation 100 Q3 Q1 Assignment-1 • Find the following measurement of dispersion from the data set given in the next page: – Range, Percentile range, Quartile Range – Quartile deviation, Mean deviation, Standard deviation – Coefficient of variation, Coefficient of mean deviation, Coefficient of range, Coefficient of quartile deviation Data for Assignment-1 Marks No. of students Cumulative frequencies 40-50 6 6 50-60 11 17 60-70 19 36 70-80 17 53 80-90 13 66 90-100 4 70 Total 70

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 2 |

posted: | 9/14/2012 |

language: | English |

pages: | 49 |

OTHER DOCS BY s60fx45Y

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.