EC 303 Descriptive Statistics
What are descriptive statistics? Numerical values that help measure the location, centrality, dispersion [variability], and shape of a population or a sample Samples and populations are described by their centrality (location), dispersion, and shape. There are three measures of central location. For each measure the mechanics of the calculation are the same for both population parameters and sample estimate. However the differences are significant. Consequently, the equations use different symbols. What are the three measures and their equations? Mean The average value of the population or sample Population Mean = = xi/ N Sample Mean = x-bar = xi/n Median The middle of an ascending [or descending] array of numbers Population Median = Center of N observations in ascending order Sample Median = Center of n observations in ascending order [If there is an even number of numbers, use the average of the two numbers in the middle.] The most frequently observed number Could have none or more than one (bimodal, multimodal) Population Mode = Most of N observations Sample Mode = Most of n observations
Mode
Why do we consider 3 different measures of Central Tendency? For example, grading
It is based on the user's need.
The SAT scores are curved such that the mean score is 500 on each section. In the last forty years the SAT scores had to be re-centered so that the mean would be 500 on each section. For example, competition If only half of the teams at a district cheerleading competition move on to regional competition. The median score determines who moves forward.
For example, marketing The tan “M&M” was considered undesirable. Mars Candy held a competition to determine the color to replace the tan color. Blue had the most observations. [Mode can be applied to non-numerical data.] What is dispersion? Dispersion is a measure of how far the observations are spread out. Dispersion is traditionally called variability. It describes how the observations fall relative to the “center.” Although there are more measures, the two that we are going to discuss are range and variance [standard deviation]. Skip percentiles and quartiles on pages 73 – 76. Range measures the distance between the two extreme observations. Population Range = Highest of N Observation – Lowest of N Observations Sample Range = Highest of n Observation – Lowest of n Observations Variance measures the distances between each observation and the mean. The computation of variance is different for samples and populations. For a population Theoretical Practical For a sample Theoretical Practical Quick Look Sample 1 Sample 2
2 = (xi - ) 2
N = ((xi) 2/N) - 2
2
s2 = (xi - xbar) 2 n-1 2 s = ((xi) 2 – n xbar2) n-1
{1, 2, 3, 4, 5} {2, 3, 3, 3, 4}
Calculate sample means. Calculate sample variance. What can be said about these samples? What is the implication of a higher variance? As the variance increases the amount of uncertainty in the data and in any inference that is made from the data increases.
What is the standard deviation? To express dispersion [variability] in the original terms [eliminate the squared terms], the standard deviation is the positive square root of the population variance. = 2 Similarly, the sample deviation is the positive square root of the sample variance. s = s2 Coefficient of Variation: Expresses how large the standard deviation is, relative to the mean.
st .dev. CV = 100 mean
As such, the CV is a percentage. Examples: CV1 = [1.58/3]100 = 52.67 “Sample 1’s standard deviation is 52.67% of its mean.” (a little more than half) CV2 = [.707/3]1000 = 23.57 “Sample 2’s standard deviation is 23.57% of its mean.”(almost one-quarter)