Module 1 – Homework
Problems 1, 2, 3, 6
What is the difference between a sample and a population?
List three example populations.
List three example samples.
What is the goal of sampling?
What is an inference?
How can we improve an inference?
What is sample error?
How can we reduce sampling error?
What is bias?
Problem 100 (use the following 11 classes)
For the data in the spreadsheet AUTOS2005.xls.
Sort the data by MPG.
Run descriptive statistics. Interpret the results we have discussed. Put the
descriptive statistics in a new sheet named “Descriptive Stats”. Select Tools,
Data Analysis, Descriptive Statistics, Provide the Input Range, Give the New
Worksheet Name, Select the Summary Statistics Box.
Group the data in a new sheet (Insert, Worksheet) titled distributions.
(use a class width of 5)
Class Interval f F p P
1 10<= X <15 6 6 4.96% 4.96%
2 15<= X <20 45 51 37.19% 42.15%
Construct a frequency distribution. Plot/graph it, interpret it.
Construct a cumulative percentage distribution. Be able to interpret.
Problems 56, 57, 62, 66abc, 67
3Add1. You are advising your mother how she can invest her money in the safest way possible.
The investment consultant has narrowed the decision to two investment funds which have been
in existence for five years. He gives you the following data on the ROI to help you in making the
YEAR FUND A FUND B
2003 12% 13%
2002 10% 12%
2001 13% 14%
2000 09% 10%
1999 11% 06%
Which fund is safest for your mother and why?
3Add2. The Los Angeles Times regularly reports the air quality index for various areas of
Southern California. A sample of the air quality index for Pomona provided the following data:
28, 42, 58, 48, 45, 55, 60, 49, 50 (measured amount of pollutants in the air)
Compute the range and interquartile range and explain the meaning of the IQR.
Compute the variance and standard deviation.
3Add3. Using the ordered array of three-year annualized returns for 14 funds below:
9.77 11.35 12.46 13.80 15.47 17.48 18.37 18.47 18.61 20.72
21.49 22.47 31.50 38.16
Compute the IQR.
Where is the 2 quartile.
Module 1 – Answers
1. A population contains all the items of interest whereas a sample contains
only a portion of the items in the population.
2. A statistic is a summary measure describing a sample whereas a
parameter is a summary measure describing an entire population.
3. Descriptive statistical methods deal with the collection, presentation,
summarization, and analysis of data whereas inferential statistical methods
deal with decisions arising from the projection of sample information to the
characteristics of a population.
6. a. all the fulltime first-year students at the University
b. those 2821 students who responded to the survey
c. proportion in the population of all the fulltime first-year students who
studied with other students
d. 90.1% of students in the sample who indicated they had studied with
A. A population is an entire collection of observations. We are interested in
finding out characteristics of a population. A sample is a representative
portion of the population used for making inferences about a population.
B. All students at a university. All U.S. families. All employees at a
C. A representative sample of . . .
D. To make good inferences about a population. We use samples to try to
describe a population.
E. An inference is the use of a sample to make some conclusion about a
F. We can improve our results by taking large and random samples from a
G. The difference between our guess (from using a sample) and the truth.
The difference between a statistic and the corresponding parameter value.
H. By taking large, representative, and random samples.
I. Sampling bias is the tendency to take or favor certain sample elements
over others. A good, random sample would not have bias. We want
samples with minimal (or no) bias.
(c) The alcohol % is concentrated between 4 and 6, with more between 4 and 5. The
calories are concentrated between 140 and 160. The carbohydrates are
concentrated between 12 and 15. There are outliers in the percentage of alcohol
in both tails. The outlier in the lower tail is due to the non-alcoholic beer
O'Doul's with only a 0.4% alcohol content. There are a few beers with alcohol
content as high as around 10.5%. There are a few beers with calories content as
high as around 302.5 and carbohydrates as high as 31.5.
There is a strong positive relationship between percentage alcohol and calories,
and calories and carbohydrates and a moderately positive relationship between
percentage alcohol and carbohydrates.
Chapter 2 – Add
3.56 The range is a simple measure, but only measures the difference between the extremes.
The interquartile range measures the range of the center fifty percent of the data. The
standard deviation measures variation around the mean while the variance measures the
squared variation around the mean, and these are the only measures that take into account
each observation. The coefficient of variation measures the variation around the mean
relative to the mean. The range, standard deviation, variance and coefficient of variation
are all sensitive to outliers while the interquartile range is not.
3.57 The empirical rule relates the mean and standard deviation to the percentage of values
that will fall within a certain number of standard deviations of the mean.
Position Cals Fat
1 70 0.5
2 90 1.5
3 90 1.5
4 100 2.5
5 100 3.0
6 110 3.0
7 110 3.5
8 120 3.5
9 120 4.5
10 120 5.0
11 130 6.0
12 140 6.0
Mean 108.33 3.38
Median 110.00 3.25
Q1 Value 90.00 1.50
Q3 Value 120.00 5.00
IQR 30.00 3.50
Range 70.00 5.50
Stdev 19.46 1.76
CV 17.97% 52.14%
The distribution of total fat quite symmetrical, calories slightly L skewed.
A=1.41% (safest, least variability in returns)
IQR=size of the middle 50%=13
Q2 location is 7.5, value is 18.42
Descriptive Statistics Workshop
A multitude of statistics are kept and analyzed for almost every sport. The 2004
salaries for major league baseball are archived on our class website. The idea of
this workshop is for you to perform a basic statistical analysis of the salary data.
You are interested in “explaining” the characteristics of this data set in a
language that anyone could understand. The data set includes the salaries for
all the players.
1. Would you consider this data to be continuous or discrete? Why?
2. What is the shape of the distribution of the data? Sketch it.
3. What is the mean and standard deviation for all the players? Use
appropriate symbols. What are these computations telling us about the
4. A random sample of 15 players was taken. Determine the mean and
standard deviation for the sample. Use appropriate symbols/notation.
Determine the sampling error.
Perform Excel output for your analysis.
Be able to clearly explain your analysis and results
Potentially helpful formula:
=average(cell range) mean
=stdevp(cell range) population standard deviation
=stdev(cell range) sample standard deviation