STAT 3321 Mod1 HW

Document Sample
STAT 3321 Mod1 HW Powered By Docstoc
					                                STAT 3321
                            Module 1 – Homework
Chapter 1
Problems 1, 2, 3, 6

      What is the difference between a sample and a population?
      List three example populations.
      List three example samples.
      What is the goal of sampling?
      What is an inference?
      How can we improve an inference?
      What is sample error?
      How can we reduce sampling error?
      What is bias?

Chapter 2
Problem 100 (use the following 11 classes)


      For the data in the spreadsheet AUTOS2005.xls.

      Sort the data by MPG.

      Run descriptive statistics. Interpret the results we have discussed. Put the
      descriptive statistics in a new sheet named “Descriptive Stats”. Select Tools,
      Data Analysis, Descriptive Statistics, Provide the Input Range, Give the New
      Worksheet Name, Select the Summary Statistics Box.

      Group the data in a new sheet (Insert, Worksheet) titled distributions.

      (use a class width of 5)
 Class      Interval    f        F         p         P

         1 10<= X <15        6        6    4.96%    4.96%

         2 15<= X <20       45       51   37.19%   42.15%

      Construct a frequency distribution. Plot/graph it, interpret it.

      Construct a cumulative percentage distribution. Be able to interpret.

                Chapter 3
Problems 56, 57, 62, 66abc, 67
3Add1. You are advising your mother how she can invest her money in the safest way possible.
The investment consultant has narrowed the decision to two investment funds which have been
in existence for five years. He gives you the following data on the ROI to help you in making the

                                    YEAR    FUND A        FUND B
                                    2003     12%           13%

                                    2002      10%           12%

                                    2001      13%           14%

                                    2000      09%           10%

                                    1999      11%           06%

        Which fund is safest for your mother and why?

3Add2. The Los Angeles Times regularly reports the air quality index for various areas of
Southern California. A sample of the air quality index for Pomona provided the following data:

        28, 42, 58, 48, 45, 55, 60, 49, 50 (measured amount of pollutants in the air)

        Compute the range and interquartile range and explain the meaning of the IQR.

        Compute the variance and standard deviation.

3Add3. Using the ordered array of three-year annualized returns for 14 funds below:

                9.77 11.35 12.46 13.80 15.47 17.48 18.37 18.47 18.61 20.72
                21.49 22.47 31.50 38.16

        Compute the IQR.
        Where is the 2        quartile.

Module 1 – Answers
Chapter 1
   1. A population contains all the items of interest whereas a sample contains
   only a portion of the items in the population.
   2. A statistic is a summary measure describing a sample whereas a
   parameter is a summary measure describing an entire population.
   3. Descriptive statistical methods deal with the collection, presentation,
   summarization, and analysis of data whereas inferential statistical methods
   deal with decisions arising from the projection of sample information to the
   characteristics of a population.
   6. a. all the fulltime first-year students at the University
      b. those 2821 students who responded to the survey
      c. proportion in the population of all the fulltime first-year students who
   studied with other students
      d. 90.1% of students in the sample who indicated they had studied with
   other students
   A. A population is an entire collection of observations. We are interested in
      finding out characteristics of a population. A sample is a representative
      portion of the population used for making inferences about a population.
   B. All students at a university. All U.S. families. All employees at a
   C. A representative sample of . . .
   D. To make good inferences about a population. We use samples to try to
      describe a population.
   E. An inference is the use of a sample to make some conclusion about a
   F. We can improve our results by taking large and random samples from a
   G. The difference between our guess (from using a sample) and the truth.
      The difference between a statistic and the corresponding parameter value.
   H. By taking large, representative, and random samples.
   I. Sampling bias is the tendency to take or favor certain sample elements
      over others. A good, random sample would not have bias. We want
      samples with minimal (or no) bias.

Chapter 2
2.100   (a)

2.100   (b)

        (c)   The alcohol % is concentrated between 4 and 6, with more between 4 and 5. The
              calories are concentrated between 140 and 160. The carbohydrates are
              concentrated between 12 and 15. There are outliers in the percentage of alcohol
              in both tails. The outlier in the lower tail is due to the non-alcoholic beer
              O'Doul's with only a 0.4% alcohol content. There are a few beers with alcohol
              content as high as around 10.5%. There are a few beers with calories content as
              high as around 302.5 and carbohydrates as high as 31.5.
              There is a strong positive relationship between percentage alcohol and calories,
              and calories and carbohydrates and a moderately positive relationship between
              percentage alcohol and carbohydrates.

Chapter 2 – Add

Chapter 3
3.56   The range is a simple measure, but only measures the difference between the extremes.
       The interquartile range measures the range of the center fifty percent of the data. The
       standard deviation measures variation around the mean while the variance measures the
       squared variation around the mean, and these are the only measures that take into account
       each observation. The coefficient of variation measures the variation around the mean
       relative to the mean. The range, standard deviation, variance and coefficient of variation
       are all sensitive to outliers while the interquartile range is not.

3.57   The empirical rule relates the mean and standard deviation to the percentage of values
       that will fall within a certain number of standard deviations of the mean.

  Position      Cals            Fat
     1                70               0.5
     2                90               1.5
     3                90               1.5
     4               100               2.5
     5               100               3.0
     6               110               3.0
     7               110               3.5
     8               120               3.5
     9               120               4.5
    10               120               5.0
    11               130               6.0
    12               140               6.0

Mean              108.33              3.38
Median            110.00              3.25
Q1 Value           90.00              1.50
Q3 Value          120.00              5.00
IQR                30.00              3.50

Range              70.00           5.50
Stdev              19.46           1.76
CV               17.97%         52.14%

             The distribution of total fat quite symmetrical, calories slightly L skewed.

3Add1.   A=B=11%
         A=1.41% (safest, least variability in returns)


3Add2.   Range=32
         IQR=size of the middle 50%=13


3Add3.   IQR=7.69%
         Q2 location is 7.5, value is 18.42

                        Descriptive Statistics Workshop

A multitude of statistics are kept and analyzed for almost every sport. The 2004
salaries for major league baseball are archived on our class website. The idea of
this workshop is for you to perform a basic statistical analysis of the salary data.
You are interested in “explaining” the characteristics of this data set in a
language that anyone could understand. The data set includes the salaries for
all the players.

   1. Would you consider this data to be continuous or discrete? Why?
   2. What is the shape of the distribution of the data? Sketch it.
   3. What is the mean and standard deviation for all the players? Use
      appropriate symbols. What are these computations telling us about the
   4. A random sample of 15 players was taken. Determine the mean and
      standard deviation for the sample. Use appropriate symbols/notation.
      Determine the sampling error.

Perform Excel output for your analysis.
Be able to clearly explain your analysis and results

Potentially helpful formula:

=average(cell range)           mean
=stdevp(cell range)            population standard deviation
=stdev(cell range)             sample standard deviation


Shared By: