Environmental Health: Science, Policy and Social Justice
Fall quarter 2008
Sampling and Estimation
(due Friday Nov. 14)
A discussion of some simulation experiments appears in Chapter 7 of your custom statistics text
(Kuzma-Bohnenblust). You need to go through the assignment individually. As it is a simulation
using a random number generator, the odds of two results being the same are astonishingly low…
By using Excel’s Random Number Generation Function and do this as a simulation, we can
generate many independent random samples, which would be difficult with actual data sets. You
will generate 100 samples with n = 10 are taken from a population with = 100 and = 25.
Before you start Random Number generation, you need to make sure that Excel’s recalculation
option is disabled. If it is not, ever time you hit the Enter key, Excel will generate a new set of
random digits. Go to Tools Options Calculation, and toggle on the Manual option. (For
Macs: got to Excel menu Preferences Calculation check on “Manual”) As indicated in
that dialog box, if you want a new set of random numbers, all you need to do is hit the F9 key
(“apple” “+” “t” in Macs). Make sure that the Recalculate before save option is also turned off.
If you see a check mark in the box, click on the box to turn off the feature. Note: The manual
calculation mode also affects the auto fill function. When you click and drag, the new cells will
look the same as the cells you are copying. Simply hit F9 (“apple” “+” “t” in Macs) to complete
the auto fill.
From the menu bar, go to Tools, then select Data Analysis, then Random Number Generation.
In the window that opens, first select the type of Distribution: for this exercise select “Normal”.
In the Number of Variables enter 10 (this is the number of observations per trial, or sample
size), and in Number of Random Variables enter 100 (this is the number of trials, or samples).
You will need to add a random seed, or everyone’s results will come out the same (if you
remember the number, and reinsert it, EXCEL will generate the same series of random numbers,
providing you keep all of the other options the same). Lastly, you need to give the mean and
standard deviation of the distribution you are trying to simulate; remember that your “samples”
are taken from a population with = 100 and = 25.
The numbers that are generated will likely have a “General” format. You should go back and
reformat them as “Number,” with 2 or 3 decimal places – number of decimal places isn’t that
critical. Reformatting is done in the Format→Cells menu. Each sample of 10 observations will
occupy a row on your EXCEL worksheet, utilizing columns A through J. After EXCEL
generates 100 samples of size 10, insert a row or two at the top of the page, so you can add
column labels. Or, you can tell EXCEL to start generating the numbers in cell A3. You should
complete this assignment using the same block of random numbers for all three parts of this
simulation of sampling from the normal distribution.
This part shows that the sample mean x is an unbiased estimator of the population mean, .
1. At the end of each row (use column L, leave column K blank), use the Average function
to determine the mean, x, of each of the 100 samples. Don’t forget the Fill handle!
2. Make a histogram of your 100 sample means: See instructions at the end of the
document. How does you histogram compare to statement #1 of the Central Limit
3. Now at the bottom of your list of means in column L, find the mean of your 100 sample
means: use the average function for the mean of all sample means. How does it compare
to the population mean, ? How do the results of your simulation compare to statement
#2 of the Central Limit Theorem (p.82)?
4. What conclusion can you draw from this experiment?
This part demonstrates that the sample variance, s2, is an unbiased estimator of the population
1. For ease of interpretation, we will use both the =STDEV command and the =VAR
command to obtain information on variability. For each of the 100 samples, calculate the
standard deviation (s) in column M and the variance (s2) in column N, if you are using
the proposed cell-location format.
2. Calculate the average standard deviation and average variance of the 100 samples that
you generated. How does it compare to the population standard deviation, and
variance, 2, (i.e. the value used in the Random Number Generation)?
3. Calculate the standard deviation and variance of the 100 sample means. How do the
results of your simulation compare to statement #3 of the Central Limit Theorem (p.82)?
4. What is the standard deviation and variance for all 1000 observations? How do they
compare to the values based on averaging the 100 sample standard deviations and
5. Create a histogram of all 1000 observations following the instructions at the end of the
document. Compare with the histogram of the means you created in step 2 of Part I. Pay
attention to the scale of the x-axis! You may re-do the histogram of the means using the
same range of bins that you used to make the histogram of all 1000 observations: now the
difference should be clear!
6. What conclusion can you draw from this experiment?
Part III (optional, but highly recommended!)
This part demonstrates one of the fundamental concepts of statistical inference: the interval
estimate based on a sample does not always include the value of the parameter we are
trying to estimate! The objective of this simulation is to show that the interval estimates are
correct most of the time. We will use a 90% confidence interval estimate for our samples. The
90% confidence interval estimator work should be in columns O, P, and Q.
1. In column O, calculate the lower confidence limit (LCL) = x – 1.645(/n), e.g. “=M3-
1.645*(25/SQRT(10))”. Remember that x (x bar) is the mean of the sample for any
given row, not the population mean. In this experiment, however, we assume that we
know the population variance (2), so the standard deviation () is known, and is thus a
constant, = 25.
2. In column P, calculate the upper confidence limit (UCL) = x + 1.645(/n), e.g.
3. In column Q, use the =AND command e.g. “=AND($M$107>P3,$M$107<Q3)”,
(instructor or stats aid can demonstrate) to have EXCEL determine if the population
mean (i.e. µ = 100) is within the 90% confidence interval.
4. Use the =COUNTIF command to have EXCEL count the number of 90% confidence
intervals that contain µ=100 (TRUE), e.g. “=COUNTIF(R3:R102, TRUE)”. Also count
the number of times that µ doesn’t fall within the 90% confidence interval (FALSE), e.g.
“=COUNTIF(R3:R102, FALSE)” to make sure that the sum of those that do and those
that don’t = 100.
5. What conclusions can you draw from this experiment using the 90 % confidence
In the Workspace folder for our program, you will find a folder labeled Results. In that folder,
there is an EXCEL file Estimation Simulation-program results. After you have completed
your simulations, add your name and place your results in the appropriate columns.
Creating a histogram in Excel
a. You may let Excel create a column chart for you, using its default settings. From
the menu bar go to Tools - Data Analysis - Histogram. In the dialog box, enter the
input range for the “means” values. Leave the bin range blank. Make sure that you also
select the Output range (the place where excel will display the frequency table of the
histogram) or select New Worksheet Ply if you want to display these in a separate
worksheet), and also check the box for Chart Output, so that Excel displays the
histogram. In some situations, a chart like this could be useful, but you can improve the
formatting by creating your own categories (bins), as described below.
b. If you want to have more control over the quality of the histogram, you may want to
create your own categories (bins). Use the =max and =min functions to determine the
range of values of your data. This should aid in setting up your bins. Once again,
you should have approximately n bins. In one column (name it “bins”) create a series
of bins of 5, starting with the next “5” number above the minimum of your range of data
(eg, if your minimum is 32, start at 35) and going up to the maximum of your range of
data. When you make the histogram now, you can specify the cell range for the bins that
you created in the “Bins” cell of the dialogue box, and the same data range for the
“Input” cell as above (Excel sorts them for you!). Select Output range or New
Worksheet Ply and check the box for Chart Output, as above.