Document Sample

Environmental Health: Science, Policy and Social Justice Fall quarter 2008 CAL workshop Week 6 Sampling and Estimation (due Friday Nov. 14) Normal Distribution A discussion of some simulation experiments appears in Chapter 7 of your custom statistics text (Kuzma-Bohnenblust). You need to go through the assignment individually. As it is a simulation using a random number generator, the odds of two results being the same are astonishingly low… By using Excel’s Random Number Generation Function and do this as a simulation, we can generate many independent random samples, which would be difficult with actual data sets. You will generate 100 samples with n = 10 are taken from a population with = 100 and = 25. Before you start Random Number generation, you need to make sure that Excel’s recalculation option is disabled. If it is not, ever time you hit the Enter key, Excel will generate a new set of random digits. Go to Tools Options Calculation, and toggle on the Manual option. (For Macs: got to Excel menu Preferences Calculation check on “Manual”) As indicated in that dialog box, if you want a new set of random numbers, all you need to do is hit the F9 key (“apple” “+” “t” in Macs). Make sure that the Recalculate before save option is also turned off. If you see a check mark in the box, click on the box to turn off the feature. Note: The manual calculation mode also affects the auto fill function. When you click and drag, the new cells will look the same as the cells you are copying. Simply hit F9 (“apple” “+” “t” in Macs) to complete the auto fill. Simulation From the menu bar, go to Tools, then select Data Analysis, then Random Number Generation. In the window that opens, first select the type of Distribution: for this exercise select “Normal”. In the Number of Variables enter 10 (this is the number of observations per trial, or sample size), and in Number of Random Variables enter 100 (this is the number of trials, or samples). You will need to add a random seed, or everyone’s results will come out the same (if you remember the number, and reinsert it, EXCEL will generate the same series of random numbers, providing you keep all of the other options the same). Lastly, you need to give the mean and standard deviation of the distribution you are trying to simulate; remember that your “samples” are taken from a population with = 100 and = 25. The numbers that are generated will likely have a “General” format. You should go back and reformat them as “Number,” with 2 or 3 decimal places – number of decimal places isn’t that critical. Reformatting is done in the Format→Cells menu. Each sample of 10 observations will occupy a row on your EXCEL worksheet, utilizing columns A through J. After EXCEL generates 100 samples of size 10, insert a row or two at the top of the page, so you can add column labels. Or, you can tell EXCEL to start generating the numbers in cell A3. You should 1 complete this assignment using the same block of random numbers for all three parts of this simulation of sampling from the normal distribution. Part 1 This part shows that the sample mean x is an unbiased estimator of the population mean, . 1. At the end of each row (use column L, leave column K blank), use the Average function to determine the mean, x, of each of the 100 samples. Don’t forget the Fill handle! 2. Make a histogram of your 100 sample means: See instructions at the end of the document. How does you histogram compare to statement #1 of the Central Limit Theorem (p.82)? 3. Now at the bottom of your list of means in column L, find the mean of your 100 sample means: use the average function for the mean of all sample means. How does it compare to the population mean, ? How do the results of your simulation compare to statement #2 of the Central Limit Theorem (p.82)? 4. What conclusion can you draw from this experiment? Part II This part demonstrates that the sample variance, s2, is an unbiased estimator of the population variance 2. 1. For ease of interpretation, we will use both the =STDEV command and the =VAR command to obtain information on variability. For each of the 100 samples, calculate the standard deviation (s) in column M and the variance (s2) in column N, if you are using the proposed cell-location format. 2. Calculate the average standard deviation and average variance of the 100 samples that you generated. How does it compare to the population standard deviation, and variance, 2, (i.e. the value used in the Random Number Generation)? 3. Calculate the standard deviation and variance of the 100 sample means. How do the results of your simulation compare to statement #3 of the Central Limit Theorem (p.82)? 4. What is the standard deviation and variance for all 1000 observations? How do they compare to the values based on averaging the 100 sample standard deviations and variances? 5. Create a histogram of all 1000 observations following the instructions at the end of the document. Compare with the histogram of the means you created in step 2 of Part I. Pay attention to the scale of the x-axis! You may re-do the histogram of the means using the same range of bins that you used to make the histogram of all 1000 observations: now the difference should be clear! 6. What conclusion can you draw from this experiment? Part III (optional, but highly recommended!) This part demonstrates one of the fundamental concepts of statistical inference: the interval estimate based on a sample does not always include the value of the parameter we are trying to estimate! The objective of this simulation is to show that the interval estimates are 2 correct most of the time. We will use a 90% confidence interval estimate for our samples. The 90% confidence interval estimator work should be in columns O, P, and Q. 1. In column O, calculate the lower confidence limit (LCL) = x – 1.645(/n), e.g. “=M3- 1.645*(25/SQRT(10))”. Remember that x (x bar) is the mean of the sample for any given row, not the population mean. In this experiment, however, we assume that we know the population variance (2), so the standard deviation () is known, and is thus a constant, = 25. 2. In column P, calculate the upper confidence limit (UCL) = x + 1.645(/n), e.g. “=M3+1.645*(25/SQRT(10))”. 3. In column Q, use the =AND command e.g. “=AND($M$107>P3,$M$107<Q3)”, (instructor or stats aid can demonstrate) to have EXCEL determine if the population mean (i.e. µ = 100) is within the 90% confidence interval. 4. Use the =COUNTIF command to have EXCEL count the number of 90% confidence intervals that contain µ=100 (TRUE), e.g. “=COUNTIF(R3:R102, TRUE)”. Also count the number of times that µ doesn’t fall within the 90% confidence interval (FALSE), e.g. “=COUNTIF(R3:R102, FALSE)” to make sure that the sum of those that do and those that don’t = 100. 5. What conclusions can you draw from this experiment using the 90 % confidence interval? In the Workspace folder for our program, you will find a folder labeled Results. In that folder, there is an EXCEL file Estimation Simulation-program results. After you have completed your simulations, add your name and place your results in the appropriate columns. Creating a histogram in Excel a. You may let Excel create a column chart for you, using its default settings. From the menu bar go to Tools - Data Analysis - Histogram. In the dialog box, enter the input range for the “means” values. Leave the bin range blank. Make sure that you also select the Output range (the place where excel will display the frequency table of the histogram) or select New Worksheet Ply if you want to display these in a separate worksheet), and also check the box for Chart Output, so that Excel displays the histogram. In some situations, a chart like this could be useful, but you can improve the formatting by creating your own categories (bins), as described below. b. If you want to have more control over the quality of the histogram, you may want to create your own categories (bins). Use the =max and =min functions to determine the range of values of your data. This should aid in setting up your bins. Once again, you should have approximately n bins. In one column (name it “bins”) create a series of bins of 5, starting with the next “5” number above the minimum of your range of data (eg, if your minimum is 32, start at 35) and going up to the maximum of your range of data. When you make the histogram now, you can specify the cell range for the bins that you created in the “Bins” cell of the dialogue box, and the same data range for the “Input” cell as above (Excel sorts them for you!). Select Output range or New Worksheet Ply and check the box for Chart Output, as above. 3

DOCUMENT INFO

Shared By:

Categories:

Tags:
research methods, quantitative research, quantitative methods, qualitative research, qualitative and quantitative, qualitative methods, data analysis, social sciences, qualitative data, content analysis, statistical analysis, how to, quantitative analysis, quantitative data analysis, factor analysis

Stats:

views: | 12 |

posted: | 3/31/2010 |

language: | English |

pages: | 3 |

OTHER DOCS BY otm40420

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.