Document Sample

EPPL 612 – Bitto & Goff What’s yer handle, thar, good buddy? ________________________ What’s Normal About Normality? Lab: The Gaussian Distribution & the Central Limit Theorem As we have seen, some populations exhibit random or “Poisson” distributions. Others are patchy. Still others are uniform. One of the most common distributions for biological populations, however, is a so-called normal distribution (also known as a “Gaussian” distribution). A normal distribution has a bell shape. It typically shows up when one measures a single trait from all the members of a sample or a population of plants or animals. For example, imagine a small population of 100 male fiddler crabs from a local saltmarsh. Male fiddlers have one ordinary claw and one oversized claw or “fiddle,” which they use to attract females and for territorial battle with other males. If you measured the width of the fiddle of all 100 males in our imaginary population, you’d find that not all crabs have the same size claw. Most would have fiddles of a more or less “average” size, a few would have unusually small fiddles, and a few would have unusually large fiddles. When graphed, the size distribution would take the shape of a bell. Normal (Gaussian) distributions are extremely important. Let’s explore. Procedure 1. Do this with a partner. The table below lists the fiddle widths for an imaginary population of 100 male fiddler crabs. The claws range in size from 19 mm to 49 mm, with a mean width of 34 mm. Keep in mind, however, that the scientist does not normally know these values! (…unless she were to capture and measure every last crab, which she usually wouldn’t; instead she’d take a sub- sample from the population!) Fiddle Widths (mm) for a Population of 100 Fiddler Crabs 19 26 28 31 32 34 36 37 39 43 20 26 29 31 33 34 36 38 40 43 22 26 29 31 33 35 36 38 40 43 23 27 29 31 33 35 36 38 40 44 23 27 29 31 33 35 36 38 40 44 23 27 30 31 33 35 36 38 41 45 24 27 30 32 33 35 37 39 41 46 24 28 30 32 34 35 37 39 42 47 25 28 30 32 34 35 37 39 42 48 25 28 31 32 34 36 37 39 42 49 EPPL 612 Unit: Epistemology – How Do We Know What We Know? Bitto & Goff 2. To see a frequency distribution (also known as a histogram) of these 100 fiddles, launch Excel and open the file “Hypo Testing Lab.” Click on the worksheet tab named “Fiddle Frequency.” Notice the bell shape. The claws of most crabs are clustered near the mean size of 34 mm, while a few crabs have claw sizes toward the extremes or “tails” of the bell curve. In a true normal (or Gaussian) distribution, the mean (average), mode (most common value), and median (the centermost value) all coincide with one another, and the distribution is symmetrically balanced to the left and right of this mean/mode/median. By this definition, do the fiddler crabs appear (roughly) to belong to a true normal distribution? Explain. 3. With scissors cut out all 100 fiddle widths from the table above (between you and your partner, you only need one set of these 100 numbers). Crease each in half and dump them into a bucket or bowl. 4. Without looking, reach into your bucket or bowl, thoroughly shuffle and stir to randomize the contents, and then remove any 3 “crabs” from the population. You have just taken a sample of size N=3. In Excel, click on the worksheet tab named “Fiddle Calculations.” In column B, record the fiddle widths of your 3 crabs. Notice that Excel automatically calculates your estimate of the mean fiddle width. Now write this estimated mean in the table below. 5. Once again we are going to do some “hypersampling.” Toss your 3 crabs back in the bucket or bowl, stir it up, and again draw 3 crabs from the population. Now in Excel, highlight cells B3 to B5 and key in your 3 new fiddle widths (notice that these automatically replace the former values). Record your estimated mean below. Repeat the process 8 more times. (Keep in mind, once again, that normally a scientist would NOT repeat the sampling procedure in this manner. We are just doing this to gain an understanding of the sorts of estimates one gets with random sampling at different sample sizes.) 6. Finally, take 10 more random samples, but this time with a size of N=6. Record estimated means below. Estimated Mean at N = 3 Estimated Mean at N = 6 1 2 3 4 5 6 7 8 9 10 2 EPPL 612 Unit: Epistemology – How Do We Know What We Know? Bitto & Goff 7. Round your estimated means to the nearest EVEN whole number, and tally them on the chalkboard along with the estimates from all your classmates. Once all your classmates have put their tallies on the board, fill in the frequency columns in the table below. Estimate Means with N = 3 Estimate Means with N = 6 Estimate Frequency % Frequency Estimate Frequency % Frequency 22 22 24 24 26 26 28 28 30 30 32 32 34 34 36 36 38 38 40 40 42 42 44 44 46 46 8. Convert each raw frequency to a Percent Frequency (that is, a fraction of the total) using this formula. You can get Excel to do these calculations for you very quickly. Just follow the instructions below. Frequency of Estimated Mean Percent Frequency = 100% Total Number of Estimates Made How to get Excel to do all these calcs for you in a jiffy-jiff-jiff: (a) Open a fresh spreadsheet by going to Insert… Worksheet… (b) In column A, enter the estimate intervals 22 to 46 by twos. Here’s a nifty trick for doing this lickety-split: type 22 in the first cell, Enter, 24, Enter. Now highlight both of those first two cells, and release. Finally, grab the little black square in the lower right hand corner of the highlighted area, and drag it downward. When you release, Excel fills in the rest of the sequence for you! (c) In column B, enter the N=3 frequencies from the class. (d) Click in cell C1 (or whatever is your first empty cell). Type an = sign. Notice an = sign also appears in the Formula Window up top. Now tell Excel that this cell is going to equal 100 times the value in cell B1 (or whatever is your first cell with raw data) divided by the total number of estimates made. For example, if your class took a total of 140 samples, the formula would read 100 * B1 / 140. Hit Enter. (e) Now grab the little black square in the cell’s lower right hand corner, and drag it downward. Excel repeats the calculation for the rest of the frequencies! (f) Repeat the process in columns E and F for the N=6 frequencies. 3 EPPL 612 Unit: Epistemology – How Do We Know What We Know? Bitto & Goff 9. Finally, graph these Percent Frequencies as follows: (a) Highlight the Percent Frequencies in column C (if you typed headings at the top of each column, do NOT include those), then create a graph either by clicking the Chart Wizard icon up top or by selecting Chart from the Insert menu. Select XY (Scatter), and click on the style with smooth lines but no markers. Hit Finish. Drag the graph off to one side so that it doesn’t cover your numbers. (b) Now you want to add your N=6 tallies as a second plot. The easiest way to do this is to click on the chart, go up to the Chart menu, select Source Data… and click on the Series tab. Click the Add button. Now find the Y Values window and click the red, white, and blue icon to its right. This frees you to manually highlight the Percent Frequencies in column E that you wish to graph. Enter. OK. (c) Notice that your x-axis does not have the correct values (it’s just 1 to whatever). That’s because you haven’t told Excel what to use as the x-values. To do so, again go to the Chart menu… Source Data… Series tab. In the left hand window make sure “Series 1” is selected, then click the red, white, and blue icon beside the X Values window. Highlight the numbers in column A (no labels!). Enter. Now in the left hand window select “Series 2” and repeat the process. OK. (d) For better visibility, now, grab a corner of the graph and resize it to fill most of the spreadsheet area. (e) At this point, if desired, you can make some cosmetic changes to your graph. You can get rid of excess empty space by double clicking the x-axis and changing the Scale to a more narrow range like 20 to 50. You can also change fonts on either axis. You can change the thickness or colors of the plots by double clicking on either curve. You can change the color or pattern of the background. And so on. 10. You should also add titles. But before you do, let’s pause and take a moment to consider just what we’ve done here. What exactly have we graphed? The width of crab claws? No. We have graphed our repeated estimates of the mean claw width, based on random samples from the same crab population. Hypersampling! Therefore your graph does NOT represent a distribution of fiddle widths in the wild (you saw that graph earlier, remember?). Rather, it’s a Distribution of the Estimated Means. Now go to the Chart menu and select Chart Options… Titles tab. Entitle your graph “Distribution of the Means.” 11. While here, entitle the x-axis “Estimated Mean Fiddle Width” and entitle the y-axis “Percent Frequency.” Hit OK. You will also want to label your Legend with something other than “Series 1” and “Series 2.” To do so, go to Chart… Source Data… Series tab. Click on Series 1 in the left hand window, then click inside the “Name” window and type “N = 3.” Now select Series 2 and type “N = 6” in the “Name” window. 12. When your graph is finished, shout a loud “Whoop!” to call Mr. Goof over for his glowing, slobbering approval. 4 EPPL 612 Unit: Epistemology – How Do We Know What We Know? Bitto & Goff A point so important, it bears repeating: Your Distribution of the Means is NOT a graph of raw fiddle widths, but a graph of how frequently each estimated mean WOULD appear if you took repeated samples from the same crab population, over and over again (which you would never really do…). Do not tackle the questions below until you completely understand this!!! Final Questions The central limit theorem is one of the fundamental laws of statistics. It says that when a natural population of crabs (or whatever) belongs to a normal distribution, the distribution of the means will ALSO be normal. Regard your two plots of the distributions of the means for N=3 and N=6. Do they in fact appear to be roughly normal (bell-shaped and symmetrical, with mean/mode/median all coinciding)? Despite this similarity, in what respects are their shapes different? Remember (for the umpteenth time) that a scientist does NOT know the ACTUAL distribution that a natural population belongs to. Nor does she ever know the population’s “true” mean. But she DOES know that the bigger her sample size (N), the more CONFIDENT she can be that her ESTIMATED mean is reasonably accurate. The bigger the sample size, the more probable it is that the “true” mean falls somewhere “close by” the estimated mean. Let’s call this the “Probability of Closeness” rule. Consider your two graphs of the distribution of the means at N=3 and N=6. Does the difference in shape of these two plots obey the “Probability of Closeness” rule? Justify your answer. Now, a distribution of the means is really a probability distribution. It reflects the “likelihood” of coming up with such and such an estimate on any given sample of size N. It predicts the percent probability that you will get an estimate that is close to the “true” mean, and it predicts the percent probability that you could get a wild “fluke” estimate that is way off the mark. If you were to go back to your population of fiddler crabs, now, and randomly select six crabs, what would be the odds (approximately) that you would hit the true mean of 34 mm? What are the odds of getting a reasonably close estimate of, say, 36 mm? What are the odds of getting a fluke estimate of 40 mm? If instead you were to measure only three crabs, what are the odds of hitting the true mean of 34 mm? What are the odds of being close with 36 mm? What are the odds of getting a fluke of 40 mm? 5 EPPL 612 Unit: Epistemology – How Do We Know What We Know? Bitto & Goff 6

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 3 |

posted: | 12/5/2011 |

language: | Italian |

pages: | 6 |

OTHER DOCS BY wanghonghx

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.