Unit_ lesson 8_ normality_ lab guide

Document Sample
Unit_ lesson 8_ normality_ lab guide Powered By Docstoc
					EPPL 612 – Bitto & Goff                          What’s yer handle, thar, good buddy? ________________________

                             What’s Normal About Normality?
                 Lab: The Gaussian Distribution & the Central Limit Theorem
                             As we have seen, some populations exhibit random or “Poisson”
                             distributions. Others are patchy. Still others are uniform. One of
                             the most common distributions for biological populations, however,
                             is a so-called normal distribution (also known as a “Gaussian”
                             distribution). A normal distribution has a bell shape. It typically
                             shows up when one measures a single trait from all the members of
                             a sample or a population of plants or animals. For example,
                             imagine a small population of 100 male fiddler crabs from a local
                             saltmarsh. Male fiddlers have one ordinary claw and one oversized
                             claw or “fiddle,” which they use to attract females and for territorial
                             battle with other males. If you measured the width of the fiddle of
                             all 100 males in our imaginary population, you’d find that not all
        crabs have the same size claw. Most would have fiddles of a more or less “average”
        size, a few would have unusually small fiddles, and a few would have unusually large
        fiddles. When graphed, the size distribution would take the shape of a bell. Normal
        (Gaussian) distributions are extremely important. Let’s explore.
        1. Do this with a partner. The table below lists the fiddle widths for an imaginary
           population of 100 male fiddler crabs. The claws range in size from 19 mm to 49
           mm, with a mean width of 34 mm. Keep in mind, however, that the scientist
           does not normally know these values! (…unless she were to capture and
           measure every last crab, which she usually wouldn’t; instead she’d take a sub-
           sample from the population!)

        Fiddle Widths (mm) for a Population of 100 Fiddler Crabs
        19         26      28        31       32        34        36        37         39        43
        20         26      29        31       33        34        36        38         40        43
        22         26      29        31       33        35        36        38         40        43
        23         27      29        31       33        35        36        38         40        44
        23         27      29        31       33        35        36        38         40        44
        23         27      30        31       33        35        36        38         41        45
        24         27      30        32       33        35        37        39         41        46
        24         28      30        32       34        35        37        39         42        47
        25         28      30        32       34        35        37        39         42        48
        25         28      31        32       34        36        37        39         42        49
EPPL 612                  Unit: Epistemology – How Do We Know What We Know?               Bitto & Goff

      2. To see a frequency distribution (also known as a histogram) of these 100
         fiddles, launch Excel and open the file “Hypo Testing Lab.” Click on the worksheet
         tab named “Fiddle Frequency.” Notice the bell shape. The claws of most crabs are
         clustered near the mean size of 34 mm, while a few crabs have claw sizes toward
         the extremes or “tails” of the bell curve. In a true normal (or Gaussian)
         distribution, the mean (average), mode (most common value), and median (the
         centermost value) all coincide with one another, and the distribution is symmetrically
         balanced to the left and right of this mean/mode/median.
             By this definition, do the fiddler crabs appear (roughly) to belong to a true
             normal distribution? Explain.

      3. With scissors cut out all 100 fiddle widths from the table above (between you and
         your partner, you only need one set of these 100 numbers). Crease each in half and
         dump them into a bucket or bowl.
      4. Without looking, reach into your bucket or bowl, thoroughly shuffle and stir to
         randomize the contents, and then remove any 3 “crabs” from the population. You
         have just taken a sample of size N=3. In Excel, click on the worksheet tab named
         “Fiddle Calculations.” In column B, record the fiddle widths of your 3 crabs. Notice
         that Excel automatically calculates your estimate of the mean fiddle width. Now
         write this estimated mean in the table below.
      5. Once again we are going to do some “hypersampling.” Toss your 3 crabs back in
         the bucket or bowl, stir it up, and again draw 3 crabs from the population. Now in
         Excel, highlight cells B3 to B5 and key in your 3 new fiddle widths (notice that these
         automatically replace the former values). Record your estimated mean below.
         Repeat the process 8 more times. (Keep in mind, once again, that normally a
         scientist would NOT repeat the sampling procedure in this manner. We are just
         doing this to gain an understanding of the sorts of estimates one gets with random
         sampling at different sample sizes.)
      6. Finally, take 10 more random samples, but this time with a size of N=6. Record
         estimated means below.
                        Estimated Mean at N = 3               Estimated Mean at N = 6

EPPL 612                      Unit: Epistemology – How Do We Know What We Know?                 Bitto & Goff

      7. Round your estimated means to the nearest EVEN whole number, and tally them on
         the chalkboard along with the estimates from all your classmates. Once all your
         classmates have put their tallies on the board, fill in the frequency columns in the
         table below.

                 Estimate Means with N = 3                          Estimate Means with N = 6
       Estimate         Frequency       % Frequency          Estimate      Frequency    % Frequency
          22                                                    22
          24                                                    24
          26                                                    26
            28                                                 28
            30                                                 30
            32                                                 32
            34                                                 34
            36                                                 36
            38                                                 38
            40                                                 40
            42                                                 42
            44                                                 44
            46                                                 46

      8. Convert each raw frequency to a Percent Frequency (that is, a fraction of the total)
         using this formula. You can get Excel to do these calculations for you very quickly.
         Just follow the instructions below.
                                              Frequency of Estimated Mean
           Percent Frequency        =                                                     100%
                                             Total Number of Estimates Made

      How to get Excel to do all these calcs for you in a jiffy-jiff-jiff:
         (a) Open a fresh spreadsheet by going to Insert… Worksheet…
         (b) In column A, enter the estimate intervals 22 to 46 by twos. Here’s a nifty trick
              for doing this lickety-split: type 22 in the first cell, Enter, 24, Enter. Now
              highlight both of those first two cells, and release. Finally, grab the little black
              square in the lower right hand corner of the highlighted area, and drag it
              downward. When you release, Excel fills in the rest of the sequence for you!
         (c) In column B, enter the N=3 frequencies from the class.
         (d) Click in cell C1 (or whatever is your first empty cell). Type an = sign. Notice
              an = sign also appears in the Formula Window up top. Now tell Excel that this
              cell is going to equal 100 times the value in cell B1 (or whatever is your first
              cell with raw data) divided by the total number of estimates made. For
              example, if your class took a total of 140 samples, the formula would read 100
              * B1 / 140. Hit Enter.
         (e) Now grab the little black square in the cell’s lower right hand corner, and drag
              it downward. Excel repeats the calculation for the rest of the frequencies!
         (f)    Repeat the process in columns E and F for the N=6 frequencies.

EPPL 612                   Unit: Epistemology – How Do We Know What We Know?                  Bitto & Goff

      9. Finally, graph these Percent Frequencies as follows:
           (a) Highlight the Percent Frequencies in column C (if you typed headings at the top
               of each column, do NOT include those), then create a graph either by clicking
               the Chart Wizard icon up top or by selecting Chart from the Insert menu.
               Select XY (Scatter), and click on the style with smooth lines but no markers.
               Hit Finish. Drag the graph off to one side so that it doesn’t cover your
           (b) Now you want to add your N=6 tallies as a second plot. The easiest way to do
               this is to click on the chart, go up to the Chart menu, select Source Data…
               and click on the Series tab. Click the Add button. Now find the Y Values
               window and click the red, white, and blue icon to its right. This frees you to
               manually highlight the Percent Frequencies in column E that you wish to graph.
               Enter. OK.
           (c) Notice that your x-axis does not have the correct values (it’s just 1 to
               whatever). That’s because you haven’t told Excel what to use as the x-values.
               To do so, again go to the Chart menu… Source Data… Series tab. In the
               left hand window make sure “Series 1” is selected, then click the red, white,
               and blue icon beside the X Values window. Highlight the numbers in column
               A (no labels!). Enter. Now in the left hand window select “Series 2” and
               repeat the process. OK.
           (d) For better visibility, now, grab a corner of the graph and resize it to fill most of
               the spreadsheet area.
           (e) At this point, if desired, you can make some cosmetic changes to your graph.
               You can get rid of excess empty space by double clicking the x-axis and
               changing the Scale to a more narrow range like 20 to 50. You can also
               change fonts on either axis. You can change the thickness or colors of the
               plots by double clicking on either curve. You can change the color or pattern
               of the background. And so on.
      10. You should also add titles. But before you do, let’s pause and take a moment
          to consider just what we’ve done here. What exactly have we graphed? The
          width of crab claws? No. We have graphed our repeated estimates of the mean
          claw width, based on random samples from the same crab population.
          Hypersampling! Therefore your graph does NOT represent a distribution of fiddle
          widths in the wild (you saw that graph earlier, remember?). Rather, it’s a
          Distribution of the Estimated Means. Now go to the Chart menu and select
          Chart Options… Titles tab. Entitle your graph “Distribution of the Means.”
      11. While here, entitle the x-axis “Estimated Mean Fiddle Width” and entitle the y-axis
          “Percent Frequency.” Hit OK. You will also want to label your Legend with
          something other than “Series 1” and “Series 2.” To do so, go to Chart… Source
          Data… Series tab. Click on Series 1 in the left hand window, then click inside the
          “Name” window and type “N = 3.” Now select Series 2 and type “N = 6” in the
          “Name” window.
      12. When your graph is finished, shout a loud “Whoop!” to call Mr. Goof over for his
          glowing, slobbering approval.

EPPL 612                  Unit: Epistemology – How Do We Know What We Know?              Bitto & Goff

                         A point so important, it bears repeating:
           Your Distribution of the Means is NOT a graph of raw fiddle widths,
           but a graph of how frequently each estimated mean WOULD appear
            if you took repeated samples from the same crab population, over
           and over again (which you would never really do…). Do not tackle
                the questions below until you completely understand this!!!

      Final Questions
      The central limit theorem is one of the fundamental laws of statistics. It says that
      when a natural population of crabs (or whatever) belongs to a normal distribution, the
      distribution of the means will ALSO be normal. Regard your two plots of the
      distributions of the means for N=3 and N=6. Do they in fact appear to be roughly
      normal (bell-shaped and symmetrical, with mean/mode/median all coinciding)? Despite
      this similarity, in what respects are their shapes different?

      Remember (for the umpteenth time) that a scientist does NOT know the ACTUAL
      distribution that a natural population belongs to. Nor does she ever know the
      population’s “true” mean. But she DOES know that the bigger her sample size (N), the
      more CONFIDENT she can be that her ESTIMATED mean is reasonably accurate. The
      bigger the sample size, the more probable it is that the “true” mean falls somewhere
      “close by” the estimated mean. Let’s call this the “Probability of Closeness” rule.
      Consider your two graphs of the distribution of the means at N=3 and N=6. Does the
      difference in shape of these two plots obey the “Probability of Closeness” rule? Justify
      your answer.

      Now, a distribution of the means is really a probability distribution. It reflects the
      “likelihood” of coming up with such and such an estimate on any given sample of size N.
      It predicts the percent probability that you will get an estimate that is close to the
      “true” mean, and it predicts the percent probability that you could get a wild “fluke”
      estimate that is way off the mark. If you were to go back to your population of fiddler
      crabs, now, and randomly select six crabs, what would be the odds (approximately) that
      you would hit the true mean of 34 mm? What are the odds of getting a reasonably
      close estimate of, say, 36 mm? What are the odds of getting a fluke estimate of 40

      If instead you were to measure only three crabs, what are the odds of hitting the true
      mean of 34 mm? What are the odds of being close with 36 mm? What are the odds of
      getting a fluke of 40 mm?

EPPL 612   Unit: Epistemology – How Do We Know What We Know?   Bitto & Goff


Shared By: