# LAB #1c INTRODUCTION TO STATISTICAL TESTING

Document Sample

```					                              LAB #1c: INTRODUCTION TO STATISTICAL TESTING

By Rebecca C. Jann, 2000 (unpublished manuscript) in case you want to cite this in your lab report.

BACKGROUND THEORY. The point of most statistical testing in biology is to calculate the likelihood that
differences in experimental results are caused by chance alone. For example, a typical "If..., then...." statement for a
genetics hypothesis would be "If these offspring are the result of a dihybrid cross with Mendelian dominance, then
the phenotype ratio will be 9:3:3:1." However, the actual count of the numbers of individuals in each phenotype
class rarely computes to an EXACT ratio of 9:3:3:1. Thus, the experimenter must judge whether the actual results
differ significantly [as opposed to chance, luck, and sampling errors] from those predicted by the hypothesis. (If the
differences are NOT just due to chance, then maybe something biologically interesting is causing the differences.)
Here's another example: Your hypothesis is that microscopic plants are more abundant than microscopic
animals in RibbonWalk ponds. Your "if..., then....": If microscopic plants are more abundant than microscopic
animals in RibbonWalk ponds, then Bio103 students will count more green critters than non-green critters in their
slides. Then you collect their numbers so that you have data sort of like this:
Green critters per slide          Non-green critters per slide
0                                                           +1
+10                                                         +2
+3                                                          +5
+2                                                          +10
+4                                                          +4
0                                                           +1
+1                                                          +6
+1                                                          +5
+2                                                          +11
+6                                                          +8
Overall, it looks like the average slide had more non-green critters, but not in every individual slide. How do you
interpret these results? Are the two averages different enough to be considered significant? We need statistics to
help us decide. So read on to figure out ....

For all statistical testing, you must convert your common-sense biological hypothesis into a precise statistical
null hypothesis (Ho) suitable for testing-- like "the average IQ of seniors does not differ from the average IQ of
sophomores" or "the ratio of x does not differ from 9:3:3:1" or "the average green critters per
slide______________________________________." What you may be really trying to show could be the opposite
of the null hypothesis-- "sophomores are really smarter than seniors" or "this was not a dihybrid cross and the ratio is
not 9:3:3:1" or "The average green critter number per slide _____________________." Most scientists discuss
their common-sense biological hypotheses in the body (usually introduction and conclusion) of research reports
(grownup lab reports); but they formally state the statistical null hypothesis parenthetically or on the table, appendix,
or end-note where they present the results of the statistical analysis. A common student mistake is to insult the reader by expounding
on the meaning of the statistics themselves, information that real scientists already have mastered or at least would pretend to know. The grader will be able to

Most statistical tests of null hypotheses involve three statistics, all calculated by computers these days:
1. P
2. and  ( or DF = degrees of freedom)
3. and one other, either t or r or Z or 2 or F or some other statistic.

1
"P" in statistics represents the probability that the experimental differences could have been caused by chance alone.
If the P is .05 (=5%=1/20) or less, most scientists consider the results significantly different from the results
predicted by a null hypothesis. When the P is .05 or less, most scientists will REJECT the null hypothesis. When
the P is greater than .05, good scientists will FAIL TO REJECT (some arrogant or ignorant scientists will "accept")
the null hypothesis. Another way to think about P is that it represents the odds that you will be wrong if you reject
the null hypothesis. [Note: the probability that you will be right if you accept the Ho is unknowable; this is one
reason that cautious scientists say they "fail to reject the hypothesis.] In biological research publications
"significantly different" or * usually means P.05; "highly significant" or ** usually means P.01.

Here’s how Richard Dawkins describes “P” in his book Unweaving the Rainbow, pages 170-1, following
several pages describing an “experiment” in which he guesses whether a male or female has written each of a series
of papers and then explains how to evaluate whether he’s just guessing or whether he actually has some talent for
identifying sex differences by handwriting:
We have just performed what is technically called a test of statistical significance. We reasoned from first
principles, which made it rather tedious. In practice, research workers can call upon tales of probabilities and
distributions that have been previously calculated. We therefore don't literally have to write down all the possible ways
in which things could have happened. But the underlying theory, the basis upon which the tables were calculated
depends, in essence, upon the same fundamental procedure. Take the events that could have been obtained and throw
them down repeatedly at random. Look at the actual way the events occurred and measure how extreme it is, on the
spectrum of all possible ways in which they could have been thrown down.
Notice that a test of statistical significance does not prove anything conclusively. It can’t rule out luck as the
generator of the result that we observe. The best it can do is place the observed result on a par with a specified amount
of luck. In our particular hypothetical example, it was on a par with two out of 10,000 random guessers. When we say
that an effect is statistically significant, we must always specify a so-called p-value. This is the probability that a purely
random process would have generated a result at least as impressive as the actual result. A p-value of 2 in 10,000 is
pretty impressive, but it is still possible that there is no genuine pattern there. The beauty of doing a proper statistical
test is that we know how probable it is that there is no genuine pattern there.

We usually get the "P" from a computer (or from a formula or a table in a statistics book) which requires two
other values-- usually the degrees of freedom (DF or ) and a statistic (such as the value of t, Z, r, F, or Chi-square).
The computer will calculate both these values for you, too. In fact, EXCEL does not even report these values; it just
goes from your raw data directly to “P” without showing you the other values. All you have to know is which is the
correct statistic for your data and how to find it on the spreadsheet toolbar.
The correct statistic (t, F, or Chi-square) needed for determining the P depends on what kind of data and
hypothesis are involved. Even though statistics teachers in math departments recommend the Z-test, the t-test is one
of the most popular methods of biologists, psychologists, and opinion pollsters for comparing two groups
statistically. For example, "Are Queens seniors smarter than Queens juniors?" or "Is the population growth rate
higher without rats?" If you want to compare more than two groups [like seniors, juniors, and sophomores][or like
your critter, the suspected competitor rats, and another suspected competitor species], you cannot estimate P values
from repeated t-tests. Instead, good statisticians commonly use analysis of variance [ANOVA, ANOV, AOV] to
determine whether there is a difference among the groups. Then they use another test [Dunnett's or S-N-K] to
determine which specific group(s) is(are) different from (or less than) which other group(s). Later this semester,
we'll use other statistical tests, like chi-square (2) (for ratios and other numerical patterns) and correlation (for
variables which change together, like diet and heart disease or biodiversity and productivity).

During the semester, we will use several different methods; you will be expected to learn to judge when to
use which statistic. You will be expected to develop your own practical understanding of what "P" means, and you

2
will be asked to decide whether to reject many hypotheses. You will also have to interpret the statistics used by
professional ecologists in their publications.

In addition to the three main statistics (P, DF, and 2 or t or F or whatever) scientists use to test their null
hypotheses, we use many descriptive statistics to report our data. Additional types of statistics for a sample often
reported are the range, the variance [s2 or  for a population], the standard deviation [s or S.D. or STDEV], and
s x
the standard error of the mean [S.E. (s sub x-bar)], which may be lower than the standard deviation, especially if
the sample size is large. These all describe the "scatter" or variation among different sub-samples. Very often data
are reported as a "mean + S.E." or a "mean + S.D." [The mean of an entire statistical population (see Lab 1a) is  or
MU; the mean of a sample from this population is "x-bar": x with a line over it.] If one mean and its standard deviation
overlap another mean with its S.D., they're probably NOT significantly different. Most statistical tests work by
estimating the variation; when the variation is very high, the P rises. Often scientists report confidence limits (like
95% C.L.), an estimate of the probable range of possible values of the true difference if there is a statistically
"significant difference." Read your textbook pages 14-17 to make sure you understand variation, sample size,
standard errors, confidence limits, and sampling bias.

When you read professional research reports, be on the lookout for statistical terms and symbols like * and ** and P
and Ho and C.L. and SE. Finally, it's important to remember that terms like significant and correlated have precise
statistical meanings which are much narrower than their "lay" uses.

If you need more help, consult a professor or a statistics textbook. Here are some virtual references:
1. For explanations and examples with animated demos: http://www.ruf.rice.edu/~lane/stat_sim/
2. For explanations and really good links http://davidmlane.com/hyperstat/hypothesis_testing_se.html
3. For textbook details http://www.statsoft.com/textbook/stathome.html
4. For an explanation of the T-test: http://trochim.human.cornell.edu/kb/stat_t.htm

THIS WEEK'S LAB IS FOR ANALYZING AND INTERPRETING YOUR DATA FROM LAST WEEK'S
LAB. We will use the t-test this week. The computer will do all the calculations, but you will do all the real
thinking and interpretation and judgement. An example of how to use the EXCEL spreadsheet for the critter
data on the previous page is at               www.queens.edu/faculty/jannr/ecology/TtestExample.xls

3

Here is a simple hypothesis: "Team 1's estimates of coverage are higher than Team 2's estimates."
Our "If, then....." logic is "if Team 1's estimates of coverage are higher than Team 2's estimates, then their mean
quadrat estimates will be higher than the mean from Team 2's quadrat estimates." This "if..., then...." statement
summarizes our experimental procedure, and it suggests a null hypothesis for statistical testing:
"The average estimate of coverage per quadrat by team 1 is the same as the average estimate per quadrat by team 2."
Or, in statistical language,
Ho: 1 - 2 = 0 [or 1 = 2].
[= The mean number for coverage in all possible quadrats by team 1 equals the mean number for coverage in all
possible quadrats by team 2. There is no statistically significant difference in the estimates made by team 1 and team

The actual statistical test of this null hypothesis is            t = (x1 - x2) / (sx - x ). ......., but we will use EXCEL
spreadsheet to determine "P." Remember that the "P" represents the probability that you will be wrong if you reject
the Ho; most scientists will reject the Ho when P.05.
This statistical test, like all statistical tests, is a test of the Ho (the hypothesis that there is no difference....). The
alternative hypothesis is one which you "accept" if your statistical test leads you to reject Ho. The HA often is very
much like your "real" biological hypothesis; for example, "this pattern is different from a 9:3:3:1 ratio." The HA
should be stated before you do the experiment; for example, 1 > 2. Technical details only for those who really want to know: If you predict in
advance (a priori) that one mean will be greater [1 > 2.], then you can use a "one-tailed" statistical analysis of your data. A one- tailed test gives you a better chance to find a significant
difference. [This practice is often abused--another way to lie with statistics.]

What you should do is to examine the raw data sheets to see if you can find an interesting pattern.. You want to
compare this variety in two groups, like quadrats vs. line-intercept sampling or team 1 vs. team 2. Just pick
something, but no more than two simple groups. You can copy the numbers you need to two columns on your own
spreadsheet during the lab period. Then you will get help on how to make EXCEL help you analyze the data and
maybe even make graphs for your report.

sample report:
Variation in Estimates of Vegetation Coverage
Whoever U. Are
September 25, 2000

INTRODUCTION. Here you should state your "real" biological hypothesis and probably some background material
about why anyone would be interested in finding out about the accuracy (?) of sampling methods.or whatever.
(Textbook chapter 1 could be a useful resource.)

PROCEDURE. Here you should tell how you did the experiment in a way that another scientist could repeat it
exactly. You should describe [past tense, active voice, declarative mood] how you (or whoever) collected the
samples and gathered your data and which statistical test (one- or two-tailed t-test) and computer program
(Microsoft Excel in this case) you used for analyzing the data. Specify the methods for estimating the coverage.
Assume that you are writing for other professional ecologists who do not need to be told how you calculated the
statistical values or what they mean. (However, since actually you are a mere student, attach your raw data or any

4
calculations or computer print-outs as an appendix so that I may point out your errors.)

RESULTS. The best thing is to have a summary of the results in a sentence which refers to Table 1 or Figure 1
(whichever has the details of the results). Don't forget titles and labels. Be sure to report all appropriate statistics in
the text or at least say in the text which table, end-note, or whatever contains the results of the statistical analysis.
For this report, be sure to include the null hypothesis, means, standard deviations, and P. Attach raw data in an
appendix only. Don't screw up the flow of ideas with unimportant detail.

CONCLUSION. Most importantly, state whether you reject your null hypothesis and restate this conclusion in
terms of your "real" hypothesis. (For example, "I cannot reject the null hypothesis that the .... I conclude that the
offspring did result from a dihybrid cross.") Discuss some biological explanations for the results you found.

REFERENCES AND NOTES. List all bibliographic citations, technical specifications, acknowledgements,
footnotes, miscellany, and anything you don't want cluttering up the flowing prose of your text. Use an "end-note"
style with each numbered item corresponding to a subscripted or parenthetical numeral in your text.

For help in graphs, check

1. Logic and clarity, relationship of hypothesis to experimental results. The “if-then” summary. 50%
2. Style: proper format, grammar, spelling, clarity, like English composition grades. Good graphing. 25%
3. Other evidence of understanding. 15%
4. Whether you did the experiment right or at least noticed that you didn’t. 10%
5. (less than 1%) whether your results were what they should have been.

POLICY STATEMENT ON LAB REPORTS AND THE HONOR CODE
1. DO NOT fudge or invent results.
2. If you did not personally collect all the data yourself or if somebody helped you analyze or interpret your data,
give proper credit to your associates by name.
3. Similarly, if you did not personally invent a procedure, a statistical test or table, a computer program, an idea, a
concept, a definition, or a conclusion, cite the source in a proper bibliographic end-note (or other approved citation
style).
4. Do not ask any living humans [computers are o.k.] to check or proof-read your report for errors of spelling,
grammar, style, form, or content. However, it is o.k. to get somebody help you with statistical tests or computer
programs needed to analyze your data.

5

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 10 posted: 6/15/2010 language: English pages: 5