# Statistics

Document Sample

```					    Statistics
Chemistry
Tuesday January 3, 2006
Statistics
   Statistics are a way to test that any differences
in your data are a result of the variables you are
testing and not a result of random chance.
   There are different statistical tests that you can
run. The test you need depends on the type of
data that you collected.
which statistical test you will need to run on
Statistics
What does everything mean?
   Null hypothesis: This is a statement that is the antithesis (opposite) of
the hypothesis you are testing in your experiment.
   Confidence interval: This is the amount of certainty that you are willing to
accept in your experiment. For our purposes this should be set at 95%. In
other words you want to be 95% sure that the difference in your data is
caused by the variable you tested and only 5% sure that the differences are
due to random errors.
   The results of your statistical test will generate either a p-value or t value (others
are possible but these are the 2 most common). Your p-value should be
compared to the amount of error you set at the beginning of the experiment. For
our purposes this means 5% or 0.05. The t value should be compared to a
critical value (looked up on a table of values).
   If the p-value is <= 0.05, the null hypothesis is rejected (the differences in the
data are due to the tested variables).
   The calculated t value must be larger than the critical value for their to be a
significant difference between the data sets.

   In your appendix you can list the raw data and you should include a box
and whiskers plot.
Sample Statistics Section
   Your statistics discussion should go in the results section
   …A one-way analysis of variance test was used to determine the
significance between the cobalt(III) ion concentrations and the
measured absorbance. Statistical analyses were calculated using
StatView TMSE+ (Abacus Concepts, Inc., Berkeley, CA)
statistical software. The ANOVA test returned a p-value = 0.023
indicating that there was significant difference between the data.
Since the p-value was less than 0.05, the null hypothesis,
Cobalt(III) ion concentrations do not affect absorbance, can be
rejected. Bonferroni paired t-tests were performed to determine
which Cobalt (III) ion concentrations produced significantly
different results. The Bonferroni paired t-tests showed that
there was a significant different between the 0.1 M and 0.001 M
solutions only.
What Statistics Test Do I Use?
When your
data are
not normally-
Data comparisons you are making                                                             distributed, or
normally distributed                                             (possess
are ranks or
2 possible
scores
values)
Find the median,
Calculate a
You are studying one set of data               Find the mean, standard deviation         interquartile range
proportion
(Q3-Q1)
Run a x2 (chi-
Compare one set of data to a hypothetical value            Run a one-sample t-test            Run a Wilcoxon Test
square) test
Run a Fisher test,
Compare 2 sets of independently-collected data              Run a 2-sample t-test         Run a Mann-Whitney Test             or a x2 (chi-
square) test
Run a t-test on the differences
Compare 2 sets of data from the same subjects under                                                                         Run a McNemar's
between the data values (a        Run a Wilcoxon Test
different circumstances                                                                                            test
matched-pairs t-test)
Run a chi-square
Compare 3 or more sets of data                   Run a one-way ANOVA test          Run a Kruskal-Wallis test
test
Calculate
Calculate the Pearson Correlation    Calculate the Spearman             Contingency
Look for a relationship between 2 variables
coefficient                Correlation coefficient         Correlation
coefficients
Run a simple
Run a nonparametric linear
Look for a linear relationship between 2 variables          Run a linear regression                                              logistic
regression
regression
Run a nonparametric
Run a power, exponential, or           power, exponential,
Look for a non-linear relationship between 2 variables
regression
Run a multiple
Look for linear relationships between 1 dependent
Run a multiple linear regression                                         logistic
variable and 2 or more independent variables
regression
t-test
You read of a survey that claims that the average teenager
watches 25 hours of TV a week and you want to check
whether or not this is true in your school (too simple a
project!).
   Predicted value of the variable: the predicted 25 hours of
TV
   Variable under study: actual hours of TV watched
   Statistical test you would use: t-test
   Use this test to compare the mean values (averages) of
one set of data to a predicted mean value.
   Back to “What Statistics Test Do I Use?”.
2 Sample t-test
You grow 20 radish plants in pH=10.0 water and 20 plants in pH=3.0
water and measure the final mass of the leaves of the plants (too
simple an experiment!) to see if they grew better in one fluid than
in the other fluid.
   Independent variable: pH of the fluid in which the plants were
grown
   Dependent variable: plant biomass
   Statistical test you would use: 2-sample t-test
   Use this test to compare the mean values (averages) of two sets
of data.
   A Mann-Whitney test is a 2-sample t-test that is run on data that
are given rank numbers, rather than quantitative values. For
example, You want to compare the overall letter-grade GPA of
students in one class with the overall letter-grade GPA of students in
another class. You rank the data from low to high according to the
letter grade (here, A = 1, B = 2, C = 3, D = 4, E =5 might be your
rankings; you could also have set A = 5, B = 4, ...).
   Back to “What Statistics Test Do I Use?”.
Matched Pairs t-test
You give a math test to a group of students. Afterwards
you tell ? of the students a method of analyzing the
problems, then re-test all the students to see if use of
the method led to improved test scores.
   Independent variable: test-taking method (your method
vs. no imparted method)
   Dependent variable: (test scores after method - test
scores before method)
   Statistical test you would use: matched-pairs t-test
   Use this test to compare data from the same subjects
under two different conditions.
   Back to “What Statistics Test Do I Use?”.
ANalysis Of VAriance
You grow radish plants given pesticide-free water every other day, radish plants given a
5% pesticide solution every other day, and radish plants given a 10% pesticide
solution every other day, then measure the biomass of the plants after 30 days to
find whether there was any difference in plant growth among the three groups of
plants.
   Independent variable: pesticide dilution
   Dependent variable: plant biomass
   Statistical test you would use: ANOVA
   Use this test to compare the mean values (averages) of more than two sets of
data where there is more than one independent variable but only one
dependent variable. If you find that your data differ significantly, this says only
that at least two of the data sets differ from one another, not that all of your tested
data sets differ from one another.
   If your ANOVA test indicates that there is a statistical difference in your
data, you should also run Bonferroni paired t-tests to see which independent
variables produce significantly different results. This test essentially penalizes you
more and more as you add more and more independent variables, making it more
difficult to reject the null hypothesis than if you had tested fewer independent
variables.
   One assumption in the ANOVA test is that your data are normally-distributed (plot as
a bell curve, approximately). If this is not true, you must use the Kruskall-Wallis
test below.
   Back to “What Statistics Test Do I Use?”.
Kruskal-Wallis Test
You ask children, teens, and adults to rate their response to a set of statements,
where 1 = strongly agree with the statement, 2 = agree with the statement, 3
= no opinion, 4 = disagree with the statement, 5 = strongly disagree with the
statement, and you want to see if the answers are dependent on the age group
of the tested subjects.
   Independent variables: age groups of subject
   Dependent variable: responses of members of those age groups to your
statements
   Statistical test you would use: Kruskal-Wallis Test. Use this test to compare
the mean values (averages) of more than two sets of data where the data
are chosen from some limited set of values or if your data otherwise don't form
a normal (bell-curve) distribution. This example could also be done using a
two-way chi-square test.
An example of the Kruskal-Wallis Test for non-normal data is: You compare scores
of students on Math and English tests under different sicrumstances: no music
playing, Mozart playing, rock musing playing. When you score the tests, you
find in at least one case that the average.score is a 95 and the data do not form
a bell-shaped curve because there are no scores above 100, many scores in the
90s, a few in the 80s, and fewer still in the 70s, for example.
   Independent variables: type of background music
   Dependent variable: score on the tests , with at least one set of scores not
normally-distributed
   Back to “What Statistics Test Do I Use?”.
Wilcoxon Signed Rank Test
You think that student grades are dependent on the
number of hours a week students study. You collect
letter grades from students and the number of hours
each student studies in a week.
   Independent variables: hours studied
   Dependent variable: letter grade in a specific class
   Statistical test you would use: Wilcoxon Signed Rank
Test. Use this test to compare the mean values
(averages) of two sets of data, or the mean value of
one data set to a hypothetical mean, where the data are
ranked from low to high (here, A = 1, B = 2, C = 3, D =
4, E =5 might be your rankings; you could also have set
A = 5, B = 4, ...).
   Back to “What Statistics Test Do I Use?”.
Chi-Square Test
You ask subjects to rate their response to a set of statements that are provided with a
set of possible responses such as: strongly agree with the statement, agree with the
statement, no opinion, disagree with the statement, strongly disagree with the
statement.
  Independent variable: each statement asked
   Dependent variable: response to each statement
   Statistical test you would use: x2 (chi-square) test (the 'chi' is pronounced like the
'chi' in 'chiropracter') for within-age-group variations.
   For this test, typically, you assume that all choices are equally likely and test to find
whether this assumption was true. You would assume that, for 50 subjects tested, 10
chose each of the five options listed in the example above. In this case, your
observed values (O) would be the number of subjects who chose each response, and
your expected values (E) would be 10.
   The chi-square statistic is the sum of: (Observed value -Expected value)2 / Expected
value
   Use this test when your data consist of a limited number of possible values that your
data can have. Example 2: you ask subjects which toy they like best from a group
of toys that are identical except that they come in several different colors.
Independent variable: toy color; dependent variable: toy choice.
   McNemar's test is used when you are comparing some aspect of the subject with
that subject's response (i.e., answer to the survey compared to whether or not the
student went to a particular middle school). McNemar's test is basically the same as a
chi-square test in calculation and interpretation.
   Back to “What Statistics Test Do I Use?”.
Correlation Test
You look for a relationship between the size of a letter that
a subject can read at a distance of 5 meters and the
score that the subject achieves in a game of darts
(having had them write down their experience previously
at playing darts).
   Independent variable #1: vision-test result (letter size)
   Independent variable #2: darts score
   Statistical test you would use: Correlation (statistics:
r2 and r). The closer the values are to 1 the better the
correlation.
   Use this statistic to identify whether changes in one
independent variable are matched by changes in a
second independent variable. Notice that you didn't
change any conditions of the test, you only made two
separate sets of measurement.
   Back to “What Statistics Test Do I Use?”.
Linear Regression
You load weights on four different lengths of the same type
and cross-sectional area of wood to see if the maximum
weight a piece of the wood can hold is directly
dependent on the length of the wood.
   Independent variable: length of wood
   Dependent variable: weight that causes the wood to
break
   Statistical test you would use: Linear regression
(statistics: r2 and r) The closer the values are to 1 the
better the correlation.
   Fit a line to data having only one independent variable
and one dependent variable.
   Back to “What Statistics Test Do I Use?”.
Multiple Linear Regression
You load weights on four different lengths and four different
thicknesses of the same type of wood to see if the maximum weight
a piece of the wood can hold is directly dependent on the length
and thickness of the wood, and to find which is more important,
length or weight.
   Independent variables: length of wood, weight of wood
   Dependent variable: weight that causes the wood to break
   Statistical test you would use: Multiple Linear regression
(statistics: r2 and r) The closer the values are to 1 the better the
correlation.
   Fit a line to data having two or more independent variables and one
dependent variable.
   Back to “What Statistics Test Do I Use?”.
Power Regression
You load weights on strips of plastic trash bags to find how much the
plastic stretches from each weight. Research that you do indicates
that plastics stretch more and more as the weight placed on them
increases; therefore the data do not plot along a straight line.
   Independent variables: weight loaded on the plastic strip
   Dependent variable: length of the plastic strip
   Statistical test you would use: Power regression of the form y =
axb , or Exponential regression of the form y = abx , or
Quadratic regression of the form y = a + bx + cx2
(statistics: r2 and r)
   Fit a curve to data having only one independent variable and one
dependent variable.
   There are numerous polynomial regressions of this form, found on