ESM 206A
Statistics and Data Analysis for Environmental Science & Management
Winter 2010
NOTE: ESM 206 is a single 4-unit course spread across two quarters: five weeks/two units in
each of winter and spring. You will receive “IP” grades in winter. You need to complete all ten
weeks of the course to receive credit; the final grade you receive in spring will retroactively
apply to each of 206A and 206B.
Course objectives – ESM 206A
Students will learn how to use quantitative analysis of data to
1. Make decisions regarding compliance with environmental standards
2. Make predictions about the likely outcome of proposed policy or management actions
3. Assess the impact of past management actions or development projects
Students will learn to use the following tools:
1. Simple hypothesis testing, including t-tests
2. Ordinary least squares regression
3. Analysis of variance (ANOVA)
Prerequisites
I will expect that you have a working knowledge of the material covered by most undergraduate
statistics courses, and reviewed in the Fall statistics workshop:
Populations vs. samples
Population parameters vs. sample statistics
Descriptive statistics (mean, median, variance, range, covariance, correlation)
Basic plots of data (histograms, scatterplots, boxplots)
Basics of probability (random experiment, sample space, event, ways of combining
events, how to calculate joint probabilities of events, permutations, combinations,
conditional probability)
Random variables (continuous vs. discrete vs. categorical; probability density function
[PDF]; cumulative distribution function [CDF]; expected value)
Meaning and properties of the normal distribution
Meaning and properties of the binomial distribution
Meaning and properties of the t distribution
The basic process of hypothesis testing
How to calculate and interpret confidence intervals
How to compare means using a t test
If you did not attend the full workshop (or if you did but the above material looks unfamiliar),
please review the material posted at R:\Fall2009\stats_review, and for any concepts that you do
not feel thoroughly comfortable with, review the relevant sections of your undergraduate stats
text and/or the introductory chapters of the course texts (see below).
Logistics
Instructors
Office Phone email Office hours
Winter weeks 1-6: Mon 11-12, Tues 3-4,
Bruce Kendall Bren 4514 x7539 kendall@bren
Wed 4-5, or by appointment*
Shannon Hanna Bren 2045 skhanna@umail.ucsb.edu Mon 10-11, Fri 1:30-2:30
* To make an appointment, find an open time on my Corporate Time schedule, add a meeting, and send me an email
so I’m aware of it. Note that if you schedule something for the immediate future, I may not find out about it in time.
Class format
Lectures meet twice per week:
Winter quarter: MW 2:00-3:15, Weeks 1-5. No class Mon Jan. 18 (MLK holiday); instead
we meet Tues. Jan. 19, 12:30-1:45
In addition to pontificating from the front of the room, I will ask you questions, so come
prepared to think!
Labs meet once per week, in the SCF (Wednesday) or GIS lab (Thursday). There are three
sections:
Winter quarter: W 10:30-12:20; R 1:30-3:20; R 3:30-5:20, weeks 1-5
These provide you the opportunity to learn the nuts and bolts of running the analyses, as well as
a more interactive discussion of concepts.
The lab sections are quite full; if you need to switch sections on a continuing basis, please find
someone to swap with.
Assignments and grading
Learning to actually do statistical analysis requires practice, so you will be doing biweekly
homework assignments, due Friday at 5 PM on weeks 2, 4, and 6 (note: this may change if labs
start on week 2 instead of week 1). These homeworks will involve both conceptual questions
and quantitative analyses. Similar problems will be worked and discussed in lab.
There will be a total of six homeworks: three each in winter and spring. Your grade will be
based on your best five scores.
You may work on the homeworks in groups of two.
Readings
The primary text for this course is Statistical Methods in Water Resources by Helsel and Hirsch.
This will be available online, with a link on the ESM 206A GauchoSpace page. In contrast to
most commercial textbooks (which are also mostly very expensive), it is at a useful level, having
a higher information density than most undergraduate texts but not being so overwhelmingly
technical as most graduate texts. I have also provided links to some other online textbooks that
will only occasionally have assigned reading, but can provide a different perspective, as well as
providing background refresher in their earlier chapters (Dekker et al. provides a particularly
detailed treatment of the background material listed in the prerequisites section above). These
are available through a UCSB subscription, so can only be accessed from the ucsb domain; but
pdfs of individual chapters are downloadable.
I will provide additional links to useful web resources, as well as supplemental readings to
address topics not included in the text.
Software
You can do a lot of basic statistics in Excel, although working with larger datasets or complex
models can be awkward, some of the techniques we will be using aren’t available and sometimes
the answers it gives are wrong (unfortunately it’s hard to predict when that will happen)!
There are many commercial statistics programs that are both comprehensive and robust. These
include JMP and SAS (favored by biologists), STATA (favored by economists), SPSS (favored
by many other social scientists), and S-PLUS (formerly favored by statisticians). Unfortunately
these are all expensive, and, while all have some sort of GUI, they are not exactly intuitive to use.
Thus, I will not force you to learn a difficult software package that you may never use again.
The third option is the open-source program R (insert pirate joke here). This is robust and
comprehensive, and is the standard for professional statisticians. It is a command-line program,
and learning the syntax and remembering all the commands can be challenging. However,
having learned this, you can use it wherever you are: it is freely downloadable, and runs on
windows, mac, and linux computers. Also, you can save your analysis as a script file, so that the
analysis is repeatable (generally not possible in Excel!). I have provided links to a number of
online texts that describe how to use the software.
There are also many online tools available. These typically perform just one procedure (or a
family of procedures), and of course you have no way of knowing whether the programmer has
created them in a way that is correct and robust. But if you need to do something quickly and
don’t have access to anything else, they can be handy. See a good list at http://statpages.org/
Topics covered during lecture in 206A & 206B
Data and data management
Using hypothesis testing
Statistical decision making: power, confidence intervals, and acceptance regions
Projecting outcome: OLS regression
More on regression
Impact assessment: BACI, ANOVA, paired t-tests
Analyzing discrete dependent variables
Analyzing data when the residuals are not normally distributed
Survey design
Effectively reading literature
Bayesian Decision Theory
Multi-criteria Decision Analysis
Spatial statistics