ESM 206 Data Analysis for Environmental Science _ Management
Shared by: hcj
-
Stats
- views:
- 41
- posted:
- 5/22/2012
- language:
- English
- pages:
- 35
Document Sample


ESM 206A
Data Analysis for Environmental
Science & Management
Winter 2009
Hunter Lenihan and Nick Parker
Course Objectives
• Learn how to use • In ESM 206A, learn to
quantitative data analysis use the following tools
to: – hypothesis formulation
– Make decisions regarding and testing,
compliance with – t-tests, Analysis of
environmental standards or Variance (ANOVA)
performance of new – Ordinary Least Squares
management strategies regression
– Assess the impact of past • Single variable
management actions or • Multiple variable
development projects – Regression with discrete
– Make predictions about dependent variables (time
the likely outcome of permitting)
proposed policy or • Probit and Logit
management actions
Instructors
Office Phone email Office hours
Hunter Bren Leniha@ M 1-2, F 12-2, or by
x8629
Lenihah 3428 bren appointment*
Nick Bren dparker@
x8574 TBA
Parker 4508 bren
* To make an appointment, find an open time on my Corporate Time schedule,
add a meeting, and send me an email so I’m aware of it. Note that if you
schedule something for the immediate future, I may not find out about it in time.
Class format
• Lectures meet twice per week
• Winter quarter:
– T-TR 2:00-3:15
– Feb. 10,12,17,19, 24, 26
– Mar. 3, 6, 10, 12
• Labs meet once per week, in the GIS lab.
• There are 3 sections in Winter quarter:
- W 2:00-2:50, W 4:30-5:20, TR 8:30-9:20
- Weeks 7-10 (No lab the first week!)
Class format
• Labs
• These provide you the opportunity to learn the nuts and
bolts of running the analyses
• Will work on problem sets in lab
• The lab sections are usually quite full; if you need to
switch sections on a continuing basis, please find
someone to swap with.
Micro-exam
• A relatively short assignment that you will turn in
for a grade.
• Typically one extended problem that involves
both conceptual and technical aspects
• Treat as a take-home exam: once you open it,
no help from peers or instructors (but can use
notes, books, online resources)
• One micro-exam this quarter – due 4 PM,
Friday, 12 March
Text
• Statistical Methods in Water Resources by
Helsel and Hirsch
– PDF available on course webpage
– Individual chapters (smaller files) at
http://pubs.usgs.gov/twri/twri4a3/html/pdf_new.html
• Other readings/links will be posted as
appropriate
Computing
• Excel can be used for simple analysis, using the
Analysis ToolPak
• We will mostly use JMP
– JMP is comprehensive, reliable, but expensive (but UCSB pays!)
– JMP provides a somewhat user-friendly interface to it
– Based on SAS, worlds most more powerful stat program/system
• The course webpage is at
http://www.bren.ucsb.edu/academics/course.asp?number=206A
– We will use this page for the whole year
Definition of statistics
• Mathematical science pertaining to the
collection, analysis, interpretation or
explanation, and presentation of data
• Provides tools for prediction and
forecasting based on data
• Applicable to all fields of science
Definition of statistics
• Descriptive statistics = methods used to
summarize or describe a collection of data
• Inferential statistics = patterns in data are
modeled in a way that accounts for
randomness and uncertainty in the
observations. Then used to draw
inferences about the process or population
being studied.
Definition of statistics
• Descriptive and inferential statistics = applied
statistics
– Three basic types
• Monte Carlo analysis: minimal assumptions about data;
Uses randomizations of observed data as basis for inference
• Parametric analysis: Assumes data were sampled from an
underlying distribution of known form (“normal”); Estimates the
parameters of the distribution from the data; Estimates probabilities
from observed frequencies of events and uses probablities as a
basis for inference (frequentist inference)
• Bayesian analysis: Also assumes the data were sampled
from an underlying distribution of known form. Estimates
parameters not only from data but also from prior knowledge, and
assigns probablities to these parameters
Definition of statistics
• Mathematical statistics = concerned with
the theoretical basis of the subject
Applied statistics
• Common goal : investigate causality
• Draw conclusions on the effect of changes in the values
of predictors (independent variables) on dependent
variables (response)
• Y = α + βX
• There are two major types of investigations (studies):
experimental and observational
• Difference between the two types = how the study is
conducted. Each can be very effective.
Tools to learn
Experimental studies Observational studies
•Hypothesis formulation •Hypothesis formulation
and testing and testing
• t-tests • t-tests
• Analysis of Variance (ANOVA) • Analysis of Variance (ANOVA)
• Ordinary Least Squares regression • Ordinary Least Squares regression
• Multiple regression • Multiple regression
• Probit and Logit regression • Probit and Logit regression
Some notation and formulas
• For a random variable called x, the sample
statistics are: n
1
– Mean x xi
1 n
n i 1
xi x
2
– Variance sx
2
n 1 i 1
– Standard deviation sx sx
2
• The population statistics are called
– Mean x
– Variance x2
– Standard deviation x
Data for examples in lectures
CA spiny lobster
Panulirus interruptus
Data from traps used by lobster fishery
• Number of lobster per trap
• Length of lobsters in trap
www.calobtser.org
Matt Kay – Bren PhD student (Kay@lifesci.ucsb.edu)
Fishery
Commercial Lobster Landings
600
Santa Barbara
500 Total California
Lobster in tonnes
400
300
200
100
0
1935-36 1945-46 1955-56 1965-66 1975-76 1985-86 1995-96 2005-06
Season
Hypothesis formulation and
testing
Karl Raimund Popper (1902-1994)
• Most influential philosophers of science of the 20th century
• Professor at the London School of Economics
• Repudiated classical observationalist / inductivist
approach to science as the only method
• Advanced empirical falsification
• Vigorous defense of liberal democracy & principles of social
criticism
The Scientific Method - from a Popperian perspective
1. Conception - Inductive reasoning
a. Observations
b. Theory Conception - Inductive reasoning
c. Problem Perceived Problem
d. Belief
Previous Observations Belief
2. Leads to Insight and a General INSIGHT
Hypothesis
Existing Theory
General hypothesis
3. Assessment is done by
a. Formulating Specific Specific hypotheses
Confirmation (and predictions) Falsification
hypotheses
b. Comparison with new
observations
Comparison with
new observations
4. Which leads to:
a. Falsification - and rejection
Assessment - Deductive reasoning
of insight, and specific and
general
hypotheses, or
b. Confirmation - and
retesting of
alternative
hypotheses
Absolute vs. measured differences
Example - Specific hypothesis – number of Oak seedlings is higher in
areas outside oil polluted (impacted) sites than inside polluted sites
Observation 1: Observation 2:
Number inside Number outside Number inside Number outside
0 10 3 10
0 15 5 7
0 18 2 9
0 12 8 12
0 13 7 8
Mean 0 13 Mean 5 9.2
What counts as a difference?
Are these different?
Statistical analysis - cause, probability, and effect
Specific hypothesis HA:
Number of oak seedlings is
greater in areas outside impact
sites than inside impact sites
II) Statistical Methodology – General Conception - Inductive reasoning
A) Null hypotheses – Ho Perceived Problem
1) In most sciences, we are faced with:
Previous Observations Belief
a) NOT whether something is true or
false (Popperian decision) INSIGHT
b) BUT rather the degree to which an Existing Theory
effect exists (if at all) - a statistical General hypothesis
decision.
Specific hypotheses
B) Therefore 2 competing statistical Confirmation (and predictions) Falsification
hypotheses are posed: HA rejected
a) HA: there is a difference in effect HA supported (accepted) H O supported (accepted)
HO rejected
between (usually posed as < or >) Comparison with
b) HO: there is no difference in effect new observations
between
Assessment - Deductive reasoning
Statistical analysis - cause, probability, and effect
Statistical analysis - cause, probability, and effect
The logic of statistical tests – how they are performed
1) Assume the Null hypothesis (Ho) is true (e.g., no difference in number of
oak seedlings in impact and non-impact sites).
2) Compare measurements - generally this means comparing two sample
distributions (determined from the experiment or survey)
A) Comparison of distribution – generally by comparing means and the
estimate of error associated with the sampling of the means. Simplest
case is the Standard Error of the Mean (SE or SEM) = Sx
STDEV (sx) / n.5 = standard deviation / square root of level of replication
3) Determine the probability that distributions are similar/different
4) Compare with a critical p-value to assign significance
What is a distribution (around a mean)?
Calculation of statistical distribution
Distribution of Oak Seedlings - pre-impact or non-polluted site
VALUES
50
45
40
35
30
25
20
15
10
5
0
Sites = 100
Mean per site = 25
Total seedlings = 2500
Evaluate effect of sample size on calculation of
and confidence in Mean
Compare for sample size's of 5,10, 20, 50, 99
cells
Iterate 50 times
23 36
36
22 27 19 24 Example: sample 10 sites to
28 determine mean
41 28 25
12
25 x fifty
33 23 21 17 16 iterations
40
31
X=21.5 X=25.8
True Mean = 25 Means
21.5
36 22.3
Number of cases
22 27 19 23.0
41
23.9
12 24.9
25
33 23 25.1
25.8
31
26.5
Estimate of Mean
Mean = 21.5 27.8
29.9
23 36
etc
24
28
28 25
21 17 16
40
Mean = 25.8
Effect of number of observations on estimate of Mean
Frequency distributions of sample 16
16 0.3
means
Proportion per Bar
12
# of cases
Number of Cases
0.2
12
8
Numberof cases
0.1
4
8
0 0.0
15 20 25 30 35
Estimate of Mean
2.5%
4 2.5%
2 SE 2 SE
Mean based on
X observations
0
15 20 25 30 35 5
Estimate ofMean
Estimate of
Mean 10
20
50
99
Statistical comparison of two sample distributions
Ho: X 1 = X 2
HA: X 1 ≠ X 2
X X 2
1
How to estimate optimal sample size
1) Do a preliminary study of
variables that will be Mean
evaluated in project Standard Deviation
30
2) Plot the mean and some
estimate of variability of data
25
as a function of sample size
20
3) Look for “sweet spot” where
Value
estimates of mean and 15
variance (or standard
deviation) converge on a 10
stable value
5
4) Calculate a “bang for buck”
relationship to determine if a 0
0 10 20 30 40 50 60 70 80 90 100
robust design (sufficient SAMPLES
sampling) can be paid for
Trade off between accuracy and cost
Cost
Accuracy
* +
Bang per
buck -
Sample Size – e.g. number of replicate sites
Get documents about "