ESM 206 Data Analysis for Environmental Science _ Management

Shared by: hcj
Categories
Tags
-
Stats
views:
41
posted:
5/22/2012
language:
English
pages:
35
Document Sample
scope of work template
							          ESM 206A
Data Analysis for Environmental
   Science & Management

            Winter 2009
   Hunter Lenihan and Nick Parker
             Course Objectives
• Learn how to use                •   In ESM 206A, learn to
  quantitative data analysis          use the following tools
  to:                                 – hypothesis formulation
   – Make decisions regarding           and testing,
     compliance with                  – t-tests, Analysis of
     environmental standards or         Variance (ANOVA)
     performance of new               – Ordinary Least Squares
     management strategies              regression
   – Assess the impact of past           •   Single variable
     management actions or               •   Multiple variable
     development projects             – Regression with discrete
   – Make predictions about             dependent variables (time
     the likely outcome of               permitting)
     proposed policy or                  •   Probit and Logit
     management actions
                             Instructors

             Office      Phone email                        Office hours


Hunter       Bren              Leniha@              M 1-2, F 12-2, or by
                         x8629
Lenihah      3428              bren                 appointment*

Nick         Bren                   dparker@
                         x8574                      TBA
Parker       4508                   bren

 * To make an appointment, find an open time on my Corporate Time schedule,
 add a meeting, and send me an email so I’m aware of it. Note that if you
 schedule something for the immediate future, I may not find out about it in time.
                Class format
• Lectures meet twice per week

• Winter quarter:
  – T-TR 2:00-3:15
  – Feb. 10,12,17,19, 24, 26
  – Mar. 3, 6, 10, 12

• Labs meet once per week, in the GIS lab.

• There are 3 sections in Winter quarter:
   - W 2:00-2:50, W 4:30-5:20, TR 8:30-9:20
   - Weeks 7-10 (No lab the first week!)
                   Class format
• Labs

• These provide you the opportunity to learn the nuts and
  bolts of running the analyses

• Will work on problem sets in lab

• The lab sections are usually quite full; if you need to
  switch sections on a continuing basis, please find
  someone to swap with.
                 Micro-exam
• A relatively short assignment that you will turn in
  for a grade.
• Typically one extended problem that involves
  both conceptual and technical aspects
• Treat as a take-home exam: once you open it,
  no help from peers or instructors (but can use
  notes, books, online resources)
• One micro-exam this quarter – due 4 PM,
  Friday, 12 March
                        Text
• Statistical Methods in Water Resources by
  Helsel and Hirsch
  – PDF available on course webpage
  – Individual chapters (smaller files) at
    http://pubs.usgs.gov/twri/twri4a3/html/pdf_new.html
• Other readings/links will be posted as
  appropriate
                     Computing
• Excel can be used for simple analysis, using the
  Analysis ToolPak

• We will mostly use JMP
   – JMP is comprehensive, reliable, but expensive (but UCSB pays!)
   – JMP provides a somewhat user-friendly interface to it
   – Based on SAS, worlds most more powerful stat program/system

• The course webpage is at
  http://www.bren.ucsb.edu/academics/course.asp?number=206A
   – We will use this page for the whole year
        Definition of statistics
• Mathematical science pertaining to the
  collection, analysis, interpretation or
  explanation, and presentation of data

• Provides tools for prediction and
  forecasting based on data

• Applicable to all fields of science
        Definition of statistics
• Descriptive statistics = methods used to
  summarize or describe a collection of data

• Inferential statistics = patterns in data are
  modeled in a way that accounts for
  randomness and uncertainty in the
  observations. Then used to draw
  inferences about the process or population
  being studied.
          Definition of statistics
• Descriptive and inferential statistics = applied
  statistics
  – Three basic types
     • Monte Carlo analysis: minimal assumptions about data;
       Uses randomizations of observed data as basis for inference


     • Parametric analysis: Assumes data were sampled from an
       underlying distribution of known form (“normal”); Estimates the
       parameters of the distribution from the data; Estimates probabilities
       from observed frequencies of events and uses probablities as a
       basis for inference (frequentist inference)


     • Bayesian analysis: Also assumes the data were sampled
       from an underlying distribution of known form. Estimates
       parameters not only from data but also from prior knowledge, and
       assigns probablities to these parameters
       Definition of statistics

• Mathematical statistics = concerned with
  the theoretical basis of the subject
               Applied statistics
• Common goal : investigate causality

• Draw conclusions on the effect of changes in the values
  of predictors (independent variables) on dependent
  variables (response)

• Y = α + βX

• There are two major types of investigations (studies):
  experimental and observational

• Difference between the two types = how the study is
  conducted. Each can be very effective.
                        Tools to learn

Experimental studies                  Observational studies

•Hypothesis formulation             •Hypothesis formulation
   and testing                         and testing

• t-tests                           • t-tests

• Analysis of Variance (ANOVA)      • Analysis of Variance (ANOVA)

• Ordinary Least Squares regression • Ordinary Least Squares regression

• Multiple regression               • Multiple regression

• Probit and Logit regression       • Probit and Logit regression
   Some notation and formulas
• For a random variable called x, the sample
  statistics are: n
               1
  – Mean x   xi
                              1 n
               n i 1
                                   xi  x 
                                              2
  – Variance           sx 
                        2

                            n  1 i 1
  – Standard deviation                    sx  sx
                                                2


• The population statistics are called
  – Mean      x
  – Variance       x2
  – Standard deviation  x
Data for examples in lectures



             CA spiny lobster
             Panulirus interruptus
Data from traps used by lobster fishery




                        • Number of lobster per trap

                        • Length of lobsters in trap
                                                 www.calobtser.org




Matt Kay – Bren PhD student (Kay@lifesci.ucsb.edu)
                                               Fishery


                                         Commercial Lobster Landings
                    600
                                                                                  Santa Barbara
                    500                                                           Total California
Lobster in tonnes




                    400


                    300


                    200


                    100


                     0
                     1935-36   1945-46    1955-56   1965-66   1975-76   1985-86   1995-96     2005-06
                                                        Season
Hypothesis formulation and
         testing
Karl Raimund Popper (1902-1994)

     • Most influential philosophers of science of the 20th century


     • Professor at the London School of Economics

     • Repudiated classical observationalist / inductivist
       approach to science as the only method

     • Advanced empirical falsification

     • Vigorous defense of liberal democracy & principles of social
       criticism
The Scientific Method - from a Popperian perspective
1.   Conception - Inductive reasoning
          a. Observations
          b. Theory                               Conception - Inductive reasoning
          c. Problem                                                   Perceived Problem
          d. Belief
                                                    Previous Observations                    Belief

2.    Leads to Insight and a General                                       INSIGHT
      Hypothesis
                                                    Existing Theory

                                                                      General hypothesis
3.    Assessment is done by
          a. Formulating Specific                                      Specific hypotheses
                                              Confirmation             (and predictions)      Falsification
      hypotheses
           b. Comparison with new
                        observations
                                                                       Comparison with
                                                                       new observations
4.    Which leads to:
           a. Falsification - and rejection
                                                   Assessment - Deductive reasoning
              of insight, and specific and
                           general
      hypotheses, or

            b. Confirmation - and
                         retesting of
      alternative
      hypotheses
                         Absolute vs. measured differences




   Example - Specific hypothesis – number of Oak seedlings is higher in
   areas outside oil polluted (impacted) sites than inside polluted sites

       Observation 1:                                     Observation 2:
         Number inside            Number outside            Number inside   Number outside
            0                        10                        3               10
            0                        15                        5                7
            0                        18                        2                9
            0                        12                        8               12
            0                        13                        7                8
Mean        0                        13            Mean        5               9.2

                           What counts as a difference?
                              Are these different?
                   Statistical analysis - cause, probability, and effect



                                                                   Specific hypothesis HA:
                                                                   Number of oak seedlings is
                                                                   greater in areas outside impact
                                                                   sites than inside impact sites

II) Statistical Methodology – General                                 Conception - Inductive reasoning
     A) Null hypotheses – Ho                                                               Perceived Problem
         1) In most sciences, we are faced with:
                                                                        Previous Observations                      Belief
             a) NOT whether something is true or
                false (Popperian decision)                                                     INSIGHT

             b) BUT rather the degree to which an                       Existing Theory
                effect exists (if at all) - a statistical                                 General hypothesis
                decision.
                                                                                           Specific hypotheses
     B) Therefore 2 competing statistical                       Confirmation               (and predictions)        Falsification
             hypotheses are posed:                                                                               HA rejected
             a) HA: there is a difference in effect         HA supported (accepted)                              H O supported (accepted)
                                                            HO rejected
                between (usually posed as < or >)                                          Comparison with
             b) HO: there is no difference in effect                                       new observations

                between
                                                                       Assessment - Deductive reasoning
Statistical analysis - cause, probability, and effect
  Statistical analysis - cause, probability, and effect

The logic of statistical tests – how they are performed

1) Assume the Null hypothesis (Ho) is true (e.g., no difference in number of
   oak seedlings in impact and non-impact sites).
2) Compare measurements - generally this means comparing two sample
   distributions (determined from the experiment or survey)
      A) Comparison of distribution – generally by comparing means and the
         estimate of error associated with the sampling of the means. Simplest
         case is the Standard Error of the Mean (SE or SEM) = Sx

            STDEV (sx) / n.5 = standard deviation / square root of level of replication

 3) Determine the probability that distributions are similar/different

 4) Compare with a critical p-value to assign significance


               What is a distribution (around a mean)?
Calculation of statistical distribution

     Distribution of Oak Seedlings - pre-impact or non-polluted site

                                             VALUES
                                              50
                                              45
                                              40
                                              35
                                              30
                                              25
                                              20
                                              15
                                              10
                                              5
                                              0



            Sites = 100
            Mean per site = 25
            Total seedlings = 2500
      Evaluate effect of sample size on calculation of
      and confidence in Mean

                      Compare for sample size's of 5,10, 20, 50, 99
                        cells

                      Iterate 50 times




                                23                             36
                 36
22          27        19             24                             Example: sample 10 sites to
                                                          28        determine mean
                 41                             28             25
     12
            25                                                         x fifty
     33                    23             21    17             16      iterations
                                                     40
            31

          X=21.5                               X=25.8
                                                                True Mean = 25      Means
                                                                                    21.5
                          36                                                        22.3




                                              Number of cases
22              27             19                                                   23.0
                          41
                                                                                    23.9
     12                                                                             24.9
                25
     33                             23                                              25.1
                                                                                    25.8
                31
                                                                                    26.5
                                                                 Estimate of Mean
           Mean = 21.5                                                              27.8
                                                                                    29.9
23                                       36
                                                                                    etc
     24
                          28
                28                       25



          21    17                       16
                     40



               Mean = 25.8
Effect of number of observations on estimate of Mean
                          Frequency distributions of sample                       16
                     16                                                                                   0.3
                          means




                                                                                                                Proportion per Bar
                                                                                  12




                                                                     # of cases
   Number of Cases


                                                                                                          0.2
                     12
                                                                                  8
  Numberof cases




                                                                                                          0.1
                                                                                  4
                      8
                                                                                  0                    0.0
                                                                                  15   20   25   30   35
                                                                                       Estimate of Mean
                     2.5%
                       4                                 2.5%
                                   2 SE     2 SE
                                                                   Mean based on
                                                                   X observations
                      0
                       15         20       25      30         35          5
                                Estimate ofMean
                                    Estimate of
                                                Mean                      10
                                                                          20
                                                                          50
                                                                          99
Statistical comparison of two sample distributions


                  Ho: X 1 =   X   2

                  HA: X 1 ≠   X   2




              X                         X   2
                  1
                  How to estimate optimal sample size

1) Do a preliminary study of
variables that will be                                     Mean
evaluated in project                                       Standard Deviation
                                          30
2) Plot the mean and some
estimate of variability of data
                                          25
as a function of sample size
                                          20
3) Look for “sweet spot” where

                                  Value
estimates of mean and                     15
variance (or standard
deviation) converge on a                  10
stable value
                                          5
4) Calculate a “bang for buck”
relationship to determine if a            0
                                           0 10 20 30 40 50 60 70 80 90 100
robust design (sufficient                             SAMPLES
sampling) can be paid for
           Trade off between accuracy and cost




                                                            Cost
Accuracy




                                        *         +
                         Bang per
                           buck                   -




             Sample Size – e.g. number of replicate sites

						
Related docs
Other docs by hcj