Google Earth and Statistical Trends Analysis Tools
Brandon Bergenroth, Jay Rineer, Breda Munoz and William Cooter (RTI) Dwane Young (EPA OW) Dwight Atkinson (EPA OW/AWPD)
RTI International is a trade name of Research Triangle Institute
3040 Cornwallis Road Phone 919-316-3537
■
P.O. Box 12194
■
Research Triangle Park, North Carolina, USA 27709 e-mail bbergenroth@rti.org
1
Statistical Trend Analysis for STORET DATA
2
New STORET Tools (Services) Simplify Pulling Data for Trend Analysis
• Trends analysis helps identify degradation trends for waters that warrant protection to avoid 303(d) listing Trend analysis also help document incremental improvements showing progress in restoring impaired waters
•
3
Seasonal Kendall tests a common tool to help confirm apparent trend patterns
• • Seasonal Kendall tests favored by the USGS, EPA ORD, and many university researchers Valuable where parameter show variability related to seasonal changes in temperature or changes in flows Can accommodate some degree of “censored observations (below detection limits or missing values)
•
4
Trend analysis functions/modules similar to ESTREND (USGS) and Kendall (S-PLUS) already implemented in the open source R.
5
R is supported by EPA through EMAP and through initiatives such as NCEA’s CADDIS
6
R-based Trend Analysis using STORET river/stream station data
• Scatter plots for data series of conventional and toxic parameters Seasonal Kendall test can be used to assess seasonal trends
•
7
Non Parametric Statistic Tests
Non parametric statistic tests refer to the collection of statistic tests that do not require any assumption on the distribution of the data. They are also known in the statistic literature as distribution free tests and distribution independent tests.
Furthermore, non parametric tests have few underlying assumptions and tend to concentrate in the relative values (e.g. ranks) of the observations instead of the magnitude of the observations. Most non parametric tests were designed to assess the presence or absence of a given statistic characteristic (e.g. trend) and therefore do not provide the magnitude of the statistic characteristic of interest. For this reason, some researchers classify them as exploratory data procedures. They are often used in hypothesis testing (e.g. existence of trends) and therefore considered as confirmatory data analysis tools.
8
MannKendall
Let:
x1 ,...,xn be a sequence of measurements over time, to test the null hypothesis, H 0 : x1,...,xn come from a population where the random variables are independent and identically distributed, H1 : x1 ,...,xn follow a monotonic (e.g. increasing or decreasing) trend over time.
The Mann-Kendall test statistic is calculated as S
sgn(x
k 1 j k 1
n1
n
j
xk ) where
1 if x j xk 0 sgn(x j xk ) 0 if x j xk 0 1 if x x 0 j k
S is asymptotically normally distributed. The mean and variance of S are given by
ES 0
p n(n 1)(2n 5) t j (t j 1)(2t j 5) j 1 if ties VarS 18 n(n 1)(2n 5) no ties 18
where p is the number of tied groups in the data set and is t j the number of data points in the jth tied group.
9
MannKendall
A positive value of S indicates that there is an upward (increasing) trend (e.g. observations increase with time). A negative value of S means that there is a downward (decreasing) trend. If S is significantly different from zero, then based on the data H 0 can be rejected at a pre-selected significance level and the existence of a monotonic trend can be accepted. Note that S is a count of the number of times x j xk 0 for jk, more than x j xk 0. The maximum value of S (called it D) occurs when x1 x2 ... xn.
Kendall’s tau is defined as tau
S where D
n(n 1) p n(n 1) t j (t j 1) if ties 2 2 j 1 D n(n 1) no ties 2
10
MannKendall
The distribution of Kendall’s tau can be easily obtained from the distribution of S.
A positive value of tau indicates that there is an upward (increasing) trend (e.g. observations increase with time).
A negative value of tau means that there is a downward (decreasing) trend. If tau is significantly different from zero (e.g. value less than 0.05 at the 5% significance level or less than 0.01 at the 1% significance level), then based on the data, H 0 can be rejected at a pre-selected significance level (e.g. alpha = 5%) and the existence of a monotonic trend can be accepted.
Note that the test only allows the software user to conclude about the existence not about the magnitude of the trend.
11
Getting Results
Using STORET Data Warehouse STORET Station Descriptions Stations by Geographic Location http://iaspub.epa.gov/stormoda/DW_stationcriteria Stations by Organization and Station ID http://iaspub.epa.gov/stormoda/DW_stationcriteria_STN
12
Visualizing Results
Transform text results to KML Keyhole Markup Language (KML) is an XML based language for describing three-dimensional geospatial data and its display in application programs.
KML is supported in GoogleEarth, GoogleMaps and Microsoft VirtualEarth
http://code.google.com/apis/kml/documentation
13
Visualizing Results
14
Visualizing Results
15
Report Results
http://iaspub.epa.gov/storpubl/storet_wme_pkg.Display_Station?p_station_id=SP-64&p_org_id=MWRD
16
Report Results
http://iaspub.epa.gov/stormoda/DW_resultcriteria_station http://iaspub.epa.gov/webservices/StoretResultService/index.html?operation=getResults
17
Kendall Trend Analysis for pH
18
Kendall Trend Analysis for Temperature
19
Kendall Trend Analysis for Dissolved Oxygen
20
Kendall Trend Analysis for Total Suspended Solids
21
Kendall Trend Analysis for Turbidity
22
Kendall Trend Analysis for Cadmium
23
Kendall Trend Analysis for Zinc
24
Indexing STORET stations to the NHD can help increase sophistication of trend analyses
• Group sites relative to upstream NPDES discharges Group using HortonStrahler stream orders Group in terms of landcover patterns using NHDPlus LU/LC raster data
•
•
25
Indexing and combining station results
26
Next Steps
• Additional work on “pre-processing” STORET station data to focus attention on stations with enough “data density” to support trend analyses Develop a “data mart” of R trend analysis results – including saved images of scatter plots over time from R Consider ways trend analyses can support either pro-active study of anti-degradation effects [Goal: detect degradation trend early on and consider management steps to avoid winding up with additional 303(d) lists] Also – use trend analyses as a tool to document incremental progress in meeting targets established under WQ Standards or the TMDL program
•
•
•
27
EPADocs 5/13/2008 |
532 |
3 |
0 |
legal
EPADocs 5/13/2008 |
267 |
0 |
0 |
legal
EPADocs 5/18/2008 |
297 |
0 |
0 |
legal
EPADocs 5/9/2008 |
224 |
2 |
0 |
legal
EPADocs 5/9/2008 |
164 |
0 |
0 |
legal
EPADocs 5/9/2008 |
177 |
2 |
0 |
legal
EPADocs 5/14/2008 |
143 |
3 |
0 |
legal
EPADocs 5/14/2008 |
142 |
0 |
0 |
legal
EPADocs 5/9/2008 |
198 |
2 |
0 |
legal
EPADocs 5/9/2008 |
148 |
0 |
0 |
legal
EPADocs 5/14/2008 |
131 |
1 |
0 |
legal
EPADocs 5/18/2008 |
111 |
0 |
0 |
legal
EPADocs 5/18/2008 |
115 |
1 |
0 |
legal
EPADocs 5/18/2008 |
113 |
1 |
0 |
legal
EPADocs 5/18/2008 |
84 |
0 |
0 |
legal
EPADocs 5/21/2008 |
270 |
8 |
0 |
legal
EPADocs 5/21/2008 |
169 |
2 |
0 |
legal
EPADocs 5/21/2008 |
177 |
2 |
0 |
legal
EPADocs 5/21/2008 |
210 |
1 |
0 |
legal
EPADocs 5/21/2008 |
196 |
4 |
0 |
legal
EPADocs 5/21/2008 |
177 |
3 |
0 |
legal
EPADocs 5/21/2008 |
178 |
0 |
0 |
legal
EPADocs 5/21/2008 |
163 |
0 |
0 |
legal
EPADocs 5/21/2008 |
158 |
0 |
0 |
legal
EPADocs 5/21/2008 |
166 |
0 |
0 |
legal