Google Earth and Statistical Trends Analysis Tools

Brandon Bergenroth, Jay Rineer, Breda Munoz and William Cooter (RTI)

Dwane Young (EPA OW)

Dwight Atkinson (EPA OW/AWPD)

RTI International is a trade name of Research Triangle Institute

3040 Cornwallis Road P.O. Box 12194 Research Triangle Park, North Carolina, USA 27709
Phone 919-316-3537 e-mail

Statistical Trend Analysis for STORET DATA

New STORET Tools (Services) Simplify Pulling
Data for Trend Analysis

Trends analysis helps

identify degradation
trends for waters that
warrant protection to
avoid 303(d) listing
Trend analysis also help
document incremental
improvements showing
progress in restoring
impaired waters

Seasonal Kendall tests a common tool to help
confirm apparent trend patterns
Seasonal Kendall tests favored by the USGS, EPA
ORD, and many university researchers
Valuable where parameter show variability related to
seasonal changes in temperature or changes in flows
Can accommodate some degree of censored
observations (below detection limits or missing values)

Trend analysis functions/modules similar to
ESTREND (USGS) and Kendall (S-PLUS)
already implemented in the open source R.

R is supported by EPA through EMAP and
through initiatives such as NCEAs CADDIS

R-based Trend Analysis using STORET
river/stream station data

Scatter plots for data

series of conventional
and toxic parameters

Seasonal Kendall test

can be used to assess
seasonal trends

Non Parametric Statistic Tests

Non parametric statistic tests refer to the collection of statistic tests that
do not require any assumption on the distribution of the data. They are
also known in the statistic literature as distribution free tests and
distribution independent tests.

Furthermore, non parametric tests have few underlying assumptions

and tend to concentrate in the relative values (e.g. ranks) of the
observations instead of the magnitude of the observations.

Most non parametric tests were designed to assess the presence or

absence of a given statistic characteristic (e.g. trend) and therefore do
not provide the magnitude of the statistic characteristic of interest. For
this reason, some researchers classify them as exploratory data

They are often used in hypothesis testing (e.g. existence of trends) and
therefore considered as confirmatory data analysis tools.

x1 ,..., x n be a sequence of measurements over time, to test the null hypothesis,
H 0 : x1 ,..., xn come from a population where the random variables are independent and identically distributed,
H 1 : x1 ,..., x n follow a monotonic (e.g. increasing or decreasing) trend over time.
n 1 n
The Mann-Kendall test statistic is calculated as S sgn( x j xwhere
k 1 j k 1

1 if x j x k 0

sgn( x j x k ) 0 if x j x k 0
1 if x x 0
j k

S is asymptotically normally distributed.

The mean and variance of S are given by

E S 0

n( n 1)(2n 5) t j (t j 1)(2t j 5)
j 1
Var S if ties
n(n 1)(2n 5)
no ties
where p is the number of tied groups in the data set and is the number of data points in the jth tied group.


A positive value of S indicates that there is an upward (increasing) trend (e.g. observations increase with

A negative value of S means that there is a downward (decreasing) trend.

If S is significantly different from zero, then based on the data H 0 can be rejected at a pre-selected
significance level and the existence of a monotonic trend can be accepted.

x xk 0 x xk 0
Note that S is a count of the number of times j for j k, more than j .
The maximum value of S (called it D) occurs when 1x x 2 ... x n.

n(n 1) p n(n 1)
S t j (t j 1) if ties
2 2
Kendalls tau is defined as tau D where D j 1
n(n 1)
no ties


The distribution of Kendalls tau can be easily obtained from the distribution of S.

A positive value of tau indicates that there is an upward (increasing) trend (e.g. observations increase with

A negative value of tau means that there is a downward (decreasing) trend.

If tau is significantly different from zero (e.g. value less than 0.05 at the 5% significance level or less than
0.01 at the 1% significance level), then based on the data, H 0 can be rejected at a pre-selected
significance level (e.g. alpha = 5%) and the existence of a monotonic trend can be accepted.

Note that the test only allows the software user to conclude about the existence not about the magnitude of
the trend.

Getting Results

Using STORET Data Warehouse

STORET Station Descriptions

Stations by Geographic Location
Stations by Organization and Station ID

Visualizing Results

Transform text results to KML

Keyhole Markup Language (KML) is an XML based language for describing three-dimensional
geospatial data and its display in application programs.

KML is supported in GoogleEarth, GoogleMaps and Microsoft VirtualEarth

Visualizing Results

Visualizing Results

Report Results

Report Results

Kendall Trend Analysis for pH

Kendall Trend Analysis for Temperature

Kendall Trend Analysis for Dissolved Oxygen

Kendall Trend Analysis for Total Suspended Solids

Kendall Trend Analysis for Turbidity

Kendall Trend Analysis for Cadmium

Kendall Trend Analysis for Zinc

Indexing STORET stations to the NHD can
help increase sophistication of trend
Group sites relative to
upstream NPDES
Group using Horton-
Strahler stream orders
Group in terms of
landcover patterns
using NHDPlus LU/LC
raster data

Indexing and combining station results

Next Steps
Additional work on pre-processing STORET station data to
focus attention on stations with enough data density to support
trend analyses
Develop a data mart of R trend analysis results including
saved images of scatter plots over time from R
Consider ways trend analyses can support either pro-active
study of anti-degradation effects [Goal: detect degradation trend
early on and consider management steps to avoid winding up
with additional 303(d) lists]
Also use trend analyses as a tool to document incremental
progress in meeting targets established under WQ Standards or
the TMDL program