Multivariate dependence in complex systems
Auroop R. Ganguly*, Shiraj Khan**, David J. Erickson III*, Rick W. Katz***, George Ostrouchov*, Vladimir A. Protopopescu*, Sharba Bandyopadhyay****, and Sunil Saigal**
* Oak Ridge National Laboratory, Oak Ridge, TN ** University of South Florida, Tampa, FL *** National Center for Atmospheric Research, Boulder, CO **** Johns Hopkins University, Baltimore, MD
5th Symposium on Understanding Complex Systems University of Illinois at Urbana-Champaign May 16-19, 2005
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
Definitions
“Multivariate Dependence”
Generalized (linear or nonlinear) dependence within one or among multiple variables, in spatial, temporal or other dimensions
“Complex Systems”
Nonlinear, multidimensional, multi-scale, component processes
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
2
Summary
Correlation and Dependence
Linear and Nonlinear Dependence Simulated System (Time series) Real Systems (Time series and Spatial)
Extremal Dependence Next Steps
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
3
Dependence
Linear and Nonlinear
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
Correlation and Dependence
Linear Correlation
Linear relationship among variables, r Time Series, Autocorrelation and Cross-correlation functions
Nonlinear Dependence
Linear and nonlinear relationships, by using information theoretic concepts like mutual information
Multiple dimensions
Temporal Spatial Spatio-Temporal
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
5
Information Entropy and Mutual Information
Information entropy, H(X) = – S pi lnpi Mutual entropy, H(X,Y) = – S S pij lnpij Mutual Information, I(X;Y) I(X;Y) or IXY = H(X) + H(Y) – H(X,Y)
“Distance” between the joint distribution F(X;Y) and the product distribution F(X)F(Y) Independence implies F(X;Y) = F(X) F(Y)
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
6
Nonlinear dependence measure
I(X;Y) goes from 0 to ∞; r2XY from 0 to 1 For bivariate normal, IXY = – ½ log(1–r2XY) Granger defined l = 1 – exp(–2*IXY)
This new quantity is like a “nonlinear correlation” measure that goes from 0 to 1
Can be extended for the multivariate case
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
7
Nonlinear dependence: Space-time
Nonlinear dependence measure
Some applications in time series Almost nothing for spatial statistics Nothing for spatio-temporal
Significant
Linear is only one of several types Nonlinear is general Linear may not predominate in all situations
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
8
Bounds on Predictability
MSE from linear regression
MSE = (1 – r2XY)s2Y
Minimum MSE from nonlinear methods
Maximum bound on predictability Theorem: E{Y–g(X)}2 ≥ (1/2pe) exp{2(HY–IXY} The left hand side is the MSE bound g(X) is the best possible function of X that explains or predicts Y
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
9
Simulated System
Lorenz Equations
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
Dynamical Time Series: Simple nonlinear system
s = 10 r = 28 b = 8/3 X(0) = 1.1 Y(0) = 5.0 Z(0) = 1.1
Courtesy: Mathworld
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
11
Lorenz Equations
Lorenz Equations: Lorenz Y vs. Lorenz X
r = 0.8606 Linear MSE (theoretical): 22.5026 Linear MSE (validation): 21.3181 l = 0.9192 Nonlinear MSE (theoretical bound): 9.6031 Nonlinear MSE (validation with ANN): 21.2995
Impact of noise & Seasonality:
Lower r & l imply higher MSE
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
12
Autocorrelation – Lorenz X
Cross-Correlation Function r r (lag)
Cross-Dependence Function l l (lag)
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
13
Cross-correlation – X vs. Y & X vs. Z
X vs. Y X vs. Z
CrossCorrelation Function r r (lag)
CrossDependence Function l l (lag)
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
14
Lagged Cross-correlation – X vs. Y
X/t=Xt+10 vs. Y X/t=Xt+25 vs. Y
CrossCorrelation Function r r (lag)
CrossDependence Function l l (lag)
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
15
Cross-correlation with Noise – X vs. Z
X vs. Z/ = Z + N (0,1) X vs. Z/ = Z + N (0,5)
CrossCorrelation Function r r (lag)
CrossDependence Function l l (lag)
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
16
Real Systems
1. Time Series (Linear & Nonlinear)
a. Hydro-climatology
2. Spatial (Linear)
a. Wind velocity b. High-resolution population c. Wind velocity (potentially, spatio-temporal)
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
17
Hydro-climatology
El Nino Southern Oscillation Index Variability in river flows around the world
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
18
0.8 0.7 0.6
Correlation Coefficient
Correlation Coefficient
0.7 0.6
Correlation Coefficient
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 Year 4 5 6 7 0 1 2 Year 3 4 5 Ganges (l) Ganges (nl)
0.5 0.4 0.3 0.2 0.1 0
0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 Year 4 5 6
Amazon (linear) Amazon (nonlinear) Parana (linear) Parana (nonlinear)
Parana (l) Parana (nl)
0.8 0.7 0.6
Correlation Coefficient
0.8 0.7 0.6
Correlation Coefficient
0.9 0.8 0.7
correlation coefficient
Et & ENSO (l) Et & ENSO (nl)
0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 4 Year 5 6 7 8 Nile (l) Nile (nl)
0.5 0.4 0.3 0.2 0.1
0.6 Nile & ENSO (l) 0.5 0.4 0.3 Et, Enso & Nile (l) 0.2 0.1 Et, ENSO & Nile (nl) Nile & ENSO (nl) Et & Nile (l) Et & Nile (nl)
Congo (l) Congo (nl) Nile (l) Nile (nl)
0 0 1 2 3 Year 4 5 6
0 0 0.5 1 1.5 2 Year 2.5 3 3.5 4 4.5
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
19
High-resolution Population
“LandScan”: Developed at ORNL, used globally by mapping agencies & for disaster management 30 arc seconds for global; 3 arc seconds for USA Census counts allocated to higher resolutions: Re-distribution model Input variables like proximity to roads, slopes, night-time lights, land cover, etc. Correlations can be directional
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
20
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
21
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
22
Directional Spatial Correlations: Autocorrelation, Aggregated LandScan USA
1 2 3
4
5
6 5
4
3
2
1
6
7
8
9: Identical 9 7 8
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
23
Directional Spatial Correlations: Cross-correlation, LandScan USA vs. Global
1 2 3
4
5
6 5
4
3
2
1
6
7
8
9: Identical 9 7 8
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
24
Directional Spatial Correlations: Cross-correlation, LandScan Global vs. Lights
1 2 3
4
5
6 5
4
3
2
1
6
7
8
9: Identical 9 7 8
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
25
Wind Velocity
U and V components 1013 millibars Entire globe, 1 degree lat-long coverage Note: Projections do not consider spherical nature of the data Correlations can be directional
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
26
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
27
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
28
Spatial auto-correlation: U
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
29
Spatial auto-correlation: V
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
30
Spatial cross-correlation: U & V
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
31
Extremal Dependence
Emerging Literature Ongoing Work Applications
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
Statistics of Extremes: State of the Art
Law of large numbers & probability model
Regular values ~ CLT ~ Normal distribution Extreme values (rate of occurrence) ~ Poisson distribution Severity ~ Generalized Pareto (GP) distribution Where number of occurrences are rare, probability models help to model and predict
Time series extremes
Declustering to identify extremes / events – Last decade Probability models (probability of exceedence and probability given exceedence) – 2000 to 2004 ACF-like time lagged univariate dependence – 2003 Multivariate normalizations – 2004
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
33
Extremal Dependence: Multiple Time Series
Which set of conditions in indicator variables produce extremes in a time series? Prob. { Y > u | X = x }
Poisson for occurrence of high threshold Poisson Parameter: P (l) f (x, s, t); s: space; t: time Generalized Pareto distribution for Pr { Y > u } GP Parameters: GP (q); q f (x, s, t)
Can we develop a new measure for quantifying the dependence in extremes? Prob. { Y (t + t) > u | X (t) > v }
Autocorrelation like measures exists for single series (2003) Multivariate extremal transformations exist (2004) Develop cross-correlation measures for multiple series
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
34
A Method
Dependent variable, Y
Regional Precipitation 2D non-homogeneous point process
Fit Poisson, P(l) for occurrence of extremes Fit Generalized Pareto, GP (q) for extremes
Independent Variable, X
Ocean Temperature Express q q (x, s, t) and l l (x, s, t) Find {x} which trigger {Y > u} for any s, t Pr { Y > u | X = x } as a function of (s, t)
35
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
A Measure
Time-lagged extremal associations
New method gives Pr { Y > u | X = x } Find Y > u given X > v
Develop CCF-like measure
Ledford and Tawn (2003): Extremal ACF: Pr { Y (t + t) > u | Y (t) > u } Heffernan & Tawn (2004): Multivariate extremal transforms: Pr { Y (t + t) > u | X (t) > v } CCF-like measure Time-lagged multivariate extremal dependence First step to space-time extremal dependence
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
36
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
37
Applications: Climate Extremes
Abrupt Changes in the Paleo Climate
Science: August, 2004
Heat Waves in the 21st century
Higher: Intensity, frequency, duration
Science: March, 2003
Sudden regional change in past climate
1920-s North latitudes
NRC (2002): Abrupt Climate Change Panel “Current use of statistics needs to be re-examined, as one cannot treat abrupt climate change in the same manner as one would treat the occurrence of a 100-year floods”
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
38
Next Steps
Regular Dependence
Linear correlation for space-time Nonlinear dependence in space & space-time
Extremal dependence
Multivariate extremal dependence measures Space and space-time Linear versus nonlinear?
Real applications
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
39
Thank you!
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY