# Variational Gaussian process factor analysis for modeling spatio

Document Sample

```					 Variational Gaussian-process factor
analysis for modeling spatio-temporal
data

Jaakko Luttinen and Alexander Ilin
NIPS 2009

Presented by Bo Chen
2.26, 2010
Outline
•   Introduction---- Factor Analysis (FA)
•   Introduction--- Gaussian Process (GP)
•   Spatio-Temporal Factor Analysis
•   Factor Analysis with GP prior (GPFA)
•   Variational Bayesian Inference
•   Speeding up GPFA
•   Experiments
The Applications of Factor Analysis

• 1. Dimensionality Reduction
• 2. Dictionary Learning (Denoising and
Impainting)
• 3. Feature Selection (Gene Analysis)
• 4. Matrix Completion (Regression)
• 5. Spatial Dynamic Data Analysis
• …. Uncover the prominent structure from
the data
Gaussian Process
A joint Gaussian distribution over sets of function
values {fx} of any arbitrary set of n instances x  
f x  (  (x), K(x, x' ))
Probability distribution
over functions           Introduce the extra information
from the input space

Pros:
• Utilize the extra information from the input space
• Nonlinearity
Cons:
• Computational Complexity
Spatio-Temporal Factor Analysis

Time information

W:d: A factor vector spatially
Spatial                                     distributed
information

Xd:: Time seires of factor d
(M. N. Schmidt.,ICML 2009)

The m-th row of Y corresponds to a spatial location lm (e.g., a location on a two
dimensional map) and the n-th column corresponds to a time instance tn
Introduce Gaussian Process Prior
Each time signal xd: contains values of a latent function X(t)
computed at time instances tn.

Each spatial signal w:d contains measurements of a function W(l)
at different locations lm.

The likelihood function of the observed data:
Variational Bayesian Inference
The approximation of the true posterior:

The lower bound of the marginal log-likelihood:

Maximizing the lower bound, we can get
Inferred Posterior

Where Z: is a DNx1 vector formed by concatenation of vectors:

U is a DNxDN block-diagonal matrix with the following DxD
matrices on the diagonal:

In the paper, the author assume an isotropic noise:
Speeding Up GPFA (1)
• Component-Wise Factorization
Speeding Up GPFA (2)
• Inducing the inputs
A set of auxiliary variables       which contain the values of latent
functions Wd(l), Xd(t) in some locations                          and

If the inducing inputs summarize the data well,

The approximate posterior:

Maximizing the new variational lower bound

We will get

Some VB update details can be found in this paper and M. K. Titsias., AISTATS’09.
Computational Complexity
Artificial Experiments
M=30 sensors (two-dimensional spatial locations)
N=200 time instances
D=4 temporal signals xd: generated by taking samples from GP
priors with different covariance kernels, see next page.

space using the squared exponential covariance kernel.

Data Y: 452 points are selected as observed and the remaining
ones as missing.

The hyperparameters of the Gaussian processes were initialized randomly close
to the values used for data generation, assuming that a good guess about the
Hidden signals can be obtained by exploratory analysis of data.
Covariance Kernels
• Squared exponential function to model a slowly changing
component:

• Periodic function with decay to model a quasi-periodic component:

• Compactly supported piecewise polynomial function to model two
fast changing components with different time scales

• Squared exponential to model the spatial information
Results
Reconstruction of Global SST Using
the MOHSST5 Dataset
The authors demonstrate how the presented model can be used to reconstruct
global sea surface temperatures (SST) from historical measurements.

Data Description:
1: U.K. Meteorological Office historical SST data set that
contain monthly SST anomalies in the 1856-1991 period for
50x50 longitude-latitude bins.

2. The dataset contains in total approximately 1600 time
instances and 1700 spatial locations.

3. The dataset is sparse, especially during the 19th century
and the World Wars, having 55% of the values missing, and
thus, consisting of more than 106 observations in total.
Available at http://iridl.ldeo.columbia.edu/SOURCES/.KAPLAN/.RSA_MOHSST5.cuf/.OS/.ssta/?help+datafiles
Experimental Methodology
Factor number: D=80           Training set: 20%;     Testing set: 80%
Covariance Kernels:
1. Five time signals xd: to describe climate trends: the squared exponential
kernel.
2. Five temporal components to capture periodic signals: quasi-periodic kernel
3. Five components to model prominent interannual phenomena such as
El Nino: squared exponential kernel
4. The rest 65 time signals: piecewise polynomial kernel
5. Spatial pattern w:d: scaled squared exponential. The distance r between the
locations li and lj was measured on the surface of the Earth using the spherical
law of cosines.
Inducing inputs:
1. Each spatial function wd(l): 500 inducing inputs
2. 15 temporal functions X(t) which modeled slow climate variability:
(1) the slowest: 80; (2) quasi-periodic: 300; (3) interannual: 300
3. The remaining temporal phenomena: priors with a sparse covariance matrix and
therefore allow efficient computations.
4. Taking a random subset from the original inputs and then kept fixed throughout
learning
Results

El Nino
Reconstruction
Error: 0.5714

El Nino

Reconstruction
Error: 0.6180
Conclusions
• 1. Gaussian Process factor analysis used for
modeling spatio-temporal phenomena on different
scales by using properly selected GPs.

• 2. Infer the parameters using variational Bayesian
so as to take into account the uncertainty about the
unknown parameters

• 3. Use all available data and combine all modeling
assumptions in one estimation procedure

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 11 posted: 4/5/2011 language: Italian pages: 18