Variational Gaussian process factor analysis for modeling spatio

Document Sample
Variational Gaussian process factor analysis for modeling spatio Powered By Docstoc
					 Variational Gaussian-process factor
analysis for modeling spatio-temporal
                data

  Jaakko Luttinen and Alexander Ilin
              NIPS 2009

                          Presented by Bo Chen
                                 2.26, 2010
                  Outline
•   Introduction---- Factor Analysis (FA)
•   Introduction--- Gaussian Process (GP)
•   Spatio-Temporal Factor Analysis
•   Factor Analysis with GP prior (GPFA)
•   Variational Bayesian Inference
•   Speeding up GPFA
•   Experiments
The Applications of Factor Analysis


• 1. Dimensionality Reduction
• 2. Dictionary Learning (Denoising and
  Impainting)
• 3. Feature Selection (Gene Analysis)
• 4. Matrix Completion (Regression)
• 5. Spatial Dynamic Data Analysis
• …. Uncover the prominent structure from
  the data
                 Gaussian Process
   A joint Gaussian distribution over sets of function
   values {fx} of any arbitrary set of n instances x  
                 f x  (  (x), K(x, x' ))
Probability distribution
over functions           Introduce the extra information
                              from the input space


    Pros:
    • Utilize the extra information from the input space
    • Nonlinearity
    Cons:
    • Computational Complexity
     Spatio-Temporal Factor Analysis



              Time information


                                            W:d: A factor vector spatially
Spatial                                     distributed
information

                                            Xd:: Time seires of factor d
               (M. N. Schmidt.,ICML 2009)

 The m-th row of Y corresponds to a spatial location lm (e.g., a location on a two
 dimensional map) and the n-th column corresponds to a time instance tn
  Introduce Gaussian Process Prior
Each time signal xd: contains values of a latent function X(t)
computed at time instances tn.



Each spatial signal w:d contains measurements of a function W(l)
at different locations lm.




The likelihood function of the observed data:
 Variational Bayesian Inference
The approximation of the true posterior:



The lower bound of the marginal log-likelihood:




Maximizing the lower bound, we can get
                Inferred Posterior

 Where Z: is a DNx1 vector formed by concatenation of vectors:




U is a DNxDN block-diagonal matrix with the following DxD
matrices on the diagonal:




In the paper, the author assume an isotropic noise:
     Speeding Up GPFA (1)
• Component-Wise Factorization
                 Speeding Up GPFA (2)
  • Inducing the inputs
 A set of auxiliary variables       which contain the values of latent
 functions Wd(l), Xd(t) in some locations                          and


 If the inducing inputs summarize the data well,


 The approximate posterior:



 Maximizing the new variational lower bound



We will get

   Some VB update details can be found in this paper and M. K. Titsias., AISTATS’09.
Computational Complexity
               Artificial Experiments
 M=30 sensors (two-dimensional spatial locations)
 N=200 time instances
 D=4 temporal signals xd: generated by taking samples from GP
 priors with different covariance kernels, see next page.

 The loadings were generated from GPs over the two-dimensional
 space using the squared exponential covariance kernel.

 Data Y: 452 points are selected as observed and the remaining
 ones as missing.

The hyperparameters of the Gaussian processes were initialized randomly close
to the values used for data generation, assuming that a good guess about the
Hidden signals can be obtained by exploratory analysis of data.
             Covariance Kernels
• Squared exponential function to model a slowly changing
  component:



• Periodic function with decay to model a quasi-periodic component:



• Compactly supported piecewise polynomial function to model two
  fast changing components with different time scales




• Squared exponential to model the spatial information
Results
         Reconstruction of Global SST Using
              the MOHSST5 Dataset
The authors demonstrate how the presented model can be used to reconstruct
global sea surface temperatures (SST) from historical measurements.

  Data Description:
  1: U.K. Meteorological Office historical SST data set that
  contain monthly SST anomalies in the 1856-1991 period for
  50x50 longitude-latitude bins.

  2. The dataset contains in total approximately 1600 time
  instances and 1700 spatial locations.

  3. The dataset is sparse, especially during the 19th century
  and the World Wars, having 55% of the values missing, and
  thus, consisting of more than 106 observations in total.
Available at http://iridl.ldeo.columbia.edu/SOURCES/.KAPLAN/.RSA_MOHSST5.cuf/.OS/.ssta/?help+datafiles
          Experimental Methodology
      Factor number: D=80           Training set: 20%;     Testing set: 80%
    Covariance Kernels:
1. Five time signals xd: to describe climate trends: the squared exponential
kernel.
2. Five temporal components to capture periodic signals: quasi-periodic kernel
3. Five components to model prominent interannual phenomena such as
El Nino: squared exponential kernel
4. The rest 65 time signals: piecewise polynomial kernel
5. Spatial pattern w:d: scaled squared exponential. The distance r between the
locations li and lj was measured on the surface of the Earth using the spherical
law of cosines.
Inducing inputs:
 1. Each spatial function wd(l): 500 inducing inputs
 2. 15 temporal functions X(t) which modeled slow climate variability:
     (1) the slowest: 80; (2) quasi-periodic: 300; (3) interannual: 300
 3. The remaining temporal phenomena: priors with a sparse covariance matrix and
 therefore allow efficient computations.
 4. Taking a random subset from the original inputs and then kept fixed throughout
 learning
                 Results


    El Nino
Reconstruction
Error: 0.5714




 El Nino

Reconstruction
Error: 0.6180
                 Conclusions
• 1. Gaussian Process factor analysis used for
  modeling spatio-temporal phenomena on different
  scales by using properly selected GPs.

• 2. Infer the parameters using variational Bayesian
  so as to take into account the uncertainty about the
  unknown parameters

• 3. Use all available data and combine all modeling
  assumptions in one estimation procedure