# A Global Geometric Framework for Nonlinear Dimensionality

Document Sample

```					Nonlinear Dimensionality
Reduction Approaches
Dimensionality Reduction

   The goal:
The meaningful low-dimensional structures hidden in their
high-dimensional observations.
   Classical techniques
   Principle Component Analysis—preserves the variance
   Multidimensional Scaling—preserves inter-point distance

   Isomap
   Locally Linear Embedding
Common Framework

   Algorithm
   Given data D  x ,, x  . Construct a nxn affinity matrix M.
1   n

~
   Normalize M, yielding M .
   Compute the m largest eigenvalues 'j and eigenvectors v j
~
of M . Only positive eigenvalues should be considered.
   The embedding of each example x j is the vector y j with yij
~
the i-th element of the j-th principle eigenvector v j of M .
Alternatively (MDS and Isomap), the embedding is ei , with
eij  'j yij . If the first m eigenvalues are positive, then ei .e j
~
is the best approximation of M using only m corrdinates,
in the sense of squared error.
Linear Dimensionality Reduction

   PCA
   Finds a low-dimensional embedding of the data
points that best preserves their variance as
measured in the high-dimensional input space

   MDS
   Finds an embedding that preserves the inter-point
distances, equivalent to PCA when the distances
are Euclidean.
Multi-Dimensional Scaling

   MDS starts from a notion of distacne of affinity that
is computed each pair of training examples.
   The normalizing step is equivalent to dot products
using the “double-centering” formula:
1                            
Si   M ij
~                1    1      1
M ij    M ij  Si  S j  2 Si S j    where
2       n    n     n                              j

   The embedding         of example is given by  v
eik                           xi                 k ik
~
where v is the k-th eigenvector of M . Note that if
.k

M   y  y y  y  where y is the average
M  y y
2~
ij   i   jthen                   ij      i          j

value of y         i
Nonlinear Dimensionality Reduction

   Many data sets contain essential nonlinear
structures that invisible to PCA and MDS
   Resorts to some nonlinear dimensionality
reduction approaches.
A Global Geometric Framework
for Nonlinear Dimensionality
Reduction (Isomap)
Joshua B. Tenenbaum, Vin de Silva,
John C. Langford
Example
   64X64 Input
Images form
4096-dimensional
vectors
   Intrinsically, three
dimensions is
enough for
presentations
Two pose
parameters and
azimuthal lighting
angle

   Combining the major algorithmic features of
PCA and MDS
   Computational efficiency
   Global optimality
   Asymptotic convergence guarantees
   Flexibility of learning a broad class of
nonlinear manifold
Example of Nonlinear Structure
   Swiss roll
Only the geodesic distances reflect the true low-dimensional
geometry of the manifold.
Intuition

   Built on top of MDS.
   Capturing in the geodesic manifold path of
any two points by concatenating shortest
paths in-between.
   Approximating these in-between shortest
paths given only input-space distance.
Algorithm Description

   Step 1
Determining neighboring points within a fixed radius based on the
input space distance d X i, j 
These neighborhood relations are represented as a weighted
graph G over the data points.
   Step 2
Estimating the geodesic distances d M i, j  between all pairs of
points on the manifold M by computing their shortest path
distances d G i, j  in the graph G
   Step 3
Constructing an embedding of the data in d-dimensional
Euclidean space Y that best preserves the manifold’s geometry
Construct Embeddings

   The coordinate vector y for points in Y are chosen
i

to minimize the cost function
E   DG   DY  L2

where D denotes the matrix of Euclidean distances d i, j   y  y 
Y                                              Y     i   j

the L matrix norm  A The  operator converts
2
and A  L2
2
i, j   ij

distances to inner products.
Dimension

   The true dimensionality of data can be
estimated from the decrease in error as the
dimensionality of Y is increased.
Manifold Recovery Guarantee

   Isomap is guaranteed asymptotically to recover the
true dimensionality and geometric structure of
nonlinear manifolds
   As the sample data points increases, the graph
distances d (i, j) provide increasingly better
G

approximations to the intrinsic geodesic distances d   M   (i, j )
Examples

   Interpolations
between distant
points in the low-
dimensional
coordinate space.
Summary

   Isomap handles non-linear manifold
   Isomap keeps the advantages of PCA and
MDS
   Non-iterative procedure
   Polynomial procedure
   Guaranteed convergence
   Isomap represents the global structure of a
data set within a single coordinate system.
Nonlinear Dimensionality
Reduction by Locally Linear
Embedding
Sam T. Roweis and Lawrence K. Saul
LLE

   Neighborhood preserving embeddings
   Mapping to global coordinate system of low
dimensionality
   No need to estimate pairwise distances
between widely separated points
   Recovering global nonlinear structure from
locally linear fits
Algorithm Description

   We expect each data point and its neighbors to lie
on or close to a locally linear patch of the manifold.
   We reconstruct each point from its neighbors.
             
 W   
2
X i   j Wij X j
i

where Wij summarize the contribution of jth data point to the
ith data reconstruction and is what we will estimated by
optimizing the error
   Reconstructed from only its neighbors
   Wj sums to 1
Algorithm Description

   A linear mapping for transform the high dimensional
coordinates of each neighbor to global internal
coordinates on the manifold.
            
min Y    Yi   j Wij Y j
2

Y
i

Note that the cost defines a quadratic form
Y    M Y  Y 
 
ij   i   j
ij

where M    W  W  W W
ij   ij   ij   ji
k
ki        kj

The optimal embedding is found by computing the
bottom d eigenvector of M, d is the dimension of the
embedding
Illustration
Examples
   Two Dimensional Embeddings of Faces
Examples
Examples
Thank you

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 3 posted: 8/16/2012 language: pages: 26
How are you planning on using Docstoc?