Document Sample

Nonlinear Dimensionality Reduction Approaches Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional observations. Classical techniques Principle Component Analysis—preserves the variance Multidimensional Scaling—preserves inter-point distance Isomap Locally Linear Embedding Common Framework Algorithm Given data D x ,, x . Construct a nxn affinity matrix M. 1 n ~ Normalize M, yielding M . Compute the m largest eigenvalues 'j and eigenvectors v j ~ of M . Only positive eigenvalues should be considered. The embedding of each example x j is the vector y j with yij ~ the i-th element of the j-th principle eigenvector v j of M . Alternatively (MDS and Isomap), the embedding is ei , with eij 'j yij . If the first m eigenvalues are positive, then ei .e j ~ is the best approximation of M using only m corrdinates, in the sense of squared error. Linear Dimensionality Reduction PCA Finds a low-dimensional embedding of the data points that best preserves their variance as measured in the high-dimensional input space MDS Finds an embedding that preserves the inter-point distances, equivalent to PCA when the distances are Euclidean. Multi-Dimensional Scaling MDS starts from a notion of distacne of affinity that is computed each pair of training examples. The normalizing step is equivalent to dot products using the “double-centering” formula: 1 Si M ij ~ 1 1 1 M ij M ij Si S j 2 Si S j where 2 n n n j The embedding of example is given by v eik xi k ik ~ where v is the k-th eigenvector of M . Note that if .k M y y y y where y is the average M y y 2~ ij i jthen ij i j value of y i Nonlinear Dimensionality Reduction Many data sets contain essential nonlinear structures that invisible to PCA and MDS Resorts to some nonlinear dimensionality reduction approaches. A Global Geometric Framework for Nonlinear Dimensionality Reduction (Isomap) Joshua B. Tenenbaum, Vin de Silva, John C. Langford Example 64X64 Input Images form 4096-dimensional vectors Intrinsically, three dimensions is enough for presentations Two pose parameters and azimuthal lighting angle Isomap Advantages Combining the major algorithmic features of PCA and MDS Computational efficiency Global optimality Asymptotic convergence guarantees Flexibility of learning a broad class of nonlinear manifold Example of Nonlinear Structure Swiss roll Only the geodesic distances reflect the true low-dimensional geometry of the manifold. Intuition Built on top of MDS. Capturing in the geodesic manifold path of any two points by concatenating shortest paths in-between. Approximating these in-between shortest paths given only input-space distance. Algorithm Description Step 1 Determining neighboring points within a fixed radius based on the input space distance d X i, j These neighborhood relations are represented as a weighted graph G over the data points. Step 2 Estimating the geodesic distances d M i, j between all pairs of points on the manifold M by computing their shortest path distances d G i, j in the graph G Step 3 Constructing an embedding of the data in d-dimensional Euclidean space Y that best preserves the manifold’s geometry Construct Embeddings The coordinate vector y for points in Y are chosen i to minimize the cost function E DG DY L2 where D denotes the matrix of Euclidean distances d i, j y y Y Y i j the L matrix norm A The operator converts 2 and A L2 2 i, j ij distances to inner products. Dimension The true dimensionality of data can be estimated from the decrease in error as the dimensionality of Y is increased. Manifold Recovery Guarantee Isomap is guaranteed asymptotically to recover the true dimensionality and geometric structure of nonlinear manifolds As the sample data points increases, the graph distances d (i, j) provide increasingly better G approximations to the intrinsic geodesic distances d M (i, j ) Examples Interpolations between distant points in the low- dimensional coordinate space. Summary Isomap handles non-linear manifold Isomap keeps the advantages of PCA and MDS Non-iterative procedure Polynomial procedure Guaranteed convergence Isomap represents the global structure of a data set within a single coordinate system. Nonlinear Dimensionality Reduction by Locally Linear Embedding Sam T. Roweis and Lawrence K. Saul LLE Neighborhood preserving embeddings Mapping to global coordinate system of low dimensionality No need to estimate pairwise distances between widely separated points Recovering global nonlinear structure from locally linear fits Algorithm Description We expect each data point and its neighbors to lie on or close to a locally linear patch of the manifold. We reconstruct each point from its neighbors. W 2 X i j Wij X j i where Wij summarize the contribution of jth data point to the ith data reconstruction and is what we will estimated by optimizing the error Reconstructed from only its neighbors Wj sums to 1 Algorithm Description A linear mapping for transform the high dimensional coordinates of each neighbor to global internal coordinates on the manifold. min Y Yi j Wij Y j 2 Y i Note that the cost defines a quadratic form Y M Y Y ij i j ij where M W W W W ij ij ij ji k ki kj The optimal embedding is found by computing the bottom d eigenvector of M, d is the dimension of the embedding Illustration Examples Two Dimensional Embeddings of Faces Examples Examples Thank you

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 3 |

posted: | 8/16/2012 |

language: | |

pages: | 26 |

OTHER DOCS BY ert554898

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.