VIEWS: 5 PAGES: 29 POSTED ON: 2/14/2012 Public Domain
Canonical Correlation Analysis for Feature Reduction 1 Outline of lecture • Overview of feature reduction • Canonical Correlation Analysis (CCA) • Nonlinear CCA using Kernels • Applications 2 Overview of feature reduction • Feature reduction refers to the mapping of the original high- dimensional data onto a lower-dimensional space. – Criterion for feature reduction can be different based on different problem settings. • Unsupervised setting: reduce the information loss • Supervised setting: maximize the class discrimination • Given a set of data points of p variables x1 , x2 , , xn Compute the linear transformation (projection) G p d : x y G T x d 3 Overview of feature reduction Original data reduced data Linear transformation d p Y d G T X p pd G : X Y G X T d 4 Overview of feature reduction • Unsupervised – Latent Semantic Indexing (LSI): truncated SVD – Principal Component Analysis (PCA) – Canonical Correlation Analysis (CCA) – Independent Component Analysis (ICA) • Supervised – Linear Discriminant Analysis (LDA) • Semi-supervised – Research topic 5 Outline of lecture • Overview of feature reduction • Canonical Correlation Analysis (CCA) • Nonlinear CCA using Kernels • Applications 6 Outline of lecture • Overview of feature reduction • Canonical Correlation Analysis (CCA) • Nonlinear CCA using Kernels • Applications 7 Canonical Correlation Analysis (CCA) • CCA was developed first by H. Hotelling. – H. Hotelling. Relations between two sets of variates. Biometrika, 28:321-377, 1936. • CCA measures the linear relationship between two multidimensional variables. • CCA finds two bases, one for each variable, that are optimal with respect to correlations. • Applications in economics, medical studies, bioinformatics and other areas. 8 Canonical Correlation Analysis (CCA) • Two multidimensional variables – Two different measurement on the same set of objects • Web images and associated text • Protein (or gene) sequences and related literature (text) • Protein sequence and corresponding gene expression • In classification: feature vector and class label – Two measurements on the same object are likely to be correlated. • May not be obvious on the original measurements. • Find the maximum correlation on transformed space. 9 Canonical Correlation Analysis (CCA) Correlation XT WX Transformed data measurement transformation 10 YT WY Problem definition • Find two sets of basis vectors, one for x and the other for y, such that the correlations between the projections of the variables onto these basis vectors are maximized. Given Compute two basis vectors wx and wy : y wy , y 11 Problem definition • Compute the two basis vectors so that the correlations of the projections onto these vectors are maximized. 12 Algebraic derivation of CCA The optimization problem is equivalent to C xy XY T , C xx XX T where C yx YX T , C yy YY T 13 Algebraic derivation of CCA • The Geometry of CCA a j X T wx , b j Y T wy 2 2 aj w XX wx 1, b j T x T wT YY T wy 1 y 2 2 2 a j bj aj bj 2corr (a j , b j ) Maximization of the correlation is equivalent to the 14 minimization of the distance. Algebraic derivation of CCA The optimization problem is equivalent to max s.t. 15 Algebraic derivation of CCA x y C yx C T xy 16 Algebraic derivation of CCA C xy wy C xx wx It can be rewritten as follows: C yx wx C yy wy 0 Cxy wx Cxx 0 wx C 0 wy 0 C w yx yy y Generalized eigenvalue problem Aw Bw 17 Algebraic derivation of CCA T max w C xy wy1 x1 s.t. wT1C xx wx1 1, wT1C yy wy1 1 x y Next consider the second set of basis vectors: max wT2C xy wy 2 x Additional s.t. w C xx wx 2 1, w C yy wy 2 1, T x2 T y2 constraint w C xx wx1 0, w C yy wy1 0. T x2 T y2 second eigenvector 18 Algebraic derivation of CCA • In general, the k-th basis vectors are given by the k–th eigenvector of • The two transformations are given by WX wx1 , wx 2 , wxp WY w y1 , wy 2 , wyp 19 Outline of lecture • Overview of feature reduction • Canonical Correlation Analysis (CCA) • Nonlinear CCA using Kernels • Applications 20 Nonlinear CCA using Kernels Key: rewrite the CCA formulation in terms of inner products. C xx XX T C xy XY T X XY Y T T T max , X XX X Y YY Y T T T T T T wx X w y Y Only inner products Appear 21 Nonlinear CCA using Kernels T X T XY T Y Recall that max , T X T XX T X T Y T YY T Y Apply the following nonlinear transformation on x and y: x : xi x ( xi ) y : yi y ( yi ) Define the following K x ( x1 , x2 ) x ( x1 ) x ( x2 ) Two kernels: K y ( y1 , y2 ) y ( y1 ) y ( y2 ) 22 Nonlinear CCA using Kernels Define the Lagrangian as follows: Take the derivatives and set to 0: 23 Nonlinear CCA using Kernels Two limitations: overfitting and singularity problem. Solution: apply regularization technique to both x and y. The solution is given by computing the following eigen-decomposition: 24 Outline of lecture • Overview of feature reduction • Canonical Correlation Analysis (CCA) • Nonlinear CCA using Kernels • Applications 25 Applications in bioinformatics • CCA can be extended to multiple views of the data – Multiple (larger than 2) data sources • Two different ways to combine different data sources – Multiple CCA • Consider all pairwise correlations – Integrated CCA • Divide into two disjoint sources 26 Applications in bioinformatics Source: Extraction of Correlated Gene Clusters from Multiple Genomic Data by Generalized Kernel Canonical Correlation Analysis. ISMB’03 http://cg.ensmp.fr/~vert/publi/ismb03/ismb03.pdf 27 Applications in bioinformatics • It is crucial to investigate the correlation which exists between multiple biological attributes, and eventually to use this correlation in order to extract biologically meaningful features from heterogeneous genomic data. • A correlation detected between multiple datasets is likely to be due to some hidden biological phenomenon. Moreover, by selecting the genes responsible for the correlation, one can expect to select groups of genes which play a special role in or are affected by the underlying biological phenomenon. 28 Next class • Topic – Manifold learning • Reading – A global geometric framework for nonlinear dimensionality reduction • Tenenbaum JB, de Silva V., and Langford JC • Science, 290: 2319–2323, 2000 – Nonlinear Dimensionality Reduction by Locally Linear Embedding • Roweis and Saul • Science, 2323-2326, 2000 29