Docstoc

Canonical Correlation Analysis for Feature Reduction

Document Sample
Canonical Correlation Analysis for Feature Reduction Powered By Docstoc
					Canonical Correlation Analysis
   for Feature Reduction




                                 1
                   Outline of lecture

•   Overview of feature reduction
•   Canonical Correlation Analysis (CCA)
•   Nonlinear CCA using Kernels
•   Applications




                                           2
             Overview of feature reduction

• Feature reduction refers to the mapping of the original high-
  dimensional data onto a lower-dimensional space.
    – Criterion for feature reduction can be different based on different
      problem settings.
        • Unsupervised setting: reduce the information loss
        • Supervised setting: maximize the class discrimination


• Given a set of data points of p variables             
                                                x1 , x2 ,  , xn       
  Compute the linear transformation (projection)


             G   p d : x  y  G T x   d


                                                                            3
            Overview of feature reduction
Original data                                  reduced data




                                 Linear transformation



                d p                                          Y  d
    G 
       T



                             X p

                       pd
           G               : X  Y  G X 
                                           T          d
                                                                       4
           Overview of feature reduction

• Unsupervised
  –   Latent Semantic Indexing (LSI): truncated SVD
  –   Principal Component Analysis (PCA)
  –   Canonical Correlation Analysis (CCA)
  –   Independent Component Analysis (ICA)


• Supervised
  – Linear Discriminant Analysis (LDA)


• Semi-supervised
  – Research topic

                                                      5
                   Outline of lecture

•   Overview of feature reduction
•   Canonical Correlation Analysis (CCA)
•   Nonlinear CCA using Kernels
•   Applications




                                           6
                   Outline of lecture

•   Overview of feature reduction
•   Canonical Correlation Analysis (CCA)
•   Nonlinear CCA using Kernels
•   Applications




                                           7
    Canonical Correlation Analysis (CCA)

• CCA was developed first by H. Hotelling.
   – H. Hotelling. Relations between two sets of variates.
     Biometrika, 28:321-377, 1936.


• CCA measures the linear relationship between two
  multidimensional variables.

• CCA finds two bases, one for each variable, that are
  optimal with respect to correlations.

• Applications in economics, medical studies,
  bioinformatics and other areas.
                                                             8
      Canonical Correlation Analysis (CCA)

• Two multidimensional variables
   – Two different measurement on the same set of objects
      •   Web images and associated text
      •   Protein (or gene) sequences and related literature (text)
      •   Protein sequence and corresponding gene expression
      •   In classification: feature vector and class label


   – Two measurements on the same object are likely to be correlated.
      • May not be obvious on the original measurements.
      • Find the maximum correlation on transformed space.




                                                                        9
Canonical Correlation Analysis (CCA)




                                                   Correlation
 XT
              WX
                                Transformed data
measurement    transformation




                                                     10
YT            WY
                   Problem definition

• Find two sets of basis vectors, one for x and the other
  for y, such that the correlations between the projections
  of the variables onto these basis vectors are maximized.

   Given

   Compute two basis vectors    wx and wy :


  y   wy , y 
                                                          11
                 Problem definition

• Compute the two basis vectors so that the correlations of
  the projections onto these vectors are maximized.




                                                         12
             Algebraic derivation of CCA




The optimization problem is equivalent to




              C xy  XY T , C xx  XX T
     where
              C yx  YX T , C yy  YY T     13
               Algebraic derivation of CCA

• The Geometry of CCA




     a j  X T wx , b j  Y T wy
           2                                  2
      aj        w XX wx  1, b j
                     T
                     x
                         T
                                                   wT YY T wy  1
                                                     y
                 2           2          2
      a j  bj        aj         bj        2corr (a j , b j )

   Maximization of the correlation is equivalent to the
                                                                     14
   minimization of the distance.
              Algebraic derivation of CCA

The optimization problem is equivalent to

                 max
                  s.t.




                                            15
               Algebraic derivation of CCA




                        x   y  
C   yx   C   T
               xy   


                                             16
         Algebraic derivation of CCA

C xy wy  C xx wx
                     It can be rewritten as follows:
C yx wx  C yy wy

         0 Cxy  wx    Cxx 0  wx 
                             
        C   0  wy     0 C  w 
         yx                yy  y 


          Generalized eigenvalue problem


                 Aw  Bw
                                                       17
                Algebraic derivation of CCA
           T
 max w C xy wy1
           x1

 s.t. wT1C xx wx1  1, wT1C yy wy1  1
       x                y


Next consider the second set of basis vectors:

max wT2C xy wy 2
     x
                                                 Additional
s.t. w C xx wx 2  1, w C yy wy 2  1,
      T
      x2
                         T
                         y2                      constraint

    w C xx wx1  0, w C yy wy1  0.
      T
      x2
                         T
                         y2


                                         second eigenvector
                                                         18
           Algebraic derivation of CCA

• In general, the k-th basis vectors are given by the k–th
  eigenvector of




•   The two transformations are given by


                 
        WX  wx1 , wx 2 , wxp            
        WY     w   y1   , wy 2 , wyp   
                                                             19
                   Outline of lecture

•   Overview of feature reduction
•   Canonical Correlation Analysis (CCA)
•   Nonlinear CCA using Kernels
•   Applications




                                           20
              Nonlinear CCA using Kernels

 Key: rewrite the CCA formulation in terms of inner products.

C xx  XX T
C xy  XY T
                                       X XY Y
                                       T       T    T
                    max
                       ,
                              X XX X  Y YY Y
                              T   T        T            T   T   T
wx  X
w y  Y
                                               Only inner
                                                products
                                                 Appear

                                                                    21
        Nonlinear CCA using Kernels

                                            T X T XY T Y
 Recall that       max
                        ,
                                  T X T XX T X  T Y T YY T Y

Apply the following nonlinear transformation on x and y:
         x : xi   x ( xi )          y : yi   y ( yi )
Define the following          K x ( x1 , x2 )  x ( x1 )  x ( x2 )
Two kernels:
                              K y ( y1 , y2 )   y ( y1 )   y ( y2 )



                                                                          22
            Nonlinear CCA using Kernels

Define the Lagrangian as follows:




Take the derivatives and set to 0:




                                          23
              Nonlinear CCA using Kernels

Two limitations: overfitting and singularity problem.
Solution: apply regularization technique to both x and y.




The solution is given by computing the following
eigen-decomposition:

                                                            24
                   Outline of lecture

•   Overview of feature reduction
•   Canonical Correlation Analysis (CCA)
•   Nonlinear CCA using Kernels
•   Applications




                                           25
           Applications in bioinformatics

• CCA can be extended to multiple views of the data
   – Multiple (larger than 2) data sources



• Two different ways to combine different data sources
   – Multiple CCA
       • Consider all pairwise correlations
   – Integrated CCA
       • Divide into two disjoint sources




                                                         26
             Applications in bioinformatics




Source: Extraction of Correlated Gene Clusters from Multiple Genomic
Data by Generalized Kernel Canonical Correlation Analysis. ISMB’03
               http://cg.ensmp.fr/~vert/publi/ismb03/ismb03.pdf        27
          Applications in bioinformatics

• It is crucial to investigate the correlation which exists
  between multiple biological attributes, and eventually to
  use this correlation in order to extract biologically
  meaningful features from heterogeneous genomic data.

• A correlation detected between multiple datasets is likely
  to be due to some hidden biological phenomenon.
  Moreover, by selecting the genes responsible for the
  correlation, one can expect to select groups of genes
  which play a special role in or are affected by the
  underlying biological phenomenon.

                                                              28
                             Next class

• Topic
   – Manifold learning



• Reading
   – A global geometric framework for nonlinear dimensionality reduction
       • Tenenbaum JB, de Silva V., and Langford JC
       • Science, 290: 2319–2323, 2000



   – Nonlinear Dimensionality Reduction by Locally Linear Embedding
       • Roweis and Saul
       • Science, 2323-2326, 2000


                                                                           29

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:5
posted:2/14/2012
language:
pages:29