# Canonical Correlation Analysis for Feature Reduction by pptfiles

VIEWS: 5 PAGES: 29

• pg 1
```									Canonical Correlation Analysis
for Feature Reduction

1
Outline of lecture

•   Overview of feature reduction
•   Canonical Correlation Analysis (CCA)
•   Nonlinear CCA using Kernels
•   Applications

2
Overview of feature reduction

• Feature reduction refers to the mapping of the original high-
dimensional data onto a lower-dimensional space.
– Criterion for feature reduction can be different based on different
problem settings.
• Unsupervised setting: reduce the information loss
• Supervised setting: maximize the class discrimination

• Given a set of data points of p variables             
x1 , x2 ,  , xn       
Compute the linear transformation (projection)

G   p d : x  y  G T x   d

3
Overview of feature reduction
Original data                                  reduced data

Linear transformation

d p                                          Y  d
G 
T

X p

pd
G               : X  Y  G X 
T          d
4
Overview of feature reduction

• Unsupervised
–   Latent Semantic Indexing (LSI): truncated SVD
–   Principal Component Analysis (PCA)
–   Canonical Correlation Analysis (CCA)
–   Independent Component Analysis (ICA)

• Supervised
– Linear Discriminant Analysis (LDA)

• Semi-supervised
– Research topic

5
Outline of lecture

•   Overview of feature reduction
•   Canonical Correlation Analysis (CCA)
•   Nonlinear CCA using Kernels
•   Applications

6
Outline of lecture

•   Overview of feature reduction
•   Canonical Correlation Analysis (CCA)
•   Nonlinear CCA using Kernels
•   Applications

7
Canonical Correlation Analysis (CCA)

• CCA was developed first by H. Hotelling.
– H. Hotelling. Relations between two sets of variates.
Biometrika, 28:321-377, 1936.

• CCA measures the linear relationship between two
multidimensional variables.

• CCA finds two bases, one for each variable, that are
optimal with respect to correlations.

• Applications in economics, medical studies,
bioinformatics and other areas.
8
Canonical Correlation Analysis (CCA)

• Two multidimensional variables
– Two different measurement on the same set of objects
•   Web images and associated text
•   Protein (or gene) sequences and related literature (text)
•   Protein sequence and corresponding gene expression
•   In classification: feature vector and class label

– Two measurements on the same object are likely to be correlated.
• May not be obvious on the original measurements.
• Find the maximum correlation on transformed space.

9
Canonical Correlation Analysis (CCA)

Correlation
XT
WX
Transformed data
measurement    transformation

10
YT            WY
Problem definition

• Find two sets of basis vectors, one for x and the other
for y, such that the correlations between the projections
of the variables onto these basis vectors are maximized.

Given

Compute two basis vectors    wx and wy :

y   wy , y 
11
Problem definition

• Compute the two basis vectors so that the correlations of
the projections onto these vectors are maximized.

12
Algebraic derivation of CCA

The optimization problem is equivalent to

C xy  XY T , C xx  XX T
where
C yx  YX T , C yy  YY T     13
Algebraic derivation of CCA

• The Geometry of CCA

a j  X T wx , b j  Y T wy
2                                  2
aj        w XX wx  1, b j
T
x
T
 wT YY T wy  1
y
2           2          2
a j  bj        aj         bj        2corr (a j , b j )

Maximization of the correlation is equivalent to the
14
minimization of the distance.
Algebraic derivation of CCA

The optimization problem is equivalent to

max
s.t.

15
Algebraic derivation of CCA

x   y  
C   yx   C   T
xy   

16
Algebraic derivation of CCA

C xy wy  C xx wx
It can be rewritten as follows:
C yx wx  C yy wy

 0 Cxy  wx    Cxx 0  wx 
                     
C   0  wy     0 C  w 
 yx                yy  y 

Generalized eigenvalue problem

Aw  Bw
17
Algebraic derivation of CCA
T
max w C xy wy1
x1

s.t. wT1C xx wx1  1, wT1C yy wy1  1
x                y

Next consider the second set of basis vectors:

max wT2C xy wy 2
x
s.t. w C xx wx 2  1, w C yy wy 2  1,
T
x2
T
y2                      constraint

w C xx wx1  0, w C yy wy1  0.
T
x2
T
y2

second eigenvector
18
Algebraic derivation of CCA

• In general, the k-th basis vectors are given by the k–th
eigenvector of

•   The two transformations are given by


WX  wx1 , wx 2 , wxp            
WY     w   y1   , wy 2 , wyp   
19
Outline of lecture

•   Overview of feature reduction
•   Canonical Correlation Analysis (CCA)
•   Nonlinear CCA using Kernels
•   Applications

20
Nonlinear CCA using Kernels

Key: rewrite the CCA formulation in terms of inner products.

C xx  XX T
C xy  XY T
 X XY Y
T       T    T
  max
 ,
 X XX X  Y YY Y
T   T        T            T   T   T
wx  X
w y  Y
Only inner
products
Appear

21
Nonlinear CCA using Kernels

 T X T XY T Y
Recall that       max
 ,
 T X T XX T X  T Y T YY T Y

Apply the following nonlinear transformation on x and y:
 x : xi   x ( xi )          y : yi   y ( yi )
Define the following          K x ( x1 , x2 )  x ( x1 )  x ( x2 )
Two kernels:
K y ( y1 , y2 )   y ( y1 )   y ( y2 )

22
Nonlinear CCA using Kernels

Define the Lagrangian as follows:

Take the derivatives and set to 0:

23
Nonlinear CCA using Kernels

Two limitations: overfitting and singularity problem.
Solution: apply regularization technique to both x and y.

The solution is given by computing the following
eigen-decomposition:

24
Outline of lecture

•   Overview of feature reduction
•   Canonical Correlation Analysis (CCA)
•   Nonlinear CCA using Kernels
•   Applications

25
Applications in bioinformatics

• CCA can be extended to multiple views of the data
– Multiple (larger than 2) data sources

• Two different ways to combine different data sources
– Multiple CCA
• Consider all pairwise correlations
– Integrated CCA
• Divide into two disjoint sources

26
Applications in bioinformatics

Source: Extraction of Correlated Gene Clusters from Multiple Genomic
Data by Generalized Kernel Canonical Correlation Analysis. ISMB’03
http://cg.ensmp.fr/~vert/publi/ismb03/ismb03.pdf        27
Applications in bioinformatics

• It is crucial to investigate the correlation which exists
between multiple biological attributes, and eventually to
use this correlation in order to extract biologically
meaningful features from heterogeneous genomic data.

• A correlation detected between multiple datasets is likely
to be due to some hidden biological phenomenon.
Moreover, by selecting the genes responsible for the
correlation, one can expect to select groups of genes
which play a special role in or are affected by the
underlying biological phenomenon.

28
Next class

• Topic
– Manifold learning

– A global geometric framework for nonlinear dimensionality reduction
• Tenenbaum JB, de Silva V., and Langford JC
• Science, 290: 2319–2323, 2000

– Nonlinear Dimensionality Reduction by Locally Linear Embedding
• Roweis and Saul
• Science, 2323-2326, 2000

29

```
To top