# Independent Component Analysis - PowerPoint

Document Sample

```					Independent Component
Analysis

CMPUT 466/551
Nilanjan Ray
The Origin of ICA: Factor Analysis

• Multivariate data are often thought to be indirect measurements
arising from some underlying sources, which cannot be directly
measured/observed.
• Examples
– Educational and psychological tests use the answers to questionnaires
to measure the underlying intelligence and other mental abilities of
subjects
– EEG brain scans measure the neuronal activity in various parts of the
brain indirectly via electromagnetic signals recorded at sensors placed
at various positions on the head.
• Factor analysis is a classical technique developed in statistical
literature that aims at identifying these latent sources.
• Independent component analysis (ICA) is a kind of factor analysis
that can uniquely identify the latent variables.
Latent Variables and Factor Analysis
X 1  a11S1  a12 S 2    a1 p S p
X 2  a21S1  a22 S 2    a2 p S p
Latent variable model:

or,        X  AS
X p  a p1S1  a p 2 S 2    a pp S p

Observed variable
Latent components

Mixing matrix

Factor analysis attempts to find out both the mixing coefficients and the
latent components given some instances of observed variables
Latent Variables and Factor Analysis…
Typically we require the latent variables to have unit variance and to be uncorrelated.
Thus, in the following model, cov(S) = I.

X  AS

This representation has an ambiguity. Consider, for example an orthogonal matrix R:

X  ART RS  A*S *

cov(S * )  R cov(S ) RT  RIR T  RR T  I

So,   X  A*S * is also a factor model with unit variance, uncorrelated latent variables.

Classical factor analysis cannot remove this ambiguity; ICA can remove this ambiguity.
Classical Factor Analysis

X 1  a11S1  a12 S 2    a1q S p  1
Model:     X 2  a21S1  a22 S 2    a2 q S p   2

X p  a p1S1  a p 2 S 2    a pq S p   p

’s are zero mean, uncorrelated Gaussian noise.
q < p, i.e., the number of underlying latent factor is assumed less than
the number of observed components.
Diagonal matrix

The covariance matrix takes this form:               AAT  D

Maximum likelihood estimation is used to estimate A.

However, still the previous problem of ambiguity remains here too…
Independent Component Analysis
N
1
• Step 1: Center data:       x
N
x ; x
i 1
i   i    (x i  x)
PCA

• Step 2: Whiten data: compute SVD of the centered data
matrix
[x1  x N ]  UDV T ; x i  UD 1/ 2U T x i

– After whitening in the factor model, X  AS the covariance of
x, cov(x) = I, and A become orthogonal

• Step 3: Find out orthogonal A and unit variance, non-
Gaussian and independent S
Example: PCA and ICA
 x1   a11 a12   s1 
Model:     x   a        
 2   21 a22   s2 

Blind source separation (cocktail party problem)
PCA vs. ICA
PCA:                          ICA:
1. Find projections to        1. Find “interesting”
minimize                      projections
reconstruction error         – Projected data look as
•   Variance of projected       non-Gaussian,
data is as large as         independent as possible
possible                2. Higher-order statistics
2. 2nd-order statistics          needed to measure
needed (cov(x))               degree of
independence
Computing ICA
Model:     X  AS

Step 3: Find out orthogonal A and unit variance, non-Gaussian and independent S.

The computational approaches are mostly based on information theoretic criterion.

• Kullback-Leibler (KL) divergence

• Negentropy

Another different approach emerged recently is called “Product Density Approach”
ICA: KL Divergence Criterion
• x is zero-mean and whitened
• KL divergence measures “distance” between
two probability densities
– Find A such that KL(.) is minimized:

Joint density

Independent density

H is differential entropy: H ( y )      f ( y ) log( f ( y ))dy   E[log( f ( y ))]
ICA: KL Divergence Criterion…
• Theorem for random variable transformation says:

So,

Hence,

Minimize with respect to orthogonal A
ICA: Negentropy Criterion
• Differential entropy H(.) is not invariant to scaling
of variable
• Negentropy is a scale-normalized version of H(.):

• Negentropy measures the departure of a r.v. s
from a Gaussian r.v. with same variance
• Optimization criterion:
ICA: Negentropy Criterion…
• Approximate the negentropy from data by:

• FastICA
(http://www.cis.hut.fi/projects/ica/fastica/) is
based on negentropy. Free software in Matlab,
C++, Python…
ICA Filter Bank for Image Processing
An image patch is modeled as a weighted sum of basis images (basis functions):

Image patch                                     Basis functions (a.k.a. ICA filter bank)

x  [a1 a 2  a N ]s  As

s  A1x  AT x                           Rows of AT are filters

Columns of A are filters
Filter responses

Jenssen and Eltoft, “ICA filter bank for segmentation of textured images,” 4th International symposium on ICA and BSS,
Nara, Japan, 2003
Texture and ICA Filter Bank

Training textures                             12x12 ICA basis functions or ICA filters

Jenssen and Eltoft, “ICA filter bank for segmentation of textured images,” 4th International symposium on ICA and BSS,
Nara, Japan, 2003
Segmentation By ICA FB
ICA Filter Bank                          I1, I2,…, In              These
Image, I                                                                                       are filter
With n filters                                                     responses

Segmented image, C                             Clustering

Above is an unsupervised setting.

Segmentation (i.e., classification in this context) can also be performed by
a supervised method on the output feature images I1, I2 , …, In.

A texture image                              Segmentation
Jenssen and Eltoft, “ICA filter bank for segmentation of textured images,” 4th International symposium on ICA and BSS,
Nara, Japan, 2003
On PCA and ICA

• PCA & ICA differ in choosing projection
directions:
– Different principle: least-square (PCA),
independence (ICA)
• For data compression, PCA would be a
good choice
• For discovering structures of data, ICA
would be a reasonable choice

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 118 posted: 8/28/2010 language: English pages: 17
How are you planning on using Docstoc?