# Statistical perturbation theory for spectral clustering

Document Sample

```					Statistical perturbation theory
for spectral clustering

Harrachov, 2007

A. Spence and Z. Stoyanov
Plan of the Talk
A. Clustering (Brief overview).

B. Deterministic Perturbation Theory.

C. Statistical Perturbation Theory.
Graph Clustering

2
5
1
4
7

3           6
Graph Clustering

2
5
1
4
7

3           6
Graph Clustering + Perturbation

2
5
?
1
4
7

3           6
An Application

Gene Expression Data                       Clustering

1. Genes in same cluster behave similarly?

2. Genes in different clusters behave differently?

Issues:
• There are over 10 000 genes expressed in any one tissue;

• DNA arrays typically produce very noisy data.
Bi-partite Graphs

1

1
2

2

3

3

4
Matrix Form
A Real Data Matrix (Leukemia)
Spectral Clustering: General Idea

Discrete Optimisation Problem
Exact - Impractical
(NP - Hard)

Approximation

Real Optimisation Problem
Heuristic - Practical
(Tractable)
Discrete Optimisation  SVD

Active    Inactive

Solution:   Singular
Inactive   Active                 Value
Decomposition of Wscaled
Clustering Algorithm: Summary

ACTIVE INACTIVE

INACTIVE ACTIVE
Literature
Types of Graph Matrices
How we Cluster
Leukemia Data
Clustered Leukemia Data
Inaccuracies in the Data
(Perturbation Theory)
Perturbation Theory
(Deterministic Noise)
Deterministic Perturbation
(Symmetric Matrix)
Linear Solve
Taylor Expansions
Rectangular Case  Symmetric
Random Perturbations
(plan)
• The Model

• Issues with the Theory

• A Possible Solution via Simulations?

• Experiments
The Model
2
5
1
4
7

3           6
Difficulties with Random Matrix
Theory (RMT)
Deterministic Perturbation 
Stochastic Perturbation
(simple eigenvector)
Deterministic Perturbation 
Stochastic Perturbation
(simple eigenvalues)
PP Plot -Test for Normality
(Largest eigenvalue of a Symmetric Matrix)
Simulated Random Perturbation
(Largest eigenvalue of a Symmetric Matrix)
Deterministic Perturbation 
Stochastic Perturbation
(simple eigenvectors)
Results for Laplacian Matrices
Functional of the Eigenvector
Results for hTv2
PP Plot of hTv’(0) - Test for Normality
(h = ej)
Histogram of hTv’(0) - Simulations
(h = ej)
PP Plot of Simulated v[j]()
(Distribution close to Normal)
Histogram of Simulated v[j]()
(Distribution close to Normal)
Extension to the Rectangular Case
Probability of “Wrong Clustering”
Issues with Numerics
Efficient Simulations
Solution via Simulations?
Solution via Simulations?
(Algorithm)
Comparing: Direct Calculation Vs.
Repeated Linear Solve

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 82 posted: 3/29/2008 language: English pages: 45
How are you planning on using Docstoc?