# TOPOLOGY-BASED FUZZY CLUSTERING ALGORITHM by she20208

VIEWS: 30 PAGES: 69

• pg 1
```									TOPOLOGY-BASED FUZZY
CLUSTERING ALGORITHM
by
Abhishek Jaiantilal

Objective
• Primary Goal:
To design an algorithm that can find clusters
of any shape in an unsupervised manner in a
linearized time.
• Secondary Goals:
they are encountered.
 Needs to be adaptive to take Supervised Expert
feedback to increase the rate of accuracy.
Parts of Presentation
•   Introduction to Clustering/Classification
•   Background Methods
•   Topology-Based Fuzzy Clustering (TFC)
•   Results and Conclusion
Introduction
• Clustering/Classification
• Grouping of samples
• Fields of Interests -Multiple
• Pattern Recognition
• Linguistics
• Etc.
Introduction
• Contemporary Clustering/Classification
Methods
• C-Means Family.
• Neural Networks (Radial Basis Function,
Multi-Layered Perceptrons, etc).
• Hierarchical Clustering.
Introduction
• Analysis of C-Means Family
k    n          2

J   xi( j )  c j                        Optimization Function
j 1 i 1

1; Dik  Dij, j  i 
uik                       i, k
0; Otherwise        
Hard C-Means:
x
n

Iterated till vik                   u      ik   xk
xk  X i
k

vik    k 1
                   vi i
stops moving                           n

u
ni
ik
k 1
Introduction
• Most C-Means based algorithms use:
• Distance, is the most varied measure.
• New Algorithms like
• Gath-Geva – Includes Priori and Posteriori
Probability.
• Includes Probability (Gustaffson-Kessel ), Fuzzy
Memberships (Fuzzy C-Means).
Introduction
• Needs the number of Clusters Priori.
• Dependence on Initial Cluster centers (Very
sensitive in case of better performing algorithms
like Gath-Geva) causes non-unique solutions.
• Will Converge but the number of iterations
required is not fixed!
• Exponential increase of time factor with linear
increase in samples.
• No way to add supervised information.
• =>Cannot learn new patterns.
Introduction
• Neural Networks:
• Radial Basis Function Neural Networks
(RBFNN)
• Multi Layered Perceptron (MLP)
• Requires Training (normally supervised)
• Requires Learning (Computationally
intensive)
Introduction
• Example: RBFNN
Multiple output units
can be used for
multiple
clusters/pattern
u1        

u2       
.
X
.
.
.
.
.
Note: is the variance of the radial
un       basis function at Layer-1
Introduction
• Requires extensive Supervised training.
• Learning would cause modification of the
weights and relearning.
• Its very rigid as newer patterns cannot be
• =>Cannot learn new patterns.
Background Methods
• Growing Neural Gas (GNG)
• Topology Learning Method
• Idea is
• Take 2 nodes representing vector positions.
• With each new sample try to find the nearest node and
the 2nd Nearest Node and create a link between them if
not existing.
• Make the winner node and its neighbors learn on basis of
the euclidean distance and learning rate eb and en
respectively.
• Insert a new node after every „n‟ iterations.
Background Methods
• Growing Neural Gas (GNG) -Growth

From [2]
Background Methods
• Growing Neural Gas (GNG)
• If the number of nodes generated is n and
the dimension of the dataset is „d‟.
• Variables Involved
Node position „vector‟ matrix W (nxd)
Edge Matrix C (nxn)
Error (M.S.E.)Matrix E (nx1)
Age Matrix Age (nxn)
Background Methods
• Comparison of GNG with Self Organization
 Fixed rate of learning in GNG.
 Growth present in GNG.
 Winner and Neighbors take all strategy!
 Edge list helps in finding nearest
neighbors.
Background Methods
• Growing Neural Gas (GNG) –Growth
• Learning Rate is constant.
• New Edges are created if no edge exist between
the winner and the 2nd winner.
• Old edges are deleted based on age factor.
 Age factor is initially set to „0‟ on Edge creation.
 Age factor incremented for the winner and the
neighbors.
 Age factor can be directly incorporated in to C.
Background Methods
• Dependence on the Parameters(7) for
learning, edge deletion and node creation.
• Just generates the topology, cannot be
used directly for clustering.
• Used mainly for viewing data-topology in
3D or using Supervised GNG.
Background Methods
Learning:
• Supervised GNG                                            Difference between
expected and
Multiple output units                calculated, Rather than
can be used for multiple
clusters/pattern
euclidean distance



X                                        

Using GNG
Background Methods
• What is missing in Supervised-GNG?
 Unsupervised Clustering and Growth.
 Number of Clusters have to be decided
Priori.
 Relearning is still required.
• Single Pass.
• Resistance to Noise.
• Can learn patterns of all shapes and
sizes.
• Relearning is very much local.
Topology-Based Fuzzy Clustering
(TFC)
• Proposed Idea:
• Based partly on GNG
• Adaptive Topology aid by an Expert
TFC
Need more in GNG!
 Difference in Learning nodes between
nodes.

 Noting what nodes belong to which
cluster.

 Incorporating Topology to ascertain the
distribution of data in a cluster.
TFC
• Difference in Learning between nodes
• Learning should depend on the data
distribution and the cluster probability in
the area.
• Cannot use a diminishing learning rate as
that would cause the network to NOT-
LEARN new patterns.
TFC
• Noting what nodes belong to which
cluster.
• Placeholders for the Clusters.
• Incorporating Topology to ascertain the
distribution of data in a cluster.
• Cluster distributed in form of nodes and
edges.
TFC
• Placeholder for the Clusters
Symbolic
representation
A    ØA:C1                        saying that the
Level-1 node lies in
B   ØB:C1                       a particular Level-2
C1
Node (Cluster)
ØC:C1
C

Edge as     D    ØD:C1
connected                 ØY:C2   C2
by GNG      Y
ØX:C2
X

Level 1                Level 2
TFC
• Finding the Centers of the Clusters
-Proposed
“Reference and Fuzzy Finding Algorithm”

-Finds the center of the Cluster

-How? Use the Topology
TFC
• What is the center in the following
figure?

Should we use vector to find the center?
Should we use Distance to find the center?
TFC
• Works on the same theory using the link structure to
find the center. So for each node do the following:
k

         n   Rn
     n 1
k           k

   R
n 1
n
n 1
n

Let nodes be represented by N1, N2, N3…Nm
Each having corresponding reference values be Ǿ1, Ǿ2, Ǿ3… Ǿm
and Radius (Average distance between node and neighbors) R1, R2,
R3,…Rn.
TFC
• Iterate till the values converge
k

         n   Rn
     n 1
k           k

   R
n 1
n
n 1
n

• Independent of Vectors.
• Iterations are cumulative.
TFC
Nodes represented by Blue Diamonds,
Edges are in Red
Average Radius is drawn in Blue Circles
Values shown are the Fuzzy membership

Iteration   1:   referenceM =[   0.3333    0.3333   0.3333   0.3333] (Initial value)
Iteration   2:   referenceM =[   0.5000    0.2500   0.3333   0.3333]
Iteration   3:   referenceM =[   0.5141    0.2539   0.3349   0.3349]
Iteration   4:   referenceM =[   0.5144    0.2540   0.3348   0.3348] (Converged value)
fuzzyM =[   0.4937    1.0000   0.7586   0.7586]
TFC
•    Fuzzy Membership is found as

min(ReferenceValue)
FuzzyMembership(i)=                       , 1  i  size(ReferenceValue)
ReferenceValue(i)

•    As the Reference value of the Center is the lowest, normalize all other
values using the center value.
•    The center will now have a membership of 1.

referenceM =[ 0.5144        0.2540    0.3348    0.3348] (Converged value)
fuzzyM =[ 0.4937        1.0000    0.7586    0.7586]
TFC
• Effects on Topology

= Path of Convergence
Path of Convergence (Internal).

Iteration 1: referenceM =[ 0.3333        0.3333      0.3333      0.3333] (Initial value)
Iteration 2: referenceM =[ 0.5000        0.2500      0.3333      0.3333]
TFC
• Effects on Topology

= Path of Convergence
Path of Convergence (External).

Iteration 2: referenceM =[ 0.5000        0.2500     0.3333   0.3333]
Iteration 3: referenceM =[ 0.5141        0.2539     0.3349   0.3349]
Iteration 4: referenceM =[ 0.5144        0.2540     0.3348   0.3348] (Converged value)
TFC
• Proof of “Reference and Fuzzy Finding Algorithm”
• Standard Power Iterations: Given a unit 2-norm q(0)є Rn, power
method produces a sequence of vectors q(k) as follows:

for k=1,2,….                                If q(0) is not “deficient”
A‟s eigen-value of
z ( k )  Aq ( k 1)                  maximum modulus is
unique, the q(k) converges
q(k )  z (k ) / z (k )               to an eigenvector
2
T
   (k )
  q ( k )  Aq ( k )
         
end

Deficiency occurs only when the exterior nodes are 0.
TFC 2-Nodes 2-Dimensional
TFC -Examples
TFC
• No!
• Trace
referM = 0.4953 0.3335 0.5045
fuzzyM = 0.6734 1.0000 0.6612
TFC –Examples
• Effects of Edges on the Reference and Fuzzy Algorithm
TFC
Discussed Algorithm will help
A                                     finding the fuzzy memberships
ØB:C1
B                    C1
ØC:C1
C

Edge as     D    ØD:C1
connected                 ØY:C2   C2
by GNG
Y
ØX:C2
X

Level 1                Level 2

System Layout with Fuzzy Membership.
TFC
• So how is the Fuzzy Area decided?

FMem = 0.8        FMem = Fuzzy membership

FMem = 1.0

FMem = 0.75

FMem = 0.65

Overlapping Fuzzy Hyperspheres.
TFC
• Projection of Hyperspheres on Axes
FMem = Fuzzy membership

FMem = 1.0

FMem = 0.75

FMem = 0.65

1
FMem=1.0    FMem=0.75
FMem=0.6
5

Fuzzy Membership Projection

Fuzzy Membership Projection with Maximal Membership.
TFC

Hyperspheres in a 3 Dimensional topology.
TFC
Cluster
represented
by black
outliner

Cluster
represented by
gray outliner

Node A
Node B
Y
X
X,Y represents Data Sample

Overlapping Cluster Regions.
TFC
FMem = Fuzzy membership

FMem = 1.0

FMem = 0.75

FMem = 0.65

Overlapping regions
in which maximum
membership is taken

1

Note I am decreasing
the fuzzy membership
as a function of
Fuzzy Membership Projection         Radius and Error

Figure 4.28 Fuzzy Membership Projection with Smoothed Out
Error.
Input: X, Cluster Matrix
Output: UNKNOWN /ClusterIndex

TFC –Test()                        Test ( ) function

Find all Clusters that have the distance between X and
Level-1 nodes less than radius + √error associated with
the node and name them „XClusters‟

Size(XClusters)=
0

Yes

No

Mark as UNKNOWN                          Find the fuzzy memberships of „X‟ into each of the
XClusters, and name them as XFuzzyMemberships.
Also use the smoothing based on error in region out

Find the Cluster having the maximum
membership in XFuzzyMemberships, and
Return Test( )
name it as ClusterIndex
TFC-Results (Iris)

Iris Virginica
Iris Versicolor
Iris Setosa

Scatter Plot for the IRIS Dataset.
TFC-Results (Iris)
Figure shows position
Cluster
separation line        of the node in a 3D
plot.

Note that the Cluster
node for Versicolor is
not at the right position
Not at the    as desired.
right place

To be taken figuratively
not mathematically
GNG, Node-Data Display in 3-Dimensional
Topology with the Discriminant line.
TFC-Results (Iris)
Also notice much
more extensive
growth in this
area

Now at the
right place

Node Positions according to TPC.                 By GNG
TFC-Results (Iris)
Misclassified data for IRIS dataset out of 150

FCM                        Gath-Geva                          TFC

Classes
  10, eb  0.04, en  0.0006,
m=1.5   m=2     m=3.5      m=1.5     m=2       m=3.5
  0.5,   0.0005, amax  12

Iris Setosa             0      0         0         0         0         0                     0

Iris Versicolor         3      3         3         3         3         4                     10

Iris Virginica         12      13       11         12       12         9                     1

Total Misclassified    15      16       14         15       15        13                     11
•   Insertion of the misclassified data as a node.
•   Done by adding a misclassified data point as a node in the cluster.

Virginica
Newly
Virginicainserted
node

Misclassified Data points by
Setosa                                          ATFC denoted in blue

Versicolor
ATFC-Results (Iris)
TFC                           ATFC-SSAA(for Iris Virginica only)

Classes

Misclassified         Classification Rate   Misclassified        Classification Rate

Iris Setosa                0                      100%                 0                     100%

Iris Versicolor            10                      80%                 4                      92%

Iris Virginica             1                       98%                 1                      98%

Total Misclassified        11                     92.7%                5                     96.7%
ATFC
• Expert Ability to split/merge clusters
2                                                2
6                    1                           6
1
7                                                7
3                                               3
5                                                5
Cluster 1                               Cluster 2

4                                                4
Pictorial View in 2 Dimension of a Topology
Pictorial View in 2 Dimension of a Topology     (After Expert‟s Input).
(Before Expert‟s Input).
ATFC
ATFC –Simulated Dataset

Size
Width      Min    Max    Start   End
Circle        (0,0)     13        1.5      11.5   14.5    0      2pi     10000

Semicircle      (0,0)     9          1        8     10      0      pi       5000

Arc         (0,0)     5          1        4      6      4       5       3000
ATFC –Simulated Dataset

Node Positions   Test Data Points
ATFC –Simulated Dataset
• Expert‟s modification of Cluster set

Broken

Simulated Dataset Topological Node Positions with Breakage.
ATFC –Simulated Dataset
Calculated Classes from ATFC
Class      Real Class
Total Classification      Misclassified       Not Classified

Circle        10                10                     0                    0

Semicircle      10                12                     2                    0

Arc          10                 8                     0                    2
NASA Dataset
•   Used by Dai[3].
•   Consists of a signal from the fuel tank.
•   Signal used for event detection.
•   Clustering/ Classification
Valve Close
is not that important as
finding events.
Valve Open

MPRE301P Signal (50Hz), Events of Interests.
NASA Dataset

Wavelet Tree

Wavelet Decomposition
NASA Dataset
By Unsupervised TFC
•No Events detected.
•Blue diamonds are
unclassified events.
NASA Dataset
Supervised
X
Insertion
X = Data point
-- = Virtual Edge to the Nearest
Node          Insertion of a Data point as a Node(1).

X
X = Data point
-- = Virtual Edge to the Nearest
Node           Insertion of a Data point as a Node(2).
NASA Dataset
Supervised Insertion - Result

X

X = Data point
-- = Virtual Edge to the Nearest Node
Insertion of a Data point as a Node(3).
NASA Dataset
By Supervised ATFC

Expert points out events
and subsequently finding
new events (shown in red)
NASA Dataset
2 Events marked by an Expert

•Default parameters don‟t
work that well.
•Too many events are
detected.
•Need to modify the size of
the clusters.
NASA Dataset
Nodes Represented
1         by Circles and
1       0.7
Circles

.7               .6
1              1

Overlapping Clusters with high memberships.
Conclusions(1)
Choice of ATFC in Different Circumstances
Factors                                          Choice

1             Execution time                                Yes, very good

Dataset Size-Large
2                                                           Yes, very good
(Num. of Clusters<<<Size of dataset)

Dataset Size-Small
3                                               Yes, but better algorithms are available
(Num. of Clusters < Size of dataset)

4             Topology Shape                                Yes, very good

5
Misclassification                                 be required

Yes, but for medium to high accuracy. It’s not
6        Accuracy of Classification
always the best performing.

7    Adaptiveness of Learning new data                      Yes, very good
Conclusions(2)
Its advantages can be cited to:
1.TFC has a linearized time for calculating the clusters
in an unsupervised manner.

2.TFC has the advantages of being able to learn
different cluster shapes through the use of Topology.

3.Through ATFC, the algorithm becomes adaptive to
data by allowing expert‟s feedback on the type of the
data encountered.
References
•   Fritzke, B. (1996). Automatic construction of radial basis function
networks with the growing neural gas model and its relevance for
fuzzy logic. Proceedings of the 1996 ACM symposium on Applied
Computing, 624-627.
•   Golub, Van Loan. Matrix Algorithms
•   Dai, S. & Dhawan, A.P. (2004). Adaptive learning for event modeling
and pattern classification. PhD Dissertation at NJIT, njit-etd2004-
023.
Thank You!

```
To top