VIEWS: 30 PAGES: 69 CATEGORY: Lifestyle POSTED ON: 2/8/2010
TOPOLOGY-BASED FUZZY CLUSTERING ALGORITHM by Abhishek Jaiantilal Advisor: Dr.Atam P. Dhawan Objective • Primary Goal: To design an algorithm that can find clusters of any shape in an unsupervised manner in a linearized time. • Secondary Goals: Needs to be adaptive and learn more clusters as they are encountered. Needs to be adaptive to take Supervised Expert feedback to increase the rate of accuracy. Parts of Presentation • Introduction to Clustering/Classification • Background Methods • Topology-Based Fuzzy Clustering (TFC) • Adaptive TFC • Results and Conclusion Introduction • Clustering/Classification • Grouping of samples • Fields of Interests -Multiple • Pattern Recognition • Linguistics • Etc. Introduction • Contemporary Clustering/Classification Methods • C-Means Family. • Neural Networks (Radial Basis Function, Multi-Layered Perceptrons, etc). • Hierarchical Clustering. Introduction • Analysis of C-Means Family k n 2 J xi( j ) c j Optimization Function j 1 i 1 1; Dik Dij, j i uik i, k 0; Otherwise Hard C-Means: x n Iterated till vik u ik xk xk X i k vik k 1 vi i stops moving n u ni ik k 1 Introduction • Most C-Means based algorithms use: • Distance, is the most varied measure. • New Algorithms like • Gath-Geva – Includes Priori and Posteriori Probability. • Includes Probability (Gustaffson-Kessel ), Fuzzy Memberships (Fuzzy C-Means). Introduction • Comments on C-Means: • Needs the number of Clusters Priori. • Dependence on Initial Cluster centers (Very sensitive in case of better performing algorithms like Gath-Geva) causes non-unique solutions. • Will Converge but the number of iterations required is not fixed! • Exponential increase of time factor with linear increase in samples. • No way to add supervised information. • =>Cannot learn new patterns. Introduction • Neural Networks: • Radial Basis Function Neural Networks (RBFNN) • Multi Layered Perceptron (MLP) • Requires Training (normally supervised) • Requires Learning (Computationally intensive) Introduction • Example: RBFNN Multiple output units can be used for multiple clusters/pattern u1 u2 . X . . . . . Note: is the variance of the radial un basis function at Layer-1 Introduction • Disadvantages: • Requires extensive Supervised training. • Learning would cause modification of the weights and relearning. • Its very rigid as newer patterns cannot be added to an already learnt network. • =>Cannot learn new patterns. Background Methods • Growing Neural Gas (GNG) • Topology Learning Method • Idea is • Take 2 nodes representing vector positions. • With each new sample try to find the nearest node and the 2nd Nearest Node and create a link between them if not existing. • Make the winner node and its neighbors learn on basis of the euclidean distance and learning rate eb and en respectively. • Insert a new node after every „n‟ iterations. Background Methods • Growing Neural Gas (GNG) -Growth From [2] Background Methods • Growing Neural Gas (GNG) • If the number of nodes generated is n and the dimension of the dataset is „d‟. • Variables Involved Node position „vector‟ matrix W (nxd) Edge Matrix C (nxn) Error (M.S.E.)Matrix E (nx1) Age Matrix Age (nxn) Background Methods • Comparison of GNG with Self Organization Fixed rate of learning in GNG. Growth present in GNG. Winner and Neighbors take all strategy! Edge list helps in finding nearest neighbors. Background Methods • Growing Neural Gas (GNG) –Growth • Learning Rate is constant. • New Edges are created if no edge exist between the winner and the 2nd winner. • Old edges are deleted based on age factor. Age factor is initially set to „0‟ on Edge creation. Age factor incremented for the winner and the neighbors. Age factor can be directly incorporated in to C. Background Methods • GNG – Comments: • Dependence on the Parameters(7) for learning, edge deletion and node creation. • Just generates the topology, cannot be used directly for clustering. • Used mainly for viewing data-topology in 3D or using Supervised GNG. Background Methods Learning: • Supervised GNG Difference between expected and Multiple output units calculated, Rather than can be used for multiple clusters/pattern euclidean distance X Using GNG Background Methods • What is missing in Supervised-GNG? Unsupervised Clustering and Growth. Number of Clusters have to be decided Priori. Relearning is still required. Advantages of GNG • Single Pass. • Resistance to Noise. • Can learn patterns of all shapes and sizes. • Relearning is very much local. Topology-Based Fuzzy Clustering (TFC) • Proposed Idea: • Based partly on GNG • Unsupervised Learning (Already Present) • Unsupervised Cluster formation (Added) • Cluster Evaluation/Testing (Added) • Single Pass (Already Present) • Adaptive Topology aid by an Expert (Added). TFC Need more in GNG! Difference in Learning nodes between nodes. Noting what nodes belong to which cluster. Incorporating Topology to ascertain the distribution of data in a cluster. TFC • Difference in Learning between nodes • Learning should depend on the data distribution and the cluster probability in the area. • Cannot use a diminishing learning rate as that would cause the network to NOT- LEARN new patterns. TFC • Noting what nodes belong to which cluster. • Placeholders for the Clusters. • Incorporating Topology to ascertain the distribution of data in a cluster. • Cluster distributed in form of nodes and edges. TFC • Placeholder for the Clusters -Adding a Level-2 structure Symbolic representation A ØA:C1 saying that the Level-1 node lies in B ØB:C1 a particular Level-2 C1 Node (Cluster) ØC:C1 C Edge as D ØD:C1 connected ØY:C2 C2 by GNG Y ØX:C2 X Level 1 Level 2 TFC • Finding the Centers of the Clusters -Proposed “Reference and Fuzzy Finding Algorithm” -Finds the center of the Cluster -How? Use the Topology TFC • What is the center in the following figure? Should we use vector to find the center? Should we use Distance to find the center? TFC • Works on the same theory using the link structure to find the center. So for each node do the following: k n Rn n 1 k k R n 1 n n 1 n Let nodes be represented by N1, N2, N3…Nm Each having corresponding reference values be Ǿ1, Ǿ2, Ǿ3… Ǿm and Radius (Average distance between node and neighbors) R1, R2, R3,…Rn. TFC • Iterate till the values converge k n Rn n 1 k k R n 1 n n 1 n • Advantages: • Independent of Vectors. • Iterations are cumulative. TFC Nodes represented by Blue Diamonds, Edges are in Red Average Radius is drawn in Blue Circles Values shown are the Fuzzy membership Iteration 1: referenceM =[ 0.3333 0.3333 0.3333 0.3333] (Initial value) Iteration 2: referenceM =[ 0.5000 0.2500 0.3333 0.3333] Iteration 3: referenceM =[ 0.5141 0.2539 0.3349 0.3349] Iteration 4: referenceM =[ 0.5144 0.2540 0.3348 0.3348] (Converged value) fuzzyM =[ 0.4937 1.0000 0.7586 0.7586] TFC • Fuzzy Membership is found as min(ReferenceValue) FuzzyMembership(i)= , 1 i size(ReferenceValue) ReferenceValue(i) • As the Reference value of the Center is the lowest, normalize all other values using the center value. • The center will now have a membership of 1. referenceM =[ 0.5144 0.2540 0.3348 0.3348] (Converged value) fuzzyM =[ 0.4937 1.0000 0.7586 0.7586] TFC • Effects on Topology = Path of Convergence Path of Convergence (Internal). Iteration 1: referenceM =[ 0.3333 0.3333 0.3333 0.3333] (Initial value) Iteration 2: referenceM =[ 0.5000 0.2500 0.3333 0.3333] TFC • Effects on Topology = Path of Convergence Path of Convergence (External). Iteration 2: referenceM =[ 0.5000 0.2500 0.3333 0.3333] Iteration 3: referenceM =[ 0.5141 0.2539 0.3349 0.3349] Iteration 4: referenceM =[ 0.5144 0.2540 0.3348 0.3348] (Converged value) TFC • Proof of “Reference and Fuzzy Finding Algorithm” • Standard Power Iterations: Given a unit 2-norm q(0)є Rn, power method produces a sequence of vectors q(k) as follows: for k=1,2,…. If q(0) is not “deficient” A‟s eigen-value of z ( k ) Aq ( k 1) maximum modulus is unique, the q(k) converges q(k ) z (k ) / z (k ) to an eigenvector 2 T (k ) q ( k ) Aq ( k ) end Deficiency occurs only when the exterior nodes are 0. TFC 2-Nodes 2-Dimensional TFC -Examples TFC • Is Radius Normalization required? • No! • Trace referM = 0.4953 0.3335 0.5045 fuzzyM = 0.6734 1.0000 0.6612 Radius = 1.6401 1.8062 1.9723 TFC –Examples • Effects of Edges on the Reference and Fuzzy Algorithm TFC Discussed Algorithm will help A finding the fuzzy memberships of these links ØB:C1 B C1 ØC:C1 C Edge as D ØD:C1 connected ØY:C2 C2 by GNG Y ØX:C2 X Level 1 Level 2 System Layout with Fuzzy Membership. TFC • So how is the Fuzzy Area decided? FMem = 0.8 FMem = Fuzzy membership FMem = 1.0 FMem = 0.75 FMem = 0.65 Overlapping Fuzzy Hyperspheres. TFC • Projection of Hyperspheres on Axes FMem = Fuzzy membership FMem = 1.0 FMem = 0.75 FMem = 0.65 1 FMem=1.0 FMem=0.75 FMem=0.6 5 Fuzzy Membership Projection Fuzzy Membership Projection with Maximal Membership. TFC Hyperspheres in a 3 Dimensional topology. TFC Cluster represented by black outliner Cluster represented by gray outliner Node A Node B Y X X,Y represents Data Sample Overlapping Cluster Regions. TFC FMem = Fuzzy membership FMem = 1.0 FMem = 0.75 FMem = 0.65 Overlapping regions in which maximum membership is taken 1 Note I am decreasing the fuzzy membership as a function of Fuzzy Membership Projection Radius and Error Figure 4.28 Fuzzy Membership Projection with Smoothed Out Error. Input: X, Cluster Matrix Output: UNKNOWN /ClusterIndex TFC –Test() Test ( ) function Find all Clusters that have the distance between X and Level-1 nodes less than radius + √error associated with the node and name them „XClusters‟ Size(XClusters)= 0 Yes No Mark as UNKNOWN Find the fuzzy memberships of „X‟ into each of the XClusters, and name them as XFuzzyMemberships. Also use the smoothing based on error in region out of the radius but lesser than radius+√error. Find the Cluster having the maximum membership in XFuzzyMemberships, and Return Test( ) name it as ClusterIndex TFC-Results (Iris) Iris Virginica Iris Versicolor Iris Setosa Scatter Plot for the IRIS Dataset. TFC-Results (Iris) Figure shows position Cluster separation line of the node in a 3D plot. Note that the Cluster node for Versicolor is not at the right position Not at the as desired. right place To be taken figuratively not mathematically GNG, Node-Data Display in 3-Dimensional Topology with the Discriminant line. TFC-Results (Iris) Also notice much more extensive growth in this area Now at the right place Node Positions according to TPC. By GNG TFC-Results (Iris) Misclassified data for IRIS dataset out of 150 FCM Gath-Geva TFC Classes 10, eb 0.04, en 0.0006, m=1.5 m=2 m=3.5 m=1.5 m=2 m=3.5 0.5, 0.0005, amax 12 Iris Setosa 0 0 0 0 0 0 0 Iris Versicolor 3 3 3 3 3 4 10 Iris Virginica 12 13 11 12 12 9 1 Total Misclassified 15 16 14 15 15 13 11 Adaptive TFC-Results (Iris) • Insertion of the misclassified data as a node. • Done by adding a misclassified data point as a node in the cluster. Virginica Newly Virginicainserted node Misclassified Data points by Setosa ATFC denoted in blue Versicolor ATFC-Results (Iris) TFC ATFC-SSAA(for Iris Virginica only) Classes Misclassified Classification Rate Misclassified Classification Rate Iris Setosa 0 100% 0 100% Iris Versicolor 10 80% 4 92% Iris Virginica 1 98% 1 98% Total Misclassified 11 92.7% 5 96.7% ATFC • Expert Ability to split/merge clusters 2 2 6 1 6 1 7 7 3 3 5 5 Cluster 1 Cluster 2 4 4 Pictorial View in 2 Dimension of a Topology Pictorial View in 2 Dimension of a Topology (After Expert‟s Input). (Before Expert‟s Input). ATFC Expert asked cluster splitting ATFC –Simulated Dataset Distribution Center Radius Variation in Radians Distribution Size Width Min Max Start End Circle (0,0) 13 1.5 11.5 14.5 0 2pi 10000 Semicircle (0,0) 9 1 8 10 0 pi 5000 Arc (0,0) 5 1 4 6 4 5 3000 ATFC –Simulated Dataset Node Positions Test Data Points ATFC –Simulated Dataset • Expert‟s modification of Cluster set Link Broken Simulated Dataset Topological Node Positions with Breakage. ATFC –Simulated Dataset Calculated Classes from ATFC Class Real Class Total Classification Misclassified Not Classified Circle 10 10 0 0 Semicircle 10 12 2 0 Arc 10 8 0 2 NASA Dataset • Used by Dai[3]. • Consists of a signal from the fuel tank. • Signal used for event detection. • Clustering/ Classification Valve Close is not that important as finding events. Valve Open MPRE301P Signal (50Hz), Events of Interests. NASA Dataset Wavelet Tree Wavelet Decomposition NASA Dataset By Unsupervised TFC •No Events detected. •Blue diamonds are unclassified events. NASA Dataset Supervised X Insertion X = Data point -- = Virtual Edge to the Nearest Node Insertion of a Data point as a Node(1). X X = Data point -- = Virtual Edge to the Nearest Node Insertion of a Data point as a Node(2). NASA Dataset Supervised Insertion - Result X X = Data point -- = Virtual Edge to the Nearest Node Insertion of a Data point as a Node(3). NASA Dataset By Supervised ATFC Expert points out events and subsequently finding new events (shown in red) NASA Dataset 2 Events marked by an Expert •Default parameters don‟t work that well. •Too many events are detected. •Need to modify the size of the clusters. NASA Dataset Nodes Represented 1 by Circles and .6 Radius by Dotted 1 0.7 Circles .7 .6 1 1 Overlapping Clusters with high memberships. Conclusions(1) Choice of ATFC in Different Circumstances Factors Choice 1 Execution time Yes, very good Dataset Size-Large 2 Yes, very good (Num. of Clusters<<<Size of dataset) Dataset Size-Small 3 Yes, but better algorithms are available (Num. of Clusters < Size of dataset) 4 Topology Shape Yes, very good Adaptiveness of Correcting Yes, but modification/ tweaking node radius might 5 Misclassification be required Yes, but for medium to high accuracy. It’s not 6 Accuracy of Classification always the best performing. 7 Adaptiveness of Learning new data Yes, very good Conclusions(2) Its advantages can be cited to: 1.TFC has a linearized time for calculating the clusters in an unsupervised manner. 2.TFC has the advantages of being able to learn different cluster shapes through the use of Topology. 3.Through ATFC, the algorithm becomes adaptive to data by allowing expert‟s feedback on the type of the data encountered. References • Fritzke, B. (1996). Automatic construction of radial basis function networks with the growing neural gas model and its relevance for fuzzy logic. Proceedings of the 1996 ACM symposium on Applied Computing, 624-627. • Golub, Van Loan. Matrix Algorithms • Dai, S. & Dhawan, A.P. (2004). Adaptive learning for event modeling and pattern classification. PhD Dissertation at NJIT, njit-etd2004- 023. Thank You!