TOPOLOGY-BASED FUZZY CLUSTERING ALGORITHM by she20208

VIEWS: 30 PAGES: 69

									TOPOLOGY-BASED FUZZY
CLUSTERING ALGORITHM
              by
       Abhishek Jaiantilal

   Advisor: Dr.Atam P. Dhawan
                 Objective
• Primary Goal:
  To design an algorithm that can find clusters
  of any shape in an unsupervised manner in a
  linearized time.
• Secondary Goals:
   Needs to be adaptive and learn more clusters as
    they are encountered.
   Needs to be adaptive to take Supervised Expert
    feedback to increase the rate of accuracy.
         Parts of Presentation
•   Introduction to Clustering/Classification
•   Background Methods
•   Topology-Based Fuzzy Clustering (TFC)
•   Adaptive TFC
•   Results and Conclusion
               Introduction
• Clustering/Classification
  • Grouping of samples
  • Fields of Interests -Multiple
     • Pattern Recognition
     • Linguistics
     • Etc.
             Introduction
• Contemporary Clustering/Classification
  Methods
  • C-Means Family.
  • Neural Networks (Radial Basis Function,
    Multi-Layered Perceptrons, etc).
  • Hierarchical Clustering.
                     Introduction
• Analysis of C-Means Family
         k    n          2

  J   xi( j )  c j                        Optimization Function
        j 1 i 1

                                   1; Dik  Dij, j  i 
                             uik                       i, k
                                   0; Otherwise        
 Hard C-Means:
                                                            x
                                      n

 Iterated till vik                   u      ik   xk
                                                           xk  X i
                                                                      k

                             vik    k 1
                                                                          vi i
 stops moving                           n

                                      u
                                                               ni
                                                  ik
                                      k 1
              Introduction
• Most C-Means based algorithms use:
  • Distance, is the most varied measure.
  • New Algorithms like
    • Gath-Geva – Includes Priori and Posteriori
      Probability.
    • Includes Probability (Gustaffson-Kessel ), Fuzzy
      Memberships (Fuzzy C-Means).
               Introduction
• Comments on C-Means:
  • Needs the number of Clusters Priori.
  • Dependence on Initial Cluster centers (Very
    sensitive in case of better performing algorithms
    like Gath-Geva) causes non-unique solutions.
  • Will Converge but the number of iterations
    required is not fixed!
  • Exponential increase of time factor with linear
    increase in samples.
  • No way to add supervised information.
  • =>Cannot learn new patterns.
             Introduction
• Neural Networks:
  • Radial Basis Function Neural Networks
    (RBFNN)
  • Multi Layered Perceptron (MLP)
• Requires Training (normally supervised)
• Requires Learning (Computationally
  intensive)
            Introduction
• Example: RBFNN
                               Multiple output units
                                 can be used for
                                      multiple
                                 clusters/pattern
           u1        



           u2       
           .
      X
           .
           .
           .
           .
           .
                    Note: is the variance of the radial
           un       basis function at Layer-1
             Introduction
• Disadvantages:
  • Requires extensive Supervised training.
  • Learning would cause modification of the
    weights and relearning.
  • Its very rigid as newer patterns cannot be
    added to an already learnt network.
  • =>Cannot learn new patterns.
       Background Methods
• Growing Neural Gas (GNG)
  • Topology Learning Method
  • Idea is
     • Take 2 nodes representing vector positions.
     • With each new sample try to find the nearest node and
       the 2nd Nearest Node and create a link between them if
       not existing.
     • Make the winner node and its neighbors learn on basis of
       the euclidean distance and learning rate eb and en
       respectively.
     • Insert a new node after every „n‟ iterations.
      Background Methods
• Growing Neural Gas (GNG) -Growth




                             From [2]
      Background Methods
• Growing Neural Gas (GNG)
  • If the number of nodes generated is n and
    the dimension of the dataset is „d‟.
  • Variables Involved
     Node position „vector‟ matrix W (nxd)
     Edge Matrix C (nxn)
     Error (M.S.E.)Matrix E (nx1)
     Age Matrix Age (nxn)
       Background Methods
• Comparison of GNG with Self Organization
    Fixed rate of learning in GNG.
    Growth present in GNG.
    Winner and Neighbors take all strategy!
    Edge list helps in finding nearest
     neighbors.
       Background Methods
• Growing Neural Gas (GNG) –Growth
  • Learning Rate is constant.
  • New Edges are created if no edge exist between
    the winner and the 2nd winner.
  • Old edges are deleted based on age factor.
      Age factor is initially set to „0‟ on Edge creation.
      Age factor incremented for the winner and the
       neighbors.
      Age factor can be directly incorporated in to C.
      Background Methods
• GNG – Comments:
  • Dependence on the Parameters(7) for
    learning, edge deletion and node creation.
  • Just generates the topology, cannot be
    used directly for clustering.
  • Used mainly for viewing data-topology in
    3D or using Supervised GNG.
      Background Methods
                                                                Learning:
• Supervised GNG                                            Difference between
                                                               expected and
                     Multiple output units                calculated, Rather than
                   can be used for multiple
                       clusters/pattern
                                                            euclidean distance

                                                          


         X                                        




                                              Using GNG
      Background Methods
• What is missing in Supervised-GNG?
   Unsupervised Clustering and Growth.
   Number of Clusters have to be decided
    Priori.
   Relearning is still required.
       Advantages of GNG
• Single Pass.
• Resistance to Noise.
• Can learn patterns of all shapes and
  sizes.
• Relearning is very much local.
Topology-Based Fuzzy Clustering
            (TFC)
• Proposed Idea:
  • Based partly on GNG
    • Unsupervised Learning (Already Present)
    • Unsupervised Cluster formation (Added)
    • Cluster Evaluation/Testing (Added)
    • Single Pass (Already Present)
  • Adaptive Topology aid by an Expert
    (Added).
                   TFC
Need more in GNG!
   Difference in Learning nodes between
    nodes.

   Noting what nodes belong to which
    cluster.

   Incorporating Topology to ascertain the
    distribution of data in a cluster.
                    TFC
• Difference in Learning between nodes
  • Learning should depend on the data
    distribution and the cluster probability in
    the area.
  • Cannot use a diminishing learning rate as
    that would cause the network to NOT-
    LEARN new patterns.
                    TFC
• Noting what nodes belong to which
  cluster.
  • Placeholders for the Clusters.
• Incorporating Topology to ascertain the
  distribution of data in a cluster.
  • Cluster distributed in form of nodes and
    edges.
                                     TFC
• Placeholder for the Clusters
  -Adding a Level-2 structure
                                                     Symbolic
                                                  representation
                A    ØA:C1                        saying that the
                                                Level-1 node lies in
                B   ØB:C1                       a particular Level-2
                                     C1
                                                  Node (Cluster)
                     ØC:C1
                C

    Edge as     D    ØD:C1
   connected                 ØY:C2   C2
    by GNG      Y
                             ØX:C2
                X


               Level 1                Level 2
                    TFC
• Finding the Centers of the Clusters
  -Proposed
  “Reference and Fuzzy Finding Algorithm”

  -Finds the center of the Cluster

  -How? Use the Topology
                   TFC
• What is the center in the following
  figure?



                  Should we use vector to find the center?
                  Should we use Distance to find the center?
                             TFC
• Works on the same theory using the link structure to
  find the center. So for each node do the following:
                                 k

                                        n   Rn
                            n 1
                                k           k

                                  R
                               n 1
                                       n
                                           n 1
                                                     n




   Let nodes be represented by N1, N2, N3…Nm
   Each having corresponding reference values be Ǿ1, Ǿ2, Ǿ3… Ǿm
   and Radius (Average distance between node and neighbors) R1, R2,
   R3,…Rn.
                        TFC
• Iterate till the values converge
                                k

                                       n   Rn
                           n 1
                               k           k

                                 R
                              n 1
                                      n
                                          n 1
                                                    n




• Advantages:
   • Independent of Vectors.
   • Iterations are cumulative.
                                           TFC
                                          Nodes represented by Blue Diamonds,
                                          Edges are in Red
                                          Average Radius is drawn in Blue Circles
                                          Values shown are the Fuzzy membership




Iteration   1:   referenceM =[   0.3333    0.3333   0.3333   0.3333] (Initial value)
Iteration   2:   referenceM =[   0.5000    0.2500   0.3333   0.3333]
Iteration   3:   referenceM =[   0.5141    0.2539   0.3349   0.3349]
Iteration   4:   referenceM =[   0.5144    0.2540   0.3348   0.3348] (Converged value)
                     fuzzyM =[   0.4937    1.0000   0.7586   0.7586]
                                     TFC
•    Fuzzy Membership is found as


                          min(ReferenceValue)
    FuzzyMembership(i)=                       , 1  i  size(ReferenceValue)
                           ReferenceValue(i)

•    As the Reference value of the Center is the lowest, normalize all other
     values using the center value.
•    The center will now have a membership of 1.

referenceM =[ 0.5144        0.2540    0.3348    0.3348] (Converged value)
    fuzzyM =[ 0.4937        1.0000    0.7586    0.7586]
                                      TFC
• Effects on Topology




                                                               = Path of Convergence
                    Path of Convergence (Internal).



 Iteration 1: referenceM =[ 0.3333        0.3333      0.3333      0.3333] (Initial value)
 Iteration 2: referenceM =[ 0.5000        0.2500      0.3333      0.3333]
                                     TFC
• Effects on Topology




                                                          = Path of Convergence
                   Path of Convergence (External).



 Iteration 2: referenceM =[ 0.5000        0.2500     0.3333   0.3333]
 Iteration 3: referenceM =[ 0.5141        0.2539     0.3349   0.3349]
 Iteration 4: referenceM =[ 0.5144        0.2540     0.3348   0.3348] (Converged value)
                                                TFC
• Proof of “Reference and Fuzzy Finding Algorithm”
• Standard Power Iterations: Given a unit 2-norm q(0)є Rn, power
  method produces a sequence of vectors q(k) as follows:

      for k=1,2,….                                If q(0) is not “deficient”
                                                  A‟s eigen-value of
            z ( k )  Aq ( k 1)                  maximum modulus is
                                                  unique, the q(k) converges
            q(k )  z (k ) / z (k )               to an eigenvector
                                        2
                                  T
               (k )
                         q ( k )  Aq ( k )
                                  
      end



   Deficiency occurs only when the exterior nodes are 0.
TFC 2-Nodes 2-Dimensional
TFC -Examples
                         TFC
• Is Radius Normalization required?
  • No!
  • Trace
    referM = 0.4953 0.3335 0.5045
    fuzzyM = 0.6734 1.0000 0.6612
    Radius = 1.6401 1.8062 1.9723
              TFC –Examples
• Effects of Edges on the Reference and Fuzzy Algorithm
                                  TFC
                                                    Discussed Algorithm will help
             A                                     finding the fuzzy memberships
                                                            of these links
                 ØB:C1
             B                    C1
                  ØC:C1
             C

 Edge as     D    ØD:C1
connected                 ØY:C2   C2
 by GNG
             Y
                          ØX:C2
             X



            Level 1                Level 2

            System Layout with Fuzzy Membership.
                         TFC
• So how is the Fuzzy Area decided?


                   FMem = 0.8        FMem = Fuzzy membership


                   FMem = 1.0


                                FMem = 0.75

          FMem = 0.65




           Overlapping Fuzzy Hyperspheres.
                                    TFC
• Projection of Hyperspheres on Axes
                                                      FMem = Fuzzy membership


                                    FMem = 1.0


                                                 FMem = 0.75

                          FMem = 0.65




            1
                                     FMem=1.0    FMem=0.75
                         FMem=0.6
                         5

                                Fuzzy Membership Projection


                Fuzzy Membership Projection with Maximal Membership.
             TFC




Hyperspheres in a 3 Dimensional topology.
                     TFC
       Cluster
    represented
      by black
      outliner



                                                  Cluster
                                              represented by
                                               gray outliner



                         Node A
                                  Node B
Y
                          X
                                           X,Y represents Data Sample


             Overlapping Cluster Regions.
                            TFC
                                          FMem = Fuzzy membership


                      FMem = 1.0


                                    FMem = 0.75

           FMem = 0.65

                                                  Overlapping regions
                                                  in which maximum
                                                  membership is taken

1

                                                      Note I am decreasing
                                                      the fuzzy membership
                                                      as a function of
                  Fuzzy Membership Projection         Radius and Error


    Figure 4.28 Fuzzy Membership Projection with Smoothed Out
    Error.
                                                            Input: X, Cluster Matrix
                                                            Output: UNKNOWN /ClusterIndex




TFC –Test()                        Test ( ) function



                 Find all Clusters that have the distance between X and
                Level-1 nodes less than radius + √error associated with
                           the node and name them „XClusters‟




                                   Size(XClusters)=
                                   0

                Yes



                                                                      No

     Mark as UNKNOWN                          Find the fuzzy memberships of „X‟ into each of the
                                              XClusters, and name them as XFuzzyMemberships.
                                             Also use the smoothing based on error in region out
                                                 of the radius but lesser than radius+√error.



                                                        Find the Cluster having the maximum
                                                       membership in XFuzzyMemberships, and
        Return Test( )
                                                               name it as ClusterIndex
TFC-Results (Iris)




                                        Iris Virginica
                                        Iris Versicolor
                                        Iris Setosa



   Scatter Plot for the IRIS Dataset.
           TFC-Results (Iris)
                                                        Figure shows position
                                    Cluster
                                 separation line        of the node in a 3D
                                                        plot.


                                                        Note that the Cluster
                                                        node for Versicolor is
                                                        not at the right position
                                          Not at the    as desired.
                                          right place



                                                        To be taken figuratively
                                                        not mathematically
GNG, Node-Data Display in 3-Dimensional
  Topology with the Discriminant line.
         TFC-Results (Iris)
                              Also notice much
                               more extensive
                                growth in this
                                    area




                                   Now at the
                                   right place




Node Positions according to TPC.                 By GNG
                  TFC-Results (Iris)
                              Misclassified data for IRIS dataset out of 150

                              FCM                        Gath-Geva                          TFC

Classes
                                                                                 10, eb  0.04, en  0.0006,
                      m=1.5   m=2     m=3.5      m=1.5     m=2       m=3.5
                                                                                 0.5,   0.0005, amax  12

Iris Setosa             0      0         0         0         0         0                     0

Iris Versicolor         3      3         3         3         3         4                     10

Iris Virginica         12      13       11         12       12         9                     1

Total Misclassified    15      16       14         15       15        13                     11
     Adaptive TFC-Results (Iris)
•   Insertion of the misclassified data as a node.
•   Done by adding a misclassified data point as a node in the cluster.

                                          Virginica
                                                               Newly
                                                      Virginicainserted
                                                              node




                                                                  Misclassified Data points by
                     Setosa                                          ATFC denoted in blue

                                    Versicolor
                 ATFC-Results (Iris)
                                      TFC                           ATFC-SSAA(for Iris Virginica only)

Classes

                      Misclassified         Classification Rate   Misclassified        Classification Rate



Iris Setosa                0                      100%                 0                     100%


Iris Versicolor            10                      80%                 4                      92%



Iris Virginica             1                       98%                 1                      98%


Total Misclassified        11                     92.7%                5                     96.7%
                                              ATFC
• Expert Ability to split/merge clusters
                 2                                                2
                              6                    1                           6
   1
                                         7                                                7
        3                                               3
                                  5                                                5
                                                Cluster 1                               Cluster 2

                  4                                                4
                                                Pictorial View in 2 Dimension of a Topology
Pictorial View in 2 Dimension of a Topology     (After Expert‟s Input).
(Before Expert‟s Input).
                                 ATFC
Expert asked cluster splitting
ATFC –Simulated Dataset




Distribution   Center   Radius   Variation                  in Radians   Distribution
                                                                             Size
                                  Width      Min    Max    Start   End
  Circle        (0,0)     13        1.5      11.5   14.5    0      2pi     10000

Semicircle      (0,0)     9          1        8     10      0      pi       5000

    Arc         (0,0)     5          1        4      6      4       5       3000
ATFC –Simulated Dataset




Node Positions   Test Data Points
   ATFC –Simulated Dataset
• Expert‟s modification of Cluster set

                                                          Link
                                                         Broken




     Simulated Dataset Topological Node Positions with Breakage.
ATFC –Simulated Dataset
                                          Calculated Classes from ATFC
   Class      Real Class
                           Total Classification      Misclassified       Not Classified


   Circle        10                10                     0                    0


 Semicircle      10                12                     2                    0


    Arc          10                 8                     0                    2
                        NASA Dataset
•   Used by Dai[3].
•   Consists of a signal from the fuel tank.
•   Signal used for event detection.
•   Clustering/ Classification
                                                                                       Valve Close
    is not that important as
    finding events.
                                                                                       Valve Open




                                        MPRE301P Signal (50Hz), Events of Interests.
        NASA Dataset



                        Wavelet Tree




Wavelet Decomposition
                  NASA Dataset
By Unsupervised TFC
                           •No Events detected.
                           •Blue diamonds are
                           unclassified events.
                                NASA Dataset
 Supervised
                                                                                                            X
 Insertion
                                                    X = Data point
                                                    -- = Virtual Edge to the Nearest
                                                    Node          Insertion of a Data point as a Node(1).




                                                             X
X = Data point
-- = Virtual Edge to the Nearest
Node           Insertion of a Data point as a Node(2).
                 NASA Dataset
Supervised Insertion - Result


                                                                X



       X = Data point
       -- = Virtual Edge to the Nearest Node
                      Insertion of a Data point as a Node(3).
                     NASA Dataset
By Supervised ATFC

                             Expert points out events
                             and subsequently finding
                             new events (shown in red)
                  NASA Dataset
2 Events marked by an Expert

                               •Default parameters don‟t
                               work that well.
                               •Too many events are
                               detected.
                               •Need to modify the size of
                               the clusters.
               NASA Dataset
                                        Nodes Represented
                                1         by Circles and
     .6                                  Radius by Dotted
                 1       0.7
                                              Circles


          .7               .6
                     1              1




Overlapping Clusters with high memberships.
              Conclusions(1)
            Choice of ATFC in Different Circumstances
               Factors                                          Choice


1             Execution time                                Yes, very good


            Dataset Size-Large
2                                                           Yes, very good
    (Num. of Clusters<<<Size of dataset)

           Dataset Size-Small
3                                               Yes, but better algorithms are available
    (Num. of Clusters < Size of dataset)


4             Topology Shape                                Yes, very good


        Adaptiveness of Correcting         Yes, but modification/ tweaking node radius might
5
            Misclassification                                 be required

                                             Yes, but for medium to high accuracy. It’s not
6        Accuracy of Classification
                                                      always the best performing.

7    Adaptiveness of Learning new data                      Yes, very good
               Conclusions(2)
Its advantages can be cited to:
1.TFC has a linearized time for calculating the clusters
   in an unsupervised manner.

2.TFC has the advantages of being able to learn
  different cluster shapes through the use of Topology.

3.Through ATFC, the algorithm becomes adaptive to
  data by allowing expert‟s feedback on the type of the
  data encountered.
                     References
•   Fritzke, B. (1996). Automatic construction of radial basis function
    networks with the growing neural gas model and its relevance for
    fuzzy logic. Proceedings of the 1996 ACM symposium on Applied
    Computing, 624-627.
•   Golub, Van Loan. Matrix Algorithms
•   Dai, S. & Dhawan, A.P. (2004). Adaptive learning for event modeling
    and pattern classification. PhD Dissertation at NJIT, njit-etd2004-
    023.
Thank You!

								
To top