Recent Trends in Fuzzy Clustering From Data to Knowledge by she20208

VIEWS: 26 PAGES: 69

									Recent Trends in Fuzzy Clustering:
          From Data to Knowledge




    pedrycz@ee.ualberta.ca

                             Shenyang, August 2009
  Agenda

Introduction: clustering, information granulation and paradigm shift

Key challenges in clustering

Fuzzy objective-based clustering

Knowledge-based augmentation of fuzzy clustering

Collaborative fuzzy clustering

Concluding comments
   Clustering

Areas of research and applications:

•Data analysis
•Modeling
•Structure determination



Google Scholar -2, 190,000 hits for “clustering”
(as of August 6, 2009)
          Clustering as a
     conceptual and algorithmic
framework of information granulation

   Data information granules (clusters)
          abstraction of data

   Formalism of:
                   set theory (K-Means)
                   fuzzy sets (FCM)
                   rough sets
                   shadowed sets
      Main categories of clustering


Graph-oriented and hierarchical
(single linkage, complete linkage, average linkage..)

Objective function-based clustering

Diversity of formalisms and optimization tools
(e.g., methods of Evolutionary Computing)
               Key challenges of
                  clustering

Data-driven methods

Selection of distance function (geometry of clusters)

Number of clusters

Quality of clustering results
The dichotomy and the shift of
          paradigm
             Fuzzy Clustering:
           Fuzzy C-Means (FCM)

Given data x1, x2, …, xN, determine its structure by
forming a collection of information granules – fuzzy sets


     Objective function

                     c   N
                         m
               Q    u ik || x k  v i || 2
                    i 1 k 1


     Minimize Q; structure in data (partition matrix and prototypes)
              Fuzzy Clustering:
            Fuzzy C-Means (FCM)

Vi – prototypes

U- partition matrix
  FCM – optimization


              c   N
                 m
       Q    u ik || x k  v i || 2   Minimize
             i 1 k 1



subject to

       (a) prototypes

       (b) partition matrix
Optimization - details

   Partition matrix – the use of Lagrange multipliers

                c             c
           V   u m d 2   ( u ik 1)
                   ik ik
                i1           i1


            dik= ||xk-vi||2
            –Lagrange multiplier
  


              V           V
                    0        0
              u st        
        Optimization – partition matrix (1)

        c                      c                V                     V
 V   u d  λ( u ik  1)
            m        2
                                                      0                  0
                                                u st                  λ
            ik       ik
     i 1                     i 1




 V                                                     1
       mu st 1d st  λ
           m      2                                           2                        1         2
                                                  λ  m-1 m-1                   λ  m 1 c
u st                                    u st         d st                          d m 1  1
                                                                                              jt
                                                            m                 m       j1




        1
                                                   1
  λ
  
      m 1
                 
                          1          u st                       1
  m                        2                 c  d   2       m 1

                                               d          
                       c
                       d m 1
                                                       st
                          jt                      
                                              j1 
                                                       2    
                      j1                              jt   
  Optimization- prototypes (2)

                     c   N             n
                                  m                       2
                 Q           u ik    (x kj  v ij )
                    i 1 k 1          j1


Euclidean distance
                                                N
                                                    m
Gradient of Q with respect to vs                 u ik (x kt  v st )  0
                                               k 1



                                                              N
                                                                  m
                                                               u ik x kt
                                                    v st  k 1
                                                                  N
                                                                      m
                                                                   u ik
                                                               k 1
Fuzzy C-Means (FCM): An overview
      procedure FCM-CLUSTERING (x) returns prototypes and partition matrix

      input : data x = {x1, x2, ..., xk}

      local:   fuzzification parameter: m

               threshold: 

               norm: ||.||



               INITIALIZE-PARTITION-MATRIX

               t0

               repeat

                        for i=1:c do

                                             N
                                                m
                                             u ik ( t )x k
                                v i ( t )  k 1                 compute prototypes
                                               N
                                                   m
                                                u ik ( t )
                                             k 1




                        for i = 1:c do

                             for k = 1:N do

                                update partition matrix

                                                                    1
                                u ik ( t  1)                                              update partition matrix
                                                                                 2/(m 1)
                                                   c    || x  v i (t) || 
                                                    k                      
                                                                            
                                                  j 1  || x k  v j (t) || 




               tt+1

               until ||U(t+1)-U(t)||  

               return U, V
Geometry of information granules

    n=1




  m =1.2     m =2.0      m =3.5
           Domain Knowledge:
     Category of knowledge-oriented
                guidance
Partially labeled data: some data are provided with labels
(classes)


Proximity knowledge: some pairs of data are quantified in
terms of their proximity (closeness)


Viewpoints: some structural information is provided


Context-based guidance: clustering realized in a certain context
specified with regard to some attribute
Clustering with domain knowledge
  (Knowledge-based clustering)
         Information granules
                                        Information granules
              (structure)
                                             (structure)



       CLUSTERING
                                       CLUSTERING

                                                               Domain knowledge


Data
                                Data




       Data-driven                     Data- and knowledge-
                                              driven
              Context-based clustering


To align the agenda of fuzzy clustering with the principles of fuzzy
modeling, the following features are considered:

Active role of the designer [customization of the model]

The structural backbone of the model is fully reflective of relationships
between information granules in the input and output space


      Clustering : construct clusters in input space X



  Context-based Clustering : construct clusters in input space X
                             given some context expressed in
                             output space Y
Context-based clustering:
Computing considerations
 structure                        structure




                                                   context

 Data                              Data
             •computationally more efficient,
             •well-focused,
             •designer-guided clustering process
        Context-based clustering



Context-based Clustering : construct clusters in input space X
                           given some context expressed in
                           output space Y




Context – hint (piece of domain knowledge)
         provided by designer who actively impacts the
          development of the model
      Context-based clustering:
           Context design


Context – hint (piece of domain knowledge)
         provided by designer who actively impacts the
          development of the model. As such, context
          is imposed by the designer at the beginning



Realization of context

Designer  focus  information granule (fuzzy set)

(a) Designer, and (b) clustering of scalar data in output space



Context – fuzzy set (set) formed in the output space
   Context-based clustering:
           Modeling

                     Determine structure in input space
                     given the output is high




                     Determine structure in input space
                     given the output is medium




                     Determine structure in input space
                     given the output is low



Input space (data)
         Context-based clustering:
                examples

Find a structure of customer data [clustering]       no context

Find a structure of customer data considering
                                                     context
 customers making weekly purchases in the
range [$1,000 $3,000]


Find a structure of customer data considering        context
customers making weekly purchases at the level of
around $ 2,500

Find a structure of customer data considering       context
customers making significant weekly purchases who   (compound)
are young
      Context-oriented FCM

Data (xk, targetk), k=1,2,…,N

Contexts: fuzzy sets W1, W2, …, Wp

          wjk = Wi(targetk) membership of j-th context for k-th data


   Context-driven partition matrix


                           c                       N
                                                                
U (Wj )  u ik  0,1 |  u ik  w jk k and 0   u ik  N i
                         i 1                     k 1         
     Context-oriented FCM:
       Optimization flow
                                               c   N
Objective function                       Q   u ik || x k  v i || 2
                                                  m

                                              i 1 k 1



Subject to constraint                         U in U(Wj)



Iterative adjustment of partition matrix and prototypes
                                                           N

     u ik 
                     w jk
                                   2
                                                             m
                                                            u ik x k
               c   xk  vi      m 1             vi    k 1
                                                             N
               x  v
                  
                              
                                                          u     m
                                                                  ik
              j1
                   k     j                               k 1
   Viewpoints: definition

Description of entity (concept) which is deemed essential
in describing phenomenon (system) and helpful in casting
an overall analysis in a required setting



“external” , “reinforced” clusters
         Viewpoints: definition
     viewpoint (a,b)                                                      viewpoint (a,?)
                                                                 x2
x2

     b




                                                                      a   x1
               a   x1




                        200

                        150

                        100

                         50

                          0
                               0   100   200   300   400   500
                         -50

                        -100

                        -150
   Viewpoints: definition

Description of entity (concept) which is deemed essential
in describing phenomenon (system) and helpful in casting
an overall analysis in a required setting



“external” , “reinforced” clusters
         Viewpoints: definition
     viewpoint (a,b)                                                      viewpoint (a,?)
                                                                 x2
x2

     b




                                                                      a   x1
               a   x1




                        200

                        150

                        100

                         50

                          0
                               0   100   200   300   400   500
                         -50

                        -100

                        -150
    Viewpoints in fuzzy clustering
 B- Boolean matrix characterizing structure:
    viewpoints
     prototypes (induced by data)

       1, if the j - th feature of the i - th row of B is determined by the viewpoint
b ij  
        0, otherwise



                                                  1 1
                                              B  0 0 
     x2
                                                      
          b                                       0 0 
                                                      



                                                  a b 
                                              F  0 0 
                                                      
                    a     x1                      0 0 
                                                      
Viewpoints in fuzzy clustering
   N      c          n                                 N       c         n
Q =              u (xkj  vij )  
                               m
                               ik
                                                  2
                                                                       u ik (x kj  f ij ) 2
                                                                           m

  k 1   i 1   j1                                    k 1   i 1   j1
                i, j:b ij 0                                         i, j:b ij 1




          v ij if b ij  0
  g ij  
         f ij if b ij  1

                                    N       c          n
                   Q                                u ik (xkj  g ij ) 2
                                                          m

                                    k 1   i 1       j1
    Viewpoints in fuzzy clustering
 B- Boolean matrix characterizing structure:
    viewpoints
     prototypes (induced by data)

       1, if the j - th feature of the i - th row of B is determined by the viewpoint
b ij  
        0, otherwise



                                                  1 1
                                              B  0 0 
     x2
                                                      
          b                                       0 0 
                                                      



                                                  a b 
                                              F  0 0 
                                                      
                    a     x1                      0 0 
                                                      
Viewpoints in fuzzy clustering
   N      c          n                                 N       c         n
Q =              u (xkj  vij )  
                               m
                               ik
                                                  2
                                                                       u ik (x kj  f ij ) 2
                                                                           m

  k 1   i 1   j1                                    k 1   i 1   j1
                i, j:b ij 0                                         i, j:b ij 1




          v ij if b ij  0
  g ij  
         f ij if b ij  1

                                    N       c          n
                   Q                                u ik (xkj  g ij ) 2
                                                          m

                                    k 1   i 1       j1
Labelled data and their
description




 Characterization in terms of membership degrees:

 F = [fik] i=12,…,c , k=1,2, …., N

 and supervision indicator

 b = [bk], k=1,2,…, N
 Augmented objective function

  c     N
Q      u2 || x k  vi ||2  
         ik                        (uik  fik ) 2 bk || x k  vi ||2
  i1   k1



                            >0
Proximity hints

                               Prox(k,l)




 Characterization in terms of proximity    Prox(s,t)
 degrees:

 Prox(k, l), k, l=1,2, …., N

 and supervision indicator matrix

 B = [bkl], k, l=1,2,…, N
Proximity measure
 Properties of proximity:

 (a)Prox(k, k) =1

 (b)Prox(k,l) = Prox(l,k)




Proximity induced by partition matrix U:


                              c
               Prox(k, l)   min(u ik ,u il )
                             i1
   Augmented objective function

  c     N                            c     N     N
Q     u    2
              ik   || x k  v i ||  
                              2
                                           [Prox(k1, k2)  Prox(U)(k1,   k2)] 2 b(k1, k2) || x k1  x k2 ||2
  i1   k1                         i1   k11 k2 1




>0
Two general development
strategies

     SELECTION OF A “MEANINGFUL” SUBSET OF
     INFORMATION GRANULES
        Two general development
        strategies
                 (1) HIERARCHICAL DEVELOPMENT OF INFORMATION
                 GRANULES (INFORMMATION GRANULES OF HIGHER
                 TYPE)




  Information granules
  Type -2



Information granules
Type -1
        Two general development
        strategies
                 (2) HIERARCHICAL DEVELOPMENT OF INFORMATION
                 GRANULES AND THE USE OF VIEWPOINTS




                                                       viewpoints
  Information granules
  Type -2



Information granules
Type -1
Two general development
strategies
   (3) HIERARCHICAL DEVELOPMENT OF INFORMATION
   GRANULES – A MODE OF SUCCESSIVE CONSTRUCTION
     Information granules and
     their representatives
Represent vk[ii] with the use of z1, z2, …, zc

                                                                            1
                                       u i (v k [ii]) 
                                                           c  || v [ii]  z ||       2/(m1)
                                                          || vk [ii]  z i ||Fii F 
                                                                                     
                                                          j1     k        j F ii F 



                 z1

                                               z2



                        v1[ii]
                                            zc        F

                                 Fii
Representation of fuzzy sets:
two performance measures


Entropy measure


Reconstruction criterion (error)
Expressing performance through
entropy measure



       p       c     c[ii]

         H(u
      ii 1   i 1   k 1
                             i   ( v k [ii]))
 Reconstruction error
                                p      c[ii]
                          Q=     || v( v
                               ii 1
                                      ˆ
                                       k 1
                                                      k [ii])  v k [ii] || Fii
                                                                            2




where

                c                                                 c                   c
v ( v k [ii])   u ( v k [ii]) z i
ˆ                     m
                      i                        v( v k [ii])   u ( v k [ii]) z i /  u im ( v k [ii])
                                               ˆ                        m
                                                                        i
               i 1                                              i 1                i 1



Requirement of “coverage” condition

                                        c               p

                                       F       ik     Fi
                                       k 1            i 1
Optimization problem
Form a collection of prototypes Z = {z1, z2, …, zc} such that

        entropy (or reconstruction error)

is minimized while satisfying coverage criterion
            c

           F     ik
                          p
                         Fi       MinZ Q subject to    c

                                                        F     ik     Fi
                                                                          p


           k 1          i 1                           k 1             i 1




Optimization of fuzzification coefficient (m)

                                                                     c           p

                                MinZ Q subject to m>1 and            Fi k   Fi
                                                                    k 1        i 1
              Collaborative
        structure development (2)
        Information
        granules of
        higher type



Information
granules




        data-1                   data-2
                                                     data-P


                      phenomenon, process, system…
     Collaborative structure determination:
      Information granules of higher order



         Prototypes
       (higher order)

     Clustering




prototypes




    D[1]                D[2]          D[P]
Determining correspondence between
             clusters (3)



       Prototypes                  zj
     (higher order)

   Clustering




Select prototypes in D[1], D[2], …, D[p] associated with z j
with the highest degree of membership
     Determining correspondence between
                  clusters (4)


                        zj




                      vi[ii]
                                                                         1
        D[ii]                                λ ij [ii]                                    2
                                                           c[ii]
                                                                 || v i [ii]  z j || 
                                                                                     
                                                                 || v k [ii]  z j || 
                                                           k 1                       

Prototype i 0 associated with prototype zj   λ i 0 j[ii]  max i 1,2,...,c[ii]λ ij
Family of associated prototypes


Prototype i 1 in D[1] associated with prototype zj

Prototype i2 in D[2] associated with prototype zj

              …
Prototype i p in D[p] associated with prototype zj



        v i1 [1], v i 2 [2],...., v i p [P]


        i1    , i 2 ,....,      i p
    From numeric prototypes to
       granular prototypes

          v i1 [1], v i 2 [2],...., v i p [P]


          i1    , i 2 ,....,      i p

individual coordinate of the associated prototypes:

         a1          a2 ….             ap        R

         m1          m2   ….           mp       [0,1]


                Information granule
The principle of justifiable granularity:
       Interval representation


                    a1             a2 ….            ap
                   m1              m2   ….           mp

     1



     0

                     b        a0           d


         if ai  [b,d] then elevate to membership grades to 1
                            required change 1- mi
                                         :
The principle of justifiable granularity:
       Interval representation


                   a1            a2 ….           ap
                  m1             m2   ….          mp

     1



     0

                   b        a0           d


         if ai  [b,d] then reduce membership grades to 0
                            required changemi
                                         :
     The principle of justifiable granularity:
             optimization criterion


            1



            0

                       z1                     z2




          Min   b,d R:bd   {               (1 mi )      m }       i
                                 a i [b,d]                a i [b,d]





          Hyperbox prototypes




         Hi



                                          Hj




i  j : H i  H j   (the number of clusters at theaggregatio n level)
Interval-valued fuzzy sets
 and granular prototypes



            x

 Hi




                       Hj
               Interval-valued fuzzy sets
                and granular prototypes



                     vi

                                                 || x  v i || min
                                       x


                                                 | x  v i || max


Bounds of distances determined coordinate-wise
Interval-valued fuzzy sets:
   membership function


 u i ( x) 
                             1                   Upper bound
                                           2
               c    || x  v i || m in  m 1
                                      
                    || x  v j || m ax 
               j1                     

                             1                   Lower bound
 u i ( x)                               2
               c    || x  v i || m ax m 1
                                      
                    || x  v j || m in 
               j1                     
    Collaborative structure determination:
            Structure refinement




  Feedback
and structure
 refinement
Collaborative structure determination:
        Structure refinement




Iterate
          Clustering at the local level

          Sharing findings and clustering at the higher (global) level

          Assessment of quality of clusters in light of the global structure
          gi(U)[ii] formed at the higher level
                                              c[ii]

          Refinement of clustering      Q[ii]        γ (U)[ii] || x
                                                                i        k    v i [ii] ||2
                                              i 1 x k X[ii]


Until termination criterion satisfied
    Concluding comments

Paradigm shift from data-based clustering to knowledge-based
clustering


Accommodation of knowledge in augmented objective functions


Emergence of type-2 (higher type) information granules when
working with collaborative clustering

								
To top