Docstoc

GAD_ General Activity Detection for Fast Clustering on Large Data

Document Sample
GAD_ General Activity Detection for Fast Clustering on Large Data Powered By Docstoc
					GAD: General Activity Detection
for Fast Clustering on Large Data


    Xin Jin, Sangkyum Kim, Jiawei Han,
      Liangliang Cao and Zhijun Yin
                  SDM’09
Outline
   Introduction
   GAD
   GAD for very large clustering
   Experimental
   Conclusion
   My though
Introduction
 It focus on developing fast core clustering
  algorithms.
 Contribution
    Exploit activity detection for fast clustering on
     different senarios
    It can achieve very high speed than k-means
GAD
 Notations
   NC(i, p, j)
      pattern p’s jth nearest center
   D-NC(i, p, j)
      distance from pattern p to its jth nearest center
   Dist(i, p, Cj)
      distance between pattern p and center Cj
Definition and Concepts
 The GAD framework function
   GAD(S, A, m, B)
   S: search methods, A: activity states, m: the number of
    nearest center, B: boundary
 Search methods
     Full search - find a pattern’s m nearest center
     Whole full search - perform full search for all the patterns
     Partial search - search from active centers
     m-search - search from a pattern’s previous m nearest centers
     0-search - a special case of m-search
     m-boundary
GAD algorithm
 General algorithm
   Step 1. initialization
   Step 2. search method decision
   Step 3. update pattern p’s nearest centers according
    to step 2
   Step 4. get next pattern
   Step 5. assign each pattern to its nearest center
   Step 6. go to step 2 until all the centers are
    converged
Exact GAD algorithm
                                   m=3



 i=1       P        C1   C2        C3             C4             C5        C6



 i=2
           P        C1        C2   C3                  C4        C5             C6
  (1)


 i=2
           P   C2   C1        C4   C3             C5        C6
  (2)


  i=2
           P        C1                  C2   C3        C6   C5        C4
result 1


  i=2
           P   C2   C1        C4   C3             C5        C6
result 2
                                        m=3



 i=1       P        C1        C2        C3        C4             C5   C6



 i=2
           P   C1             C2             C3        C4        C5        C6
  (1)


 i=2
           P   C2   C3   C1        C4             C5        C6
  (2)


  i=2
           P   C1             C2             C3        C4        C5        C6
result 1


  i=2
           P   C2   C3   C1        C4             C5        C6
result 2
Full Search
                                  m=3



i=1      P   C1        C2             C3                  C4             C5        C6



i=2      P              m=3                C2        C1             C3   C5   C4        C6



 i=2
         P   C2   C1        C3   C5             C4             C6
result
GAD for Very Large Clusters
 H-GAD
   Hierarchical GAD
 KD-GAD
   Kd-tree GAD
   Build two kd-tree
      Full kd-tree
      Active kd-tree
Experimental Evaluation
Conclusion
 Propose a General Activity Detection
  framework for fast clustering.
 It is several times faster than K-Means and the
  best speedup can be as high as 10 times.
My thought
 Although this paper provide new core
  clustering algorithm, but whether uses on data
  streaming.
 Is it the same result for different initialize
  center on GAD algorithm?

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:5/4/2012
language:
pages:16