Clustering Algorithm of High-dimensional Data Based on

Document Sample
scope of work template
							      34            10                                                                                                                                       2008   5
    Vol.34         No.10                                            Computer Engineering                                                                       May 2008

    ·                                     ·                          1000      3428(2008)10 0101           02                            A                          TP311.13




                                                              (                                             445000)

                                                          (CAHD)                                       n


     2                                                                                    CAHD                                               30%




 Clustering Algorithm of High-dimensional Data Based on Unit Region
                                                                      XIE Kun-wu, HU Jun-peng
                                          (School of Information Engineering, Hubei Institute for Nationalities, Enshi 445000)

     Abstract     This paper proposes a Clustering Algorithm of High-dimensional Data(CAHD). Unit regions with intensive data points are found by
 employing the two-way search strategy in the designated n-dimensional space or its subspaces, and these intensive modules are clustered by a
 case-by-phase approach. Two-way search strategy can effectively reduce the search space, improve the efficiency of algorithms, and cluster intensive
 regional unit only uses one by one with two machines and displacement direction. Experimental results show that the running time CAHD algorithm
 spent is 30% less than other algorithms with the same number of categories found.
     Key words          clustering algorithm; high-dimensional data; unit


1                                                                                            Lik, Rik])          aij       uri       j             [Lij, Rij]     uri
                                                                                             j                                 uri            [Lij, Rij]           pi
                                                                                                Lij dj  Rij     pi            uri
                                                                                                         P_num(uri)             P_num(uri) T                 uri
                                                                                                       dur         uri = ([ai1, Li1, Ri1], [ai2, Li2, Ri2], , [aik, Lik,
                                                                                             Rik])                           [Lij, Rij] urn= ([an1, Ln1, Rn1], [an2,
                                                                                             Ln2, Rn2], , [ank, Lnk, Rnk])         [Lnj, Rnj]                    [Lij]=
                                                                                             [Rnj] [Lnj]=[Rij]               k-1 [Lij, Rij]=[Lnj, Rnj]
                                                    [1]                                            uri urj
                                                                                                                                                         dur
                                  2           (1)                                                                                                 k
                                                                                                                                 dur                             dur
                (2)
         2                                                                                                                                          (Clustering Algorithm of
                                                                                             High-dimensional Data, CAHD)
                                                                                                                       [2]
                                                                                             Apriori           2
2
                                                                                                           1       S      k
                                                                                                   S            k-1                          S’       S’     k-1


             (i1, i2,     , ik        k                                                                    2          S    k-1                                      S
         k                                                                                                                S                  k              S’      S’
                                                                                                                                                       (2002BA901A02)
                                                         n                                                                 (2004AA210B01)
P= {p1, p2, , pn} k                            S = {1×2× ×k}                pi = (d1,                             (1970       )
d2, , dk)     S
ur={ur1, ur2, , urm} uri                  ([ai1, Li1, Ri1], [ai2, Li2, Ri2],    , [aik,                     2007-06-28            E-mail         xiekunwu@163.com


                                                                                                                                                                         101
                                                                                                             Ck                  k               dur                   Dk-1      (k-1)
                                 k                        S          k-1                              dur                       dur             ai
      S’       S                S            S’                                                                              n-k                        n-k-1
3      CAHD                                                                                             Gen_ub(RCk)
3.1                                                                                                     for i = 1 to n-k
           I   m                                         I={i1, i2, , im} D                                  INSERT INTO RCk-1
                                                                                                             SELECT u 1 .[L1 , R 1], u 1 .[L 2 , R 2],         , u 1 .[L i-1 , R i-1 ], u 1 .[Li+1 ,
                                T                                  T⊆I
                                                                                                  Ri+1], , u1.[Lk-1, Rk-1], u2.[Ln-k, Rn-k]
                                X⇒Y                       X, Y ⊂ I        X∩Y = ∅
                                                                                                             FROM RCk
                                (itemset)                                                               return RCk
                                k-itemsets                   k                         k                     RCk-1      n–k-1                   dur                    RCk       n-k
                                                       (support)                            X                ur
                            sup(X)                     X⊂I                   D               X    3.2
                                                            s             sup(X)=s
   X ⇒ Y                                          sup(X Y)                                              CAHD(D)
sup(X Y)/sup(X)                                                                                         1 DUR 1 =1                             UR n = {ur 1 , ur 2 ,    , ur m },k = 1,S = {}
                                 2                             (1)                                      2     repeat
                                  minsup              (2)                                               3          DURk+1 = {}
                                                                                                        4          DURk+1 = Gen_bu(DURk)
                            minconf                      (1)
                                                                                                        5          URn-k = Gen_ub(DURn-k+1)
                                                                                                        6          Prunning(ref DURk+1, ref URn-k)
                                                                                       dur
                                                                                                        7          Mark(S)
3.1.1                                                                                                   8      k = k+1
                                                                     1                                  9     until (n-k = k) or (DUR k+1 = NULL) or (UR n-k                                dur)
                                                                                                        10     S’’ = NULL
               2                                                                                        11     for i = 1 to |S| {
                              CAHD                                                                      12     S’ = NULL
                                             2                                                          13     for row = i+1 to |S|
                                                                                                        14          S’[row] = And(S[i], S[row])
                                                                                                        15          if S’[row] <> S’[row-1] then
           CAHD                          2               RC      C
                                                                                                        16             S’’ = S’’ S’[row-1]
                                                                                   k
                                                                                                        17          if S’[row]        ‘0’ then continue
      Ck                                          k       RCk                                           18     }
    n-k                 n                                                                               19     return Merge(S’’)
                                k        n-k                                                            CAHD                          D                           n                 D
  Ck       Cn-k                      2                                    ur                                               m                                 1               1     dur
|Cn-k|     |Ck|            Ck                            ur                                       DUR1             m                             URn          S
                   Ckdur                                             dur               Sk                        n                                                                             2~
              Cn-k     ur                                 ur         Sk                                 9                                                2                   dur
                      ur                                                                                Prunning()                   dur                                Mark
dur                 2     ur                                   dur                                               S                                      10~            19
      Cn-k   dur        F                                                                                                                S’’                           Merge
        k+1     n–k-1                                                 1        F            dur
            dur                                                           ur                      4
              |Ck| |Cn-k|                                 Cn-k       ur                    Ck                                                                CLIQUE                      CAHD
  ur                                                                                                                                                         2
3.1.2                                                                                                           [3]
                 k                                    k+1                                               (1)|D|(                               ) 100 000
       Gen_bu(Dk-1)                                                                                     (2)n(                              ) 10~50
       INSERT INTO Ck
                                                                                                        (3)Cn(                            ) 5
       SELECT u1.[L1, R1], u1.[L2, R2], , u1.[Lk-1, Rk-1], u2.[Lk-1, Rk-1]
                                                                                                        (4)Dom(A)(                              ) 50
       FROM Dk-1 u1, Dk-1 u2
                                                                                                        (5)C(                    )   5                                   (             107         )
       WHERE u1.a1 = u2.a1, u1.L1 = u2.L1, u1.R1 = u2.R1,
                    u1.a2 = u2.a2, u1.L2 = u2.L2, u1.R2 = u2.R2,
                    u1.ak-2 = u2.ak-2, u1.Lk-2 = u2.Lk-2, u1.Rk-2 = u2.Rk-2,
                    u1.ak-1 < u2.ak-1
       return Ck


    102

						
Related docs
Other docs by xxk47264