Text Mining Algorithm Based on Fuzzy Clustering by she20208

VIEWS: 0 PAGES: 2

									       35           5                                                                                                                                                        2009  3
    Vol.35        No.5                                       Computer Engineering                                                                                            March 2009

    ·                             ·                          1000    3428(2009)05 0044                  02                                        A                                      TP301.6




                                                                                 1,2               2

                                (1.                                       116024       2.                                                         114005)

                      FCM                                                                                                                               NSFCM
         NSFCM
                                                                             FCM




                         Text Mining Algorithm Based on Fuzzy Clustering
                                                           LIU Zhi-yong1,2, GENG Xin-qing2
                                        (1. School of Management, Dalian University of Technology, Dalian 116024;
                                        2. Department of Mathematics, Anshan Normal University, Anshan 114005)

     Abstract    The main defect of traditional methods of FCM algorithm is sensitive to the isolated data and is to know the number of clustering in
advance. A fuzzy clustering algorithm NSFCM is presented in this paper, and NSFCM agorithm is applied to text mining. This algorithm adds a
weight to the membership of the data, which is to decrease the effect on the initial cluster center. This paper applies average information entropy to
find the number of clusters and adopts a density function algorithm to find the initial cluster centers. The experiment shows both the precision and
the efficiency of clustering NSFCM are higher than those of FCM.
     Key words     number of clustering; text clustering; fuzzy clustering




1                                                                                      2.1
                                                                                                                                                      [8]
                                                                                                                                                                                                        xi
                                           [1-2]

                                                                                                             N            1
                                                                                                 Di(0) = ∑                                                                                         (1)
                                                                                                             j =1   1 + f d × dij
                                                                                                                                2



                                                                                                       f d = 4 / rd2 , rd
                                                                                                                                                                                                   1
                                                                                                                                             rd        N
                                                                                                                                                                                                   2
                                                           Dunn                                         1     1       N N
                                                    [3]                                          rd =                ∑ ∑ dij
                                                                                                                              2
                                                                                                                                                                                                   (2)
Bezdek                        C         (FCM)                 n                                         2 N ( N − 1) j =1 i=1
  C                                                                                                      (2)                        xi                                           Di(0)
                                                     [4]
FCM
                                                                             c
                                                                                        D1* = max{Di(0) , i = 1, 2,L , N }                                          *
                                                                                                                                                                   x1           1
                     m                                              [5]
                                                                          [6]

                                                                                                                                             1
                   IFCM           [7]                                                            Di( k ) = Di( k −1) − Dk
                                                                                                                        *
                                                                                                                                                            2
                                                                                                                                                                , k=1, 2,    , c-1                     (3)
                                                                                                                               1 + f d xi − xk
                                                                                                                                             *



                                                                                                   c                                 xi − xk
                                                                                                                                           *
                                                                                                                                                                        xi                *
                                                                                                                                                                                         xk

                                                                                                       Dk = max{Di( k −1) , i = 1, 2,L , N }
                                                                                                        *                                                                                      *
                                                                                                                                                                                              xk
                                                                                             k

2     NSFCM                                                                                                                                                      (70671016)
                                                                                                                          (1967          )
         NSFCM
                                                                                                           2008-06-11                    E-mail             brove67@yahoo.com.cn

    44
2.2                                                                                                                                    (6)        ||P(b)-P(b+1)||<
                                                                                                                                                            b=b+1               (3)
                                                                                                                                 (7)                     (4)       H(b)          H(b+1) <H(b)                             c=c+1
                                                                                                                                 (2)                     H(x)
                                              2                                                                              c=c-1
                                                                                                                             3                    NSFCM
                                                                                                                             3.1
                          C N
          H = − ∑ ∑ {[uij × l b(uij ) + (1 − uij ) × l b(1 − uij )] / N }                                              (4)                                     (Vector Space Model, VSM)
                          j =1 i =1

                uij                           j                       i                                  H
                                                          c
2.3                                                                                                                                    V (d ) = (t1 , w1 (d );L; ti , wi (d );L; tn , wn (d ))
                                    [6]                                                                                                      ti                     wi (d )     ti         d                    wi (d )
                                                                                                                                             ti       d                   tfi (d )                   wi (d ) = ψ (tf i (d ))
                                                                                                                                       TF-IDF

          Nij = γ uij + (1 − γ )uij
                                 2
                                                                                                                       (5)
                                                                                                                                                               N
                                                                                                                                       ψ = tfi (d ) × l b( )                                                                      (8)
                 γ                                             [0, 1]                      γ =1, Nij=uij                                                       ni
N ij = f (uij )                                                                  uij        0        Nij     0                               N                                       ni                  ti
uij       1                 Nij         1              [0, 1]                                                                3.2
                                                                                                                       Nij

                                                                                 γ                               Nij
                                                                                                     γ
                                             Nij


2.4                                                                                                                          3.3
          FCM                                                                                                FCM
                                                                                                              c                                                               NSFCM
                      m                                                                                                      NSFCM



                                                                                                             2
                                                                                                 U                           4
                                                                                                                                                                        1 000
                                                                                                                                                      3 658
                                                                                                         NSFCM                                                  5                      200         Matlab6.5
                                                                                                                                                                                 pdist() squareform()
          (1)                                          c=2                                       b=0                                                                                     NSFCM        4
      m                                               ε                                                                                                         1 min            FCM              c 5
          (2)                        (1)~         (3)                                                                              8                                                7 min              NSFCM
          (3)                                                                                                                      FCM
                                       (b)     2                                                                                     γ                         [0, 1]     ε =0.01               m        2
                                c     dij
      u ij = 1/ ∑ (
        (b)
                                        (b)
                                            ) m−1                                                                      (6)                        5       γ =0.2                                γ =0.8                     γ =0.2
                             k =1     d kj
                                                                                                                                                              γ =0.8      γ =0.2, ε =0.01, m=2
          ∃i, r               d ir = 0
                                (b)
                                                               d ir                    r                     i
                                                                                                                                   1
                                                      u ir = 1
                                                        (b)
                                                                                j ≠ i, u (b) = 0
                                                                                         jr
                                                                                                                                                                1    NSFCM
          (4)                        (5)                                                                                                                                                                      /(%)              /(%)
                                                                                                                                                          6         194                   209            93                97

          (5)                                                             (5)              (6)                                                            3         197                   212            92                99

                            N                                                                                                                         20            180                   185            97                90
                                       (b +1) m
                           ∑ (N ij           ) ⋅xj                                                                                                    15            185                   191            97                93
           (b +1)          j=1
      P   i           =          N
                                                              , i=1,2,          ,c                                     (7)                            20            180                   196            92                90
                                           (b +1) m
                                ∑ (Nij            )
                                j=1                                                                                                                                                                                       49
                                                                                                                                                                                                                                45

								
To top