Document Categorization Using Fuzzy Clustering by she20208

VIEWS: 0 PAGES: 13

									   Document Categorization Using                                        Table of Contents
        Fuzzy Clustering
                                                              Document Categorization
                                                              Pattern Recognition Systems
               Masoud Makrehchi                               Data Clustering
                Maryam Shokri                                 Fuzzy Clustering
                                                              Proposed Method
           Submitted to: Prof. M. Kamel
 Term paper of SD-622, Machine Intelligence Course            References
               University of Waterloo
                       SYDE                              5/2/2003                M. Makrehchi, M. Shokri                 2




           Document Categorization                       Introduction


    Introduction                                         • This paper presents the results of an experimental study of
                                                           document clustering techniques.
    Architecture of a Document Classification            • In this report we give a survey of the state-of-the-art in text
    System                                                 categorization.
                                                         • Since there is a large amount of information represented to
                                                           the people by Internet, the problem of finding an
                                                           appropriate method to organize the vast variety of
                                                           information must be solved.
                                                         • Our goal is organizing the large set of documents using
                                                           clustering method.

5/2/2003            M. Makrehchi, M. Shokri          3   5/2/2003                M. Makrehchi, M. Shokri                 4
   Architecture of a Document Classification System
                                                                                                             Pattern Recognition Systems
                                          Classified/
                                           Labeled               Thesaurus
                                             Data

      Unlabeled
      Documents
                     Labeling
                     Process
                                                                                                      Pattern Recognition Systems Life Cycle
                     Clustering
                                       Preprocessing
                                                                                                      Classification or Clustering?
                                                            Training                                  Classification Methods
                                                        (Classification)



                                                            Knowledge
                                                              Base




                           Preprocessing
                                                            Decision               Class
                                                            (Recall)
   5/2/2003                       M. Makrehchi, M. Shokri                                     5   5/2/2003                  M. Makrehchi, M. Shokri                           6




   Pattern Recognition Systems Life Cycle                                                         Classification or Clustering?
    Data            Data            Feature             Feature        Knowledge Engineer/                               No                           Yes
                                                                       Field Engineer Layer
                                                                                                                                   labeled
  Sampling       Conditioning      Extraction           Selection                                                                   data?
                                                                       (Real World)


                                                                                                              No                   Yes
                                                                                                                    Number of
                                              Training                                                               Groups?
                                       (Learning Process)                                                                                                    Classification
                                     Classification/Clustering          AI Engineer Layer
                                         Unsupervised/                  (Computer World)                                                                    ____________
                                           Supervised                                               Clustering                    Clustering                  Supervised
Including Internal                                                                                ____________                  ____________                   Learning
Feedback                                                                                             Unsupervised               Semi-Supervised
(Learning)                                     Recall                                                  Learning                    Learning
                 True                  (Pattern Recognition             Customer Layer
                 Data                    System In Use)                 (Real World)




   5/2/2003                       M. Makrehchi, M. Shokri                                     7   5/2/2003                  M. Makrehchi, M. Shokri                           8
 Classification Methods
                                                                                                             Data Clustering

 •   Rocchio’s algorithm
                                                                                          Introduction
 •   Naive Bayes
 •   K-nearest neighbor                                                                   A Clustering System
 •   Decision Trees
 •   Support Vector Machines
                                                                                          A Taxonomy of Clustering Techniques
 •   Voted Classification                                                                 Hard C-means Clustering
 •   Neural Networks
 •   Fuzzy Logic Based Learning

 •   Aggregation of multiple classifiers



 5/2/2003                      M. Makrehchi, M. Shokri                            9   5/2/2003                  M. Makrehchi, M. Shokri                      10




Introduction                                                                          A Clustering System

•    Cluster analysis is based on partitioning a collection of data points into
     a number of subgroups, where the objects inside a cluster (a subgroup)
     show a certain degree of closeness or similarity.
•    Hard clustering assigns each data point (or feature vector) to one and                          Pattern
     only one of the clusters.
                                                                                                                                                      Clusters
•    In hard clustering, the degree of membership for each data point is
     equal either one or zero, it means we assume a well defined boundaries            Feature Extraction/         Interpattern
     between the clusters. (No overlapping)                                             Feature Selection           Similarity                 Grouping

•    Clustering is an unsupervised learning approach.
•    Similarity measure is required
      – Generally taken as the Euclidean distance in feature space.                                                               Learning Feedback
                                                                                                 Design Feedback


 5/2/2003                      M. Makrehchi, M. Shokri                        11      5/2/2003                  M. Makrehchi, M. Shokri                      12
        A Taxonomy of Clustering Techniques                                                                                              Hard C-means Clustering
                                                   Clustering
                                                                                                                                         •   Initialize Cluster Centers: Choose a value of c representing the number of
                                                                                                                                             desired clusters- Centroid
            Hierarchical                                                                            Partitional
                                                                                                                                               – Each cluster is represented by a centroid (mean of all cluster members).
                                                                                                                                               – The number of Centroids is equal to number of final clusters.
                                                                                                                                         • Assign Data to the nearest cluster center
                                                                                                                                         • Evaluate the clusters (the average distance from centers) if it is within a given
 Single Link         Complete Link                                                                                                           limit then stop
                                                               Square                 Graph     Mixture                 Mode
                                                                Error                Theoretic Resolving               Seeking                 – We need a similarity measure or a distance measure,
                                                                                                                                               for example Euclidean distance
                                                                                                                                         • Otherwise update the cluster centers and reassign
From:                                                                                                    Expectation
A.K. JAIN, M.N. MURTY, P.J. FLYNN                              C-Means                                                                    the data points
                                                                                                         Maximizing


        5/2/2003                                  M. Makrehchi, M. Shokri                                               13               5/2/2003                                  M. Makrehchi, M. Shokri                                    14




        Hard C-means Algorithm                                                                                                           Hard C-means Algorithm


                                                    X4                                                                                                                               X4

                             X1                                      X3                                                                                           X1                                   X3
                                                                                                   X21                                                                                                                                  X21
                                                                                           X16                                                                                                                               X16
                                                                                                                                                                                     k
                                                    X7                                                                                                                               X17
                                                                                X5                                                                                                                                X5
                                              1
                              X2                                                                                                                                   X2
                                                                                                                                                                                                                             k2
                                                         X8                X12                                                                                                             X8                X12

                                                                                             X17                                                                                                                                  X17
                                   X6                            X11                 X14                                                                                X6                         X11                 X14

                                                   X9                                                                                                                               X9
                                                                                                                                                                             k4
                                                                          X15                                                                                                                               X15

                                                         X13                                                                                                                               X13
From:                                   X10                                                X19                                   From:                                       X10
                                                                                                                                                                                                                  k3
                                                                                                                                                                                                                             X19
                                                                                                   X20                                                                                                                                  X20
Ekkasit Tiamkaew,                                              X18
                                                                                                                                 Ekkasit Tiamkaew,                                               X18
Jirakhom Ruttanavakul                                                                                                            Jirakhom Ruttanavakul


        5/2/2003                                  M. Makrehchi, M. Shokri                                               15               5/2/2003                                  M. Makrehchi, M. Shokri                                    16
   Hard C-means Algorithm
                                                                                                      Fuzzy Clustering
                                       ∑k =1uik ∗ xk
                                            n
                Centroids:      Vi =                                                Why Fuzzy Clustering?
                                        ∑k =1uik
                                           n
                                                                                    What Is Fuzzy Clustering?
                                                                                    Types of Fuzzy Clustering
               Membership:
                                                                                    Objective function-based fuzzy clustering
                    1 X k − Vi ≤ X k − V j                                        algorithms
              uik =                                                               Bottlenecks in Fuzzy C-Means
                    0 otherwise                                                   Fuzzy Clustering in Document Categorization
                                                                                    Fuzzy c-means

   5/2/2003                    M. Makrehchi, M. Shokri                     17   5/2/2003                      M. Makrehchi, M. Shokri                      18




   Why Fuzzy Clustering?                                                        What Is Fuzzy Clustering?
• There are several applications in which the clusters have no clear and
  well defined boundaries, for example in Document Categorization.              •   Fuzzification of Hard C-Means
                                                                                •   Each example belongs to multiple clusters, to different degrees.
                                                                                •   The U membership matrix takes on real values between 0 and 1.
                                                                                •   U sums to 1 across the rows.
                                                                                •   If an example does not clearly fit into either of two clusters, this
                                                                                    knowledge can be captured
                                                                                       Example   C1   C2                   Example      C1    C2
                                                                                       1         0    1                    1            .2    .8
                                                                                       2         0    1                    2            .01   .99
                                                                                       3         0    1                    3            .45   .55
                                                                                       4         1    0                    4            .9    .1

                                                                                            Hard C-Means                       Fuzzy C-Means

   5/2/2003                    M. Makrehchi, M. Shokri                     19   5/2/2003                      M. Makrehchi, M. Shokri                      20
What Is Fuzzy Clustering?                                                                 Types of Fuzzy Clustering

• In Fuzzy clustering result is represented by grades of                                  1.     Fuzzy clustering based on fuzzy relation.
  membership of every pattern to the classes                                              2.     Fuzzy clustering based on objective function and fuzzy coovariance
                                                                                                 matrix
  established.                                                                            3.     Nonparametric classifier, that is the fuzzy generalized k-nearest
• Unlike binary evaluation of crispy clustering, the                                             neighbor rule
  membership grades in fuzzy clustering are evaluated                                     4.     Neuro-Fuzzy Clustering
  within the [0, 1] interval.                                                                  –     Self Organizing Maps
• The necessity of fuzzy clustering lies in the reality                                        –     Fuzzy Learning Vector Quantization
  that a pattern could be assigned to different classes                                        –     Fuzzy Adaptive Resonance Theory
  (categories).                                                                                –     Growing Neural Gas
                                                                                               –     Fully Self-Organizing Simplified Adaptive Resonance theory
• The objective function method is one of the major
                                                                                               –     Fuzzy Competitive Learning
  techniques in fuzzy clustering.
5/2/2003                         M. Makrehchi, M. Shokri                             21   5/2/2003                      M. Makrehchi, M. Shokri                          22




Objective function-based fuzzy clustering algorithms
                                                                                          Bottlenecks in Fuzzy C-Means
•   Fuzzy c-means algorithm: spherical clusters of approx. the same size
•   Gustafson-Kessel algorithm: ellipsoidal clusters with approx. the same size;
    there are also axis-parallel variants of this algorithm; can also be used to detect   •      There are three major bottlenecks in fuzzy clustering of real data:
    lines (to some extent)                                                                       the number of clusters: in most cases it is not defined a priori- we
•   Gath-Geva algorithm / Gaussian mixture decomposition: ellipsoidal clusters                   have to have a criteria to stop algorithm. With knowing the target
    with varying size; there are also axis-parallel variants of this algorithm; can              number of clusters, we have a semi-supervised learning.
    also be used to detect lines (to some extent)
•   Fuzzy c-varieties algorithm: detection of linear manifolds (infinite lines in 2D)     •      Centroid: The location and character of centroid, that is the
•   Adaptive fuzzy c-varieties algorithm: detection of line segments in 2D data                  representative of its cluster, is not necessarily predefined. We have
•   Fuzzy c-shells algorithm: detection of circles (no closed form solution for                  to do an initial estimation from the centroid location.
    prototypes)
•   Fuzzy c-spherical shells algorithm: detection of circles
                                                                                          •      There are a large variability in cluster shapes, cluster
•   Fuzzy c-rings algorithm: detection of circles
                                                                                                 densities, and the maximum number of data point in each
•   Fuzzy c-quadric shells algorithm: detection of ellipsoids
                                                                                                 cluster.
•   Fuzzy c-rectangular shells algorithm: detection of rectangles
5/2/2003                         M. Makrehchi, M. Shokri                             23   5/2/2003                      M. Makrehchi, M. Shokri                          24
Fuzzy Clustering in Document Categorization                                    Fuzzy Clustering in Document Categorization

G. Keswani, L.O. Hall                                                          O. Nasrouni, et al
• Application: Text Categoriation
                                                                               • Application: Mining web access logs
• Method: Semi-Supervised Fuzzy C-means
    algorithm                                                                  • Method: Relational competitive fuzzy clustering
     – Combination of ssFCM and Naïve                                                – Fuzzy version of agglomeration clustering,
         Bayse methods (feeding Naïve Bayse                                          – Using a new dissimilarity function (Non-Euclidean)
         classifier with labeled data using ssFCM
         clustering),
     – ssFCM is used for estimating the class-                                 R. Kreishnapuram, et al
         labels of the unlabeled data using the
         labeled data. The results of this
                                                                               • Application: Web document clustering
         clustering are used with NBC to classify                              • Method: K-medoids fuzzy clustering
         unseen documents.                                                           – A relational clustering method same as SAHN, CLARA, PAM and
•   The result has been compared with                                                  CLARANS.
    combination of NBC and Expectation-                                        •    Results: 83.05% recognition rate reported.
    Maximization Clustering

5/2/2003                          M. Makrehchi, M. Shokri               25     5/2/2003                          M. Makrehchi, M. Shokri                                   26




Fuzzy Clustering in Document Categorization                                    Fuzzy c-means

M.E.S. Mendes, L. Sacks                                                        •    The fuzzy c-means algorithm is based on minimization of the following
• Application: RFC documents                                                        objective function:
                                                                                                                                                     Any inner product metric
• Method: Fuzzy c-means, with proposed similarity function instead of                                           Set of K prototypes or centroids
                                                                                                                                                     (distance between Xj and
  Euclidean distance                                                                                                       Any real number>1         Vi)
                                                                                                               N     K
                                                                                           J q (U , V ) = ∑∑ (uij ) q d 2 ( X j , Vi );
• Results: 85% recognition rate reported

K.S. Leung, et al                                                                                              j =1 i =1
• Application: Content-based indexing.
                                                                             Fuzzy K-partitioned data set                                           The centroid of ith cluster
• Method: Fuzzy competitive clustering.
                                                                             K: Number of Clusters          The degree of
• Results: 79% recognition rate reported. (in average efficiency).                                          membership of Xj in
                                                                             N: Number of Data points       the ith cluster                The jth m-dimensional feature vector




5/2/2003                          M. Makrehchi, M. Shokri               27     5/2/2003                          M. Makrehchi, M. Shokri                                   28
    Step1: Choose primary random centroid (vi)- Prototypes                                                                                                                                                      Eliminate all centroid vectors that are too close

                                                                                                                                                                     X1                                                                                                                                                                                              X1
                                                                                               X1                                                                                                                                                                                                      X1
                        Z1        X1                  X1                                                           Z15                                                                                                               Z1        X1                  X1                                                      Z8
                                                                                                                                       X1                                                                                                                                                                                                      X1

                                                                                   X1                                                                 X1                                                                                                                                     X1                                                            X1
                                                           X1                                                                X1                                                                                                                                         X1                                                           X1
                                            Z21                                                                                                                       Z13
                        X1                                                                          X1                                                                                                                               X1                                                                     X1
                                                                                                                                                                                         X1                                                                                                                                                                                              X1
                                                                          X1                                                                                                                                                                                                           X1
                                  Z22                                                                                                       Z14
                                                                                                                        X1                                                 X1                                                                                                                                                   X1                                         X1
                                                                                                                                             7        X1                                                                                                                                                                                                   X1
                                                                                                                                            Z                                                                                                                                                                                                       Z6
                                       X1                            X1             Z23         Z16                                                                                 X1             X1                                               X1                            X1                                                                                                X1             X1
                        X1                                                                                                                                                                                                           X1

                                                                          4                                                                                               18
                                                                      Z                                                                               X1              Z                                                                                                            Z4                                                                      X1
                                                                                                                                                                                              X1                                                                                                                                                                                              X1

                        X1                                                          Z20                  X1
                                                                                                                                                                                                                                     X1
                                                                                                                                                                                                                                                                                                                 X1
                                  Z5                            X1                                                                              X1                                                                                             Z5                            X1                                                                      X1
                                                                                                                                                                          X1             X1                                                                                                                                                                               X1             X1

                                            X1
                                                                                                              X1
                                                                                                                                                          Z12                                                                                            X1
                                                                                                                                                                                                                                                                                                                      X1

                                  X1                                                                                                        Z2                        Z11                                                                      X1                                                                                                   Z2
                        Z6                       X1                                             Z25                               X1                      Z19                                 X1                                                              X1                                                                          X1                                                  X1
                                                                                                                                                                                                                                                                                                            Z9
                                                                                   X1                                                                                          X1                                                                                                            X1                                                                                X1
                                  Z8                   Z3                                                                                                                                                                                                           Z3
                                                                                                                   X1         X1                                                                                                                                                                                           X1         X1
                   X1                            X1                                                                                                             X1                  X1                                          X1                            X1                                                                                                X1                  X1

                                  Z17                                 Z24                 X1                                                                                                                                                                                                      X1
                                            X1                                                                                X1                     X1                                       X1                                                         X1                                                                           X1                  X1                                  X1


From:                        X1                       X1                      X1
                                                                                                              X1
                                                                                                                                            Z9
                                                                                                                                                                      Z10
                                                                                                                                                                                         X1                  From:                        X1                       X1                   X1
                                                                                                                                                                                                                                                                                                                      X1                                                                 X1
                                                                                                                                                                                                                                                                                                                                                    Z7

E. Tiamkaew,                                                                                                                                                                                                 E. Tiamkaew,
J. Ruttanavakul                                                                                                                                                                                              J. Ruttanavakul


        5/2/2003                                                                        M. Makrehchi, M. Shokri                                                                                         29           5/2/2003                                                                M. Makrehchi, M. Shokri                                                                                    30




                                                                                                                                                                                                                 Eliminate all clusters that have fewer than p vectors
    Assign feature vectors (X) to the nearest centroid vector




From:                                                                                                                                                                                                        From:
E. Tiamkaew,                                                                                                                                                                                                 E. Tiamkaew,
J. Ruttanavakul                                                                                                                                                                                              J. Ruttanavakul


        5/2/2003                                                                        M. Makrehchi, M. Shokri                                                                                         31           5/2/2003                                                                M. Makrehchi, M. Shokri                                                                                    32
• Step 3: Compute the degree of membership of all feature
  vectors in the all clusters:                                                                                                 ˆ
                                                                              • Step 5: Recalculate the degree of memberships; uij
                                                         1 /( q −1)
                                1                                           • Step 6: Do termination test;
                            2              
                            d ( x j , vi ) 
                                           
                 uij =
                          K      1       
                         ∑  d 2 (x , v ) 
                                                            1 /( q −1)

                                                                                         ij
                                                                                              [  ˆ    ]
                                                                                     if Max uij −uij < ε , then Stop
                         k =1 
                                  j   k                                           else goto Step 4 and compute new centroid vecors,
• Step 4: Compute new centroids:
                               N

                              ∑ (u        ij   )q X j
                      ˆ
                     V i=
                               j =1
                                   N

                                ∑ (u
                                   j =1
                                               ij   )q

5/2/2003                 M. Makrehchi, M. Shokri                         33   5/2/2003                    M. Makrehchi, M. Shokri                34




              Proposed Method                                                 Problem Statement
                                                                              •   Our data set is set of documents from the Reuters corpus.
                                                                              •   All the documents have been parsed and processed.
    Problem Statement                                                         •   Each term has been mapped to an integer word ID.
                                                                              •   Each category tag has been mapped to an integer category ID.
    Modeling and Data Representation                                          •   Preprocessing
                                                                                   – Remove HTML (or other) tags
    Feature Selection Strategy                                                     – Remove stop words
    Clustering Approach                                                            – Perform word stemming
                                                                              •   Dimensions
                                                                                   – 445 primary categories
                                                                                   – 29,108 primary terms (feature space)
                                                                                   – 18,551 items in data collection



5/2/2003                 M. Makrehchi, M. Shokri                         35   5/2/2003                    M. Makrehchi, M. Shokri                36
                                                                                     Modeling and Data Representation
Data Collection                                                                      •   Documents are represented by vectors of words.
                                                                                     •   There is a collection of documents represented in a word-by-document
                                                                                         matrix.
•   doc_w_db.dat : Database of documents that holds all the documents and the
                                                                                     •   This matrix is usually sparse
    categories for each document.
•   category.dic : Dictionary of categories this maps between the category IDs and   •   The number of rows of the matrix corresponds to the number of words
    the category name.                                                                   in the dictionary
•   word.dic : Dictionary of words This maps between the word IDs and the            •   Each component of the matrix is the weight of word i in document k
    words.                                                                           •   The more times a word occurs in a document, the more relevant it is to
                                                                                         the topic of the document.
                                                                                     •   The more times the word occurs throughout all documents in the
                                                                                         collection, the more poorly it discriminates between documents.
                                                                                     •   A major characteristic of text categorization problems is the high
                                                                                         dimensionality of the feature space.



5/2/2003                       M. Makrehchi, M. Shokri                          37   5/2/2003                    M. Makrehchi, M. Shokri                     38




Document Vector                                                                      Weighting

                                                                                     •   Boolean Weighting
                                                                                     •   Word Frequency Weighting
                                                                                     •   tf x idf-weighting
                                                                                     •   tfc-Weighting
    Each document is represented by a vector                                         •   ltc-Weighting
    Each dimension of the vector is associated with a word/term                      •   Entropy Weighting
    For each document, the value of each dimension is the frequency of
    that word that exists in the vector.



5/2/2003                       M. Makrehchi, M. Shokri                          39   5/2/2003                    M. Makrehchi, M. Shokri                     40
tf x idf-Weighting                                                                         Feature Selection Strategy

       It is a well-known approach for computing word weights. This method
       assigns the weight to each word in document in proportion to the
       number of occurrences of the word in the document, and in inverse                  Goals:
       proportion to the number of documents in the collection for which the
       word occurs at least once.
                                                                                          • Removes non-informative words
                                                   N                                        from documents.
                               aik = f ik × log(      )
                                                   ni                                     • Improve categorization effectiveness.
           f ik = The frequency of word i in document k                                   • Reduce computational complexity
           N    = The number of documents in the collection.                              Result: Dimensionality Reduction
                                                                                                                                                A simple 2D feature space
           ni   = The total number of times word i occurs in the whole collection.



5/2/2003                           M. Makrehchi, M. Shokri                           41    5/2/2003                   M. Makrehchi, M. Shokri                         42




Why dimension reduction?                                                                   Dimensionality Reduction
                                                                                           1. Reducing number of target categories (Classes)
• Pattern Recognition=Dimension Reduction                                                     with a threshold of at least 50 Documents belong to the
• Why?                                                                                        category. (<<0.1 of total documents)
                                                                                                       Distribution of documents in categories




                                          Feature Space
 Sample Space                              (order of m)      Class Space

    (order of n)                                               (order of k)
                                n>>m>>k
                                                                                                      Before Thresholding                  After Thresholding
5/2/2003                           M. Makrehchi, M. Shokri                           43    5/2/2003                   M. Makrehchi, M. Shokri                         44
Dimensionality Reduction                                              Feature Selection Methods

2. Term selection (Feature space reduction)                          •   Document Frequency Thresholding
   In the first step, we remove those terms with no
   information content.                                              •   Information Gain
   Terms seen in all categories, same as Reuter; 234 terms.          •   χ 2 - Statistic
   Term seen in none of categories; 669 terms.
                                                                     •   Mutual Information
    We still have 28,205 terms!                                      •   Term Strength
    Need more reduction.


5/2/2003                M. Makrehchi, M. Shokri               45     5/2/2003                       M. Makrehchi, M. Shokri                      46




Document Frequency Thresholding                                      Information Gain
                                                                      Information Gain measures the number of bits of information obtained
                                                                      for category pre-diction by knowing the presence or absence of a word
• The document frequency for a word is the number of documents        in at document. The information gain of a word w is:
  in which the word occurs.
                                                                   IG ( w) = −∑ j =1 P (c j ) log P (c j ) + P ( w)∑ j =1 P (c j w) log P (c j w) +
                                                                                     k                                    k
• In document frequency thresholding the document frequency for
  each word in the training corpus is computed.
                                                                                P ( w)∑ j =1 P (c j w) log P (c j w)
                                                                                         k

• Those words whose document frequency is less than
  predetermined threshold are removed in document frequency
  thresholding.                                                    c1 ,..., c k Denote the the set of possible categories.
                                                                           Information gain is computed for each word of training set.
• Assumption is that rare words are either non-informative for             Words whose information gain is less than some predetermined
  category prediction, or not influential in global performance.           threshold are removed.


5/2/2003                M. Makrehchi, M. Shokri               47     5/2/2003                       M. Makrehchi, M. Shokri                      48
Clustering Approach                                                                   References
                                                           •   Fuzzy Clustering Method for Content-based Indexing; K.S. Leung, I. King and
• Fuzzy C-Means (FCM) clustering technique and                 H.Y. Yue.
  its variants, and examine different similarity           •   Assessment of the Performance of Fuzzy Cluster Analysis in the Classification
  function to find more efficient strategy                     of RFC Documents; M. E. S. Mendes, L. Sacks.
                                                           •   Text Classification with Enhanced Semi-supervised Fuzzy Clustering; Girish
• With knowing the number of final cluster, the                Keswani, Lawrance O. Hall.
  algorithm will be semi-supervised.                       •   A survey of Fuzzy Clustering for Pattern Recognition; A. Barakli, P. Blonda.
                                                           •   Unsupervised Optimal Fuzzy Clustering; I. Gath, A.B. Geva.
• With labeled data, we can evaluate the clustering        •   A Fuzzy Relative of the K-Medoids Algorithm with Application to Web
  accuracy.                                                    Document and Snippet Clustring; R. Krishnapuram, A. Joshi, L. Yi
                                                           •   Text Categorisation : A Survey; Kjersti Aas, Line Eikvil, June 1999



5/2/2003            M. Makrehchi, M. Shokri           49   5/2/2003                       M. Makrehchi, M. Shokri                         50




5/2/2003            M. Makrehchi, M. Shokri           51

								
To top