Docstoc

A Fuzzy Ontological Knowledge Document Clustering Methodology

Document Sample
A Fuzzy Ontological Knowledge Document Clustering Methodology Powered By Docstoc
					                                     A Fuzzy Ontological
                                    Knowledge Document
                                  Clustering Methodology
      IEEE Transactions on Systems, Man and Cybernetics --Part B: Cybernetics,
      Vol. 39, No. 3, June 2009
      Amy J. C. Trappey, Charles V. Trappey, Fu-Chiang Hsu, and David W. Hsiao
                                     指導老師:陳彥良 教授
                                     報 告 人:詹子銘 (984403003)
                                     報告日期:2010.12.3
R.1
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Outline
 Abstract
 Introduction
 Literature Review
 System Methodology
 Case Examples and Experiment
 Conclusion




                                         詹子銘                        1
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Abstract (1/2)
   本文提出一種新方法對知識文件(特別是專利)進行
    分群(Clustering)
   當前以關鍵字為基礎的方法對文件內容管理往往是
    不一致且無效的
      當採用技術內容的部份含義用於分群分析時

   提出新方法
      運用本體論架構(ontology schema)來自動地解譯並將知識
       文件分群
      運用模糊邏輯控制(fuzzy logic control)方法,根據文件導
       出的本體語意網(ontological semantic webs),將專利文件
       配入合適的文件群組

                                         詹子銘                        2
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Abstract (2/2)
   最後利用三個案例來測試本文的方法:
      第一個案例,從世界智慧財產權組織(WIPO)取得的100
       篇化學機械研磨製程(CMP)專利,進行分析分群
      第二個案例,從線上網站取得的100篇專利新聞稿,進行
       分析分群
      第三個案例,從WIPO取得的100篇RFID專利,進行分析
       分群
   結果顯示模糊本體論文件分群方法性能比K-means
    法好,特別在以下指標的表現:
      Precision
      Recall
      F-measure
      Shannon’s       entropy
                                         詹子銘                        3
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Introduction (1/3)
   公司開發產品面臨很大的不確定性及時程限制的
    挑戰-因為:
      設計變複雜了
      產品週期變短了
      競爭對手握有堅實的專利組合

   對策是:
      持續的分析專利知識以避免侵權
      訂出智財(IP)範圍
      指揮研究發展朝向申請新專利以豐富並擴大其策略性專
        利組合的目標而努力


                                         詹子銘                        4
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Introduction (2/3)
   專利知識管理領域,專利分群扮演重要角色以協助
    訂出未來R&D的方向
   目前在專利分群的研究,依賴運用關鍵字、片語的
    統計方法
      此類方法無法代表在專利文件中的知識

   本文採用知識本體表達技術和模糊邏輯控制技術
      知識本體表達法讓領域專家能夠以一致的方式定義知識
       並透過使用標準格式(如: XML, RDF, OWL)改善知識交
       換的效率
      模糊邏輯接著用在語言學的表現法,來導出文件間的相
       似度度量值用以分群

                                         詹子銘                        5
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Introduction (3/3)
   依靠這兩項技術(知識本體+模糊邏輯)支援,專利
    的深度知識涵義可被導出,專利之間的相似度可真
    實的定義
   本文也探討text mining, ontology, fuzzy logic control,
    clustering methodology 等領域文獻
   最後提出模糊知識本體文件分群法論(FODC),並
    且用三個案例來展示其效率(efficiency)及效用
    (effectiveness)案例為:
      CMP專利100篇
      專利新聞稿100篇
      RFID專利100篇
                                         詹子銘                        6
       A Fuzzy Ontological Knowledge Document Clustering Methodology


   Literature Review (1/8)
      Areas of research for document clustering
                                                                • Information Retrieval
                                  Text Mining                   • Natural Language
• Fuzzy Inference                                                  Processing
• Similarity derivation


                                   Document                      Mathematical
       Fuzzy Logic                                               Clustering
                                   Clustering

                                                               • K-means Algorithm
                                     Ontology                  • Levenshtein Distance
  • Rule-based                       Creation                    Algorithm
  • Logic-based                                                • Fuzzy C-mean Algorithm
       - RDF (Triples)                                         • Hierarchical Clustering
  • Frame-based                                                  Algorithm
                                            詹子銘                                            7
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Literature Review (2/8)
   Text mining-
      Information retrieval-frequently use                  key phrases to index
        and retrieve documents
             For example, Hou and Chan [3] present a methodology to extract
              document key phrases and then calculate frequencies and derive
              relationships between the phrases
             Nevill-Manning et al. [4] present an interactive means to infer a
              document hierarchical structure
             Witten [5] presents an algorithm, SEQUITUR, to extract a
              hierarchical phrase structure from text. The algorithm uses Naïve
              Bayes statistics, text term frequency, inverse document frequency,
              and placement distance to identify key-phrase sequences to infer the
              document structure
             Sanchez et al. [6] use a feature data-mining algorithm
             Feng and Croft [7] use a Markov model and the Viterbi algorithm
              for phrase extraction

                                         詹子銘                                         8
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Literature Review (3/8)
   Text mining-
      Natural      language processing -
             The common approach is to analyze natural language using grammar
              and semantics
             Computer programs parse the natural language of a sentence using
              the rules of grammar
             However, determining the meaning of a sentence is a difficult and
              complicated problem that tends to be domain and language specific




                                         詹子銘                                      9
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Literature Review (4/8)
   Ontology-
     A   body of knowledge in an area of interest is represented by
       the objects, concepts, entities, and the relationships among
       them
      World Wide Web can be thought of having an ever
       expanding body of knowledge that requires a structured
       framework, i.e., ontology, to describe it and make it available
       for use. Thus, the RDF was created
      The base element of the RDF is a triple: A resource (the
       subject) is linked to another resource (the object) through an
       arc labeled with a third resource (the predicate)


                                         詹子銘                         10
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Literature Review (5/8)
   Ontology-
      Wu   and Palmer [8] present a distance-based algorithm to
       compute the similarity values of pairwise keywords in the
       ontology
      Kung [9] presents an algorithm that automatically generates
       the ontology and classifies information using fuzzy neural
       networks
      Kao [10] presents a document classification methodology
       using an automatically constructed ontology but also uses
       document key term frequencies for classification.



                                         詹子銘                         11
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Literature Review (6/8)
   Fuzzy logic-
      Gruninger and     Fox [11] proposed a methodology to
       facilitate ontology design and evaluation and implement it
       via the TOVE (TOronto Virtual Enterprise) modeling project
      The rules and conventions of experts are transformed into
       mathematics, then the computer can be programmed to
       mimic experts and process knowledge with consistency
      Lee et al. [12] use a predefined ontology to extract news
       content and apply a fuzzy inference model to derive the
       similarity of the news and generate news summaries



                                         詹子銘                        12
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Literature Review (7/8)
   Clustering-
     A  general method to create sets that are fairly homogeneous
       within groups but significantly heterogeneous between
       groups
      Runkler and Bezdek [13] clustered the text of web pages
       and the sequences of web pages visited by users (web logs).
       The Levenshtein distance algorithm and the fuzzy c-mean
       algorithm were jointly applied to generate the clusters
      Hsu et al. [14] who used the K-means approach for
       clustering patent documents




                                         詹子銘                         13
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Literature Review (8/8)
   As shown by the previous research, phrases extracted
    from documents are frequently used to establish
    similarity relationships between document texts, and
    these similarity relationships are used as the basis to
    group documents
   However, the statistical analysis of key phrases cannot
    fully represent the underlying knowledge
   This correspondence presents a method to analyze
    and cluster patents and related knowledge documents
    using a domain ontology schema rather than a key-
    phrase text-mining approach

                                         詹子銘                        14
    A Fuzzy Ontological Knowledge Document Clustering Methodology


System Methodology (1/26)
   Fuzzy Ontological Document Clustering (FODC)

                        A. Building a Patent Ontology


       B. Natural Language Processing and Terminology Training


                           C. Terminology Analyzer


                          D. Knowledge Extraction


                          E. Patent Similarity Match


                  F. Defuzzification and Patent Clustering


                                         詹子銘                        15
     A Fuzzy Ontological Knowledge Document Clustering Methodology


 System Methodology (2/26)
A.    Building a Patent Ontology
         Domain experts define the “Ontology Schema”
         Knowledge-engineering methodology (7 Steps) for
          ontology building
         Using knowledge ontology building & RDF editing tool –
          Protégé
             Graphical User Interface
             A Framework that other software plug-ins can easily be added and
              linked
         The ontological web can be automatically transformed into
          standard data formats (XML, RDF, or OWL) for further
          manipulation and interpretation for knowledge analysis and
          synthesis

                                          詹子銘                                    16
     A Fuzzy Ontological Knowledge Document Clustering Methodology


 System Methodology (3/26)
    CMP (Chemical Mechanical Polishing) Terminology


                                                                     (Conduit)
Carrier Speed (CS)
                                       (Film)

 (wafer /
 substrate)                                                              (Slurry)
                                                                      (含 Particle)


                                                (Polish pad)



                            Platen Speed (PS)
                                          詹子銘                                    17
                           A Fuzzy Ontological Knowledge Document Clustering Methodology


                         Ontology Schema for the CMP Domain
Main concepts




                             CMP_consumable                     CMP_method                    Transport_apparatus

                                         is a      polish                                   utilize       is a
                               is a                 determine    clean             polish
                                                                         control
                                                                                                      Wafer_transfer
                                                             CMP_                  Substrate
                             Film              Slurry                                                 _mechanism
                                                           equipment
 Detailed descriptions




                                                                           CS_PS             is a
                                    comprise                    carry                                   position
                                                  absorb

                            Particle                                               Conduit            Polish_pad
                                          Electromagnetic_radiation

                                                                                   【註 】紅箭號與論文Fig.10不同
                                     Triples 三元式
                                                                                   Ontology Schema 或另稱:
                         Subject       Predicate        Object                       Ontological Web,
                          主詞             述語                受詞                        Semantic Network
                                                                  詹子銘                                                  18
     A Fuzzy Ontological Knowledge Document Clustering Methodology


System Methodology (4/26)
B.    Natural Language Processing and Terminology Training
     System is trained using a set of patent documents
     Sentences from the training documents are tagged to extract the
      parts of speech, chunks, and lemmas using the MontyLingua
      natural language processing tool




          詞
          類
          標
          記



                                          詹子銘                        19
    A Fuzzy Ontological Knowledge Document Clustering Methodology


System Methodology (5/26)
   MontyLingua—(英文自然語言處理工具,非商業用免費)
      Commonsense-enriched,              end-to-end natural language
       understander for English.
      Feed raw English text into MontyLingua, and the output
       will be a semantic interpretation of that text.
      Perfect for information retrieval and extraction, request
       processing, and question answering.
      From English sentences, it extracts subject/verb/object tuples,
       extracts adjectives, noun phrases and verb phrases, and
       extracts people's names, places, events, dates and times, and
       other semantic information.
      Free for non-commercial use

                                         詹子銘                            20
     A Fuzzy Ontological Knowledge Document Clustering Methodology


System Methodology (6/26)
B.    Natural Language Processing and Terminology Training
     Knowledge engineers map the extracted words (字詞)to the
      concepts of the ontology (人工)
     Example:
       “A chemical mechanical polishing apparatus and method
        for polishing semiconductor wafers. . .,”
         chemical mechanical polishing apparatus         concept CMP_method (n.)
         method                                          concept CMP_method (n.)
         Polishing                                       concept polish (v.)
         semiconductor wafers                            concept substrate (n.)



                                                                     Ontology
                                          詹子銘                                    21
     A Fuzzy Ontological Knowledge Document Clustering Methodology


System Methodology (7/26)
B.    Natural Language Processing and Terminology Training
     Chunk style--詞組形式example:
         “The Fulton County Grand Jury said Friday an
          investigation of Atlanta ’s recent primary election produced
          "no evidence” that any irregularities took place.”
         [NX The Fulton County Grand Jury] [VX said] [NX Friday]
          [NX an investigation] of [NX Atlanta] ’s [NX recent primary
          election] [VX produced] “ [NX no evidence] ” that [NX any
          irregularities] [VX took] [NX place].
         NX-- noun chunk, VX– verb chunk


                                          詹子銘                            22
     A Fuzzy Ontological Knowledge Document Clustering Methodology


System Methodology (8/26)
B.    Natural Language Processing and Terminology Training
     System records the probabilities of the concepts that a word
      (lemma) implies in the patent
     Conditional probability—
         P(The patent concept | The word W in chunk C of the corpora)
       It is derived during the training session
     Example: 10 training patents that contain word: “polishing”
                                                in chunk NX
     To map “polishing” to ontology concept:
          CMP_method: 5 patents
         P (The concept is CMP_method | The word polishing is in the NX corpora chunk) =
          0.5
          Polish_pad: 5 patents
                                            word
          P (The concept is polish_pad | The 詹子銘 polishing is in the NX corpora chunk) = 0.5   23
     A Fuzzy Ontological Knowledge Document Clustering Methodology


System Methodology (9/26)
B.    Natural Language Processing and Terminology Training
     To maintain the completeness of the FODC system, the
      research also includes an iterative relearning mechanism to
      include new words
     System manager assigns a corresponding ontological concept
      to this term
     Automatically recalculate &
     update
     Terminology-ontological
     concept knowledgebase


                            前頁 example

                                          詹子銘                        24
     A Fuzzy Ontological Knowledge Document Clustering Methodology


System Methodology (10/26)
C.    Terminology Analyzer
     After previous step (Natural Language Processing and
      Terminology Training), all of the sentence concepts are
      inferred
     Probabilities of the concepts for each chunk (Table II) are
      computed
     算出Probability有助於解析:
     每個句子代表的主要觀念
     (例子見次頁)



                                                  詞元      詞組
                                          詹子銘                        25
     A Fuzzy Ontological Knowledge Document Clustering Methodology


System Methodology (11/26)
C.    Terminology Analyzer
     Example-- Parsing and analyzing sentence:
         “chemical mechanical polishing apparatus and method”
       計算 Ontology concepts probability:
     * CMP_method = (1+1+0.5+1+1)/5 = 0.9
     * polish_pad = 0.5/5 = 0.1




     Lemma       Chunk                    詹子銘                        26
     A Fuzzy Ontological Knowledge Document Clustering Methodology


System Methodology (12/26)
D.    Knowledge Extraction
     The chunks implying concepts as predicates are the first to
      enter into the ontology, 例如:下圖 chunk5,含有 concept c1,c2
   Select chunks that imply the concepts as the subject in the
    ontology from the previous sentence to the next sentence
   The same process is used to
                                         主詞         述語         受詞
determine the object candidates
   Probability:


     Document is transformed into a set of
     statements in the ontology
      Statement 成為文件間相似度比較基礎
                                          詹子銘                        27
     A Fuzzy Ontological Knowledge Document Clustering Methodology


System Methodology (13/26)
E.    Patent Similarity Match
     前一步找出文件所有Statement (主詞+述語+受詞)
     將文件對照ontology schema,區分出所含的Main concepts
      及Detail descriptions (如下圖)


               主詞

                述語


               受詞 主詞

               述語


                     受詞

                                          詹子銘                        28
     A Fuzzy Ontological Knowledge Document Clustering Methodology


System Methodology (14/26)
E.    Patent Similarity Match
     Similarity measure of document 1 and document 2:




                                                            Main concepts:
                                                             TTm= 4, STm = 2
                                                             Xm = 2/4 = 0.5

                                                            Detail descriptions:
                                                             TTd = 5, STd = 2
                                                             Xd = 2/5 =0.4
                                          詹子銘                                      29
     A Fuzzy Ontological Knowledge Document Clustering Methodology


System Methodology (15/26)
E.    Patent Similarity Match
     Fuzzy Rules for
     Patent Document
     Similarity Derivation




                                          詹子銘                        30
     A Fuzzy Ontological Knowledge Document Clustering Methodology


 System Methodology (16/26)
E.   Patent Similarity Match
    Overall Similarity Matrix (derived from Fuzzy Rules)
         Similarity: H--High, M--Medium, Low--Low

                                                      Main Concepts
         Overall Similarity
                                     Many match         Some match    Few match

                    Many match             H                  H          M
   Detail
                    Some match             H                  M          L
Descriptions
                    Few match              M                  M          L




                                          詹子銘                                     31
     A Fuzzy Ontological Knowledge Document Clustering Methodology


System Methodology (17/26)
E.    Patent Similarity Match
     Fuzzy logic representations of “many matches,” “some matches,”
      and “few matches” are defined by the membership functions
      shown below:




     many matches                     some matches                   few matches




                                          詹子銘                                      32
     A Fuzzy Ontological Knowledge Document Clustering Methodology


System Methodology (18/26)
E.    Patent Similarity Match
     Mamdani fuzzy inference model applies legacy if–then rules to
      fuzzify the input and output, min-min-max operation
     Consider 9 rules(Table IV) simultaneously
     Steps:
     (1) Calculate the similarity of the documents matched in main concepts (Xmc)
          and the similarity of the documents matched in detailed descriptions (Xdd)
     2) Evaluate Xmc and Xdd using the rules to derive the corresponding
          memberships
     3) Compare the memberships and select the minimum membership from these
          two sets to represent the membership of the corresponding concept (high
          similarity, medium similarity, and low similarity) for each rule
     4) Collect memberships which represent the same concept in one set
     5) Derive the maximum membership for each set, and compute the final
          inference result               詹子銘                                      33
     A Fuzzy Ontological Knowledge Document Clustering Methodology


System Methodology (19/26)
E.   Patent Similarity Match
    Procedure for calculating the membership of “High Similarity”
     given the input (Xmc,Xdd) = (0.6, 0.6).




                                          詹子銘                        34
     A Fuzzy Ontological Knowledge Document Clustering Methodology


System Methodology (20/26)
E.   Patent Similarity Match
    Procedure for calculating the
     membership of “Medium
     Similarity” given the input
     (Xmc,Xdd) = (0.6, 0.6).




                                          詹子銘                        35
     A Fuzzy Ontological Knowledge Document Clustering Methodology


System Methodology (21/26)
E.   Patent Similarity Match
    Procedure for calculating the membership of “Low Similarity”
     given the input (Xmc,Xdd) = (0.6, 0.6).




                                          詹子銘                        36
     A Fuzzy Ontological Knowledge Document Clustering Methodology


System Methodology (22/26)
F.   Defuzzification and Patent Clustering
    Decide which similarity (“High Similarity,” “Medium Similarity,”
     and “Low Similarity”) best represents the relationship between
     these two documents




    Focuses on transforming the value from the similarity membership
     例子見次頁
                                          詹子銘                        37
 A Fuzzy Ontological Knowledge Document Clustering Methodology


System Methodology (23/26)




                                      詹子銘                        38
     A Fuzzy Ontological Knowledge Document Clustering Methodology


System Methodology (24/26)
F.    Defuzzification and Patent Clustering
     After all measures of similarities between patents are calculated,
      the similarity matrix is generated.




                                          詹子銘                         39
     A Fuzzy Ontological Knowledge Document Clustering Methodology


System Methodology (25/26)
F.    Defuzzification and Patent Clustering
     Agglomerative Hierarchical Clustering Algorithm, Proximity
      計算採 average-linkage method
     步驟:
     1) Find the max(rij) in the matrix, and group the documents i
         and j into a new cluster.
     2) Calculate the relationship between the new cluster and
         other documents by using the average-linkage method.
     3) Go to Step 1), until there is only one cluster left.
     原文件間similarity matrix 元素rij 介於0~1間,越相近值越
      大,與距離概念正好相反,如果將元素換成 dij=(1- rij)便
      可轉換並應用距離概念進行由下而上的階層式分群
                                          詹子銘                        40
     A Fuzzy Ontological Knowledge Document Clustering Methodology


System Methodology (26/26)
F.    Defuzzification and Patent Clustering

                  dij
                   0.25


                    0.2


                   0.15


                    0.1


                   0.05


                      0
                          3       6      4        1     2       5
                                             文件


                                          詹子銘                        41
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Case Examples and Experiment (1/15)
   Case 1- CMP patents
      50 CMP patent documents were collected and downloaded
       from the World Intellectual Property Organization (WIPO)
       patent pool as the training documents for ontology building
       and terminology training
      Additional 50 CMP patent documents were collected from
       the WIPO as the test set
             Type 1- focuses on the mechanical aspects of CMP machines
             Type 2- considers new chemical compositions for the polishing
              slurry
             Type 3- covers innovative cleaning methods for CMP machinery



                                         詹子銘                                  42
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Case Examples and Experiment (2/15)
   Case 1- CMP patents




                                    Ontology Schema


                                         詹子銘                        43
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Case Examples and Experiment (3/15)
   Case 1- CMP patents
   結果如下:
                            Cluster 1             Cluster 2         Cluster3
       分群數量                (Mechanical         (Composition of       (Clean
                              design)               Slurry)           Method)
          Actual                20                     15             15
          FODC                  18                     18             14
       K-means +
         TF*IDF
                                15                     21             14




                                         詹子銘                                    44
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Case Examples and Experiment (4/15)
   Case 1- CMP patents
   The reason for the higher degree of error in clustering using
    key-phrase-based K-means is due to the inclusion of
    insignificant key phrases (e.g., structure, method, substrate,
    and wafer) which are applied as the basis of the clustering
    criteria.
   These key phrases often appear in CMP patent documents. For
    example, some CMP patents belonging to mechanical control
    (Type 1) may contain less significant key phrases which cause
    the K-means approach to place the patents into the wrong
    clusters.


                                         詹子銘                         45
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Case Examples and Experiment (5/15)
   Case 2- Patent news
      100 Patent news-related documents were collected and
        downloaded as the training set for ontology building and
        terminology training
      Additional 100 documents were collected as the test set
             Type 1- focuses on patent infringement
             Type 2- cover patent trade
             Type 3- application of new patents to make products




                                         詹子銘                        46
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Case Examples and Experiment (6/15)
   Case 2- Patent news




                                  Ontology Schema

                                         詹子銘                        47
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Case Examples and Experiment (7/15)
 Case 2- Patent news
 結果如下:

                             Type 1                                     Type3
                                                  Type 2
    分群數量                     (patent
                                               (patent trade)
                                                                    (new patents to
                           infringement)                              make products)

       Actual                  51                    18                  31
       FODC                    49                    16                  35
    K-means +
      TF*IDF
                               45                    23                  32


                                         詹子銘                                           48
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Case Examples and Experiment (8/15)
   Case 3- RFID patents
      100 RFID-related patents were collected and analyzed as the
        training set for ontology building and terminology training
      Additional 100 documents were collected as the test set
             Type 1- focuses on data detecting and data presenting
             Type 2- cover digital data processing
             Type 3- application of signal devices and interaction devices




                                         詹子銘                                  49
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Case Examples and Experiment (9/15)
   Case 3- RFID patents




                                  Ontology Schema

                                         詹子銘                        50
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Case Examples and Experiment (10/15)
 Case 3- RFID patents
 結果如下:

                            Type 1                                      Type3
                                                  Type 2
                        (data detecting                             (signal devices
      分群數量                                      (digital data
                             and data                                 and interaction
                                                  processing)
                           presenting)                                   devices)

        Actual                 40                   30                   30
        FODC                   37                   35                   28
     K-means +
       TF*IDF
                               30                   43                   27


                                          詹子銘                                           51
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Case Examples and Experiment (11/15)
   為了比較 FODC法與K-means法分群效果採用以下指標:




     F-measure :



   Shannon’s Entropy to measure the clustering capability (越小越好)
                                    Pij is the probability that a number of cluster j
                                    belongs to class i
                               nj is the size of cluster j, m is the number of clusters,
                               and n is the total number of patent documents

                                         詹子銘                                               52
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Case Examples and Experiment (12/15)
   Case 1-CMP case results comparing this research and the
    K-means approach




                                         詹子銘                        53
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Case Examples and Experiment (13/15)
   Case 2-Patent news case results comparing this research and
    the K-means approach




                                         詹子銘                        54
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Case Examples and Experiment (14/15)
   Case 3-RFID case results comparing this research and the
    K-means approach




                                         詹子銘                        55
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Case Examples and Experiment (15/15)
   The differences between the FODC and the K-means
    approach are summarized




                                         詹子銘                        56
    A Fuzzy Ontological Knowledge Document Clustering Methodology


Conclusion
   We analyze the grammar of the sentences and derive the
    ontology of documents. Then, the relationships between
    documents are inferred, and the document similarities and
    differences are compared
   A fuzzy ontology-based methodology for clustering
    knowledge documents (the FODC methodology) is
    presented and compared to the frequently used key-phrase
    K-means approach
   The benchmarking results demonstrate that the FODC
    approach outperforms the K-means clustering approach
    and provides R&D managers with a new and beneficial
    approach for IP and innovation management.


                                         詹子銘                        57
Thank you for your
    listening !


 Q&A
    A Fuzzy Ontological Knowledge Document Clustering Methodology


附錄-
   從語料庫(corpus)統計,詞類排列順序之二元或然率
    (bigram probabilities)用Markov Chain表示如下

                                冠詞
                                               .65
                                ART                       V         .74
                     .71
                                         .43               動詞
             Φ                   1             .35
                        .29
                                                 .26
                           名詞        N                    P
                                               .44
                                                              前置詞
                                  .13




                                         詹子銘                              59

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:31
posted:6/3/2012
language:
pages:60