Docstoc

計算語言學: 理論,技術,與應用

Document Sample
計算語言學: 理論,技術,與應用 Powered By Docstoc
					    Summarization Technologies and
             Evaluation
       自動摘要技術與評估

                                Hsin-Hsi Chen
          Department of Computer Science and Information Engineering
                          National Taiwan University
                                   陳信希
                         國立台灣大學資訊工程學系




H-H Chen, CSIE, NTU                                                    1
                      Outlines
• Introduction
• Architecture of a Summarization System
• An Evaluation Model using Question
  Answering System
• Multilingual News Summarizer
• Conclusion

H-H Chen, CSIE, NTU                        2
                      Introduction




H-H Chen, CSIE, NTU                  3
  Why Summarization is needed?
• Owing to the widespread use of the Internet, a
  large scale of multicultural information can be
  obtained
• Eliminate some degree of bottlenecks on the
  information highway
• Issues
     – How to absorb and employ information effectively
     – How to tackle the problem of multilingual document
       clustering
H-H Chen, CSIE, NTU                                         4
          Where is Summarization?
•   Headline
•   Table of contents
•   Preview
•   Digest
•   Highlights
•   Abstract
•   Bulletin
•   Biography
              :
H-H Chen, CSIE, NTU                 5
     What is Text Summarization?
• The Process of distilling the most important
  information from a source (or sources) to produce
  an abridged version for a particular user (or users)
  and task (or tasks)
• Extract vs. Abstract
     – An extract is a summary consisting entirely of material
       copied from the input
     – An abstract is a summary at least some of whose
       material is not present in the input, e.g., subject
       categories, paraphrase of content, etc.

H-H Chen, CSIE, NTU                                              6
     Characteristics of Summaries
• Reduction of information content
   – Compression Rates
   – Target Length
• Informativeness
   – Fidelity to Source
   – Relevance to User’s Interest
• Well-formedness
   – Extracts: need to avoid gap, dangling anaphora, etc.
   – Abstracts: need to produce grammatical, plausible output

H-H Chen, CSIE, NTU                                         7
          Evaluation of Summaries
• Intrinsic methods test the system in itself
     – Criteria
          • Coherence
          • Informativeness
     – Methods
          • Comparison against reference output
          • Comparison against summary input
• Extrinsic methods test the system in relation to
  some other task
     – Time to perform tasks, accuracy of tasks, ease to use
     – Expert assessment of usefulness in task
H-H Chen, CSIE, NTU                                            8
       Summarization Approaches
• The research of Text summarization begins very
  early (Luhn, 1958; Edmundson, 1964, 1969)
• Single Document Summarization
     – Chen, Lin, Huang, and Chen, 1998; Kupiec, pedersen,
       and Chen, 1995; Lin and Hovy, 1997; Brunn, Chali, and
       Pinchak, 2001, etc.
• Multiple Document Summarization
     – Chen and Huang, 1999, Mckeown and Radev, 1995;
       Mani and Bloedorn, 1997; Radev and Mckeown, 1998;
       Lin and Hovy, 2002, etc.
• Summac-1(1998) - Evaluation Tasks
• Document Understanding Conference (2000)
H-H Chen, CSIE, NTU                                        9
              SUMMAC Evaluation
• Extrinsic measures
     – Those that ignore the content of the summary and assess it solely
       according to how useful it is in enabling an agent to perform some
       measurable task
          • Ad Hoc Task
               – Support a relevance judgment
          • Categorization Task
               – Support a categorization judgment

• Intrinsic measures
     – Those that examine the content of the summary and attempt to pass
       some judgment on it directly
          • Question-Answering Task
          • Acceptability Task

H-H Chen, CSIE, NTU                                                     10
                          Issues
• In a multi-document summarization
      – How to decide which documents deal with the same
        topic and which sentences touch on the same event are
        indispensable
      – How to measure similarity on different levels (i.e.,
        words, sentences and documents)
• In a multilingual multi-document summarization,
      – Due to the multiliguality problem, how to measure the
        similarity on concepts, themes and topics in terms of
        different language
• Due to the human assessors’ involvement, the large
  scale evaluation is nearly impossible
H-H Chen, CSIE, NTU                                         11
               Architecture of
           a Summarization System




H-H Chen, CSIE, NTU                 12
                 System Architecture




H-H Chen, CSIE, NTU                    13
                  A News Clusterer
• The tasks for the clusterer are listed below
     – Employing a segmentation system to identify Chinese
       words
     – Extracting named entities like people, place,
       organization, time, date and monetary expression
     – Applying a tagger to determine the part of speech for
       each word
     – Clustering the news stream based on the named entities
       and other signatures, such as speech-act and locative
       verbs
H-H Chen, CSIE, NTU                                         14
                A News Summarizer
• The tasks for the news summarizer are
  shown as follows:
     – Partitioning a Chinese text into several
       meaningful units (MUs)
          • Chinese writers often assign punctuation marks at
            random, the sentence boundary is not clear, so MUs
            are used for clustering instead of sentences
          • A MU that is composed several sentence segments
            denotes a complete meaning

H-H Chen, CSIE, NTU                                          15
             A News Summarizer
• Partitioning a Chinese text into several meaningful
  units (MUs)
    – Three kinds of linguistic knowledge are used to identify
      the MUs
         • Punctuation marks (Yang, 1984)
         • Linking elements (Li and Thompson, 1981)
              –因為天氣不好,飛機改在明天起飛。
              –我想早一點來,可是我沒趕上公車。
              –他一邊走路,一邊唱歌。
         • Topic chains (Chen, 1994)
              –國民黨是靠組織起家的政黨,現在的組織體質卻很虛弱,所
               以選戰最後也要仰仗文宣,實在很可惜。

H-H Chen, CSIE, NTU                                          16
              A News Summarizer
• For example:
     (A) 儘管警方大肆鎮壓與逮捕,無數反對自由貿易的
       示威群眾今天仍繼續向西雅圖市中心前進,他們發
       動和平集會,以抗議世界各國貿易部長即將在西雅
       圖舉行討論全球貿易自由化的會議。
     =>
     (A1) 儘管警方大肆鎮壓與逮捕,無數反對自由貿易的
       示威群眾今天仍繼續向西雅圖市中心前進
     (A2) 他們發動和平集會,以抗議世界各國貿易部長即
       將在西雅圖舉行討論全球貿易自由化的會議
H-H Chen, CSIE, NTU               17
               A News Summarizer
 • Linking the meaningful units, denoting the
   same thing, from different news reports
      – The similarity of two MUs is in term of noun-
        similarity and verb-similarity
                                          m
                      noun  sim( A, B) 
                                          ab
                                          n
                      verb  sim( A, B) 
                                          cd
                m(n): the number of matched nouns (verbs)
                a,b: total number of nouns in MUs A and B
                c,d: total number of verbs in MUs A and B
H-H Chen, CSIE, NTU                                         18
                    A News Summarizer
• Several strategies in similarity model
   (S1) Nouns in one MU are matched to nouns in another MU, so are the verbs.
   (S2) The operations in (1) are exact matches.
   (S3) A Chinese thesaurus is employed during the matching. That is, the operations
        in (S1) may be relaxed to inexact matches.
   (S4) Each term specified in (S1) is matched only once.
   (S5) The order of nouns and verbs in MU is not taken into consideration.
   (S6) The order of nouns and verbs in MU is critical, but it is relaxed within a
        window.
   (S7) When continuous terms are matched, an extra score is added to the similarity
         measure.
   (S8) When the object of transitive verbs are not matched, a score is subtracted from
         the similarity measure.
   (S9) When date/time expressions and monetary and percentage expressions are
        matched, an extra score is added to the similarity measure.

     H-H Chen, CSIE, NTU                                                      19
                A News Summarizer
   • Displaying the summarization results by
     two kinds modes:
        – Focusing summarization
             • a sequence of news by information decay
        – Browsing summarization
             • The MUs which are reported more than twice are
               selected
             • For each set of similar MUs, only the longest
               sentence is used in the summary

H-H Chen, CSIE, NTU                                         20
          Browsing Summarization
    Browsing model: The first article




H-H Chen, CSIE, NTU                     21
          Browsing Summarization
    Browsing model: The first article




H-H Chen, CSIE, NTU                     22
          Browsing Summarization
    Browsing model: The third article




H-H Chen, CSIE, NTU                     23
                      Experiment
   • Preparation of Testing Corpus
        – Nine events, which were occurred within
          1998/11/7 and 1998/12/8, were manually
          selected from Central Daily News, China Daily
          Newspaper, China Times Interactive, and FTV
          News online in Taiwan
        – Each event was composed of more than two
          articles, which were reported in the same day

H-H Chen, CSIE, NTU                                 24
                       Experiment
   • Preparation of Test Corpus
        – (1) 社會役的實施 (military service): 6 articles
        – (2) 老丙建建築 (construction permit): 4 articles
        – (3) 三芝鄉土石流 (landslide in Shan Jr): 6 articles
        – (4) 總統布希之子 (Bush's sons): 4 articles
        – (5) 芭比絲颱風侵台 (Typhoon Babis): 3 articles
        – (6) 股市穩定基金 (stabilization fund): 5 articles
        – (7) 國父墨寶失竊案 (theft of Dr Sun Yat-sen's calligraphy): 3
          articles
        – (8) 央行調降利率 (interest rate of the Central Bank): 3 articles
        – (9) 內閣總辭問題 (the resignation issue of the Cabinet): 4 articles


H-H Chen, CSIE, NTU                                                25
                          Experiment
 • Preparation of Test Corpus
      – Annotator reads all the news articles, and connects the
        MUs that discuss the same story
 • Five models shown below are constructed under
   various combination of the strategies specified
   above:
      –   (M1)        strategies (S1)+(S3)+(S4)+(S5)
      –   (M2)        strategies (S1)+(S3)+(S4)+(S6)
      –   (M3)        strategies (S1)+(S3)+(S4)+(S5)+(S7)+(S8)
      –   (M4)        strategies (S1)+(S3)+(S4)+(S5)+(S7)+(S8)+(S9)
      –   (M5)        strategies (S1)+(S2)+(S4)+(S5)+(S7)+(S8)+(S9)
H-H Chen, CSIE, NTU                                            26
                           Experiment
• Performance of similarity of MUs

                Model                 Precision                 Recall
                 M1                    0.5000                   0.5434
                  M2                    0.4871                  0.3905
                  M3                    0.5080                  0.5888
                  M4                    0.5164                  0.6198
                  M5                    0.5243                  0.5579

         The thresholds of none-similarity and verb-similarity are set to 0.3
H-H Chen, CSIE, NTU                                                             27
                      Discussion
• Some issues
     – The compression rate is fixed by the system
     – The presentation order of sentences in a summary is
       based on the relative position in the original documents
       instead of their importance
     – The voting strategy gives a shorter summarization,
       which might miss unique information reported by only
       once


H-H Chen, CSIE, NTU                                           28
          Generating Summaries with
             Informative Words
• The concepts of topic words and event
  words were applied to topic ranking
  successfully. (Fukumoto and Suzuki,2000)
• An event word associated with a story
  appears across paragraphs, but a topic word
  does not.
• The topic words frequently appears across
  all documents

H-H Chen, CSIE, NTU                         29
          Generating Summaries with
             Informative Words
 • We define that the words which have both high
   term frequency and high document frequency are
   informative words (salience words)
 • The sentences which have more informative words
   will be extracted to generate summaries
 • The more the informative words a MU has, the
   more important the MU is


H-H Chen, CSIE, NTU                            30
           Generating Summaries with
              Informative Words
 • The score function for deciding the
   informative words
      IW Wid     Ntf Wid   1   DF Wid 
                                   tf Wid   mtf d 
                      Ntf Wid  
                                   tf Wid   mtf d 
                      DF Wid   D Wid  / N

       Ntf: normalized term frequency,   DF: document frequency
       Tf: temfrequency              ,   mtf: mean term frequency
H-H Chen, CSIE, NTU                                                 31
         Generating Summaries with
            Informative Words
• Only 10 terms with the higher IW scores
  will be chosen as informative words in a
  document
• The score for each MU symbolizes the total
  number of informative words in it and the
  MUs with the highest score will be selected
• Moreover, the selected MUs in a summary
  will be arranged in the descending order

H-H Chen, CSIE, NTU                         32
          Generating Summaries with
             Informative Words
• Experiment result (QA task)




H-H Chen, CSIE, NTU                   33
          Generating Summaries with
             Informative Words
• Experiment results
     – Data
          • Collected from 6 news sites in Taiwan
          • 17,877 documents (near 13 MB) from 1/1/2001 to
            5/1/2001
          • After clustering, there are 3,146 events
          • 12 events are selected randomly in the experiment
            and 60 questionnaires (5 questions for each event) are
            made manually with answers to their related
            documents
H-H Chen, CSIE, NTU                                            34
          Generating Summaries with
             Informative Words
• Experiment results
     – 12 members of our laboratory who are all graduate
       students majoring in computer science are selected to
       conduct experiments below
          •   Full text (FULL)
          •   Chen and Huang’s system (Basic)
          •   Term frequency with vote strategy (TFWV)
          •   Informative words with vote strategy (PSWV)
          •   Term frequency without vote strategy (TFNV)
          •   Informative words without vote strategy (PSWV)

H-H Chen, CSIE, NTU                                            35
          Generating Summaries with
             Informative Words
• Experiment results




H-H Chen, CSIE, NTU                   36
          Generating Summaries with
             Informative Words
• Discussion
   – Observation
        • The size of TFNV and PSNV is larger than that of BASIC (near
          15%), but the precision rate of TFNV and PSNV is lower than
          that of BASIC
        • The size of TFWV and PSWV is smaller than that of BASIC,
          and their precision rate is still smaller than that of BASIC
        • The precision rate of both TFWV and PSWV are larger than
          those of TFNV and PSNV




H-H Chen, CSIE, NTU                                                  37
          Generating Summaries with
             Informative Words
• Discussion
     – Due to the limitation and drawbacks of human
       assessment, evaluation shown below may mislead
          • Due to different background among human assessors, the
            evaluation is unable to be objective
          • Fatique and limited of time scale to work may effect the
            assessors to quit reading or read too fast so as to miss the
            information
          • Due to the high cost of assessors, the large-scale evaluation is
            nearly impossible


H-H Chen, CSIE, NTU                                                            38
      An Evaluation Model using
      Question Answering System




H-H Chen, CSIE, NTU               39
              Model Using Question
               Answering System




H-H Chen, CSIE, NTU                  40
              Model Using Question
               Answering System
• Question and Answering System (Lin, et al., 2001)
     – Three major modules
          • Preprocessing the question sentences
               – Part-of –speech processing, stop-words removing
               – Canonical form transformation and key word expansion
          • Retrieving the documents containing answers
              – score(D) =  t in D weight (t )
          • Retrieving the sentences containing answers
               – The sentences that contain most words in expanded question
                 sentence are retrieved



H-H Chen, CSIE, NTU                                                           41
              Model Using Question
               Answering System




                      MRR: Mean Reciprocal Rank (Voorhees 2000)


H-H Chen, CSIE, NTU                                          42
                      Discussion
• The difference between Table 2 and Table 3
     – QA_MRR values of TFNV and PSNV are larger than
       those of corresponding TFWV and PSWV
     – QA_MRR values of PSWV and PSNV are larger than
       those of the corresponding TFWV and TFNV
     – Comparing the precision of QA task with the
       corresponding precision of best-5 strategy, Q&A
       system is is better than QA task (i.e., 0.576 > 0.502 and
       0.559 > 0.513, respectively)


H-H Chen, CSIE, NTU                                           43
          Experiments Using Large
           Documents and Results
• Data set
     – 140 new questionnaires are made and 93
       questions have been answered
     – Samples of question




H-H Chen, CSIE, NTU                             44
          Experiments Using Large
           Documents and Results
• Result

                      Table 4. Results with Large-Scale Data




H-H Chen, CSIE, NTU                                            45
          Experiments Using Large
           Documents and Results
• Discussion
     – Due to the increase of document size, the
       QA_MRR of all models decreased
     – Due to the noise of FULL, its QA_MRR drops
       drastically. However, other models’ QA_MRR
       values increase comparing with Table 3
     – The QA_MRR values of TFWV, PSWV,
       TFNV and PSNV are also larger than the value
       of BASIC

H-H Chen, CSIE, NTU                               46
          Experiments Using Large
           Documents and Results
• Discussion
     – The QA_MRR values of PSWV and PSNV are
       also larger than those of TFWV and TFNV,
       respectively
     – Since the performance of each model has the
       similar results to those shown in Table 4, it is
       feasible to introduce Q&A system into the
       evaluation of summarization

H-H Chen, CSIE, NTU                                   47
  Multilingual News Summarizer




H-H Chen, CSIE, NTU              48
                      Basic Architecture
                       Internet multi-lingual document sources

                                                                        source documents
                              Document Preprocessing



                                Document Clustering

                                                                  document clustered by events



                            Document Content Analysis




                                 kinds of summaries for readers


H-H Chen, CSIE, NTU                                                                              49
                      Basic Architecture
• The major issues behind the system
     – How to represent documents in different
       language
     – How to measure the similarity among document
       presentation in different language
     – The granularity of similarity computation
     – Visualization of summarization


H-H Chen, CSIE, NTU                              50
            Similarity Measurement
• Predicate and surrounding arguments form
  the basic skeleton in a sentence, so the verbs
  and nouns are considered as the basic
  features for similarity measurement
• Relaxation with WordNet-like resources
  postulates that words in the same synset are
  similar and can facilitate the inexact
  matching among documents or sentences
H-H Chen, CSIE, NTU                           51
             Similarity Measurement
• The similarity of two monolingual sentences
  is defined as follows
                                           Si  S j
                      SIM ( Si , S j ) 
                                           Si    Sj
• For computing the similarity of two
  sentences in different languages, the
  ambiguity problems float up
H-H Chen, CSIE, NTU                                   52
             Similarity Measurement
• Five strategies are proposed
    – Position-free
         • For each word in Si, find its translation equivalents by a bilingual
           dictionary and merge all the equivalents to be Si’
                                                                      Si  S j
                                                                        '

                                               SIM ( Si , S j ) 
                                                                       Si     Sj
    – First-match-first-occupy
         • When a word in Sj is matched, it is removed from Sj and the SC is
           added by 1                                           SC
                                                SIM ( S i , S j ) 
                                                                            Si S j
    – First-match-first-occupy and position dependent within a
      window
H-H Chen, CSIE, NTU                                                                  53
           Similarity Measurement
• Five strategies are proposed
    – Unambiguous-word-first and position dependent within a
      window
    – Unambiguous-word-first and position dependent within a
      range

          Si          E1 E2 E3 E4 E5 E6 E7 …. .… .…



          Sj          C1 C2 C3 C4 C5 C6 C7 …. .… .…
                                (window size or range)
H-H Chen, CSIE, NTU                                      54
         Similarity Measurement
• Experiments
  – Test corpus
      • 81 pairs of English and Chinese news stories from the web site
        of United Daily News in Taiwan and another 80 unrelated
        news stories (40: English, 40:Chinese) were mixed together
      • 43 pairs of sentences were extracted at random from the 81
        pairs
      • For each English document (sentence), find the best matching
        from the remaining documents (sentences)
  – Correct rate
                               CorrectPairsSystemFind
                 CorrectRate 
H-H Chen, CSIE, NTU
                                 TotalCorrectPairs                   55
             Similarity Measurement
                         Table 1. Performance of Document Alignment
            Strategy 1       Strategy 2     Strategy 3     Strategy 4    Strategy 5
Best 1        0.951            0.839          0.506           0.320        0.320
Best 2        0.987            0.925          0.604           0.432        0.444
Best 3        1.000            0.925          0.666           0.469        0.469
Best 4        1.000            0.950          0.740           0.518        0.518
Best 5        1.000            0.975          0.740           0.530        0.530
                         Table 2. Performance of Sentence Alignment
            Strategy 1       Strategy 2    Strategy 3       Strategy 4   Strategy 5
Best 1        0.883            0.767          0.441           0.255        0.255
Best 2        0.930            0.813          0.674           0.279        0.279
Best 3        0.976            0.860          0.697           0.325        0.325
Best 4        1.000            0.930          0.790           0.372        0.372
Best 5        1.000            0.930          0.790           0.372        0.372
H-H Chen, CSIE, NTU                                                                56
            Similarity Measurement
• Discussion
     – The strategy 1 and 2 are better than the other three
       strategies. Strategy 1 is also superior to strategy 2
     – The position-dependent seems not to be useful in both
       first-match-first-occupy and unambiguous-word-first
       models
          • Due to different word order between Chinese and English
            sentence
          • A though the average number of translation equivalents per
            lexical item in our bilingual dictionary is 2.17, only 841 of
            9,636 words in the test corpus are unambiguous


H-H Chen, CSIE, NTU                                                         57
                      Event Clustering
• Three strategies
     – Translation BEFORE document clustering
     – Translation AFTER document clustering.




     – Translation DEFERRED to sentence clustering
H-H Chen, CSIE, NTU                                  58
                      Event Clustering
     • Experiments
             – Test corpus
                • English and Chinese news articles on May 8, 2001
                  from the news sites in Taiwan
             – Manual clustering result
                 Article Number   Cluster Number   Cluster Number = 1   Cluster Number > 1

   Chinese            360              265                230                  35
   English            91               75                 65                   10
     CE               460              318                276                  42


H-H Chen, CSIE, NTU                                                                 59
                      Event Clustering




H-H Chen, CSIE, NTU                      60
                      Event Clustering
 • Discussion
      – Due to the different document features, e.g.,
        document numbers, three different thresholds
        are used in the two-phase scheme
      – The performance of two-phase scheme is better
        than that of one-phase scheme. The major
        reason is translation is performed after
        monolingual clustering

H-H Chen, CSIE, NTU                                61
                 Sentence Clustering




H-H Chen, CSIE, NTU                    62
                 Sentence Clustering
     • Three Alternatives
          – Complete link using all sentences
          – Complete link within a cluster
          – Subsumption-based clustering
               • Total 25 words of higher document frequency in a cluster are
                 considered as topic words
               • Centroid sentence in each cluster is determined using the
                 following formula
                                 inf S    Sn  Sv  St 
               • Sentence similarity is in term of subsumption score
                                                        Si  S j
                                SIM ( S i , S j ) 
H-H Chen, CSIE, NTU
                                                          
                                                      min S i , S j      63
                 Sentence Clustering
• Experiments
    – Test corpus
         • Select five events from the materials in event clustering and
           manually cluster the related sentences in those documents
                   Total          Total           Total            Total
                  Chinese        English         Chinese          English
                 Documents      Documents       Sentences        Sentences
   Event 1            4           3               69               25

   Event 2            5           2               87               39

   Event 3            5           3               92               40

   Event 4            5           2               82               16

   Event 5            2           3               23               46

H-H Chen, CSIE, NTU                                                          64
                 Sentence Clustering




H-H Chen, CSIE, NTU                    65
                 Sentence Clustering
   • Discussion
        – By observing Table 7 and 8, the performance of
          strategy 2 is better than that of strategy 1
        – Although the performance of Strategy 3 is a little
          worse than that of Strategy 2, its time complexity is
          decreased very much. If the score function can be
          further improved to obtain the more representative
          sentence, this strategy is competible




H-H Chen, CSIE, NTU                                               66
                      Visualization
    • Focus mode1
          – For each set of similar sentences, only the longest
            sentence is displayed
          – The display order is determined by the related
            position in the original news article




H-H Chen, CSIE, NTU                                               67
                      Visualization
   • Browsing model
        – The news article are listed by information decay and
          chronological order
        – In the latter news article, those sentences that have
          higher similarity scores with the sentences in former
          news articles, are shadowed
        – The reader can focus on novel information




H-H Chen, CSIE, NTU                                           68
                      Conclusion
• A multi-document summarization system using
  informative words is a promising way
• According to the score of each MU, the
  summaries can be compressed into any desire
  length without losing much information
• The Q&A system can play an important role in
  conducting large-scale evaluation and make the
  results more objective than human assessors

H-H Chen, CSIE, NTU                                69
                      Conclusion
• We also present a multilingual multi-document
  summarization system
• For the similarity computation between two
  bilingual sentences, five strategies are proposed
     – The position-free strategy is better than position-
       dependent strategy
• For document clustering, two strategies are
  proposed
     – The two-phase strategy (translation after clustering) is
       better than one-phase strategy (translation before
       clustering)
H-H Chen, CSIE, NTU                                               70
                      Conclusion
• For sentence clustering three strategies are
  proposed
     – Complete link within a cluster has the best performance
     – The subsumption method has the advantage of low
       computation complexity and similar performance
• Two visualization model (focus and browsing) are
  proposed, which also considers the users’
  language preference

H-H Chen, CSIE, NTU                                         71
                      References
• Hsin-Hsi Chen, June-Jei Kuo, Sheng-Jie Huang, Chuan-Jie
  Lin and Hung-Chia Wung (2003) “A Summarization
  System for Chinese News from Multiple Sources.” Journal
  of American Society for Information Science and
  Technology.
• Hsin-Hsi Chen, June-Jei Kuo and Tsei-Chun Su (2003).
  “Clustering and Visualization in a Multi-Lingual Multi-
  Document Summarization System.” Proceedings of 25th
  European Conference on Information Retrieval Research,
  Lecture Notes in Computer Science, LNCS 2633, April 14-
  16, Pisa, Italy, 2003, 266-280.


H-H Chen, CSIE, NTU                                    72

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:4/18/2013
language:English
pages:72