xiao by liaoxiuli


									Finding High-frequent Synonyms of a Domain-
   specific Verb in English Sub-language of
    MEDLINE Abstracts Using WordNet

            Chun Xiao and Dietmar Rösner
                 Institut für Wissens-
            und Sprachverarbeitung (IWS),
             Faculty of Computer Science,
              University of Magdeburg,
             39016 Magdeburg, Germany
          Introduction — MEDLINE Abstract

   – Domain: clinical medicine, biomedicine, biological and
      physical sciences;
   – Source: articles from over 4,600 journals published
      throughout the world;
   – Coverage: abstracts are included for about 52% of the
• PubMed®, an application of UMLS (unified medical
  language system), provides links within MEDLINE® to the
  full text of 15 clinical medical journals .
   – Available at: http://www.ncbi.nlm.nih.gov/PubMed/
        Available Resources in the
• The test corpus consists of 800 MEDLINE
  abstracts extracted from the GENIA Corpus
  V3.0p and V3.01.
  - Available at: http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/

• WordNet 1.7.1
        Extraction of a Specific Relation
•   Inhibitory relation
    –   Example: Secreted from activated T cells and
        macrophages, bone marrow-derived MIP-1
        alpha/GOS19 inhibits primitive hematopoietic stem
        cells and appears to be involved in the homeostatic
        control of stem cell proliferation.
•   Semantic annotations in the GENIA corpus:
     protein_molecule
     cell_type
High-frequent Verbs in the Test Corpus

  Synonym Sets (Synsets) of Verb inhibit
• Synset in WordNet
  Sense 1
  suppress, stamp down, inhibit, subdue, conquer, curb
      => control, hold in, hold, contain, check, curb,
  Sense 2
      => restrict, restrain, trammel, limit, bound, confine,
• Synset in test corpus of MEDLINE abstracts
  Inhibit, block, prevent, etc.

• Occurrences of verbs in the two synsets in the test
  corpus of MEDLINE abstracts
   – WN-synonyms: suppress (69), limit (16), restrict (5)
   – non WN-synonyms: block (124), reduce (119), prevent(53)

• How can WordNet synsets and information from the
  corpus be combined to create domain-specific verb

              Three Definitions
• Language unit — a text segment (a sentence,
  several sentences, or a paragraph, etc.) that
  expresses one semantic topic.
• Core word — the verb, whose synset in the
  test corpus is to be found out. E.g., in this test
  inhibit is the core word.
• Keyword — the word, whose corresponding
  verb base form is the core word. E.g., in this test
  inhibitor, inhibiting, and so on are keywords.

    We performed an analysis of the mechanisms by which two
    PKC inhibitors, Calphostin C and Staurosporine, prevent
    the FN-induced IL-1beta response. Both inhibitors blocked
    the secretion of IL-1beta protein into the media of
    peripheral blood mononuclear cells exposed to FN.
•   Language unit: two sentences
•   Core word: inhibit
•   Keyword: inhibitor (2 times)
•   Local context: searching window size >=3
•   Verbs around the first keyword: perform, prevent, block, expose
•   Verbs around the second keyword: prevent, perform, block, expose
   In the following test, the language unit is selected to be the whole
              Idea Description
• Assumption:
   The synonyms of a verb co-occur much more frequently
   together with the keywords of the verb than together
   with other words in the language unit.
• Method:
    Thus the verb chunks around the keywords are
    collected, from which the synonyms of the core word
    will be selected and filtered, using WordNet synset
  - One resource:
  WordNet synset information
  - The other resource:
  Local context information in the test corpus
Distribution of Keywords of inhibit in the Test Corpus
Verbs around the Keywords in the Test Corpus
           Method Description I
• Expansion of WordNet Synsets (Si)
  – S1 : the verb collection of synonyms of all synonyms of
    the core word;
  – S2 : the verb collection of synonyms of all verbs in S1;
  – …
• Expansion of Stoplist (STOPk)
  – STOP0: manually select 15 stop-verbs from the high-
    frequent verbs in the test corpus (e.g., suggest, indicate,
    including the high-frequent antonyms of the core word);
  – STOP1: the verb collection of synonyms of all verbs in
  – …
        Method Description II
• Verb list from the corpus (Vj)
    Verbs around the keywords in a local context of
    searching window size of j are collected.
• Synonym candidate list (Sg)
    If a verb is in Vj and also in Si, but not in
    STOPk, then add it to Sg.
• Golden standard list (SG)
  – A manually created synonym list, which is extracted
    from the test corpus.
  – Consist of 10 verbs with the most frequent occurrences,
    in which 3 verbs come directly from the WordNet
    synset of ―inhibit‖, the rest 7 verbs come from its
    hypernym set or the expanded list of its synonyms.
• Recall & Precision


 60% recall of SG <=> 93.05% occurrences in the test corpus
         Conclusions and Future Work
• Conclusions
  –   English sublanguage of MEDLINE abstract;
  –   The core word and its keywords were high-frequent;
  –   Multiword verb structures were not considered yet;
  –   Balance between recall and precision: expansion of Si and
      STOPk should be limited.
• Future works
  –   Consideration of other WordNet information besides synsets;
  –   Automatic creation of stoplists;
  –   Extraction of multiword verb structures;
  –   Utilization of syntactic information.
Looking forward to your questions!
                Possible Errors
• Errors of POS tags between
   Adjectives <=> Past participles

• Errors of manual works when selecting stop-verbs

          Question or Hope

Can WordNet provide the possibility for accessing
multiword expressions?


To top