SIMS 290-2: Applied Natural Language Processing: Marti Hearst

Document Sample
SIMS 290-2: Applied Natural Language Processing: Marti Hearst Powered By Docstoc
					SIMS 290-2:
Applied Natural Language Processing

Marti Hearst
November 17, 2004


 Using WordNet in QA
 Other Resources in QA
 Semantic Reasoning in QA
 Definition Questions
 Complex Questions

    Question Answering
        Captures the semantics of the question by recognizing
            expected answer type (i.e., its semantic category)
            relationship between the answer type and the question
        The Q/A process:
            Question processing – Extract concepts/keywords from question
            Passage retrieval – Identify passages of text relevant to query
            Answer extraction – Extract answer words from passage
        Relies on standard IR and IE Techniques
            Proximity-based features
              – Answer often occurs in text near to question keywords
            Named-entity Recognizers
              – Categorize proper names into semantic types (persons, locations,
                organizations, etc)
              – Map semantic types to question types (“How long”, “Who”, “What

Adapted from slide by Shauna Eggers                                                3
    The Importance of NER
           The results of the past 5 TREC evaluations of QA systems indicate
           that current state-of-the-art QA is determined by the recognition
           of Named Entities
           In TREC 2003 the LCC QA system extracted 289 correct answers
           for factoid questions
           The Name Entity Recognizer was responsible for 234 of them
    QUANTITY                  55    ORGANIZATION       15   PRICE              3
    NUMBER                    45    AUTHORED WORK      11   SCIENCE NAME       2
    DATE                      35    PRODUCT            11   ACRONYM            1
    PERSON                    31    CONTINENT           5   ADDRESS            1
    COUNTRY                   21    PROVINCE            5   ALPHABET           1
    OTHER LOCATIONS           19    QUOTE               5   URI                1
    CITY                      19    UNIVERSITY          3

Adapted from slide by Harabagiu and Narayanan                                      4
    The Special Case of Names
       Questions asking for names of authored works

       1934: What is the play “West Side Story” based on?
       Answer: Romeo and Juliet

       1976: What is the motto for the Boy Scouts?
       Answer: Be prepared.

       1982: What movie won the Academy Award for best picture in 1989?
       Answer: Driving Miss Daisy

       2080: What peace treaty ended WWI?
       Answer: Versailles

       2102: What American landmark stands on Liberty Island?
       Answer: Statue of Liberty

Adapted from slide by Harabagiu and Narayanan                             5
        NE assumes all answers are named entities
             Oversimplifies the generative power of language!
             What about: “What kind of flowers did Van Gogh
        Does not account well for morphological, lexical, and
        semantic alternations
             Question terms may not exactly match answer terms;
             connections between alternations of Q and A terms
             often not documented in flat dictionary
             Example: “When was Berlin’s Brandenburger Tor
             erected?”  no guarantee to match built
             Recall suffers

Adapted from slide by Shauna Eggers                               6
    LCC Approach:
    WordNet to the rescue!
      WordNet can be used to inform all three steps of the
      Q/A process
       1. Answer-type recognition (Answer Type Taxonomy)
       2. Passage Retrieval (“specificity” constraints)
       3. Answer extraction (recognition of keyword alternations)
      Using WN’s lexico-semantic info: Examples
           “What kind of flowers did Van Gogh paint?”
             – Answer-type recognition: need to know (a) answer is a
               kind of flower, and (b) sense of the word flower
             – WordNet encodes 470 hyponyms of flower sense #1,
               flowers as plants
             – Nouns from retrieved passages can be searched against
               these hyponyms
           “When was Berlin’s Brandenburger Tor erected?”
             – Semantic alternation: erect is a hyponym of sense #1 of

Adapted from slide by Shauna Eggers                                      7
WN for Answer Type Recognition
 Encodes 8707 English concepts to help recognize expected answer type
 Mapping to parts of Wordnet done by hand
    Can connect to Noun, Adj, and/or Verb subhierarchies

     WN in Passage Retrieval
         Identify relevant passages from text
              Extract keywords from the question, and
              Pass them to the retrieval module
         “Specificity” – filtering question concepts/keywords
              Focuses search, improves performance and precision
              Question keywords can be omitted from the search if
              they are too general
              Specificity calculated by counting the hyponyms of a
              given keyword in WordNet
                – Count ignores proper names and same-headed concepts
                – Keyword is thrown out if count is above a given threshold
                  (currently 10)

Adapted from slide by Shauna Eggers                                           9
     WN in Answer Extraction
     If keywords alone cannot find an acceptable answer, look for
     alternations in WordNet!

    Q196: Who wrote “Hamlet”?
    Morphological Alternation: wrote  written
    Answer: before the young playwright has written Hamlet – and Shakespeare seizes the

    Q136: Who is the queen of Holland?
    Lexical Alternation: Holland  Netherlands
    Answer: Pricess Margrit, sister of Queen Beatrix of the Netherlands, was also present

    Q196: What is the highest mountain in the world?
    Semantic Alternation: mountain  peak
    Answer: first African country to send an expedition to Mount Everest, the world’s
    highest peak

Adapted from slide by Shauna Eggers                                                         10
          Paşca/Harabagiu (NAACL’01 Workshop) measured
          approach using TREC-8 and TREC-9 test collections
          WN contributions to Answer Type Recognition
               Count number of questions for which acceptable
               answers were found; 3GB text collection, 893 questions

                              Method                # questions with correct answer
                                                          All          What only

      Flat dictionary (baseline)                    227 (32%)       48 (13%)
      A-type taxonomy (static)                      445 (64%)       179 (50%)
      A-type taxonomy (dynamic)                     463 (67%)       196 (56%)
      A-type taxonomy (dynamic + answer patterns)   533 (76%)       232 (65%)

Adapted from slide by Shauna Eggers                                                   11
          WN contributions to Passage Retrieval
      Impact of keyword alternations

       No alternations enabled                            55.3% precision
       Lexical alternations enabled                       67.6%
       Lexical + semantic alternations enabled            73.7%
       Morphological expansions enabled                   76.5%

      Impact of specificity knowledge

        Specificity knowledge             # questions with correct answer in
                                             first 5 documents returned
                                         TREC-8                     TREC-9
       Not included               133 (65%)               463 (67%)
       Included                   151 (76%)               515 (74%)

Adapted from slide by Shauna Eggers                                            12
Going Beyond Word Matching

 Use techniques from artificial intelligence to try to
 draw inferences from the meanings of the words
 This is a highly unusual and ambitious approach.
    Surprising it works at all!
    Requires huge amounts of hand-coded information
 Uses notions of proofs and inference from logic
    All birds fly. Robins are birds. Thus, robins fly.
    forall(X): bird(X) -> fly(x)
    forall(X,Y): student(X), enrolled(X,Y) -> school(Y)

    Inference via a Logic Prover
        The LCC system attempts inference to justify an
        Its inference engine is a kind of funny middle ground
        between logic and pattern matching
        But quite effective: 30% improvement
             Q: When was the internal combustion engine
             A: The first internal-combustion engine was built in
        invent -> create_mentally -> create -> build

Adapted from slides by Manning, Harabagiu, Kusmerick, ISI           14
       World knowledge from:
            WordNet glosses converted to logic forms in the eXtended
            WordNet (XWN) project
            Lexical chains
              – game:n#3  HYPERNYM  recreation:n#1  HYPONYM 
              – Argentine:a#1  GLOSS  Argentina:n#1
            NLP axioms to handle complex NPs, coordinations,
            appositions, equivalence classes for prepositions etcetera
            Named-entity recognizer
              – John Galt  HUMAN
       A relaxation mechanism is used to iteratively uncouple
       predicates, remove terms from LFs. The proofs are penalized
       based on the amount of relaxation involved.

Adapted from slide by Surdeanu and Pasca                                 15
    Logic Inference Example
        “How hot does the inside of an active volcano get?”
             get(TEMPERATURE, inside(volcano(active)))
             “lava fragments belched out of the mountain were as
             hot as 300 degrees Fahrenheit”
             fragments(lava, TEMPERATURE(degrees(300)),
             belched(out, mountain))
             volcano ISA mountain
             lava ISPARTOF volcano -> lava inside volcano
             fragments of lava HAVEPROPERTIESOF lava
        The needed semantic information is in WordNet definitions,
        and was successfully translated into a form that was used
        for rough ‘proofs’

Adapted from slides by Manning, Harabagiu, Kusmerick, ISI            16
                           Axiom Creation
   XWN Axioms
     A major source of world knowledge is a general purpose
     knowledge base of more than 50,000 parsed and disambiguated
     WordNet glosses that are transformed into logical form for use
     during the course of a proof.

       Kill is to cause to die

   Logical Form:
       kill_VB_1(e1,x1,x2) -> cause_VB_1(e1,x1,x3) & to_TO(e1,e2) &

Adapted from slide by Harabagiu and Narayanan                         17
                               Lexical Chains
     Lexical Chains
         Lexical chains provide an improved source of world knowledge by
         supplying the Logic Prover with much needed axioms to link
         question keywords with answer concepts.

       How were biological agents acquired by bin Laden?

         On 8 July 1998 , the Italian newspaper Corriere della Serra indicated that
         members of The World Front for Fighting Jews and Crusaders , which
         was founded by Bin Laden , purchased three chemical and
         biological_agent production facilities in

     Lexical Chain:
         ( v - buy#1, purchase#1 ) HYPERNYM ( v - get#1, acquire#1 )

Adapted from slide by Harabagiu and Narayanan                                         18
  Axiom Selection
      Lexical chains and the XWN knowledge base work together to select and
      generate the axioms needed for a successful proof when all the
      keywords in the questions are not found in the answer.

      How did Adolf Hitler die?
     … Adolf Hitler committed suicide …

  The following Lexical Chain is detected:
      ( n - suicide#1, self-destruction#1, self-annihilation#1 ) GLOSS ( v -
      kill#1 ) GLOSS ( v - die#1, decease#1, perish#1, go#17, exit#3,
      pass_away#1, expire#2, pass#25 ) 2

  The following axioms are loaded into the Prover:
      exists x2 all e1 x1 (suicide_nn(x1) -> act_nn(x1) & of_in(x1,e1) &

      exists x3 x4 all e2 x1 x2 (kill_vb(e2,x1,x2) -> cause_vb_2(e1,x1,x3) &
      to_to(e1,e2) & die_vb(e2,x2,x4)).

Adapted from slide by Harabagiu and Narayanan                                  19
LCC System Refecences
The previous set of slides drew information from these sources:
   The Informative Role of WordNet in Open-Domain Question Answering,
   Pasca and Harabagiu, WordNet and Other Lexical Resources, NAACL 2001
   Pasca and Harabagiu, High Performance Question/Answering, SIGIR’01
   Moldovan, Clark, Harabagiu, Maiorano: COGEX: A Logic Prover for Question
   Answering. HLT-NAACL 2003
   Moldovan, Pasca, Harabagiu, and Surdeanu: Performance issues and error
   analysis in an open-domain question answering system. ACM Trans. Inf.
   Syst. 21(2): 133-154 (2003)
   Harabagiu and Maiorano, Abductive Processes for Answer Justification, AAAI
   Spring Symposium on Mining Answers from Texts and Knowledge Bases,

Incorporating Resources

 For 29% of TREC questions the LCC QA system relied
 on an off-line taxonomy with semantic classes such

    Incorporating Resources
        How well can we do just using existing resources?
            Used the 2,393 TREC questions and answer keys
            Determined if it would be possible for an algorithm to
            locate the target answer from the resource.
            So a kind of upper bound.

Lita L.V., Hunt W., Nyberg E. Resource Analysis for Question Answering. ACL 2004.   22

            CIA World Factbook
            “Can be directly answered”
            (Not explained further)
            High precision, low recall

Lita L.V., Hunt W., Nyberg E. Resource Analysis for Question Answering. ACL 2004.   23
            WordNet glosses,
            synonyms, hypernyms,
            Question terms and phrases
            extracted and looked up.
            If answer key matched any of
            these WordNet resources, then
            considered found.
            Thus, measuring an upper
            About 27% can in principle
            be answered from WN alone

Lita L.V., Hunt W., Nyberg E. Resource Analysis for Question Answering. ACL 2004.   24
     Definition Resources

            Google’s define operator
            Formulate a query from n-
            grams extracted from each
            Encyclopedia most
            TREC-12 had fewer
            define q’s so less benefit.

Lita L.V., Hunt W., Nyberg E. Resource Analysis for Question Answering. ACL 2004.   25
     Web Pages

            The Web
            Questions tokenized and
            stopwords removed.
            Keywords “used” (no
            further details) to retrieve
            100 docs via Google API.
            A relevant doc is found
            somewhere in the results
            for nearly 50% of the q’s.

Lita L.V., Hunt W., Nyberg E. Resource Analysis for Question Answering. ACL 2004.   26
     Web N-grams
             The Web
             Retrieved top 5, 10, 15, 20,
             50, and 100 docs via web
             query (no details provided)
             via Google API
             Extracted the most frequent
             50 n-grams (up to trigrams)
             (not clear if using full text or
             summaries only)
             The correct answer is found in
             the top 50 n-grams more than
             50% of the time.

Lita L.V., Hunt W., Nyberg E. Resource Analysis for Question Answering. ACL 2004.   27
Using Machine Learning in QA

 The following slides are based on:
   Ramakrishnan, Chakrabarti, Paranjpe, Bhattacharyya, Is
   Question Answering an Acquired Skill? WWW’04

  Learning Answer Type Mapping
    Idea: use machine learning techniques to
    automatically determine answer types and query
    terms from questions.
    Two types of answer types:
        Surface patterns
          –   Infinite set, so can’t be covered by a lexicon
          –   “at DD:DD” “in the ‘DDs” “in DDDD” “Xx+ said”
          –   Can also associate with synset [date#n#7]
        WordNet synsets
          – Consider: “name an animal that sleeps upright”
          – Answer: “horse”

Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04   29
 Determining Answer Types
     The hard ones are “what” and “which” questions.
     Two useful heuristics:
         If the head of the NP appearing before the auxiliary
         or main verb is not a wh-word, mark this as an a-
         type clue
         Otherwise, the head of the NP appearing after the
         auxiliary/main verb is an atype clue.

Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04   30
 Learning Answer Types
   Given a QA pair (q, a):
        – (“name an animal that sleeps upright”, “horse”)
      (1a)   See which atype(s) “horse” can map to
      (1b)   Look up the hypernyms of “horse” -> S
      (2a)   Record the k words to the right of the q-word
      (2b)   For each of these k words, look up their synsets
        – An, animal, that
      (2c) Increment the counts for those synsets that also
      appear in S
   Do significance testing
      Compare synset frequencies against a background set
      Retain only those that are significantly associated with the
      question word more so than in general (chi-square)

Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04   31
            Learning Answer Tyeps

Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04   32
  Learning to Choose Query Terms

     Which words from the question to use in the query?
     A tradeoff between precision and recall.
        “Tokyo is the capital of which country?”
        Want to use “Tokyo” verbatim
        Probably “capital” as well
        But maybe not “country”; maybe “nation” or maybe
        this word won’t appear in the retrieved passage at all.
          – Also, “country” corresponds to the answer type, so
            probably we don’t want to require it to be in the answer

Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04   33
  Learning to Choose Query Terms
        POS assigned to word and immediate neighbors
        Starts with uppercase letter
        Is a stopword
        IDF score
        Is an answer-type for this question
        Ambiguity indicators:
          – # of possible WordNet senses (NumSense)
          – # of other WordNet synsets that describe this sense
               E.g., for “buck”: stag, deer, doe
               (NumLemma)
        J48 decision tree worked best

Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04   34
Learning to Choose Query Terms
  WordNet ambiguity indicators were very helpful
    – Raised accuracy from 71-73% to 80%
  Atype flag improved accuracy from 1-3%

  Learning to Score Passages
     Given a question, and answer, & a passage (q, a, r)
        Assign +1 if r contains a
        Assign –1 otherwise
        Do selected terms s from q appear in r?
        Does r have an answer zone a that does not s?
        Are the distances between tokens in a and s small?
        Does a have a strong WordNet similarity with q’s answer
        Use logistic regression, since it produces a ranking rather
        than a hard classification into +1 or –1
          – Produces a continuous estimate between 0 and 1

Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04   36
Learning to Score Passages
   F-scores are low (.33 - .56)
   However, reranking greatly improves the rank of the
   corresponding passages.
   Eliminates many non-answers, pushing better
   passages towards the top.

        Learning to Score Passages

Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04   38
 Computing WordNet Similarity
    Path-based similarity measures are not all that good in WordNet
        3 hops from entity to artifact
        3 hops from mammal to elephant
    An alternative:
        Given a target synset t and an answer synset a:
         – Measure the overlap of nodes on the path
              from t to all noun roots and
              from a to all noun roots
         – Algorithm for computing similarity of t to a:
              If t is not a hypernym of a: assign 0
              Else collect the set of hypernym synsets of t and a
              Call them Ht and Ha
              Compute the Jaccard overlap
                  » |Ht Intersect Ha| / |Ht Union Ha|

Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04   39
Computing WordNet Similarity
    Algorithm for computing similarity of t to a:
            – |Ht Intersect Ha| / |Ht Union Ha|

                                                          |Ht Intersect Ha|
       living thing                                          |Ht Union Ha|
               animal                             Ht = mammal, Ha = elephant
                                                            7/10 = .7
                                                   Ht = animal, Ha = elephant
                    vertebrate                             5/10 = .5
                      mammal                       Ht = animal, Ha = mammal
                                                           4/7 = .57
                       placental mammal
                                                     Ht = mammal, Ha = fox
                          proboscidean                     7/11 = .63

Ramakrishnan et al., Is Question Answering an Acquired Skill? WWW’04            40
    System Extension: Definition Questions

        Definition questions ask about the definition or description of a
             Who is John Galt?
             What is anorexia nervosa?
        Many “information nuggets” are acceptable answers
             Who is George W. Bush?
               – … George W. Bush, the 43rd President of the United States…
               – George W. Bush defeated Democratic incumbent
                 Ann Richards to become the 46th Governor of the State of
             Any information nugget is acceptable
             Precision score over all information nuggets

Adapted from slide by Surdeanu and Pasca                                      41
      Definition Detection with Pattern Matching

      Q386: What is anorexia nervosa?      cause of anorexia nervosa, an eating
      Q358: What is a meerkat?             the meerkat, a type of mongoose, thrives
      Q340: Who is Zebulon Pike?           in 1806, explorer Zebulon Pike sighted

       What <be> a <QP> ?
       Who <be> <QP> ?
          example: “Who is Zebulon Pike?”                        Question patterns
       <QP>, the <AP>
       <QP> (a <AP>)
       <AP HumanConcept> <QP>
          example: “explorer Zebulon Pike”                        Answer patterns

Adapted from slide by Surdeanu and Pasca                                              42
       Answer Detection with Concept Expansion

           Enhancement for Definition questions
           Identify terms that are semantically related to the phrase
           to define
           Use WordNet hypernyms (more general concepts)

      Question                 WordNet hypernym         Detected answer candidate

      What is a shaman?        {priest, non-Christian   Mathews is the priest or
                               priest}                  shaman

      What is a nematode?      {worm}                   nematodes, tiny worms in

      What is anise?           {herb, herbaceous plant} anise, rhubarb and other

Adapted from slide by Surdeanu and Pasca                                            43
    Online QA Examples

           Examples (none work very well)

Adapted from slides by Manning, Harabagiu, Kushmeric, and ISI       44
What about Complex Questions?


Shared By: