CLEF 2012_ Rome QA4MRE_ Question Answering for Machine Reading Evaluatio.pptx

Document Sample
CLEF 2012_ Rome QA4MRE_ Question Answering for Machine Reading Evaluatio.pptx Powered By Docstoc
					    CLEF 2012, Rome

    QA4MRE, Question Answering
    for Machine Reading Evaluation

           Anselmo Peñas (UNED, Spain)
           Eduard Hovy (USC-ISI, USA)
           Pamela Forner (CELCT, Italy)
           Álvaro Rodrigo (UNED, Spain)
           Richard Sutcliffe (U. Limerick, Ireland)
           Roser Morante (U. Antwerp, Belgium)
           Walter Daelemans (U. Antwerp, Belgium)
           Caroline Sporleder (U. Saarland, Germany)
           Corina Forascu (UAIC, Romania)
           Yassine Benajiba (Philips, USA)
           Petya Osenova (Bulgarian Academy of Sciences)
1
                 Question Answering Track at CLEF

        2003     2004     2005   2006   2007 2008      2009     2010    2011         2012

                Multiple Language QA Main Task         ResPubliQA              QA4MRE


                Temporal                               Giki
                                  Answer Validation
               restrictions                                            Negation and Modality
                                   Exercise (AVE)      CLEF
                 and lists
 QA
Tasks
                                 Real      QA over Speech
                                                                                  Biomedical
                                 Time   Transcriptions (QAST)


                                                 WSD
                                 WiQA
                                                 QA

2
                  Portrayal
    Question      Question
                  analysis




               Passage            Answer          Answer        Answer
               Retrieval         Extraction       Ranking

                  0.8        x     0.8        x     1.0     =   0.64



                 Along the years, we learnt that the architecture is
                 one of the main limitations for improving QA
                 technology

                 So we bet on a reformulation:
3
    Hypothesis generation + validation

      Answer
                          Searching space of
                          candidate answers

                                Hypothesis
                                generation
                                  functions
                                           +
                           Answer validation
                                  functions


      Question
4
    We focus on validation …

    Is the candidate answer correct?

    QA4MRE setting:

    Multiple Choice Reading Comprehension Tests
       lMeasure progress in two reading abilities
          •Answer questions about a single text
          •Capture knowledge from text collections

5
    … and knowledge

    ¢Why capture knowledge from text
     collections?

    ¢ We need knowledge to understand language
      l The ability of making inferences about texts is
        correlated to the amount of knowledge
        considered
      l Texts always omit information we need to
        recover
         • To build the complete story behind the document
         • And be sure about the answer
6
 Text as source of knowledge

Text Collection (background collection)
   l Set of documents that contextualize the one under
     reading (20,000-100,000 docs.)
      • We can imagine this done on the fly by the machine
      • Retrieval
   l Big and diverse enough to acquire knowledge
   l Define a scalable strategy: topic by topic
   l Reference collection per topic
    Background Collections

    ¢ They must serve to acquire
       l General facts (with categorization and relevant relations)
       l Abstractions (such as


    ¢ This is sensitive to occurrence in texts
       l Thus, also to the way we create the collection


    ¢ Key: Retrieve all relevant documents and only them
       l Classical IR
       l Interdependence with topic definition
           • The topic is defined by the set of queries that produce the
             collection
8
    Example: Biomedical

    Alzheimer’s Disease Literature Corpus
         Search PubMed about Alzheimer
    Query: (((((("Alzheimer Disease"[Mesh] OR "Alzheimer's disease
       antigen"[Supplementary Concept] OR "APP protein, human"[Supplementary
       Concept] OR "PSEN2 protein, human"[Supplementary Concept] OR "PSEN1
       protein, human"[Supplementary Concept]) OR "Amyloid beta-Peptides"[Mesh])
       OR "donepezil"[Supplementary Concept]) OR ("gamma-secretase activating
       protein, human"[Supplementary Concept] OR "gamma-secretase activating
       protein, mouse"[Supplementary Concept])) OR "amyloid beta-protein (1-
       42)"[Supplementary Concept]) OR "Presenilins"[Mesh]) OR "Neurofibrillary
       Tangles"[Mesh] OR "Alzheimer's disease"[All Fields] OR "Alzheimer's
       Disease"[All Fields] OR "Alzheimer s disease"[All Fields] OR "Alzheimers
       disease"[All Fields] OR "Alzheimer's dementia"[All Fields] OR "Alzheimer
       dementia"[All Fields] OR "Alzheimer-type dementia"[All Fields] NOT "non-
       Alzheimer"[All Fields] NOT ("non-AD"[All Fields] AND "dementia"[All Fields])
       AND (hasabstract[text] AND English[lang])

9   66,222 abstracts
     Questions (Main Task)

     Distribution of question types
         27 PURPOSE
         30 METHOD
         36 CAUSAL
         36 FACTOID
         31 WHICH-IS-TRUE
     Distribution of answer types
         75 REQUIRE NO EXTRA KNOWLEDGE
         46 REQUIRE BACKGROUND KNOWLEDGE
         21 REQUIRE INFERENCE
         20 REQUIRE GATHERING INFORMATION FROM DIFFERENT
         SENTENCES
10
              Questions (Biomedical Task)

     Question types                       Answer types
     1. Experimental evidence/qualifier      Simple: The answer is found
     2. Protein-protein interaction          almost verbatim in the paper
     3. Gene synonymy relation
     4. Organism source relation             Medium: The answer is rephrased
     5. Regulatory relation
     6. Increase (higher expression)         Complex: Require combining
     7. Decrease (reduction)                 pieces of evidence and inference
     8. Inhibition
                                             They involve a predefined set of
                                             entity types


11
 Main Task

16 test documents, 160 questions, 800 candidate answers
  4 Topics
     1.   AIDS
     2.   Music and Society
     3.   Climate Change                                         new
     4.   Alzheimer (divulgative sources: blogs, web, news, …)
  4 Reading tests per topic
     Document + 10 questions
     5 choices per question
6 Languages                                           new
     English, German, Spanish, Italian, Romanian, Arabic
Biomedical Task

¢Same setting
¢Scientific language
¢Focus on one disease: Alzheimer
  l Alzheimer's Disease Literature Corpus (ADLC)
  l 66,222 abstracts from PubMed
  l 9,500 full articles
  l Most of them processed:
     • Dependency parser GDep (Sagae and Tsujii 2007)
     • UMLS-based NE tagger (CLiPS)
     • ABNER NE tagger (Settles 2005)
Task on Modality and Negation

Given an event in the text decide whether it is
   1.      Asserted (NONE: no negation and no speculation)
   2.      Negated (NEG: negation and no speculation)
   3.      Speculated but negated (NEGMOD)
   4.      Speculated and not negated (MOD)

               Is the event present as certain?
                 Yes                    No

       Did it happen?                      Is it negated?
 Yes                    No           Yes                    No

NONE                    NEG       NEGMOD                    MOD
                 Participation

                                 Registered
                 Task                         Participant groups   Submitted Runs
                                  groups
                Main                25                11                43
              Biomedical            23                7                 43
        Modality and Negation        3                3                  6
                Total                51               21                92


                           100
                            80
                            60                                               Participants
     ~100% increase         40
                                                                             Runs
                            20
                             0
15
                                      2011                  2012
     Evaluation and results

     QA perspective evaluation
       c@1 over all questions (random 0.2)
            Best systems Main      Best systems Biomedical
            0.65                   0.55
            0.40                   0.47

     Reading perspective evaluation
       Aggregating results test by test (pass if c@1 > 0.5)
             Best systems Main     Best systems Biomedical
             Tests passed: 12 / 16 Tests passed: 3 / 4
             Tests passed: 6 /16
16
       More details during the workshop

       Monday 17th Sep.
          17:00 - 18:00 Poster Session
       Tuesday 18th Sep.
          10:40 – 12:40 Invited Talk + Overviews
          14:10 – 16:10 Reports from participants (Main + Bio)
          16:40 – 17:15 Reports from participants (Mod&Neg)
          17:15 – 18:10 Breakout session


     ¢Thanks!
17

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:1
posted:2/17/2014
language:English
pages:17
pptfiles pptfiles
About