Docstoc

Chapter 22 – Chemical Bonds.ppt

Document Sample
Chapter 22 – Chemical Bonds.ppt Powered By Docstoc
					    CLEF 2011, Amsterdam
    QA4MRE, Question Answering for Machine Reading Evaluation


    Question Answering Track
    Overview
              Main Task
           Anselmo Peñas
             Eduard Hovy         Modality and Negation
            Pamela Forner           Roser Morante
           Álvaro Rodrigo         Walter Daelemans
           Richard Sutcliffe
            Corina Forascu
          Caroline Sporleder
1
               QA Tasks & Time at CLEF

        2003      2004        2005 2006   2007   2008     2009      2010       2011

                Multiple Language QA Main Task             ResPubliQA       QA4MRE


                Temporal
                                    Answer Validation                      Negation and
               restrictions                              GikiCLEF
                                     Exercise (AVE)                          Modality
                 and lists
 QA
Tasks
                                   Real       QA over Speech
                                   Time    Transcriptions (QAST)


                                                 WSD
                                   WiQA
                                                 QA

2
    New setting

    QA over a single document
      Multiple Choice Reading Comprehension Tests
         • Forget about the IR step (for a while)
         • Focus on answering questions about a single text
         • Chose the correct answer


    ¢Why this new setting?



3
      Systems performance

      Upper bound of 60% accuracy

 Overall      Definitions
Best result   Best result
  <60%          >80%               NOT
                               IR approach
           Pipeline Upper Bound
Question     Question
             analysis



           Passage            Answer          Answer        Answer
           Retrieval         Extraction       Ranking


             0.8       x      0.8         x    1.0      =   0.64
                   Not enough evidence


    SOMETHING to break the pipeline: answer
      validation instead of re-ranking
       Multi-stream upper bound
                  Best with
                ORGANIZATION
  Perfect
combination           Best with
    81%               PERSON

                         Best with
Best system                TIME
   52,5%
     Multi-stream architectures

     Different systems response better different
       types of questions
           •Specialization
           •Collaboration
Question


             QA sys1
            QA sys2                  SOMETHING for   Answer
                                       combining /
            QA sys3                     selecting
                         Candidate
                          answers
            QA sysn
AVE 2006-2008

Answer Validation: decide whether to return
  the candidate answer or not

Answer Validation should help to improve
  QA
  lIntroduce more content analysis
  lUse Machine Learning techniques
  lAble to break pipelines and combine
   streams
    Hypothesis generation + validation

      Answer
                          Searching space of
                          candidate answers

                                Hypothesis
                                generation
                                  functions
                                           +
                           Answer validation
                                  functions


      Question
9
ResPubliQA 2009 - 2010

Transfer AVE results to QA main task 2009 and
  2010
  Promote QA systems with better answer
    validation


QA evaluation setting assuming that
  To leave a question unanswered has more
    value than to give a wrong answer
Evaluation measure
Reward systems that maintain accuracy but
  reduce the number of incorrect answers by
  leaving some questions unanswered




n: Number of questions
nR: Number of correctly answered questions
nU: Number of unanswered questions
     Conclusions of ResPubliQA
     2009 – 2010

     ¢This was not enough
     ¢We expected a bigger change in systems
      architecture
     ¢Validation is still in the pipeline
       l Bad IR -> Bad QA
     ¢No qualitative improvement in performance
     ¢Need of space to develop the technology


12
      2011 campaign

     Promote a bigger change in QA systems
       architecture

     QA4MRE: Question Answering for Machine
      Reading Evaluation

     ¢ Measure progress in two reading abilities
        lAnswer questions about a single text
        lCapture knowledge from text collections
13
            Reading test

Text                                          Multiple choice test
                                              According to the text…
Coal seam gas drilling in Australia's Surat
Basin has been halted by flooding.            What company owns wells in Surat Basin?
                                              a) Australia
Australia's Easternwell, being acquired       b) Coal seam gas wells
by Transfield Services, has ceased            c) Easternwell
drilling because of the flooding.
                                              d) Transfield Services
                                              e) Santos Ltd.
The company is drilling coal seam gas
wells for Australia's Santos Ltd.             f) Ausam Energy Corporation
                                              g) Queensland
Santos said the impact was minimal.           h) Chinchilla
            Knowledge gaps

                           drill                          Australia

                                                 is part of
             I                      for
                      II                                Queensland
                                                 is part of
Company A    Well C                  Company B
                      own | P=0.8                        Surat Basin


 Acquire this knowledge from the reference collection
   Knowledge-Understanding
   dependence
     We “understand” because we “know”
We need a little more of both to answer questions



  Capture ‘knowledge’      ‘Understand’ language
   expressed in texts




                Reading cycle
Control the variable of knowledge

¢The ability of making inferences about texts
 is correlated to the amount of knowledge
 considered
  l This variable has to be taken into account during
    evaluation
  l Otherwise is very difficult to compare methods


¢How to control the variable of knowledge in
 a reading task?
Text as sources of knowledge

¢Text Collection
  l Big and diverse enough to acquire knowledge
     • Impossible for all possible topics
  l Define a scalable strategy: topic by topic
  l Reference collection per topic (20,000-100,000
    docs.)
¢Several topics
  l Narrow enough to limit knowledge needed
     • AIDS
     • CLIMATE CHANGE
     • MUSIC & SOCIETY
     Evaluation tests

     12 reading tests (4 docs per topic)
     120 questions (10 questions per test)
     600 choices (5 options per question)

     Translated into 5 languages: English, German,
       Spanish, Italian, Romanian




19
     Evaluation tests

     44 questions required background
       knowledge from the reference collection
     38 required combine info from different
       paragraphs

     ¢Textual inferences
       l Lexical: acronyms, synonyms, hypernyms…
       l Syntactic: nominalizations, paraphrasing…
       l Discourse: correference, ellipsis…
20
     Evaluation

     ¢QA perspective evaluation
       c@1 over all 120 questions


     ¢Reading perspective evaluation
       lAggregating results by test

               Registered   Participant
       Task                               Submitted Runs
                groups        groups
      QA4MRE      25            12           62 runs
21
         Workshop QA4MRE
     Tuesday
       10:30 – 12:30
          Keynote: Text Mining in Biograph (Walter Daelemans)
          QA4MRE methodology and results (Álvaro Rodrigo)
          Report on Modality and Negation pilot (Roser Morante)
        14:00 – 16:00
          Reports from participants
     Wednesday
      10:30 – 12:30
22        Breakout session
     CLEF 2011, Amsterdam
     QA4MRE, Question Answering for Machine Reading Evaluation


     Question Answering Track
     Breakout session
               Main Task
            Anselmo Peñas
              Eduard Hovy         Modality and Negation
             Pamela Forner           Roser Morante
            Álvaro Rodrigo         Walter Daelemans
            Richard Sutcliffe
             Corina Forascu
           Caroline Sporleder
23
     QA4MRE breakout session

     ¢ Task
       l Questions are more difficult and realistic
       l 100% reusable test sets
     ¢ Languages and participants
       l No participants for some languages
       l But valuable resource for evaluation
       l Good balance for developing tests in other
         languages (even without participants)
         •    Problem is to find parallel translations for tests
24
     QA4MRE breakout session

     ¢ Background collections
       l Good balance of quality and noise
       l Methodology to build them is ok
     ¢ Test documents (TED)
       l Not ideal but parallel
       l Open audience and no copyright issues
       l Consider other possibilities
         •   CafeBabel
         •   BBC news
25
     QA4MRE breakout session

     ¢Evaluation
       lEncourage participants to test previous
        systems on new campaigns
       lAblation tests, what happens if you
        remove a component?
       lRuns with and without background
        knowledge, with and without external
        resources
       lProcessing time measurements
26
     QA4MRE 2012

     ¢ Topics
       l Previous
         1. AIDS
         2. Music and Society
         3. Climate Change
       l Add
         1.   Alzheimer (divulgative sources: blogs, web,
              news, …)


27
     QA4MRE 2012 Pilots

     ¢ Modality and Negation
       l Move to a three value setting:
         Given an event in the text decide whether it is
         1. Asserted (no negation and no speculation)
         2. Negated (negation and no speculation),
         3. Speculated
       l Roadmap
         1.   2012 as a separated pilot
         2.   2013 integrate modality and negation in the main task
              tests


28
     QA4MRE 2012 Pilots

     ¢Biomedical domain
       ¢ Focus in one disease: Alzheimer (59,000 Medline
         abstracts)
       ¢ Scientific language
       ¢ Give participants the background collection already
         processed: Tok, Lem, POS, NER, Dependency
         parsing
       ¢ Development set



29
     QA4MRE 2012 in summary

     ¢Main task
       l Multiple Choice Reading Comprehension tests
       l Same format
       l Additional topic: Alzheimer
       l English, German, (maybe Spanish, Italian,
         Romanian, others)
     ¢Two pilots
       l Modality and negation
          • Asserted, negated, speculated
       l Biomedical domain focus on Alzheimer disease
30        • Same format as the main task
     Thanks!




31

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:2/17/2014
language:English
pages:31
pptfiles pptfiles
About