Docstoc

John Prange s slides

Document Sample
John Prange s slides Powered By Docstoc
					        Advanced Question Answering:
         The “Right Problem at the Right Time”
                 or a “Bridge Too Far”




                Dr. John D. Prange, Technical Director
                          jprange@nsa.gov
                        http://www.ic-arda.org

Open Domain QA Workshop
ACL - 6 July 2001
                               Outline

         • A Tale of Two Bridges

         • Advanced Question Answering
                – A Vision
                – One Plan of Attack
                – Challenges

         • Some Final Thoughts


Open Domain QA Workshop                  2
ACL - 6 July 2001 - Prange
                             A Tale of Two Bridges



                                           Rhine River Bridge
                                        Arnhem, The Netherlands




   The Golden Gate Bridge
   San Francisco, California



Open Domain QA Workshop                                           3
ACL - 6 July 2001 - Prange
                             Golden Gate Bridge:
                                      A Vision
                                                     $35M Bond Issue Passed
                                                    by 3 to 1 Margin in Nov 1930
                                Joseph B. Strauss




 In late 1800’s and early 1900’s,
 the growing population in San
       Francisco Bay Area
 encouraged speculation about
  building a bridge across the
      Golden Gate Channel
Open Domain QA Workshop
                                         Strauss’s Initial Design -- 1921
                                                                                   4
ACL - 6 July 2001 - Prange
                             Golden Gate Bridge:
                                     The Plan of Attack
                 Joseph Strauss’s Final Architectural Site Plan




      Factors in Plan:
             –      Plan was sensitive to the natural beauty of the Bridge’s location
             –      First Bridge across a major port entrance
             –      First use of “Hard Hats” & “Safety Nets”

Open Domain QA Workshop                                                                 5
ACL - 6 July 2001 - Prange
                             Golden Gate Bridge:
                                   The Challenges
                                             Significantly longer, higher than any
                                             previous suspension Bridge – Overcame
                                             many difficult
                                             engineering
                                             problems.




    First time bridge pier was
    constructed in the open
    sea, 100 feet off shore.
    Fender around south pier
    extends about 100 feet
    below the surface of         Golden Gate Bridge has two 7,650 ft long cables (36” in
    water.                       diameter), each consisting of 27,572 wires. Cables
                                 had to be spun on site over period of 6 months.
Open Domain QA Workshop                                                                    6
ACL - 6 July 2001 - Prange
                                   Golden Gate Bridge:
                       Final Result: The Right Plan at the Right Time!

                             An engineering, architectural, and aesthetic wonder!
                         Strong enough to easily withstand earthquakes, heavy gale
                           force winds,and the strong, constant currents and tides.
                       Over 1.6 Billion cars have crossed (May 1937 though March 2000).




Open Domain QA Workshop                                                                   7
ACL - 6 July 2001 - Prange
                     Arnhem Rhine River Bridge:
                                 A Vision
                              Montgommery’s Plan: Lightning Fast
                              Strike to Arnhem, Secure Bridgehead
                              across the Rhine River. Then Attack
                              the German Industrial Heart -- the Ruhr

               Europe in
             September 1944




If Plan Succeeded . . .
1. Bypass the
    Siegfried Line.
2. Reach Berlin
    before Christmas.
3. END THE WAR !!

Open Domain QA Workshop                                                 8
ACL - 6 July 2001 - Prange
                     Arnhem Rhine River Bridge:
                             The Plan of Attack

       OPERATION MARKET GARDEN -- 17-26 September 1944




Open Domain QA Workshop                                  9
ACL - 6 July 2001 - Prange
                     Arnhem Rhine River Bridge:
                                       The Plan of Attack

       OPERATION MARKET GARDEN -- 17-26 September 1944


                                1st Polish
                                Parachute
                                 Brigade




                 Operation Market                            Operation Garden
            35,000 Allied Paratroopers            British XXX Corps (2nd British Army)
             Seize & Hold River and                    Push forward along a single
            Canal Bridges all the way              road (Hells Highway) having very
              to Arnhem -- 64 miles                      limited off road mobility,
           behind German Front Lines!                 & reach Arnhem in 48 Hours!
Open Domain QA Workshop                                                                  10
ACL - 6 July 2001 - Prange
                     Arnhem Rhine River Bridge:
                                             The Challenges
                             9th & 10th SS             Not enough aircraft and
                              Armor Div                gliders for single airborne
                                                       drop. Terrain forced 1st
                   15th Army
                                                       landings far from bridges.
                                                       Later drops delayed by
                                                       weather. Only 700 British
                                                       Troops make it to Arnhem
                                                       Bridge under command of
                                                       Lt Colonel John Frost
        15th German Army Escapes;
        9th and 10th SS Armor Div on
         R&R in vicinity of Arnhem

                               Allied Plan
     Complete Allied                           Germans able
      Plan Falls into                          to destroy Son
                               Operation       Bridge & force
      German Hands
                                Market         “2nd Omaha
     In first 24 Hours
                                Garden         Beach” Landing to capture Nijmegen Bridge –
                                               Delays Advance of XXX Corp for more than 36 hours
Open Domain QA Workshop                                                                            11
ACL - 6 July 2001 - Prange
                     Arnhem Rhine River Bridge:
                             Final Result: “A Bridge Too Far”
                                                            Estimates of Killed, Wounded,
                                                            Missing, Captured among Allies,
                                                            Germans and Civilians run
                                                            as high as 35,000




    Arnhem Bridge: After the Battle




                                                                      Operation Market Garden
                                                                          immortalized in
                                                                       Cornelius Ryan’ Book
          “Out of Ammunition.                                             and MGM Movie
          God Save the King.”
Open Domain QA Workshop
ACL - 6 July 2001 - Prange
                                  RAF destroys Arnhem Bridge - Oct 1944                       12
                   A Quick Look Back into the History of
                      Human Language Technologies

                                            “Right Problem           “A Bridge
                                             at Right Time”           Too Far”

            1950’s & 60’s: Scientific                               ALPAC Report
            Machine Translation                                         1966

                                          Introduction of Noisy
            Early 1980’s: Automatic
                                         Channel Model, HMM’s,
            Speech Transcription
                                        & Stat. Language Modeling

            Mid-Late 1980’s: Expert                                     “AI
            Systems & Promise of AI                                    Winter”

           Early 1990’s: Information     TIPSTER Text Program,
           Extraction & Retrieval       MUC & TREC Workshops

Open Domain QA Workshop                                                            13
ACL - 6 July 2001 - Prange
                                              Question?
              Advanced Question Answering: Which is it?




                             “Right Problem               “A Bridge
                                 at the
                              Right Time”                  Too Far”




Open Domain QA Workshop                                               14
ACL - 6 July 2001 - Prange
                               Outline

         • A Tale of Two Bridges

         • Advanced Question Answering
                – A Vision
                – One Plan of Attack
                – Challenges

         • Some Final Thoughts


Open Domain QA Workshop                  15
ACL - 6 July 2001 - Prange
                               How Do We Find
                             Information Today?



                                    Question
                                     ????



                             Let’s Start with a
                             Simple, Factual, Question ---

                                   Where is the Taj Mahal?


Open Domain QA Workshop                                      16
ACL - 6 July 2001 - Prange
                               How Do We Find
                             Information Today?



                                    Question    ???

                                     ????



                             Let’s Start with a
                             Simple, Factual, Question ---

                                   Where is the Taj Mahal?


Open Domain QA Workshop                                      17
ACL - 6 July 2001 - Prange
                                   Traditional Information
                                   Retrieval (IR) Approach

                             Question ?



                                 System Specific
                                      Query
                              e.g. Boolean Key Word
                                     Equation
                                                                                                                           Data     Traditional
                                                                                                   Ranked List of
                                                                                                                         Source
                                                                                                 Hopefully “Relevant”
                                                                                                                        e.g Large
                                                                                                                                    Information
                                      ..........                                                     Documents                      Retrieval
                                      . ........ ...... .... .
                                      . ..................... .                                                            Text
                                      . ........................ .
                                      . ...................................... .
                                      . ........................................... .                                    Archive
                                      . .............................. ..... .. .... .. .. .
                                      . ................ .. ... ... .................. .
                                      . ................................................. .
                                      . .................................................... .
                                      . ................ .. ... . ...... ............... .
                                      . ................................................... .
                                         . .......... . ................................ .
                                            . ................. ...... ............... .
                                               . ....... . . .. .. .. .. . . . . . .
                                                   . ............................... .
                                                      . ....... . . . . . . . .
                                                           . ..................... .
                                                                 . .................. .
                                                                       ..........




Open Domain QA Workshop                                                                                                                           18
ACL - 6 July 2001 - Prange
                      Use Your Favorite Search Engine
       Where is the Taj Mahal?




                                      Answer: Agra, India
                                      Or Is It ??? It Depends !!!
Open Domain QA Workshop                                             19
ACL - 6 July 2001 - Prange
                             Alternative Answer #1
 Where is the Taj Mahal (“Hotel”)?




                                            Answer: Bombay
                                             (Mumbai), India
Open Domain QA Workshop                                        20
ACL - 6 July 2001 - Prange
                             Alternative Answer #2

Where is the (“Trump”) Taj Mahal?




                                         Answer: Atlantic City, NJ

Open Domain QA Workshop                                              21
ACL - 6 July 2001 - Prange
                             Alternative Answer #3

  Where is the Taj Mahal (“Restaurant”)?




Open Domain QA Workshop
ACL - 6 July 2001 - Prange
                                   Answer: Utrecht, The Netherlands   22
                               Next Generation Approaches:
                             Question Answering (QA) Systems
                                            Single, Factoid
    Move Closer                               Question ?
  to the Question
   e.g. Question
   Classification                             System Specific
                                            Query; often Tailored
                                              to Question Type
                                                                                                                     Ranked List of       Single   Traditional
                                                . .. .. .. .. .. .. .. .. .. .
                                                                                                                   Hopefully “Relevant”    Data    Information
                      QA                        . ..........
                                                . . .. .. .. .. .. .. .. .. .. . .
                                                . .. .. ... .......................... ... .. .. .
                                                                                                                       Documents          Source   Retrieval
                                                . . . .. .. ............ .. . . .
                                                . . .. .. .. .. ................ .. .. . .
                                 Shallow        . .. .. ... ...................................... .. .. . .
                                                . . . .. .. ................. .. .. .. . .
                                                . . .. .. .. .. ................ .. .. . . .
                                 Analysis       . .. .. ... ...................................... .. .. . .
                                                . . . .. .. ................. .. .. .. . .
                                                . .. .. ... ... ............................. ... .. . . .
                                                      . .. ... ... ... ....................... .. .. .. .. .
                                                                                ........ .
                                                              . .. ... .... ......................... ... .. . .
                                                                                                          .
                                                                       . .. ... ... ... ... ... ... ... ... .. .
                                                                                                               .
                                                                                ..........
   Move Closer
  to the Answer
   e.g. Passage
     Retrieval



                                                   “Answer”
Open Domain QA Workshop                                                                                                                                     23
ACL - 6 July 2001 - Prange
                             TREC QA Track Approach

      • ARDA & DARPA co-sponsoring the Question Answering Track in
        the NIST’s organized Text Retrieval Conference (TREC) Program.
        (Starting with TREC-8 in Nov 1999)
      • TREC-9 Results (Nov 2000):
             – 693 factual questions; answer
                 guaranteed to be found within                                                       QA Track Results-TREC 9 (Nov 2000)
               at least one news story                                                        700




                                                  (Within Top 5 Response - 250 Byte Region)
                                                                                                    598
             – Data source: approx. 2 GByte                                                   600


               database of 500K+ news



                                                         Q's with Correct Answer
                                                                                              500
                                                                                                          429   428   430
               stories                                                                        400
                                                                                                                            399   386     394
                                                                                                                                                345


             – 28 US & international                                                          300

               organizations participated;                                                    200

               78 separate runs evaluated                                                     100

             – System output: top 5 regions                                                     0
                                                                                                    1     2     3     4     5     6       7     8
               (50 bytes or 250 bytes) in a                                                                           Systems
               single story believed to contain
               Answer to the given question
Open Domain QA Workshop                                                                                                                               24
ACL - 6 July 2001 - Prange
                             “Ask Jeeves” Approach

                                              •Start with Your Question

                                              • Identify Key Words &
                                                 Classifies the Type of
                                                 Question

                                              • Respond with rephrased
                                                “Questions” for which
                                                “Ask Jeeves” knows the
                                                Answer

                                              • Provide Additional Web
                                                Sites as a fall back position
                                                (a la --- a more traditional
                                                 web search engine)

Open Domain QA Workshop                                                    25
ACL - 6 July 2001 - Prange
                      Structured Knowledge-Base Approach
                                                                                                                                                            •Create comprehensive
         Rapid Knowledge Formation (RKF)                                                                                                                     Knowledge Base(s) or
                                                                                                                                                             other Structured Data
                                                                                                                                                             Base(s)
           Direct Knowledge                Rapid Knowledge                             Comprehensive
           Entry by Domain                    Formation                                (Million-Axiom)                                                      • At the 10K Axiom
                Experts                                                               Knowledge Bases                                                         Level -- Capable of
                                                    Development Time
                                          1,000 K                                    1,000 K
                                                                                                  Crisis
                                                                                               Understanding
                                                                                                  Gene rate plau sib le
                                                                                                                              Commander’s
                                                                                                                               Associate
                                                                                                                               Gene rate po ss ible
                                                                                                                                                              Answering factual
                                                                                                                                                              questions within
                              Upper

                                                                     d
                             Ontology
                                                                                                   crisis scenarios                 e
                                                                                                                               cour s s of actions
                             Mid-Level
                                                                ir e                             Uncover co nnected           Perfor m vulnerab ility

                                                              qu
                             Theorie s                                                            activities, thre ats              ana lyses


                                                         Re                                                                                                   domain
                               -S ecifi
                         Domain p c
                            Theorie s                                                            Reason a bout no vel          Reason a bout no vel
                                                                                                   crisis situations           batt le fie ld sit uations
                                                                                                Mon ito r and in terpre t     Mon ito r and in terpre t
                                                                                                massive da ta steams          chang in g battlefield
                                                                                                                                      event s




                Parallel                                    HPKB                                                                                            • At the 100K Axiom
                                           100 K                                      100 K    Answe r cause & effe ct       Answe r question s about
                                                                                                   t
                                                                                               que s io ns about events             e
                                                                                                                                for c capabilities
                                             10 K                                      10 K
           Development by
                                                                                               Retrieve f acts relevant to   Answe r question s about
                                                                                                        a crisis


                                                                                      Biological Weapons (BW)
                                                                                                                                      ter rain
                                                                                                                                                              Level -- Answer cause
          Distributed Teams
                                                            6 Months     12 Months


                                               Need to create new
                                                                                      Knowledge
                                                                                        • Basic knowle dge of space , time,
                                                                                                                                                              & effect/capability
                                            knowledge at a rate of 400                    causality, ge neral physics                                         Questions
                                                axioms per hour                         • Biology, & biolo gica l threats
                                                                                        • BW R&D, produ ce , weapo nize
                                            (With HPKB tech nology, a 5-person          • Geo-po litical beha vior & t errorism
                                             tea m can crea te kno wledg e at a
                                                rate o f 40 axio ms pe r h our)
                                                                                                                                                            • At the 1000K Axiom
                                                                                                                                                              Level -- Answer Novel
                                                                                                                                                              Questions; ID
                Deepest QA but Limited to Given Subject Domain                                                                                                alternatives

Open Domain QA Workshop                                                                                                                                                               26
ACL - 6 July 2001 - Prange
                         Advanced Question Answering
        In a foreign news broadcast a team of analysts observe a previously
     unknown individual conferring with the Foreign Minister. They suspect that
                        he/she is really a new senior advisor.

                                            What influence
                                                              Does this signal
                                What are     does he/she
                                                                that other
                                his/her      have on FM?
                                                              policy changes
                                views?
                                                               are coming?
                   What do we
                   know about
                    him/her?

                  Who is this                                     And still more
                   advisor?                                       questions ???


                                                              Overarching Context /
                                  Information Analysts       Operational Requirement
Open Domain QA Workshop                                                                27
ACL - 6 July 2001 - Prange
                         Advanced Question Answering
                                                 Judgement      Predictive
                                 Interpretive    Questions?     Questions                                                                                              Overarching Context /
  Interpreting                   Questions?                         ?
                              Why
                                                                                                                                                                      Operational Requirement
   Complex
                          Questions
  QA Scenario                  ?                                         Other
    within a            Factoid                                        Questions?
 Larger Context        Questions                                                                                                                                            Voice
                           ?                                                                                                                                                    Text
                                                                 System Specific                                                                                                 Multi-Media
                             Information                      Queries; Fully Tailored                                                                                                Structured
                                Analysts                      to Series of Questions                                                                                                       Other
    Deeper                                                                                                                                                     Ranked                              Extend
                                                Extract &
                                                                                                                                                               Lists of                            Traditional
   Automated                                     Analyze                      ..........                                                                                                           Information
                                                                   . . . . . . .. . . . .. .. .. .. . .
                                                                              ......
                                                                                                                                                             “Relevant”
 Understanding                                   Results           . . .. .. .. .. .. ... ... ... ................ ... .. . .
                                                                              ......
                                                                   . .. .. ..... .. .. ... ... .................... .. .. . .
                                                                              . . .. . . .                                                                                                         Retrieval
                                                                   . . . ............................................ ... ... .. . .
                                                                              . . .. . . .
                                                                                                                                                             Data Objects           Multiple
                                                                   . . .. .... ... ........................................... ... ... ... .. . .
                                                                              . . .. .
                                                                   . .. .. ......................................................... .. .. . . . .
                                                                              . . .. . . . . .
                                                                   . . . ................................................... ... ... .. .. .
                                                                              . . .. . . .. . .
                                                                                                                                                                                 Heterogeneous
                                                                   . . .. .... ... .................................................... .. .. .. .
                                                                              . . . . ... . . .
                                                                   . .. .. ............................................................. .. .. .. .
                                                                              . . .. . . . .. . .. . . . . .
                     Advanced                                      . . . ...................................................... .. .. .. . .
                                                                              . .. . . . . . .
                                                                   . . .. .... ... ...................................................... . . .
                                                                              . ..
                                                                                            . . . .. ...... . .. . .
                                                                                                                        .. . .                                                        Data
                                                                   . .. .. ............................................................... ... ... .. .. .
                                                                                                                   ........ .. .. .. .. .. .. .
                                                                                            . . . ..
                                                                         . .. ... ..................................... ... ... ... .. . . .
                                                                                                         . .. .. ... . .
                                                                                . .. ... ....................... ... ... .. . .. .. .. .. .
                                                                                                                                                                                    Sources
Provide Answers      QA                                                                                        . . . . .. .
                                                                                         . .. ... ... ... .. ... ... .... ... . . . . . .
                                                                                                    . . . . . . . .. .. .
                                                                                                                         .. ..               .
                                                                                                                                                         .


   in a Form
 Analysts Want
                                  Interpret Results
                              & Formulate the Answers
                                                                                                                                                   Answers


Open Domain QA Workshop                                                                                                                                                                                   28
ACL - 6 July 2001 - Prange
                                             Advanced Question Answering Is
                                             Skipping Ahead Two Generations

                                                                   Multiple Key
                                                                   Barriers to
                                                                   Content
                                                                   Understanding
                                                                   Will Be
                                                                   Aggressively
                                                                   Attacked




  Commercial World & Current R&D Efforts
  Are Addressing the Next Generation
  But Only Selected Content Understanding
  Barriers Are Being Aggressively Attacked
Open Domain QA Workshop                                                            29
ACL - 6 July 2001 - Prange
                                   Outline

         • A Tale of Two Bridges

         • Advanced Question Answering
                – A Vision
                – One Plan of Attack:
                     ARDA’s newest R&D Program:
                     Advanced QUestion Answering for
                     INTelligence (AQUAINT)
                – Challenges

         • Some Final Thoughts
Open Domain QA Workshop                                30
ACL - 6 July 2001 - Prange
                                     AQUAINT:
                                    One Plan of Attack

       • ARDA’s newest R&D Program
              – Jointly designed / managed by multiple US Government agencies
              – Envisioned as a high risk, long term R&D Program consisting of
                three 2-year phases
       • Focus on Final Objective from start
              – Incrementally add media, data sources, & complexity of
                questions & answers during each phase
       • Each of AQUAINT’s 3 Phases:
              – Use Zero-Based, Open BAA-styled Solicitations
              – Focus on Key Research Objectives
              – Be Closely Linked to Parallel System Integration/Testbed Efforts
       • BAA for AQUAINT Phase I
              – Issued by ARDA in April 2001
              – Solicitation Period ended 6 June 2001
              – Contract awards anticipated in early Fall 2001.
Open Domain QA Workshop                                                            31
ACL - 6 July 2001 - Prange
                                                             AQUAINT:
                          R&D Focused on Three Functional Components

                                                                Other Analysts   Knowledge Bases;                        Partially
                                                                                 Technical                                 Annotated &
                           Question & Requirement                                Databases          Supplemental             Structured Data
                                                                                                        Use
                         Context; Analyst Background
                                                                                                                                   Automatic
      QUESTION                     Knowledge
                                                                                                    KB                               Metadata
                                                                                                    Queries                            Creation
        ????                                                     Query                                                       Multiple
                                                              Assessment,                        Translate Queries           Source
                         Natural Statement of                                                    into Source Specific
                                                               Advisor,                                                      Specific
                         Question;                                                               Retrieval Languages
                                         Use of              Collaboration       Queries                                     Queries
    Answer
                         Multimedia Examples              Question                               Single, Merged
    Context
                                                                    Question &                   Ranked List of              Multiple
                               Clarification             Under-        Answer                    Relevant “Documents”        Ranked
                                                       standing and Context                                       Relevant
                                                                                                                               Lists
                                                                                                                                          Supple-
                                                                                                                                           mental
                                                                                            Relevant                                      Use
                              FINAL                   Interpretation                       “Knowledge”         “Documents”

                               ANSWER                                             • Relevant information
     Analyst
                   Proposed                        Query Refinement                 extracted and combined
      Feed-
                   Answer                        based on Analyst                   where possible;                  Multiple
       back
                                               Feedback                           • Accumulation of Knowledge        Sources;
                                                                                    across “Documents”               Multiple Media;
                                                                                                                     Multi-Lingual;
                                                                                  • Cross “Document”                 Multiple Agencies
               • Formulate Answer for                    Results of Analysis        Summaries created;
                 Analyst in form they want                                        • Language/Media
               • Multimedia Navigation
                                                       Iterative Refinement
                                                                                    Independent Concept                 Determine
                 Tools for Analyst Review                                           Representation
                                                         of Results based                                                  the
                                                       on Analyst Feedback        • Inconsistencies noted;
                     Answer                                                       • Proposed Conclusions                 Answer
                   Formulation                                                      and Inferences Generated

Open Domain QA Workshop                                                                                                                             32
ACL - 6 July 2001 - Prange
                                           AQUAINT:
                             Cross Cutting/Enabling Technologies R&D Areas


     Specifically Solicited Research Areas include:
     1) Advanced Reasoning for Question Answering
     2) Sharable Knowledge Sources
     3) Content Representation
     4) Interactive Question Answering Sessions
     5) Role of Context
     6) Role of Knowledge
     7) Deep, Human Language Processing and Understanding


Open Domain QA Workshop                                                      33
ACL - 6 July 2001 - Prange
                                              AQUAINT:
                                     Separate, Coordinated Activities

                Component Integration and System Architecture Issues
                Component Level / End-to-End Testing & Evaluation


                QUESTION                                                         Separate
                  ????                    Question       Information            Coordinated
                                           Under-         Retrieval
                                          standing         Process               Activities
                                          and Inter-
                                          pretation
                             FINAL
                              ANSWER
                                                                         AQUAINT
                                                        Analysis &        Phase I
                         Answer
                                                        Synthesis
                                                         Process        Solicitation
                       Formulation
                                                           Determine
                                                          the Answer


                 Cross Cutting/Enabling Technologies Research Issues
                        Annotated and ‘Ground Truthed’ Data
Open Domain QA Workshop                                                                       34
ACL - 6 July 2001 - Prange
                                                   AQUAINT:
                                                Intermediate Goals

                        FULL COMPLEXITY OF QUESTIONS & ANSWERS RANGES:
           FROM:                                                TO:
           Questions: Simple Facts                           Questions: Complex; Uses Judgement Terms
                                                                        Knowledge of User Context Needed;
                                                                        Broad Scope

           Answers: Simple Answers found in                  Answers: Search Multiple Sources (in multiple
                    Single Document                                   Media/languages); Fusion of
                                                                      information; Resolution of conflicting
                                                                      data; Multiple Alternatives; Adding
                                                                      Interpretation; Drawing Conclusions



                             Increasing Complexity Levels of Questions & Answers


             Level 1                      Level 2                Level 3                       Level 4
              ”Simple                 "Template &            “Cross Media &                 ”Context-Based
           Factual QA’s"             Multi-valued QA’s”   Cross Document QA’s"              QA Scenarios”




         Current                     Near Term             Mid Term                      Long Term
Open Domain QA Workshop                                                                                        35
ACL - 6 July 2001 - Prange
                                               AQUAINT:
                                                 Data Types

        Structured / Semi-Structured                  Unstructured
                                                                                         Technical /
                             “Tagged Data”                                                Abstract
                                                             Visual
            KB’s      DB’s   (e.g. Web Data)                  Data
                                                                                Sensor           Geospatial

                                                    Video        Still Images
                                  Human                                           Economic     Other
                                 Language

           Media                 Language            Genre

                                                            Newswire /
               Text                 English                 News Broadcast
               Documents            Foreign
                                    Language 1              Technical

               Speech               Foreign                 Formal / Informal
                                    Language 2              Communication
               Multi-Media          Foreign
                                    Language N              Other
Open Domain QA Workshop                                                                                       36
ACL - 6 July 2001 - Prange
                                               AQUAINT:
                                                 Data Types

        Structured / Semi-Structured                  Unstructured
                                                                                          Technical /
                             “Tagged Data”                                                 Abstract
                                                             Visual
            KB’s      DB’s   (e.g. Web Data)                  Data
                                                                                Sensor            Geospatial

                                                    Video        Still Images
                                  Human                                           Economic      Other
                                 Language
                                                                                     DATA FOCUS OF
           Media                 Language            Genre                            RELATED QA
                                                                                  PROGRAMS / ACTIVITIES
                                                            Newswire /
               Text                 English                                               Commercial
                                                            News Broadcast                “Ask Jeeves”
               Documents            Foreign                                              DARPA’s DAML
                                    Language 1              Technical
                                                                                          DARPA’s RKF
               Speech               Foreign                 Formal / Informal      DARPA’s TIDES & TDT
                                    Language 2              Communication
                                                                                         TREC QA Track
               Multi-Media          Foreign
                                                            Other                        ARDA’s VACE
                                    Language N
                                                                                         ARDA’s GI2Vis
Open Domain QA Workshop                                                                                        37
ACL - 6 July 2001 - Prange
                                            AQUAINT:
                                      Phase I Data Dimensions

  Data Dimension                          Requirement                           Example

  1. Focused                 Single media, Single language, and           English
                             single genre in an unstructured data         newspaper/newswire
                             Source                                       articles (text)
  2. Multiple Media          Two or more of the following: text (clean,   Question where the
                             degraded, and speech recognition             answer is summarization
                             produced), raw speech, still imagery,        of information found in
                             video data, abstract data (technical,        video clips & may contain
                             geospacial), and related media               a table of technical data
                                                                          extracted from various
                                                                          sources (geospacial, text,
                                                                          etc.)
  3. Cross Lingual           English questions with foreign language      English question with
                             references and passages. Foreign             answer derived from
                             languages could be expressed using any       single media (newswire)
                             number of foreign character scripts and      material in Chinese or
                             encoding schemes.                            Arabic and other
Open Domain QA Workshop
                                                                          language.
                                                                                                    38
ACL - 6 July 2001 - Prange
                                           AQUAINT:
                                      Phase I Data Dimensions

  Data Dimension                         Requirement                           Example

 4. Multiple Genre           Formal and informal correspondence          Question with answer
                             (various media), formal dialog, informal    derived from formal
                             conversations or discussions,               correspondence and
                             technical/journal articles,                 journal articles
                             newswire/broadcast news;
                             advertisements; product and technical
                             descriptions, government reports; public
                             databases

 5. Structured &             Tables, charts and maps, diagrams, linked   Question with answer
    Unstructured             data or directed graph data, structured     derived from knowledge
                             databases, structured transactions; large   base and substantiated
                             knowledge bases; linked web/pages; and      with information from
                             html/xml documents PLUS unstructured        technical journal.
                             data from one of the media, lingual or
                             genre dimensions.


Open Domain QA Workshop                                                                           39
ACL - 6 July 2001 - Prange
                                  AQUAINT:
                             System Integration/Testbed


      • Government-led effort:
             – Directly Linked into Sponsoring Agency’s Technology Insertion
               Organizations
             – Close, working relationship with working Analysts
             – Provide external system development support
             – Utilize external researchers as Consultants / Advisors
      • Pull together best available system components
        emerging from AQUAINT Program research efforts
             – Couple AQUAINT components with existing GOTS and COTS
               software
      • Develop end-to-end AQUAINT prototype(s) aimed at
        specific Operational QA environments

Open Domain QA Workshop                                                    40
ACL - 6 July 2001 - Prange
                               Outline

         • A Tale of Two Bridges

         • Advanced Question Answering
                – A Vision
                – One Plan of Attack
                – Challenges

         • Some Final Thoughts


Open Domain QA Workshop                  41
ACL - 6 July 2001 - Prange
                             Top 10 Challenges

         1) Satisfy QA requirements of the “Professional”
            Information Analyst




Open Domain QA Workshop                                     42
ACL - 6 July 2001 - Prange
                Professional Information Analysts:
                       Target Audience for AQUAINT -- Who are They?


     • For ARDA they are:
            – Government and Military Analysts

     • But they could also be:
            –    Investigative / “CNN-type” Reporters
            –    Financial Industry Analysts / Investors
            –    Historians / Biographers
            –    Lawyers / Law Clerks
            –    Law Enforcement Detectives
            –    And Others

                                                         Professional Information
Open Domain QA Workshop                                         Analysts            43
ACL - 6 July 2001 - Prange
                Professional Information Analysts:
                              What do They have in Common?

  • They are not casual users of a data and information
  • They work in an information rich environment where they already
    have access to large quantities of heterogeneous data
  • They are almost always subject matter experts within their
    assigned task areas
  • They track and follow a given event, scenario, problem, or
    situation for an extended period of time
  • They are focused on their assigned task or mission
    and will do whatever it takes to accomplish it
  • The end product that results from their
    analysis is often judged against the
    standards of:
                     Timeliness     Accuracy    Usability
                     Completeness   Relevance               Professional Information
Open Domain QA Workshop                                            Analysts            44
ACL - 6 July 2001 - Prange
                             Top 10 Challenges

         1) Satisfy QA requirements of the “Professional”
            Information Analyst
         2) Pursue QA Scenarios and not just isolated,
            factually based QA




Open Domain QA Workshop                                     45
ACL - 6 July 2001 - Prange
                         Implications of QA Scenarios
  • Requires handling a Full Range of Complexity & Continuity of
    Questions
  • Need to understand & track the analysts’ line of reasoning
    and flow of argument
  • QA System requires significantly greater insight into
    knowledge, desires, past experiences, likes and dislikes of
    “Questioner”
                                                               Judgement    Predictive
                                                               Questions
  • Place much higher value on                  Interpretive
                                                Questions?         ?
                                                                            Questions
                                                                                ?
    recognizing and capturing                Why
                                          Questions
    “background” information                   ?                                  Other
                                        Factoid                                 Questions?
                                       Questions
  • Questioner/System dialogue             ?

    is now more than just a                                             Overarching Context /
                                          Information
    means for clarification                                            Operational Requirement
                                             Analysts
Open Domain QA Workshop                                                                      46
ACL - 6 July 2001 - Prange
                             Top 10 Challenges

         1) Satisfy QA requirements of the “Professional”
            Information Analyst
         2) Pursue QA Scenarios and not just isolated,
            factually based QA
         3) Support a collaborative, multiple analyst
            environment




Open Domain QA Workshop                                     47
ACL - 6 July 2001 - Prange
                                   Collaboration within QA
    • Standard Collaboration                                                  • Non-Standard Discovery
         (From an Analyst Perspective)                                          (From a System Perspective)
           – Who else is working all or a                                       – Identify previous QA
             portion of my task?                                                  Scenarios that have
                                                                                  “similarity” to current QA
           – What do they know that I                                             Scenario. Compare &
             don’t and vice versa?                                                Contrast
           – Can we share/work together?                                        – Use / Build-on / Update
                                                                                  previous results
                                           Other Analysts   Knowledge
                                                            Bases;Technical
                                                                                – Uncover new data sources
               Question & Requirement
             Context; Analyst Background                    Databases
                                                                                – Borrow a successful “line
QUESTION               Knowledge
  ????                                         Query                              of reasoning” or
             Natural Statement of
             Question;
                                             Assessment,
                                              Advisor,                            “argument flow”
                                            Collaboration        Focus
                                                                                – Alerts analyst to different
                            Use of
             Multimedia Examples
                                           Question
                   Clarification
                                           Understanding                          interpretations or to
                                           and Interpretation
                                                                                  overlooked / undervalued
Open Domain QA Workshop
ACL - 6 July 2001 - Prange
                                                                                  data                        48
                             Top 10 Challenges

         1) Satisfy QA requirements of the “Professional”
            Information Analyst
         2) Pursue QA Scenarios and not just isolated,
            factually based QA
         3) Support a collaborative, multiple analyst
            environment
         4) Some times SMALL things really matter and
            other times BIG things don’t



Open Domain QA Workshop                                     49
ACL - 6 July 2001 - Prange
       “Small & Big” - Can we tell the difference?

      • Some times SMALL differences can produce
        significantly different results/interpretations:
             – Stop Words
                • “Books {by; for} kids”
             – Attachments
                • “The man saw the woman in the park with the telescope.”
             – Co-reference
                • “John {persuaded; promised} Bill to go. He just left.”
                • “Mary took the pill from the bottle. She swallowed it.”
      • Other times BIG differences can produce the
        same/similar results:
             – “Name the films in which Jude Law starred.”
             – “Jude Law played a leading role in which movies?”
             – “In what Hollywood productions did Jude Law receive top billing?”

Open Domain QA Workshop                                                            50
ACL - 6 July 2001 - Prange
                             Top 10 Challenges

         1) Satisfy QA requirements of the “Professional”
            Information Analyst
         2) Pursue QA Scenarios and not just isolated,
            factually based QA
         3) Support a collaborative, multiple analyst
            environment
         4) Some times SMALL things really matter and
            other times BIG things don’t
         5) Advanced QA must attack the “Data Chasm”

Open Domain QA Workshop                                     51
ACL - 6 July 2001 - Prange
                   AQUAINT: Attacking the Data Chasm
              Today                           Level I                       Level II                 Future
                                                                                                    Level III

                                           Mulit-Valued
   Questions                             Factual Questions

               Single                                                      Cross Media                Full
              Factual                                                    Cross Document           Context-Based
              Isolated                                                  Simple Judgement            Question
             Questions                                                                              Scenario

Data Chasm
                                                MANY Heterogeneous            Increasing
   Missing               Contradictory                                                         Synthesis Across
                                                     Data Sources;             Volumes
    Data                     Data                                                             Media/”Documents”
                                              All Types, Sizes, Locations   (Petabyte & up)


   Answers                                                              Variable Narrative
                                                                                                 Fully Intersected;
                                                                                                  Automatically
                                                                            Summary;
           50/250 Byte                                                                              Generated;
                                         Fixed Templates                   Multi-Media
          Passage from                                                                                Variable
                                                or                        Presentations;
           Single Text                                                                               Structure/
                                           Tabular Lists                Simple Interpreted
            Document                                                                                  Format;
                                                                             Results
                                                                                                   Full Context
 Open Domain QA Workshop                                                                            Responses 52
 ACL - 6 July 2001 - Prange
                             Top 10 Challenges

         1) Satisfy QA requirements of the “Professional”
            Information Analyst
         2) Pursue QA Scenarios and not just isolated,
            factually based QA
         3) Support a collaborative, multiple analyst
            environment
         4) Some times SMALL things really matter and
            other times BIG things don’t
         5) Advanced QA must attack the “Data Chasm”
         6) Time is of the Essence
Open Domain QA Workshop                                     53
ACL - 6 July 2001 - Prange
                             Time: Our Achilles Heel?
  • Real Difficulties Exist in:
         – Extracting, correctly interpreting time references
           & then creating manageable timelines
         – Estimating & updating changing reliability
           of information over time
         – Processing information in time sequence
         – Tracking the details of an evolving event over
           time -- A whole different set of problems
  • And of course:
         – We can’t forget all of the issues related to the
             timeliness of the system’s response to our
             question(s) -- we’ll need at least “near real
           time responses”


                    March      April   May   June   July      August
Open Domain QA Workshop                                                54
ACL - 6 July 2001 - Prange
                             Top 10 Challenges

         7) Must extract, represent and preserve information
            uncovered when searching for answers




Open Domain QA Workshop                                        55
ACL - 6 July 2001 - Prange
             QA Scenarios: A Different Paradigm?
     • Current Analytic Paradigm:               • A Different Paradigm may be
          – Sequentially “Filter Down” to the     useful when handling QA
            final result                          Scenarios:
                  Data                              – Cast a “wider net” while searching
                                                      for “golden nuggets” (Answers)
                                                How Wide to                          What Info to Retain?
                                                Cast the “Net”?                             In what form?
                                                                                           For how long?
                                                  Background
                             Processing &
                               Analysis



                                                 Answers                                        Discarded
                                                            Space of Data Objects and Sources
                               Results              – Automatically Extract, Represent,
                                                      and Preserve “closely related”
          – Works when QA’s are
                                                      background information within
            independent, isolated activities
                                                      context of the QA Scenario
Open Domain QA Workshop                                                                                56
ACL - 6 July 2001 - Prange
                             Top 10 Challenges

         7) Must extract, represent and preserve information
            uncovered when searching for answers

         8) Rapidly increasing importance of Knowledge of
            all types -- regardless of the approach




Open Domain QA Workshop                                        57
ACL - 6 July 2001 - Prange
                                                 Complex QA:
                  The Need for Ever Increasing Knowledge -- Of All Types


       DIMENSIONS OF THE QUESTION                                DIMENSIONS OF THE ANSWER
        PART OF THE QA PROBLEM                                   PART OF THE QA PROBLEM

                                 Scope                                            Multiple
                                                                                  Sources
                                  Advanced                            Simple       Advanced
                Simple            QA                                  Answer,      QA
                Factual           R&D                                  Single      R&D
                Question          Program                             Source       Program




                                              Judgement                                     Interpretation
                                                                                       Increasing
                                      Increasing
                                      Knowledge                                        Knowledge
           Context                    Requirements **             Fusion               Requirements **




                      ** Knowledge Requirement would be better represented with a
                             whole “quiver of arrows” of different sizes, lengths and types
Open Domain QA Workshop                                                                                      58
ACL - 6 July 2001 - Prange
                             Top 10 Challenges

         7) Must extract, represent and preserve information
            uncovered when searching for answers

         8) Rapidly increasing importance of Knowledge of all
            types -- regardless of the approach

         9) Expanding requirements for more advanced
            learning and reasoning methods/approaches




Open Domain QA Workshop                                         59
ACL - 6 July 2001 - Prange
                 Improved Reasoning & Learning
        In a foreign news broadcast a team of analysts observe a previously
     unknown individual conferring with the Foreign Minister. They suspect that
                        he/she is really a new senior advisor.

                FOCUS                       What influence
                                                              Does this signal
                                What are     does he/she
                                                                that other
                                his/her      have on FM?
                                                              policy changes
                                views?
                                                               are coming?
                   What do we
                   know about
                    him/her?

                  Who is this                                     And still more
                   advisor?                                       questions ???


                                                              Overarching Context /
                                  Information Analysts       Operational Requirement
Open Domain QA Workshop                                                                60
ACL - 6 July 2001 - Prange
              Improved Reasoning & Learning
   Advanced Reasoning:        Follow-up                                                            Follow-up
                                Leads                                                                Leads
   • Use Multi-level Plans
   • Create and evaluate complex
     chains of reasoning
                                                                                   Education
   • Reason across hetero-                      TV & Radio
     geneous data sources    Collected         Broadcasts,                            Past
                                                                                                    Raw “Bio”
                                               Newspapers                           Positions
   • Infer answers from        Views
                                                 & Other
                                                                                                   Information
                                                                                      Family
     data extracted from                         Archives         New Senior
     multiple sources when                                         Advisor            Travels
     the answer is not                                    Cross Fertilization          Other
     explicitly stated                                                                Activities
   • Utilize Link Analysis & Summarized
     Evidence Discovery              Results                                                 Summarized
                                                        “Views:           “Bio”            Results
   • Plus other strategies                            Past & Present”    ………..….
                                                                         ……..…….
                                                     .….… ….…..          ………..….
                                                     .…….     ….…..      ……..…….
                                                     .…….     ….…..      ………..….
                                                     .…….     ….…..      ……..…….
                                                     .…….     ….…..      …………...




Open Domain QA Workshop                                                                                          61
ACL - 6 July 2001 - Prange
              Improved Reasoning & Learning
   Advanced Reasoning:         Follow-up                                                            Follow-up
                                 Leads                                                                Leads
   • Use Multi-level Plans
   • Create and evaluate complex
     chains of reasoning
                                                                                    Education
   • Reason across hetero-                       TV & Radio
     geneous data sources    Collected          Broadcasts,                            Past
                                                                                                     Raw “Bio”
                                                Newspapers                           Positions
   • Infer answers from         Views
                                                  & Other
                                                                                                    Information
                                                                                       Family
     data extracted from                          Archives         New Senior
     multiple sources when                                          Advisor            Travels
     the answer is not                                     Cross Fertilization          Other
     explicitly stated                                                                 Activities
   • Utilize Link Analysis & Summarized
     Evidence Discovery               Results                                                 Summarized
                                                         “Views:           “Bio”            Results
   • Plus other strategies                             Past & Present”    ………..….
                                                                          ……..…….
                                                      .….… ….…..          ………..….
    Advanced Learning:                                .…….     ….…..      ……..…….
                                                      .…….     ….…..      ………..….
                                                      .…….     ….…..      ……..…….
    • Automatically                                   .…….     ….…..      …………...
      learn new or modify
      existing reasoning
      strategies
Open Domain QA Workshop                                                                                           62
ACL - 6 July 2001 - Prange
                             Top 10 Challenges

         7) Must extract, represent and preserve information
            uncovered when searching for answers
         8) Rapidly increasing importance of Knowledge of all
            types -- regardless of the approach
         9) Expanding requirements for more advanced
            learning and reasoning methods/approaches
         10) Discovering the correct answer will be hard
           enough; but crafting an appropriate, articulate,
           succinct, explainable response will be even harder


Open Domain QA Workshop                                         63
ACL - 6 July 2001 - Prange
           Difficulties in Generating Answers
      • Natural Language Generation continues to be a difficult, open
        research area.
             – Adding the requirement to generate multimedia answers makes this
               problem even harder.
      • Providing the ability to explain and/or justify answers also
        continues to be a difficult, open research area.
             – The more complex the line or chain of reasoning, the more complex
               the explanation and/or justification
      • In addition, QA Scenarios add another level of complexity. The
        same question asked by different end users within different
        scenarios could produce substantially different:
             – Answer content because of different analyst’s background, needs,
               & desires
             – Answer format, structure, depth and/or breadth of coverage
             – Or even both
Open Domain QA Workshop                                                            64
ACL - 6 July 2001 - Prange
                               Outline

         • A Tale of Two Bridges

         • Advanced Question Answering
                – A Vision
                – One Plan of Attack
                – Challenges

         • Some Final Thoughts


Open Domain QA Workshop                  65
ACL - 6 July 2001 - Prange
                         Returning to our Question
              Advanced Question Answering: Which is it?




                             “Right Problem   “A Bridge
                                 at the
                              Right Time”      Too Far”




Open Domain QA Workshop                                   66
ACL - 6 July 2001 - Prange
                             Is it “A Bridge Too Far?”

        Are The Top Ten Challenges Overwhelming?
      • “… [Advanced Question Answering] outlines an interesting
        set of goals, … bordering on “AI-complete”
      • “Advances in natural language understanding/
        processing (computational linguistics) will make only
        limited contributions during the next decade to
        application areas in which it could be a powerful
        enabling technology. . . .
            “This reflects the inherent difficulty of the problems
            involved, the limited state of the current base of
            knowledge/understanding within the field.”
              So Some Clearly Feel Advanced QA is
                   FAR TOO HARD of a problem!
Open Domain QA Workshop                                              67
ACL - 6 July 2001 - Prange
                            Or is it the
                “Right Problem at the Right Time?”
            I Believe There Are Reasons For Optimism
   • Recent significant advances and progress in a many of
     the foundational technology areas for Advanced QA:
          – Human Language Technology (HLT) research programs
                (e.g. DARPA’s TIDES, TDT, Communicator, EELD (future), ARDA’s VACE,
               DoD’s ACE, etc.)

          – Knowledge-Based (KB) research programs
                 (e.g. DARPA’s HPKB, RKF etc.)

          – Intelligent Agent (IA) research programs
                 (e.g. DARPA’s DAML, CoABS, TASK etc.)

          – Planning/Collaboration research programs
                 (e.g. DARPA’s GENOA etc.)

Open Domain QA Workshop                                                               68
ACL - 6 July 2001 - Prange
                            Or is it the
                “Right Problem at the Right Time?”
   • Recent significant advances and progress (continued)
          – Artificial Intelligence (AI)
                • IBM’s Deep Blue Chess Program beating Chess Master Garry Kasparov in
                  1997
                • Artist Harold Cohen’s Aaron Program that produces museum quality
                  paintings
                • Charles Schwab Discount Brokerage’s program allowing more Intelligent
                  Search of its web pages
                • Belgium’s Starlab Artificial Brain Project to build an artificial brain to run a
                  life-sized cat robot (~ 75 million artificial neurons)
                • Robocup, the annual robotic team soccer challenge
                • MIT AI Lab’s Cog Project to develop human-like behaviors in robots

   • Net Result: Real confluence of progress across a
     broad range of information technology areas that
     are highly applicable to the Advanced QA problem
Open Domain QA Workshop                                                                              69
ACL - 6 July 2001 - Prange
                     Yet More Reasons for Optimism
      •    Positive enthusiasm and significant success generated in the IR / IE
           Communities for the QA Track in TREC
             – QA Track in TREC is continuing to evolve its tasks under a QA Roadmap
               developed in response to a published Vision statement for Advanced Q&A
               (http://www-nlpir.nist.gov/projects/duc/)

      •    Many Research Communities are actively searching for a new,
           challenging problem areas. Advanced QA Problem is an application
           area that clearly:
             – Embraces multiple technology areas
             – Provides an environment that is very conducive to collaboration across
               technology areas, and
             – Allows researchers to test their theories and ideas against a higher, broader,
               more inclusive level application

                 I Believe that we are POISED AND READY
                   to take on the Advanced QA Problem!
Open Domain QA Workshop                                                                         70
ACL - 6 July 2001 - Prange
                             So “My Final Answer”
              Advanced Question Answering: Which is it?




                             “Right Problem   “A Bridge
                                 at the
                              Right Time”      Too Far”




Open Domain QA Workshop                                   71
ACL - 6 July 2001 - Prange
                              So “My Final Answer”
                             I Believe That Advanced Question Answering Is The
                         “Right Problem at the Right Time!”



                                                                “A Bridge
                                                                 Too Far”


                               “Right Problem
                                   at the
                                Right Time”




Open Domain QA Workshop                                                          72
ACL - 6 July 2001 - Prange
                             Thank You




Open Domain QA Workshop                  73
ACL - 6 July 2001 - Prange
        Advanced Question Answering:
         The “Right Problem at the Right Time”
                 or a “Bridge Too Far”




                Dr. John D. Prange, Technical Director
                          jprange@nsa.gov
                        http://www.ic-arda.org

Open Domain QA Workshop
ACL - 6 July 2001
Open Domain QA Workshop      75
ACL - 6 July 2001 - Prange

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:4/21/2013
language:Unknown
pages:75
wuxiangyu wuxiangyu
About Those docs come from internet,if you have the copyrights of one of them,tell me by mail 1005992828@qq.com,I just want more peo learn more knowledge.Thank you!