Docstoc

Large-Scale Repositories of Highly Expressive Reusable Knowledge

Document Sample
Large-Scale Repositories of Highly Expressive Reusable Knowledge Powered By Docstoc
					      Ontology Development
           To Support
     AQUA Question-Answering
                           Richard Fikes
Jessica Jenkins    Bill MacCartney   Rob McCool   Deborah McGuinness



                  Knowledge Systems Laboratory
                       Stanford University
                      www.ksl.stanford.edu
       KSL and the WMD Coalition
 Tools for ontology creation, evolution, and maintenance
    Coalition teams have adopted KSL’s Ontolingua and Chimaera as
     a standard for ontology development, maintenance, and analysis
          (Additionally, we use other internal tools – JTP, DQL, IW)

 Initial evaluation and ongoing support
    KSL evaluated initial stage of WMD ontology using our Chimaera
      tools, reviewed findings with other teams, and taught others how
      to use the tools themselves




  2                                             Knowledge Systems Laboratory, Stanford University
         KSL and the WMD Coalition
 Tools for ontology creation, evolution, and maintenance

 Initial evaluation and ongoing support

 Knowledge representation and consulting work
       KSL providing some core new KB development for CNS core –
        Russian naval facilities and Newly Independent States facilities –
        and is augmenting this with extraction results
       Providing consultation on sources / merging opportunities -
        Counter-terrorism KBs built for DARPA (HPKB, ISX-Cyladian-
        HORUS, SAIC-Cycorp, … ) semantic web for the military (SWMU
        ontology tutorial), ontologies for information fusion, etc.




  3                                            Knowledge Systems Laboratory, Stanford University
     KSL and the WMD Coalition
 Tools for ontology creation, evolution, and maintenance
 Initial evaluation and ongoing support
 Knowledge representation and consulting work
 Knowledge extraction
    KSL knowledge extraction tools for RNF and NIS
 New focus on utilizing other important useful KB sources
    SUMO is a core ontology for ontology sharing: ~3,900 axioms;
     relations and sets; processes and objects; temporal, spatial, and
     mereological relations; agents, etc.
    Domain ontologies: WMDs, terrorism, biological viruses, …
    Written in SUO-KIF, a proprietary dialect of KIF
    Published by Teknowledge under GNU public license as part of
     IEEE SUO working group (ontology.teknowledge.com/)

 4                                        Knowledge Systems Laboratory, Stanford University
                                   SUMO
 SUMO requires translation to be used with any reasoner

 KSL has successfully translated SUMO to plain-vanilla KIF
       Full translation: complete semantic content retained
       Highly portable: should be fully compatible with most FOL reasoners
       Accurate: most test queries demonstrably answerable from translated
        axioms
 However, the result is not yet fully usable
       Most test queries not answered from full axiom set in reasonable time
       SUMO was not designed for efficient automated reasoning
       Solution: a smarter translator, and some SUMO “brain surgery”
 Translation to DAML may provide another path…
       The existing translation is quite lossy, but enables some query-answering
       Further work will enable a more complete and accurate translation



  5                                               Knowledge Systems Laboratory, Stanford University
  DAML Versions of SUMO, WMD, and Terrorism

 DAML translations of SUMO, WMD, and 5 terrorism-related ontologies
  and knowledge bases (KBs) provided by Teknowledge

                                                                           Dropped
              Classes   Properties   Instances Triples
                                                                           Axioms
  SUMO          577        188          68                 3524                 ~800
  WMD           186         9           69                 1021                   59
  Terrorism
                 84         0            0                  200                   44
  Ontology
  Terrorism
                 2          1          2892                9668                 2919
  KBs
  Total         849        198         3029               14413                 3822



  6                                          Knowledge Systems Laboratory, Stanford University
   DAML Versions of SUMO, WMD, and Terrorism

 A few simple translations were used by Teknowledge to translate
  original KIF content to DAML
 Example: subrelation
                              <daml:ObjectProperty rdf:ID=“father”>
(subrelation father parent)     <rdfs:subPropertyOf rdf:resource=“#parent”/>
                              </daml:ObjectProperty>

 Example: KIF triples to RDF triples

                                 <rdf:Descriptioin rdf:ID=“MadridSpain”>
 (part MadridSpain Spain)          <part rdf:resource=“#Spain”/>
                                 </rdf:Description>




  7                                           Knowledge Systems Laboratory, Stanford University
   DAML Versions of SUMO, WMD, and Terrorism

 Teknowledge’s DAML files provide a great starting point, but there are
   a few problems
        Syntactic errors and issues with resolving references across files -- these
         problems are easy to fix.
        A large amount of the original KIF content was dropped in the translation
         to DAML. Reincorporating some of this content is trivial, but it is generally
         nontrivial.



                        Example (trival): TransitiveRelation

                                            <daml:TransitiveProperty rdf:ID=“part”>
(instance part TransitiveRelation)            …
                                            </daml:TransitiveProperty>




   8                                                 Knowledge Systems Laboratory, Stanford University
    KIF -> DAML Example (nontrivial)
                        Original KIF
       (=> (and (instance ?SUBSTANCE BiochemicalAgent)
                 (possesses ?AGENT ?SUBSTANCE))
            (capability BiochemicalAttack agent ?AGENT))
     Translation of capability [ternary to binary relation]
            <rdfs:Class rdf:ID=“Capability”/>
            <rdf:Property rdf:ID=“capabilityRole”>
              <rdfs:domain rdf:resource=“#Capability”/>
              <rdfs:range rdf:resource=“#CaseRole”/>
            </rdf:Property>
            <rdf:Property rdf:ID=“capabilityProcess”>
              <rdfs:domain rdf:resource=“#Capability”/>
              <rdfs:range>
                <daml:Restriction>
                  <daml:onProperty rdf:resource=“&rdfs;#subClassOf”/>
                  <daml:hasValue rdf:resource=“#Process”/>
                </daml:Restriction>
              </rdfs:range>
            </rdf:Property>
            <rdf:Property rdf:ID=“capability”>
              <rdfs:range rdf:resource=“#Capability”/>
            </rdf:Property>

9                                         Knowledge Systems Laboratory, Stanford University
     KIF -> DAML Example (nontrivial) cntd.
                                Original KIF
             (=> (and (instance ?SUBSTANCE BiochemicalAgent)
                      (possesses ?AGENT ?SUBSTANCE))
                  (capability BiochemicalAttack agent ?AGENT))
     Translation of “?AGENT has BiochemicalAttack capability if it
                    possesses a BiochemicalAgent”
              <sumo:Capability rdf:ID=“BiochemicalAttackAgentCapability”>
                <sumo:capablityRole rdf:resource=“&sumo;#agent”/>
                <sumo:capabilityProcess rdf:resource=“#BiochemicalAttack”/>
              </sumo:Capability>
              <rdfs:Class rdf:ID=“AgentsWithBiochemicalAttackCapability”>
                <rdfs:subClassOf>
                  <daml:Restriction>
                    <daml:onProperty rdf:resource=“&sumo;#capability”/>
                    <daml:hasValue rdf:resource=“#BiochemicalAttackAgentCapability”/>
                  </daml:Restriction>
                </rdfs:subClassOf>
                <daml:unionOf rdf:parseType=“daml:collection”>
                  <daml:Restriction>
                    <daml:onProperty rdf:resource=“&sumo;#possesses”/>
                    <daml:hasClass rdf:resource=“#BiochemicalAgent”/>
                  </daml:Restriction>
                  …


10                                                  Knowledge Systems Laboratory, Stanford University
              Query-Answering Example 1
 “What has the capability of being the agent of a biochemical attack?”
        Query pattern: (capability ?agt Biochemical-Attack-Agent-Capability)
 Knowledge in the ontology:
        A thing is an “Agent-With-Biochemical-Attack-Capability” if and only if it –
          > Has a capability “Biochemical-Attack-Agent-Capability” or
          > Possesses a “Biochemical-Agent”
        An “Agent-With-Biochemical-Attack-Capability” has capability “Biochemical-Attack-
         Agent-Capability”
        A “Nerve-Agent” is a “Biochemical-Agent”
        If AGT has capability “Biochemical-Attack-Agent-Capability”, then AGT is capable of
         being an “agent” in a “Biochemical-Attack”
        If C is the capability of playing role R in processes of type PT, and AGT is known to
         have played role R in a process of type PT, then AGT has capability C
 Knowledge from documents:
        “Al-Qaida” is a “Foreign-Terrorist-Organization” that possesses a “Nerve-Agent”
        “Aum-Supreme-Truth-Chemical-Attack-27-Jun-94 is a “Chemical-Attack” whose
         agent is “Aum-Supreme-Truth”

  11                                                      Knowledge Systems Laboratory, Stanford University
              Query-Answering Example 1

 “What has the capability of being the agent of a biochemical attack?”
        Query pattern: (capability ?agt Biochemical-Attack-Agent-Capability)

 Answer: “Al-Qaida”

        “Al-Qaida” is a “Foreign-Terrorist-Organization” that possesses a “Nerve-Agent”
         [from documents]
        A thing is an “Agent-With-Biochemical-Attack-Capability” if and only if it –
          > Has a capability “Biochemical-Attack-Agent-Capability” or
          > Possesses a “Biochemical-Agent”       [from ontology]
        An “Agent-With-Biochemical-Attack-Capability” has capability “Biochemical-Attack-
         Agent-Capability”   [from ontology]
        If AGT has capability “Biochemical-Attack-Agent-Capability”, then AGT is capable of
         being an “agent” in a “Biochemical-Attack”      [from ontology]
        A “Nerve-Agent” is a “Biochemical-Agent”        [from ontology]




  12                                                      Knowledge Systems Laboratory, Stanford University
              Query-Answering Example 1

 “What has the capability of being the agent of a biochemical attack?”
        Query pattern: (capability ?agt Biochemical-Attack-Agent-Capability)

 Answer: “Aum-Supreme-Truth”

        “Aum-Supreme-Truth-Chemical-Attack-27-Jun-94 is a “Chemical-Attack” whose
         agent is “Aum-Supreme-Truth” [from documents]
        Playing the role “agent” in a “Biochemical-Attack” requires the capability
         “Biochemical-Attack-Agent-Capability” [from ontology]
        If playing role R in a process of type PT requires capability C, and Agt plays role R in
         a process of type PT, then Agt has capability C [from ontology]
        “Aum-Supreme-Truth” has capability “Biochemical-Attack-Agent-Capability”




  13                                                       Knowledge Systems Laboratory, Stanford University
          Query-Answering Example 2

 “Who are the agents of attacks that used the same type of weapons as
     “Recent-Attack-001?”
      Query pattern: (type Recent-Attack-001 ?res) (onProperty ?res instrument)
       (hasClass ?res ?inst-type) (type ?attack ?res) (agent ?attack ?agt)
      Must-bind variables: ?agt ?attack

 Knowledge in the ontology:
      A “Mortar-Attack” has an instrument of type “Mortar”

 Knowledge from documents:

      “Recent-Attack-001” is a Thing that has an instrument of type “Mortar”
      “Revolutionary-Armed-Forces-Of-Colombia-Mortar-Attack-1-Jul-00” is a “Mortar-
       Attack” that has agent “Revolutionary-Armed-Forces-Of-Colombia”.

 Answer: “Revolutionary-Armed-Forces-Of-Colombia”




14                                                  Knowledge Systems Laboratory, Stanford University
        AQUA Program Plan
 Overview of the project
   Goal is to create a system that can answer
    complex questions
   With plus up funding, we now have an end-to-
    end system. Makes use of KSL’s Ontolingua
    Knowledge Server and Java Theorem Prover
    (JTP) to develop answers to queries
   Uses SAIC and other technology to
    automatically populate KBs with information
    from new text sources
   Uses multiple extractors from multiple sources
    to answer queries
      > KSL extractor
      > UMBC/NMSU extractor
      > IBM extractor
                AQUA Current Plans
                                                                           SAIC- Team
                                                                            CNS Data     F
   NL
                     MOQA      B         SAIC        C    MOQA         A    Generation
                    NL TMR            TMR  KIF         Text  TMR
QUESTION         Query Processor    Mapper/Translator     Translator




                                       Ontolingua
                                    Knowledge Server        IBM               CNS
KIF-Formatted                         ----------------
                                     JAVA Theorem
                                                         Text  KIF        TEST DATA
  Question
                                           Prover         Translator




                                      KIF Answer/              KSL
                                        Proof tree           Extractor




  NL
                     MOQA      E          SAIC       D
                                                          KSL generated
                    TMR  NL           KIF  TMR
ANSWER           Answer Processor   Mapper/Translator
                                                           explanation

                                                                                             DATA
                                                                                             SAIC (A-F)
                                                                                             MOQA
                                                                                             KSL
                                                                                             IBM
          AQUA Initial Concept
                                                                               SAIC
                                               NMSU Query                  Interlingua
                 QUESTION                       Processor                  KIF Translator


                                                             Interlingua
                                 NL Query                      Query




                                                                                 KIF Query
                                 Interlingua
         NL Answer                                           KIF Answer
                                   Answer



ANSWER               NMSU NL                   SAIC KIF                      KSL Java
                     Generator                 Interlingua                 Theorem Prover
                                                Translator
            Key Tasks - SAIC
Perform translation of Onyx/UMBC extracted
 TMRs to KIF (Item A)
  Align two disparate ontologies
  Translate terms once aligned
Both formalized queries and extracted text
 need to be translated
Develop CNS WMD ontology
Co-ordinate subcontractors and develop
 system interfaces
           Key Tasks - Onyx
Provide formalized translation of NL queries
 (MOCA – item B)
Perform extraction of CNS data into text
 (MOCA – item A)
            Key Tasks - IBM
Assist in relations extraction from text into
 WMB ontology
     KSL’s Current Activities
 JTP – Hybrid reasoning for query answering
   Includes a temporal reasoner
   Is a DQL (DAML Query Language) server

 Knowledge Base Partitioning – Enabling Q-A from
 large scale KBs using parallel heterogeneous
 reasoners
 Inference Web – Providing understandable
 explanations for derived query answers
 Knowledge extraction from semi-structured
 documents
   Tables, lists, outlines, property-value pairs, etc.
        SAIC Current Activities
SAIC
 In-house Ontolingua server with JTP now
  installed and in use in development efforts
 Ontology is available as part of demonstration
  in the demo rooms
 Please visit the SAIC/KSL demo stand
   SAIC Current Activities (cont.)
SAIC spearheading a federation of a WMD
 ontology development effort, assisted by
 Stanford KSL
 Begun development of CNS ontology.
 Ontology is currently 700 terms and
 viewable in our in-house version of
 Ontolingua. (Demo available)
   SAIC Current Activities (cont.)
 Discussions underway with Sergei to put Onyx
  under subcontract to SAIC. Subcontract to go out
  as soon as possible.
   Labor division is defined and agreed to
   Major issue – Due to subcontract issues Onyx is still not
    under subcontract. This affects Q?A ayatem
    development rates as this task is on the critical path for
    system development.
 Distributed ontology to KSL and IBM.
 Development of the ontology is critical in order to
  allow the extractors to function appropriately
  WMD Ontology Creation Initial -Confederation
               Assignments
Stanford/KSL: NIS-Facilities (439 terms) and
 Russian-Naval-Facilities (365 terms)
IBM: MPT-Topic (771 terms)
Xerox-Parc: Missiles-Topic (765 terms)
Tecknowledge: NIS-Nuclear-Weapons-
 Aggregate (219 terms)
Battelle: Nuclear-Safety-Assistance (36
 terms)
            Year Two Project Goals
Complete CNS ontology development
Participate in TREC
      System is still immature
      Novel appoach
      Significant potential for further development
Refine interfaces and determine system
 metrics to ensure maximum performance in
 future system iterations


 26                                Knowledge Systems Laboratory, Stanford University
            TREC participation
 SAIC is signed up for TREC participation this year.
 A multi-pronged approach is possible with the
  current architecture
 With the SAIC/Onyx route and NL interface, gives
  the initial capability for an end-to-end system with
  restricted domain and range
 Formatted queries possible for IBM extraction
 System will be very immature in year 1 and likely
  achieve poor TREC scores, but will mature in
  multiple and novel directions over time


 27                              Knowledge Systems Laboratory, Stanford University
              Future Plans
Continue multi-pronged approach (running
 multiple extractors over a uniform KB)
Plan further enhancements (Possibly add
 more extractors or reasoners
Leverage multiple KB approach to optimize
 research in multi-partition reasoning
Develop effective metrics to determine
 efficacy of this approach and which
 pathways are optimal

 28                       Knowledge Systems Laboratory, Stanford University
          Future plans (Cont)
Work on implementing latter Proof tree to
 NL mocha interface in the future (Reverse
 TMR to KIF)
Transition from KIF to DAML format where
 possible
Extend range and capabilities of question
 answering. Initial participation will be
 limited in terms of domain and range of
 questions.

 29                        Knowledge Systems Laboratory, Stanford University

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:5
posted:10/15/2012
language:Unknown
pages:29