Document Sample
398 Powered By Docstoc
					          Question-Answering Based on Virtually Integrated Lexical
                             Knowledge Base
     Key-Sun Choi   Jae-Ho Kim      Masaru         Jun Goto    Yeun-Bae Kim
    KAIST,Korterm KAIST,Korterm     Miyazaki      NHK STRL       NHK STRL
        Daejeon       Daejeon      NHK STRL     Human Science Human Science
     305-701 Korea 305-701 Korea Tokyo 157-8510 Tokyo 157-8510 Tokyo 157-8510
    kschoi@cs.ka jjaeh@world.        Japan           Japan         Japan          miyazaki.m-    goto.j-      kimu.y-

                                                         4. Doctor is an occupation.
                    Abstract                             5. So the doctor cures the
    This paper proposes an algorithm for cau-
                                                         These sentences are transformed into proposi-
    sality inference based on a set of lexical
                                                      tional forms, as illustrated below:
    knowledge bases that contain information
    about such items as event role, is-a hier-           6. sufferFrom(patient,disease)
    archy, relevant relation, antonymy, and              7. cure(doctor,disease)
    other features. These lexical knowledge              8. cure(doctor,at-hospital)
    bases have mainly made use of lexical                9. occupation(doctor)
    features and symbols in HowNet. Several              10. cure(doctor,patient)
    types of questions are experimented to                Linguistic knowledge bases like WordNet
    test the effectiveness of the algorithm here      (Miller, 1995), EDR dictionary (Yokoi, 1995) and
    proposed. Particularly in this paper, the         HowNet (Dong, 1999) have been used to interpret
    question form of “why” is dealt with to           these sentences.
    show how causality inference works.                   Moldovan et al. (2002) generated lexical chains
                                                      from WordNet in order to trace these topically re-
                                                      lated paths and thereby to search for causal expla-
1   Introduction                                      nations. A conceptual word Cj inside of a gloss
A virtually linked knowledge base is designed to      under a synset Ci is linked to the synset Cj.
utilize a pre-constructed knowledge base in a dy-         HowNet (Dong et al. 1999) is a linguistic
namic mode when it is in actual use.                  knowledge base that is designed to have the defini-
    An open-domain question answering architec-       tion of words and concepts as well as event role
ture must consist of various components and           and role-filling entities. Commonsense knowledge
processes (Pasça, 2001) that include WordNet-         like naive physics is also built up through event
like resources, part of speech tagging, parsing,      role relation like the relation of sufferFrom requir-
named entity recognition, question processing,        ing cure.
passage retrieval, answer extraction, and answer          HowNet is modularized into separate knowl-
justification. Consider a question like the follow-   edge spaces for entity hierarchy, event hierarchy,
ing: “Why do doctors cure patients?”                  antonymy, syntax, attributes, etc. Relations be-
    The answer may be obtained by commonsense         tween various concepts (e.g., part-of, relevance,
knowledge as follows:                                 location) are defined implicitly in the definition of
                                                      each concept.
    1. A patient suffered from a                          This paper will focus on building an algorithm
       disease.                                       that allows for searching for some topical paths in
    2. A doctor cures the disease.                    order to find causal explanations for questions like
    3. The doctor cures at hospi-                     “Why do doctors cure patients?” or “Why do pa-
       tal.                                           tients pay money?” as illustrated in Figure 1.
                              (1)                                        (7) Event hierarchy: For example, the hy-
                 (2)                                                         pernym for pay is give and the hypernym
                                                                             of give is event.
     patient           doctor           occupation    money
                           #occupation      (4)                          (8) Event role: Now, event roles are partially
      $cure            *cure           earn            $earn                 filled with entities, e.g., patient and
                                                              $pay           money.
                                    converse                             (9) Event role shift: The agent of give is
          (8)                                                                equalized to the source of take.
    agent=patient                              agent=?                  An overview of each component of the knowl-
    possession=money                           possession=money
                                                                     edge base is in Figure 2, where three word entries
    target=?                          (9)      source=patient
                                                                     why, patient, and money are in the dictionary.
Figure 1: A Snapshot of a virtually integrated                       The four concept facets of entity, role, event, and
knowledge base for the question: “Why do patients                    converse are described in this example, mainly as
pay money to doctors?”                                               part of linguistic knowledge.

In the following sections, issues on the virtual in-                  dictionary

tegration of knowledge bases, their algorithms and                                         human
experimentations are presented.
                                                                                               doctor                   occupation       money

2    Underlined Knowledge Bases and Vir-                               why                         #occupation
                                                                                                     *cure                  earn
     tual Integration                                                                                                                         $earn

                                                                                    role                       event
In Figure 1, each marked numbering has the fol-                                      cause             Alter-possession
lowing meaning:                                                                     question            give                   take

    (1) Entity hierarchy: entity is the top node in                                                          agent=
        the hierarchy of entities.                                                                           target=                     source=
                                                                                    cure             pay                               earn
    (2) entity is the hypernym of patient, doctor,                      pay

        occupation, and money in the line (3).                                                             give                       take

                                                                     Figure 2: HowNet Architecture in Example.
    (3) Concepts or word entries are listed in this
        line. All concepts and word entries repre-                      Some issues on ontology integration have been
        sent their definition by a list of concepts                  discussed from various points of view. Pinto et al.
        and marked pointers.                                         (1999) classified the notions of ontology integra-
    (4) A concept (or word) in (3) features defini-                  tion into three types: integration, merging and
        tional relations to a list of concepts. For                  use/application. The term virtually integrated
                                                                     means the view of ontology-based use/application.
        example, a doctor definition is composed
                                                                        This paper presents issues on and arguments for
        of two concepts and their marking point-
        ers: #occupation and *cure. Pointers in                      linguistic knowledge base and commonsense
        HowNet represent relations between two                       knowledge in (Lenat, Miller and Yokoi, 1995).
        concepts or word entries, e.g., “#” means                    One of the arguments was whether linguistic
        “relevant” and “*” does “agent”.                             knowledge could be separated from commonsense
                                                                     knowledge, but it was agreed that both types of
    (5) syn refers to the syntactic relation in the                  knowledge were essentially required for natural
        question “Why do patients pay money to                       language processing.
        doctors?”                                                       This paper was motivated by the desire to make
    (6) converse refers to the converse relation be-                 inferences using a lexical knowledge base, thus
        tween events, e.g., give and take.                           successfully carrying out a kind of commonsense
3   Interpretation of Lexical Knowledge
Consider the following three sentences:

   One major concern is finding connectability
among words and concepts. As shown in Figure 2,
the following facts are derived:
       ponym for the act of human, one of whose
       hyponym is patient.
   Consider again the match between the tracing
sequences of concepts and the knowledge base.
Going into more details, notations with footnotes
will be given to each example. At this point, we
will give
    D) Using inheritance property in the concept            If X and Y belong to different types of knowl-
       hierarchy, relations between hypernym of          edge plane (e.g., entity and event), it is hard to
       concepts X and Y are inherited to X and Y         compare their hypernym path upward to the top
       in a way that X and Y is similar if there         concept. However, if different types of concepts
       exist X’ and Z such that X p X ' , Z ⊃ θX’,       have any relevance to (connect) causality, we will
       and Y p Z where θ is a pointer or null.           use feature similar or inverse similar after find-
       This inheritance tracing can be determined        ing the same type of concepts to calculate the path
       by how much similar X and Y are in terms          similar. Now we will explain the above by using
       of their path upward based on the relation        two pairs of concept type: entity-entity and entity-
       of hypernym. We will define path similar.         event, without loss of generality.
       But tracing the path upward following hy-            First, pathsimilar(entity X, entity Y) is de-
       pernym links is to be described later ac-         fined as follows:
       cording to the algorithm.                             pathsimilar ( X , Y )
    A measure called similar will be defined based              2 × path + ( X ) ∩ path + (Y )
on the discussion in this section. Then an algorithm        =
is introduced through this measure with an exam-                  path + ( X ) + path + (Y )
ple.                                                     where path+(X) is the ordered list of hypernym for
                                                         X by descending order from the top concept. For
5     Measures                                           example,
In the last section, we discussed four kinds of the         = []
measure similar.                                            path+(patient)
    • path similar,                                         = [entity...animate...human.patient]
                                                         Because |path+(X)| counts the number of nodes on
    • feature similar,                                   the path, pathsimilar(doctor,patient) = 2¡¿
    • inverse similar,                                   6/(7+7)=0.857.
                                                            Second, pathsimilar(entity N, event V) is de-
    • sister similar.                                    fined as follows:
    For feature, inverse, and sister similar func-          pathsimilar(N,V)
tions, path similar is used as a basis of calculation.      = Max pathsimilar(N.feature,V)
They are different with respect to both their search
method and the depth of expanding features. fea-         where N.feature means the feature list in the defi-
ture similar finds similar features by using path        nition of N. The following is an illustrative exam-
similar. inverse similar(X,Y) searches for entries       ple for the definition:
that contain X and Y as features and then use the             money ⊃ $earn,*buy,#sell, $setAside,
path similar. In the same way, sister similar finds      it is equivalent to the following:
sister concepts, expands them, and finally meas-
ures using the path similar.                                money.feature=[$earn,*buy,#sell,$setAside].
    Since path similar plays a key role in all these     So pathsimilar(money,earn)=pathsimilar(earn,earn)
search and measure processes, its role will be ex-       =1. According to this Max function, the selection
plained in the next subsection. Other measures are       priorities for the path can be specified.
only dealt with as part of the algorithm.                   Third, pathsimilar(event V, entity N) is de-
                                                         fined by inverse similar as follows: pathsimi-
5.1    Similarity Based on Hierarchy and Fea-            lar(V,N) = Max pathsimilar(V.inverse, N). For
       ture                                              example, pathsimilar(cure, doctor) = Max path-
The mission of the measuring function simi-              similar(cure.inverse, doctor) = Max pathsimi-
lar(X,Y) is to calculate their relevancy between         lar({doctor, medical worker, medicine, patient},
two concepts or words whether they are of type           doctor).
entity, event, or of some other type.                       Fourth, pathsimilar(event X, event Y) shares
                                                         the same formula with pathsimilar(entity X, en-
tity Y) shown before. But, we can give another                payer*                         converse
inverse pathsimilar(event X, event Y) = Max                   money               give                     take
pathsimilar(X.inverse, Y.inverse).                                                 hypernym                     hypernym

5.2   Logical Implication and Expansion Depth          why    patient             pay                   money

All of the relations in Figure 2 are translated into   human                         agent               commercial
logical form (see below). As shown in “Interpreta-     *sufferFrom                   content             $earn
                                                       $cure                         source              *buy
tion as Abduction” (Hobbs et al. 1988), “abductive                                                       #sell
inference is inference to the best explanation”.                                                         $setAside
These relations showed “the interpretation of a text                 human
is the minimal explanation of why the text would                     #occupation                   occupation
be true” based on the abductive inference. By the                    medical                                      earn
same token, “the interpretation of a question is the
                                                       Figure 3: Virtual Linking for Causality
minimal explanation of why the question would be
true” based on a set of lexical knowledge bases.
                                                          The “expansion depth level” of similar has two
    Before proceeding to our algorithm, an example
                                                       kinds of utilities: one is to find the minimal expla-
will be applied to abductive inference briefly as a
                                                       nation, and the other is to be dynamically adapt-
set of logical forms as well as a diagram in Figure
                                                       able to the level of interaction. This level of
                                                       similar is defined as a function simi-
   16. doctor ⊃ human, #occupation,                    lar(Level)(X,Y) for X and Y, concepts or words in
       *cure, medical.                                 the following manner:
   17. medicine ⊃ *cure.                                  • similar(0)=pathsimilar: they use only them-
   18. disease ⊃ $cure.                                     selves and their hypernym path from X and
   19. cure ⊃ medical,                                      Y.
                                                          • similar(1)=feature_similar: they use their
   20. medical ⊃ #cure.
                                                            features that are expanded one more than
   21. converse(pay,earn) ⊃                                 similar(0).
       target=agent.                                      • similar(2)=inverse_similar
   22. patient ⊃ human,$cure.                             • similar(3)=sister_similar
   23. occupation ⊃ affairs, earn.                          =inverse_similar•~ feature_similar.
   24. cause(cure,sufferFrom) ⊃
       patient=experiencer,                               Depending on what level of similar is chosen,
       content=content.                                the search paths may be changed. A snapshot up to
   25. possibleConsequence(cure,                       similar(2) is given in Figure 4.
       beRecovered) ⊃
       patient=experiencer,                                                              medicine*
       content=stateIni.                                                                 disease$
    While pursuing the path tracing enabling mini-
mal explanation, now we are going to propose            why       doctor            cure                patient
a connectability measure similar such as
“weighted abduction” (Hobbs et al. 1988). As                 human                       agent           human
“likelihood estimation” is useful to consider a              #occupation                 patient         * sufferFrom
“bounded conditioning” (Russell & Norvig, 1995)              *cure                       content         $cure
in a belief network, the “expansion depth” of simi-          medical                     medical
lar will be useful for the explanation path tracing
for the purpose of the minimal explanation of the      Figure 4: Snapshot for similar(2).
6       Tracing Algorithms                                (3) [weak stopping condition] When there is
                                                              no event, one of the other features is com-
                                                              monly shared between two concepts. For
6.1       Algorithm Crossover                                 example, medical is a common feature be-
The overall algorithm 10 flow depends on simi-                tween doctor and cure.
lar(Level) as in the next program.
                                                      6.3    Hypernym Climbing
      Algorithm Crossover
                                                         In section 4.2, inheritance was discussed for the
For Level=0...N until stopping                        purpose of finding a relation among pay ~ patient.
condition is satisfied:                               After trying to make Level=2 in section 5.2, we
     Expand the trace                                 have been motivated to find the interrelation be-
           by similar(Level)                          tween hypernyms. The algorithm crossover is up-
For example, when Level=1, the algorithm cross-       dated.
over finds a very primitive answer to the question        Algorithm Crossover+
“Why do doctors cure patients?” We will expand
other features of doctor except for cure because      For Level=0..N until stopping
cure has a syntactic relation between doctor and      condition is satisfied:
patient.                                                   Expand the trace
    As shown in the logical forms (16~24) intro-                 by similar(Level)
duced in the previous section, this algorithm in           If Level >= 2, then
Level=1 can find the following concepts as a re-           repeat climb up hypernym
sult: medical, human, cure ($cure, *cure).                 until it matches with
    When Level=2, the algorithm crossover will                   the higher relation.
seek higher-order relations (like the hypothesis)
                                                      6.4    Algorithm Crossover++
from the concept (by inverse_similar), con-
verse/antonymy relations (by feature_similar),        Consider again the question "Why do patients pay
and     event relations (if any, for use in knowing   money to doctors?" As shown in Figure 1, the best
the cause or consequence relation). Consider again    trace is $cure ~ *cure ~ *earn ~ $pay. It provides
our example "Why do doctors cure patients?" by        an explanation for the statement that “patients are
using the previous section's logical forms. The re-   cured by doctors ~ doctors earn money ~ patients
sults are as follows:                                 pay money to doctors”. This minimal explanation
                                                      is observed by switching over the role pointers θ
      *cure = {doctor, medicine}
                                                      whenever tracing is performed. For example,
      $cure = {patient, disease}
                                                      $cure was switched over to *cure. This extended
      *sufferFrom = {patient}
                                                      version of algorithm is called Crossover++.
      $sufferFrom = {disease}
Its generated meaning may be “If a doctor cures a     7     Evaluation
patient, the patient is recovered from disease.
Because patients suffer from diseases, doctors cure   By the algorithm Crossover’s, the behavior of
the patients. Patients are recovered after getting    “why”-type questions are investigated by extract-
cured.”                                               ing the answer paths as follows.
                                                      Q: Why does patient pay money?
6.2       Stopping Condition                          Path: patient ~ $cure ~ doctor ~ #occupation ~
Stopping conditions for the algorithm crossover       $earn ~ money
                                                      Q: Why does researcher read textbook?
are as follows:
                                                      Path: researcher ~ #knowledge ~ #information ~
   (1) Event roles are filled up.
                                                      readings ~ textbook
   (2) If no event is found in the feature defini-
         tion, increase similar level.                   Paths between two concepts can now be found
                                                      by simply checking the presence of a path among
                                                      the concepts reached from an initial concept. Table
     This algorithm will be called “crossover”.
1 and Table 2 show examples of the number of               With the ability to provide explanations de-
paths as a function of path size.                       pending on the level of the measure similar, our
                                                        proposed algorithm adapts itself to the user knowl-
Source             Reached concepts path size           edge level and well as to the type of interactive
concept         1              2              3         questions to enable more detailed level of ques-
  cure         275            593           24854       tion-answering.
   eat         268            605           24903
 study         276            358           23172
 food          532           650            18066
human          6713          3686           51171       Zhen Dong and Q. Dong. 1999-2003. Hownet,
money          328           1312           19827
Table 1: Examples of destination concepts reached       Jerry R. Hobbs, Mark Stickel, Douglas Appelt and
starting from one source concept                           Paul Martin. 1988. Interpretation as Abduction,
                                                           Proceedings of the Conference on 26th Annual
                               Paths number length
Concept1    Concept2                                       Meeting of the Assocation for Computational Lin-
                           1           2          3
     cure    human         0           78        26
      pay    money         0           7          3     Doug Lenat, George Miller, and Toshio Yokoi. 1995.
    human    money         0           3          7       CYC, WordNet, and EDR: Critiques and Re-
     food    human         0           0         28       sponses, Communications of the ACM, 38(11):45-
     read     write        0           4          6       48.
     earn     pay          0           0          7
                                                        Bernardo Magnini and Manuela Speranza. 2002.
Table 2: The number of paths between pairs of             Merging Global and Specialized Linguistic On-
concepts                                                  tologies, Proceedings of Ontolex 2002 (Workshop
                                                          held in conjunction with LREC-2002), Las Palmas.
8     Discussion
                                                        George Miller. 1995. WordNet: a lexical database.
HowNet (Dong et al. 1999-2003) has already de-            Communications of the ACM, 38(11):39-41.
fined the words and concepts using the features of      Dan Moldovan and Adrian Novischi. 2002. Lexical
concepts. Each event role is also defined under the       Chains for Question Answering, Proceedings of
notion of feature. On the other hand, WordNet             COLING 2002, Taipei.
(Miller, 1995) consists of synsets and their glosses.
Moldovan et al. (2002) showed a lexical chain to        Takanoa Ogino and Masahiro Kobayashi. 2000. Verb
use words in glosses in order to trace the topically      Patterns extracted from EDR Concept Description,
                                                          IPSJ SIGNotes Natural Language Abstract,
related paths.
                                                          No.138 – 006:39-46.
    Their search boundary is restricted to the
shapes: V, W, VW, and WW. In this paper, cross-         Alexandru Marius Pasça. 2001. High-Performance,
over* is shown to be flexible and search for a more       Open-Domain Question Answering from Large
probable explanation.                                     Text Collections. Ph.D Dissertation, Southern
                                                          Methodist University.
9     Conclusion                                        H. Sofia Pinto, Asunción Gómez-Pérez and João P.
                                                          Martins. 1999. Some Issues on Ontology Integra-
In this paper, we have attempted to show how to           tion, Proceedings of the IJCAI-99 workshop on
link pre-existing lexical knowledge bases to one          Ontologies and Problem-Solving Methods (KRR5),
another. The major issue was to generate a path to        Stockholm.
give explanation paths for answering the “why”-
type question. While observing the causality path       Stuart Russell and Peter Norvig. 1995. Artificial
behavior, we proposed the measure similar and              Intelligence: A Modern Approach. Prentice-Hall.
also the algorithm crossover. It is compared with       Toshio Yokoi. 1995. The EDR Electronic Dictionary.
the “weighted abduction” (Hobbs et al. 1988) and          Communications of the ACM, 38(11).
“lexical chain” (Moldovan et al. 2002).

Shared By: