398
Shared by: ajizai
-
Stats
- views:
- 2
- posted:
- 9/15/2012
- language:
- Unknown
- pages:
- 8
Document Sample


Question-Answering Based on Virtually Integrated Lexical
Knowledge Base
Key-Sun Choi Jae-Ho Kim Masaru Jun Goto Yeun-Bae Kim
KAIST,Korterm KAIST,Korterm Miyazaki NHK STRL NHK STRL
Daejeon Daejeon NHK STRL Human Science Human Science
305-701 Korea 305-701 Korea Tokyo 157-8510 Tokyo 157-8510 Tokyo 157-8510
kschoi@cs.ka jjaeh@world. Japan Japan Japan
ist.ac.kr kaist.ac.kr miyazaki.m- goto.j- kimu.y-
fk@nhk.or.jp fw@nhk.or.jp go@nhk.or.jp
4. Doctor is an occupation.
Abstract 5. So the doctor cures the
patient.
This paper proposes an algorithm for cau-
These sentences are transformed into proposi-
sality inference based on a set of lexical
tional forms, as illustrated below:
knowledge bases that contain information
about such items as event role, is-a hier- 6. sufferFrom(patient,disease)
archy, relevant relation, antonymy, and 7. cure(doctor,disease)
other features. These lexical knowledge 8. cure(doctor,at-hospital)
bases have mainly made use of lexical 9. occupation(doctor)
features and symbols in HowNet. Several 10. cure(doctor,patient)
types of questions are experimented to Linguistic knowledge bases like WordNet
test the effectiveness of the algorithm here (Miller, 1995), EDR dictionary (Yokoi, 1995) and
proposed. Particularly in this paper, the HowNet (Dong, 1999) have been used to interpret
question form of “why” is dealt with to these sentences.
show how causality inference works. Moldovan et al. (2002) generated lexical chains
from WordNet in order to trace these topically re-
lated paths and thereby to search for causal expla-
1 Introduction nations. A conceptual word Cj inside of a gloss
A virtually linked knowledge base is designed to under a synset Ci is linked to the synset Cj.
utilize a pre-constructed knowledge base in a dy- HowNet (Dong et al. 1999) is a linguistic
namic mode when it is in actual use. knowledge base that is designed to have the defini-
An open-domain question answering architec- tion of words and concepts as well as event role
ture must consist of various components and and role-filling entities. Commonsense knowledge
processes (Pasça, 2001) that include WordNet- like naive physics is also built up through event
like resources, part of speech tagging, parsing, role relation like the relation of sufferFrom requir-
named entity recognition, question processing, ing cure.
passage retrieval, answer extraction, and answer HowNet is modularized into separate knowl-
justification. Consider a question like the follow- edge spaces for entity hierarchy, event hierarchy,
ing: “Why do doctors cure patients?” antonymy, syntax, attributes, etc. Relations be-
The answer may be obtained by commonsense tween various concepts (e.g., part-of, relevance,
knowledge as follows: location) are defined implicitly in the definition of
each concept.
1. A patient suffered from a This paper will focus on building an algorithm
disease. that allows for searching for some topical paths in
2. A doctor cures the disease. order to find causal explanations for questions like
3. The doctor cures at hospi- “Why do doctors cure patients?” or “Why do pa-
tal. tients pay money?” as illustrated in Figure 1.
(1) (7) Event hierarchy: For example, the hy-
entity
(2) pernym for pay is give and the hypernym
(3)
of give is event.
patient doctor occupation money
#occupation (4) (8) Event role: Now, event roles are partially
$cure *cure earn $earn filled with entities, e.g., patient and
*pay
(5)syn
$pay money.
(6)
(7)
give
converse (9) Event role shift: The agent of give is
take
(8) equalized to the source of take.
event
agent=patient agent=? An overview of each component of the knowl-
possession=money possession=money
edge base is in Figure 2, where three word entries
target=? (9) source=patient
why, patient, and money are in the dictionary.
Figure 1: A Snapshot of a virtually integrated The four concept facets of entity, role, event, and
knowledge base for the question: “Why do patients converse are described in this example, mainly as
pay money to doctors?” part of linguistic knowledge.
In the following sections, issues on the virtual in- dictionary
Concept
facets
entity
tegration of knowledge bases, their algorithms and human
experimentations are presented.
doctor occupation money
2 Underlined Knowledge Bases and Vir- why #occupation
*cure earn
tual Integration $earn
role event
patient
In Figure 1, each marked numbering has the fol- cause Alter-possession
lowing meaning: question give take
(1) Entity hierarchy: entity is the top node in agent=
possession=
agent=
possession=
the hierarchy of entities. target= source=
cure pay earn
(2) entity is the hypernym of patient, doctor, pay
converse
occupation, and money in the line (3). give take
Figure 2: HowNet Architecture in Example.
(3) Concepts or word entries are listed in this
line. All concepts and word entries repre- Some issues on ontology integration have been
sent their definition by a list of concepts discussed from various points of view. Pinto et al.
and marked pointers. (1999) classified the notions of ontology integra-
(4) A concept (or word) in (3) features defini- tion into three types: integration, merging and
tional relations to a list of concepts. For use/application. The term virtually integrated
means the view of ontology-based use/application.
example, a doctor definition is composed
This paper presents issues on and arguments for
of two concepts and their marking point-
ers: #occupation and *cure. Pointers in linguistic knowledge base and commonsense
HowNet represent relations between two knowledge in (Lenat, Miller and Yokoi, 1995).
concepts or word entries, e.g., “#” means One of the arguments was whether linguistic
“relevant” and “*” does “agent”. knowledge could be separated from commonsense
knowledge, but it was agreed that both types of
(5) syn refers to the syntactic relation in the knowledge were essentially required for natural
question “Why do patients pay money to language processing.
doctors?” This paper was motivated by the desire to make
(6) converse refers to the converse relation be- inferences using a lexical knowledge base, thus
tween events, e.g., give and take. successfully carrying out a kind of commonsense
reasoning.
3 Interpretation of Lexical Knowledge
Consider the following three sentences:
One major concern is finding connectability
among words and concepts. As shown in Figure 2,
the following facts are derived:
ponym for the act of human, one of whose
hyponym is patient.
Consider again the match between the tracing
sequences of concepts and the knowledge base.
Going into more details, notations with footnotes
will be given to each example. At this point, we
will give
D) Using inheritance property in the concept If X and Y belong to different types of knowl-
hierarchy, relations between hypernym of edge plane (e.g., entity and event), it is hard to
concepts X and Y are inherited to X and Y compare their hypernym path upward to the top
in a way that X and Y is similar if there concept. However, if different types of concepts
exist X’ and Z such that X p X ' , Z ⊃ θX’, have any relevance to (connect) causality, we will
and Y p Z where θ is a pointer or null. use feature similar or inverse similar after find-
This inheritance tracing can be determined ing the same type of concepts to calculate the path
by how much similar X and Y are in terms similar. Now we will explain the above by using
of their path upward based on the relation two pairs of concept type: entity-entity and entity-
of hypernym. We will define path similar. event, without loss of generality.
But tracing the path upward following hy- First, pathsimilar(entity X, entity Y) is de-
pernym links is to be described later ac- fined as follows:
cording to the algorithm. pathsimilar ( X , Y )
A measure called similar will be defined based 2 × path + ( X ) ∩ path + (Y )
on the discussion in this section. Then an algorithm =
is introduced through this measure with an exam- path + ( X ) + path + (Y )
ple. where path+(X) is the ordered list of hypernym for
X by descending order from the top concept. For
5 Measures example,
path+(doctor)
In the last section, we discussed four kinds of the = [entity...animate...human.doctor]
measure similar. path+(patient)
• path similar, = [entity...animate...human.patient]
Because |path+(X)| counts the number of nodes on
• feature similar, the path, pathsimilar(doctor,patient) = 2¡¿
• inverse similar, 6/(7+7)=0.857.
Second, pathsimilar(entity N, event V) is de-
• sister similar. fined as follows:
For feature, inverse, and sister similar func- pathsimilar(N,V)
tions, path similar is used as a basis of calculation. = Max pathsimilar(N.feature,V)
They are different with respect to both their search
method and the depth of expanding features. fea- where N.feature means the feature list in the defi-
ture similar finds similar features by using path nition of N. The following is an illustrative exam-
similar. inverse similar(X,Y) searches for entries ple for the definition:
that contain X and Y as features and then use the money ⊃ $earn,*buy,#sell, $setAside,
path similar. In the same way, sister similar finds it is equivalent to the following:
sister concepts, expands them, and finally meas-
ures using the path similar. money.feature=[$earn,*buy,#sell,$setAside].
Since path similar plays a key role in all these So pathsimilar(money,earn)=pathsimilar(earn,earn)
search and measure processes, its role will be ex- =1. According to this Max function, the selection
plained in the next subsection. Other measures are priorities for the path can be specified.
only dealt with as part of the algorithm. Third, pathsimilar(event V, entity N) is de-
fined by inverse similar as follows: pathsimi-
5.1 Similarity Based on Hierarchy and Fea- lar(V,N) = Max pathsimilar(V.inverse, N). For
ture example, pathsimilar(cure, doctor) = Max path-
The mission of the measuring function simi- similar(cure.inverse, doctor) = Max pathsimi-
lar(X,Y) is to calculate their relevancy between lar({doctor, medical worker, medicine, patient},
two concepts or words whether they are of type doctor).
entity, event, or of some other type. Fourth, pathsimilar(event X, event Y) shares
the same formula with pathsimilar(entity X, en-
tity Y) shown before. But, we can give another payer* converse
inverse pathsimilar(event X, event Y) = Max money give take
advanced$
pathsimilar(X.inverse, Y.inverse). hypernym hypernym
inverse
5.2 Logical Implication and Expansion Depth why patient pay money
All of the relations in Figure 2 are translated into human agent commercial
logical form (see below). As shown in “Interpreta- *sufferFrom content $earn
$cure source *buy
tion as Abduction” (Hobbs et al. 1988), “abductive #sell
doctor
inference is inference to the best explanation”. $setAside
These relations showed “the interpretation of a text human
is the minimal explanation of why the text would #occupation occupation
affirs
*cure
be true” based on the abductive inference. By the medical earn
same token, “the interpretation of a question is the
Figure 3: Virtual Linking for Causality
minimal explanation of why the question would be
true” based on a set of lexical knowledge bases.
The “expansion depth level” of similar has two
Before proceeding to our algorithm, an example
kinds of utilities: one is to find the minimal expla-
will be applied to abductive inference briefly as a
nation, and the other is to be dynamically adapt-
set of logical forms as well as a diagram in Figure
able to the level of interaction. This level of
3.
similar is defined as a function simi-
16. doctor ⊃ human, #occupation, lar(Level)(X,Y) for X and Y, concepts or words in
*cure, medical. the following manner:
17. medicine ⊃ *cure. • similar(0)=pathsimilar: they use only them-
18. disease ⊃ $cure. selves and their hypernym path from X and
19. cure ⊃ medical, Y.
{agent,patient,content}.
• similar(1)=feature_similar: they use their
20. medical ⊃ #cure.
features that are expanded one more than
21. converse(pay,earn) ⊃ similar(0).
agent=source,
target=agent. • similar(2)=inverse_similar
22. patient ⊃ human,$cure. • similar(3)=sister_similar
23. occupation ⊃ affairs, earn. =inverse_similar•~ feature_similar.
24. cause(cure,sufferFrom) ⊃
patient=experiencer, Depending on what level of similar is chosen,
content=content. the search paths may be changed. A snapshot up to
25. possibleConsequence(cure, similar(2) is given in Figure 4.
beRecovered) ⊃
patient=experiencer, medicine*
medicine*
content=stateIni. disease$
disease&
medical#
medical#
While pursuing the path tracing enabling mini-
mal explanation, now we are going to propose why doctor cure patient
a connectability measure similar such as
“weighted abduction” (Hobbs et al. 1988). As human agent human
“likelihood estimation” is useful to consider a #occupation patient * sufferFrom
“bounded conditioning” (Russell & Norvig, 1995) *cure content $cure
in a belief network, the “expansion depth” of simi- medical medical
lar will be useful for the explanation path tracing
for the purpose of the minimal explanation of the Figure 4: Snapshot for similar(2).
question.
6 Tracing Algorithms (3) [weak stopping condition] When there is
no event, one of the other features is com-
monly shared between two concepts. For
6.1 Algorithm Crossover example, medical is a common feature be-
The overall algorithm 10 flow depends on simi- tween doctor and cure.
lar(Level) as in the next program.
6.3 Hypernym Climbing
Algorithm Crossover
In section 4.2, inheritance was discussed for the
For Level=0...N until stopping purpose of finding a relation among pay ~ patient.
condition is satisfied: After trying to make Level=2 in section 5.2, we
Expand the trace have been motivated to find the interrelation be-
by similar(Level) tween hypernyms. The algorithm crossover is up-
For example, when Level=1, the algorithm cross- dated.
over finds a very primitive answer to the question Algorithm Crossover+
“Why do doctors cure patients?” We will expand
other features of doctor except for cure because For Level=0..N until stopping
cure has a syntactic relation between doctor and condition is satisfied:
patient. Expand the trace
As shown in the logical forms (16~24) intro- by similar(Level)
duced in the previous section, this algorithm in If Level >= 2, then
Level=1 can find the following concepts as a re- repeat climb up hypernym
sult: medical, human, cure ($cure, *cure). until it matches with
When Level=2, the algorithm crossover will the higher relation.
seek higher-order relations (like the hypothesis)
6.4 Algorithm Crossover++
from the concept (by inverse_similar), con-
verse/antonymy relations (by feature_similar), Consider again the question "Why do patients pay
and event relations (if any, for use in knowing money to doctors?" As shown in Figure 1, the best
the cause or consequence relation). Consider again trace is $cure ~ *cure ~ *earn ~ $pay. It provides
our example "Why do doctors cure patients?" by an explanation for the statement that “patients are
using the previous section's logical forms. The re- cured by doctors ~ doctors earn money ~ patients
sults are as follows: pay money to doctors”. This minimal explanation
is observed by switching over the role pointers θ
*cure = {doctor, medicine}
whenever tracing is performed. For example,
$cure = {patient, disease}
$cure was switched over to *cure. This extended
*sufferFrom = {patient}
version of algorithm is called Crossover++.
$sufferFrom = {disease}
Its generated meaning may be “If a doctor cures a 7 Evaluation
patient, the patient is recovered from disease.
Because patients suffer from diseases, doctors cure By the algorithm Crossover’s, the behavior of
the patients. Patients are recovered after getting “why”-type questions are investigated by extract-
cured.” ing the answer paths as follows.
Q: Why does patient pay money?
6.2 Stopping Condition Path: patient ~ $cure ~ doctor ~ #occupation ~
Stopping conditions for the algorithm crossover $earn ~ money
Q: Why does researcher read textbook?
are as follows:
Path: researcher ~ #knowledge ~ #information ~
(1) Event roles are filled up.
readings ~ textbook
(2) If no event is found in the feature defini-
tion, increase similar level. Paths between two concepts can now be found
by simply checking the presence of a path among
the concepts reached from an initial concept. Table
10
This algorithm will be called “crossover”.
1 and Table 2 show examples of the number of With the ability to provide explanations de-
paths as a function of path size. pending on the level of the measure similar, our
proposed algorithm adapts itself to the user knowl-
Source Reached concepts path size edge level and well as to the type of interactive
concept 1 2 3 questions to enable more detailed level of ques-
cure 275 593 24854 tion-answering.
eat 268 605 24903
study 276 358 23172
food 532 650 18066
References
human 6713 3686 51171 Zhen Dong and Q. Dong. 1999-2003. Hownet,
money 328 1312 19827 http://www.keenage.com/
Table 1: Examples of destination concepts reached Jerry R. Hobbs, Mark Stickel, Douglas Appelt and
starting from one source concept Paul Martin. 1988. Interpretation as Abduction,
Proceedings of the Conference on 26th Annual
Paths number length
Concept1 Concept2 Meeting of the Assocation for Computational Lin-
1 2 3
guistics.
cure human 0 78 26
pay money 0 7 3 Doug Lenat, George Miller, and Toshio Yokoi. 1995.
human money 0 3 7 CYC, WordNet, and EDR: Critiques and Re-
food human 0 0 28 sponses, Communications of the ACM, 38(11):45-
read write 0 4 6 48.
earn pay 0 0 7
Bernardo Magnini and Manuela Speranza. 2002.
Table 2: The number of paths between pairs of Merging Global and Specialized Linguistic On-
concepts tologies, Proceedings of Ontolex 2002 (Workshop
held in conjunction with LREC-2002), Las Palmas.
8 Discussion
George Miller. 1995. WordNet: a lexical database.
HowNet (Dong et al. 1999-2003) has already de- Communications of the ACM, 38(11):39-41.
fined the words and concepts using the features of Dan Moldovan and Adrian Novischi. 2002. Lexical
concepts. Each event role is also defined under the Chains for Question Answering, Proceedings of
notion of feature. On the other hand, WordNet COLING 2002, Taipei.
(Miller, 1995) consists of synsets and their glosses.
Moldovan et al. (2002) showed a lexical chain to Takanoa Ogino and Masahiro Kobayashi. 2000. Verb
use words in glosses in order to trace the topically Patterns extracted from EDR Concept Description,
IPSJ SIGNotes Natural Language Abstract,
related paths.
No.138 – 006:39-46.
Their search boundary is restricted to the
shapes: V, W, VW, and WW. In this paper, cross- Alexandru Marius Pasça. 2001. High-Performance,
over* is shown to be flexible and search for a more Open-Domain Question Answering from Large
probable explanation. Text Collections. Ph.D Dissertation, Southern
Methodist University.
9 Conclusion H. Sofia Pinto, Asunción Gómez-Pérez and João P.
Martins. 1999. Some Issues on Ontology Integra-
In this paper, we have attempted to show how to tion, Proceedings of the IJCAI-99 workshop on
link pre-existing lexical knowledge bases to one Ontologies and Problem-Solving Methods (KRR5),
another. The major issue was to generate a path to Stockholm.
give explanation paths for answering the “why”-
type question. While observing the causality path Stuart Russell and Peter Norvig. 1995. Artificial
behavior, we proposed the measure similar and Intelligence: A Modern Approach. Prentice-Hall.
also the algorithm crossover. It is compared with Toshio Yokoi. 1995. The EDR Electronic Dictionary.
the “weighted abduction” (Hobbs et al. 1988) and Communications of the ACM, 38(11).
“lexical chain” (Moldovan et al. 2002).
Get documents about "