Awareness of Implicativity and Downward Monotonicity in EPILOG by nyut545e2


									          Awareness of Implicativity and Downward Monotonicity in EPILOG

                                                 Karl L. Stratos   (a.k.a. Jang Sun Lee)

                           Abstract                                    or not). We say forbid has implicativity −/◦. The predicates
                                                                       with +/+ and −/− are often called factives and antifac-
     Recent advances in Natural Logic (NL) emphasize mak-              tives, respectively. They are also called presuppositional
     ing inferences at a shallow level by utilizing certain            predicates, since their implicit connotation remains valid even
     linguistic properties. In particular, implicativity and           when they are negated.
     monotonicity allow for immediate and effective entail-
                                                                          The second assertion further uses the fact that decline is a
     ments when combined with a proper lexical semantic
                                                                       downward-entailing operator (DEO). We call the phenomenon
     database. Consequently, simple pattern matching is
                                                                       of being able to substitute an argument consistently with an-
     all that is needed to yield such entailments, and many
                                                                       other that is either more general or more specific than the
     researchers have taken advantage of this simplicity by
                                                                       original monotonicity, also called polarity. Note that most
     building specialized systems for NL. In this paper, we
                                                                       phrases have upward monotonicity for their argument slots
     show that the EPILOG system, a general inference
                                                                       (unless negated); this is a fact on which KNEXT, a system that
     engine for Episodic Logic (EL), can trivially absorb
                                                                       mines general knowledge from texts, heavily depends when it
     these features by dictating rules at the meta-level, al-
                                                                       generalizes to form factoids, e.g., from Robert has a gigan-
     though computational intractibility imposes a cap on
                                                                       tic head to A man may have a head. There are only a few
     its abilities. We also note the limitation of this shallow
                                                                       that have downward mononicity.1 In other words, we only
     methodology on making inferences, and the need for a
                                                                       need to be aware of DEOs in order to fully exploit the phe-
     more involved framework in order to move beyond mere
                                                                       nomenon of monotonicity. The downward monotonicity of de-
     question-answering trickery.
                                                                       cline on its verb phrase argument allows us to say that Chloe
                                                                       declined to tango, because tangoing is a special case of dancing:
1     Motivation                                                       ∀x.tango(x) ⊃ dance(x).
                                                                          From this example, we can see why one might want to incor-
David was amazed that Chloe had managed to decline to
                                                                       porate implicativity and monotonicity in a reasoning system.
                                                                       They are conceptually simple and easy to implement. Indeed,
                                                                       a substantial amount of work has been put into developing this
   From this piece of text only, let us try to evaluate the fol-       NL stance.
lowing assertions.
    1. Chloe did not dance.
    2. Chloe declined to tango.
                                                                       2   Previous Work
We cannot say that they are logically entailed, since noth-
ing prevents them from being false (for instance, Chloe might
                                                                       Nairn et al. greatly influenced the computational linguistics
have been eventually persuaded to dance, or she could have
                                                                       community with their work on implicativity and monotonicity,
assented to tango because tango happens to be her favorite
                                                                       presenting an algorithm to recursively compute these proper-
type of dance). Yet we can convince ourselves that these con-
                                                                       ties embedded in a sentence (2006). [5] They have also con-
clusions are very plausible, almost true. We say that the text
                                                                       structed a list of implicative verbs, which I have expanded
implicates those assertions to acknowledge this flexiblity.
                                                                       significantly by clustering synonyms and antonyms. Following
   To trace the steps of reasoning for the first assertion, we
                                                                       up with this endeavor, MacCartney et al. attempted to relate
know that Chloe did manage to decline to dance because the
                                                                       monotonicity effects with implicativity, and place them into a
dyadic predicate amaze, when passively used to form an ad-
                                                                       more generalized system of projectivity properties (2008). [4]
jective phrase, implicates the attached sentence. Similarly, we
                                                                       Danescu-Niculescu-Mizil employed Ladusaw’s Hypothesis that
know that Chloe did decline to dance because the predicate
                                                                       Negative Polarity Items (NPIs) appear only within the scope
manage establishes the infinitival verb phrase complement on
                                                                       of DEOs to automatically discover words with negative po-
the subject. Finally, we presume that Chloe did not dance
                                                                       larity. [1] I have also expanded his list of DEOs by using the
since decline implicates the negation of the associated action
                                                                       relation between monotonicity and implicativity.
on the subject. We will call this linguistic property implica-
tivity. We assume that a predicate carries one of the three               Schubert et al. have already shown that EPILOG is able to
types of implicativity: positive (‘+’, implication of the truth        perform NL-like entailment inference by simulating MacCart-
of the action), negative (‘−’, implication of the falsity of the       ney’s example (2010). [3] My paper can be seen as an extension
action), and neutral (‘◦’, no information). It is not true that        of this effort, although I was largely unaware of it.
the direction of implicativity is reversed when the predicate is
negated. For instance, even though forbid has ‘−’ in a posi-
tive environment (Chloe forbade Jack to move implicates that              1 In fact, the semantics of this phenomenon can be explained

Jack did not move), it has ‘◦’ in a negative environment (Chloe        by quantification only—the appendix contains a closer inves-
didn’t forbid Jack to move does not tell whether Jack moved            tigation.

3     Episodic Logic and EPILOG                                          The single quote before the word means the use of quasi-
                                                                         quotation on the word. The operator imp-pn would have
The systems that implement implicativity and monotonicity                to be declared as a predicate itself in the KB (as in (store
suffer from the fact that they are inevitably shallow, unable to          (x-is-predicate ’imp-pn) *lexicon-kb*)). There are two
perform more advanced reasoning. MacCartney, for example,                rules relevant to dare. The first is that an implicative p that
almost entirely relies on lining up a pair of sentences and in-          has ‘+’ in the positive environment, when used with the sub-
specting each pair of aligned words. This narrow focus results           categorization format p to q (where q is any predicate), impli-
in a mechanism that is only good at one task, i.e., making NL            cates the subject of p does the action q. The action is denoted
inference based on word relations.                                       by the ‘ka’ operator—standing for ‘kind of action’. We can
   EPILOG, in contrast, is a full-fledged inference engine based          state this rule at the meta-level as follows:
on EL—a superset of first-order logic developed by Hwang and
Schubert that is specifically intended to capture the expres-             (all_pred p ((’p imp-pp) or
siveness of the English language. [2] This means that the NL                          ((’p imp-pn) or (’p imp-po)))
derivations are but a particular instance of inference of which           (all_pred q
EPILOG is capable.                                                         (all x ((x p (ka q)) =>
   In order to implement this NL inference capability, we have                     (x q)))))).
to ‘dumb down’ and make a few simplifying assumptions:                   The second handles a similar inference in the negative envi-
    1. A given sentence is already in EL.                                ronment: an implicative p that has ‘−’ in the negative envi-
                                                                         ronment, when used with the subcategorization format p to q,
    2. Complicating elements, such as time and events, are ig-           implicates the subject of p does not do the action q.
                                                                         (all_pred p ((’p imp-nn) or
    3. Borderline cases (i.e., those with questionable implica-                       ((’p imp-pn) or (’p imp-on)))
       tion) are tolerated without a proof.                                (all_pred q
   It is worth noting several problems arising from these as-               (all x ((not (x p (ka q))) =>
sumptions. First, there is no ‘standard’ English-EL translator                      (not (x q))))))).
yet, and sometimes we run into an English syntax whose EL
                                                                         Now we can derive (John climb Himalayas) from (John dare
form we have not completely agreed on. Second, in a long run,
                                                                         (ka (climb Himalayas))) and (not (Sue climb Himalayas))
assumption 2 is intolerable because we believe that time is a
                                                                         from (not (Sue dare (ka (climb Himalayas)))).
crucial component of reasoning. The failure to account for tem-
                                                                            The semantics of a DEO can be implemented likewise. For
poral complications illustrates a huge defect of the simple NL
                                                                         instance, the downward monotonicity of decline on its infini-
approach (at least as it stands now). Third, it is a permanent
                                                                         tival complement can be captured by the rule “If x declines to
conundrum in NLP how to handle the fluidity of language. Im-
                                                                         do p, then x declines do any q that involves p”:
plicativity, by its very nature (as seen in section Motivation),
is very uncertain and to some extent debatable. Until we come            (all x
up with a precise mathematical model to capture this fluidity,             (all_pred p (all_pred q
we must appeal to our intuition and follow the language phe-               (((x decline (ka p)) and (all z ((z q) => (z p))))
nomenon we observe in the world. These issues are further                 => (x decline (ka q)))))))).
discussed in the last section.
                                                                         Note that we need to have the knowledge of the relations be-
                                                                         tween predicates for this to work, e.g., the fact (all x ((x tango)
4     Implementation Detail                                              ⇒ (x dance))) must be known to EPILOG. This information
                                                                         may be partially obtained from lexical resources like WordNet,
As mentioned, one of the strengths of NL is its ease of imple-           which provides a rough hierarchy of predicates using relations
mentation. I have manually contructed a list of around 250               such as part-of, but it still poses formidable technical difficul-
implicatives with their semantics.2 I’m acknowledging here               ties.
that about half of the items came from Nairn’s work (via per-               In addition, note that we do not have to worry about seman-
sonal correspondence with Cleo Condravdi at PARC). I have                tically wrong inputs, for which our rules will generate absurd
also handcrafted a list of around 80 DEOs. Again, I acknowl-             inference, because we are assuming the typical logical stance
edge that nearly 60 of them came from the collection made                that if the premise is false, then whatever conclusion is valid.
by Danescu-Niculescu-Mizil with the clever use of Ladusaw’s              Therefore, even though we can “infer” (Sasquatch exist) from
Hypothesis.                                                              (that (Sasquatch exist)) manage Tom) 3 , this inference is vac-
   Given the lexical semantic database, we can incorporate it            uously sound because the premise is semantically flawed (in a
into EPILOG by trascribing the operators and stating meta-               most reasonable sense).
rules to dictate inference. This is a rather technical issue of no
theoretical importance, so you may skip this section if you are
                                                                         5    Performance
not interested in the EPILOG system implementation.
   We illustrate the process with the implicative dare, which            Having done similar encoding for all items in the database, we
has +/−. We must first declare it as an operator with implica-            are capable of any inference allowed by the implicatives and
tivity +/−, denoted by imp-pn:                                           DEOs we possess. In particular, we can go back to the first
                                                                         question of this paper. From the sentence
(store (x-is-predicate ’dare) *lexicon-kb*)
                                                                            3 This is because manage—meant to be used with an infini-
(s ’(’dare imp-pn)).
                                                                         tive complement—is a predicate p with + in the positive en-
   2 I say ‘around’ because the number may change depending              vironment, and that is all the subcategorization format ((that
on how we decide to further modify the list.                             X) p Y) requires to derive X.

(Karl (pasv amaze) (that                                               mately merge the two methodologies in NLU, whose gap runs
 (Chloe manage (ka (decline (ka dance))))))                            deeply at the current moment.5
                                                                          EPILOG is a wonderful inference engine that has the po-
(‘pasv’ is a passive operator), EPILOG answers no to the query         tential of being used for more advanced reasoning. However,
(Chloe dance) and yes to the query (Chloe decline (ka tango)).         there are several obstacles that must be overcome in order for
   However, the main bottleneck for performance is (in ad-             EPILOG to wield its full potential:
dition to the plausibility of the inference) speed. EPILOG                 1. There is a dire need for an official manual. However su-
is, by the standard of Lisp programs, very efficiently imple-                   perior EPILOG version 2 is compared to version 1, as
mented, credits to Fabrizio Morbini—a main developer of the                   long as it lacks a manual, one can only be hesitant about
current version. Yet as the number of lexical items and axioms                using it for a serious, big-scale project.
grow, the search space which EPILOG linearly goes through
grows exponentially. Consequently, when we have all of the                 2. We also need a standard method of translation between
250 implicatives and all of the 80 DEOs with their accompa-                   English and EL. Without a complete syntax format of EL
nying axioms scribed into the knowledge base, an easy ques-                   for English prepared by side, EPILOG can only hanle ‘toy
tion such as deriving (Bill sign document) from (Tim rejoice                  problems’ that we handcraft. An automated English-EL
(that (Bill sign document))) can feel alarmingly sluggish, and a              translator is a frustrating yet necessary part we need to
nested question such as the question above becomes quickly in-                fulfill EL’s original purpose of transcribing English to an
tractable computationally. The queries for the example above                  inference-capable form.
took only 1–2 seconds, but that is because I used only the             Incorporating events and time into the NL methodology devel-
necessary portion of the database to manage them. They will            oped in this paper would be an interesting next step, although
take an unacceptable amount of time if they were asked in a            it would violate the NL spirit that stresses shallow reasoning.
database that contains all of the implicatives and DEOs.               Alleviating the computational intractibility, although largely
   One step to improve performance is to eliminate axioms that         outside the domain of NLU, is also an important part of solv-
are redundant or not used, and reduce the number of lexical            ing the problem. In a long run, it may be a good idea to focus
items since many are mere synonyms of others. Also, we can             on the heart of the problem, quantifiers and statistical/machine
try to tweak the retrieval mechanism to focus the search space,        learning approaches, whose advance might immediately grant
a long-term solution suggested by Morbini. Yet we cannot               us the NL inference as a trivial corollary.
get around this computational explosion without addressing
the problem’s fundamental computational hardness, which is
a huge subject on its own. Therefore, we will for now leave            7     Acknowledgements
this bottleneck as a grave problem for the future.
                                                                       The author thanks Len Schubert and Jonathan Gordon for
                                                                       helpful comments and advices, and is sincerely grateful to Fab-
6   Discussion and Future Direction                                    rizio Morbini for his prompt assistance with the EPILOG sys-
As convenient and efficacious this NL approach is in making
inference, it is merely scratching the surface of a deep water.
Specifically, we see in the appendix that upward/downward               References
monotonicity is but an obvious behavior of quantification
                                                                       [1] Cristian Danescu-Niculescu-Mizil, Lillian Lee, and Richard
(when presented in First-Order Logic), and implicativity is a
                                                                           Ducott. Without a ‘doubt’ ? Unsupervised discovery of
statistical phenomenon which we may capture automatically
                                                                           downward-entailing operators. In Proceedings of NAACL
in language interactions. Therefore, we cannot say we have
                                                                           HLT, pages 137–145, 2009.
‘solved’ the problem of NLU as long as we rely on NL, because
to a certain extent it is merely a cheap trickery to be able to        [2] C. H. Hwang and L. K. Schubert. El: A formal, yet natural,
explain what we observe.                                                   comprehensive knowledge representation. In Proceedings of
   But then, we do not know how ‘deep’ an ideal solution looks             AAAI-93, pages 676–682, 1993.
like. In physics, for instance, Newtonian mechanics turned out
                                                                       [3] M. Bazrafshan L. K. Schubert, B. Van Durme. Entailment
to be a special case of general relativity, and some suspect
                                                                           inference in a natural logic-like general reasoner. In Pro-
there may be a single theory of which every other theory is a
                                                                           ceedings of AAAI-10, 2010.
special case (in that case, language would also be one!).4 Yet
even if such a grand theory exists, it would be of no help until       [4] Bill MacCartney and Christopher D. Manning. Modeling
we have it, and the “special cases” are extremely successful on            semantic containment and exclusion in natural language
their own in (at least partially) explaining the world in which            inference.
we reside and thus being constantly used even today. Hence,
                                                                       [5] Cleo Condoravdi Rowan Nairn and Lauri Ka Karttunen.
it is too early to tell how significant NL is in a wider scheme
                                                                           Computing relative polarity for textual inference.
of NLU.
   Another interesting point to note is that the ways to solve
the problem of monotonicity and implicativity employ the logi-                                   Appendix
cal and probabilistic stances to NLU, respectively. Monotonic-
                                                                           5 Here,we can use another physics analogy: we may asso-
ity is best explained in a logical framework, and implicativity
in a statistical framework. This illustrates the need to ulti-         ciate the logical stance with relativity and the probabilistic
                                                                       stance with quantum mechanics in terms of their ideology and
   4 Others hold that there is no such theory, and that we sim-        their historical contention. (Einstein, for one, refused to adopt
ply get better and better results with infinitely many discrete         quantum mechanics.) But most now believe that they must be
theories that explain different phenomena.                              unified into one coherent theory.

   When utilizing the monotonicity and implicativity
                                                                       frequency of an action A when A is modified by a predicate
   properties, we cannot help but wonder why and how
                                                                       p. However, this solution would be very hard to implement
   they work. We attempt to give a theoretical expla-
                                                                       through textual processing only because much of our knowl-
   nation for them in this appendix, adopted from my
                                                                       edge of implicativity comes from other sources, such as daily
   previous work (available on my website).
                                                                       interactions with people around us (e.g., when someone at work
                                                                       says “I managed to finish the job”, it is invariably the case that
Monotonicity                                                           he did indeed finish the job). Yet it may be a worthwhile effort
                                                                       to try this out on textual resources, which many believe is a
Jumping straight to the conclusion, we can say monotonicity            good enough foundation for human-level intelligence.
arises from how quantifiers are represented in logic. The reason
most arguments seem to have upward monotonicity is that we
use them to satisfy a minimal qualification of what we try to
say, i.e., the quantifier at least. Therefore, we know from

                 Robert has a gigantic head.

that at least Robert has a gigantic head, which is at least
a head. In First-Order Logic (FOL), this would be rep-
resented as ∃x.(∃y.(man(x) ∧ name(x, Robert) ∧ head(y) ∧
gigantic(head, y) ∧ has(x, y))). So it of course follows that
∃x.(∃y.(man(x) ∧ head(y) ∧ has(x, y))), i.e., A man may have
a head. Negation (itself a DEO) flips monotonicity because it
flips conjunction to disjunction (and vice versa). For instance,
if we say Robert does not have a head, its FOL form would be
∃x.(∃y.(man(x)∧name(x, Robert)∧(¬head(y)∨¬has(x, y)))),
so it will entail that ∃x.(∃y.(man(x) ∧ name(x, Robert) ∧
(¬head(y) ∨ ¬gigantic(head, y)¬has(x, y)))), or Robert does
not have a gigantic head.
   Any DEO operates essentially by negating a conjunction of
items into a disjunction of their negations, so that we can be
more “specific” by attaching any disjunctive item. The quan-
tifier all, which is a DEO on its first argument, is no exception,
since ∀x.P (x) ⇒ Q(x) is the same as ∀x.¬P (x) ∨ Q(x), which
implies the truth of ∀x.¬P (x) ∨ ¬R(x) ∨ Q(x) and so forth.
The sentence Tom refused to move implies Tom refused to
dance because the former is ∀x.move(x) ⇒ ref use(T om, x),
which entails ∀ ⇒ ref use(T om, x) with the knowl-
edge ∀ ⊃ move(x).
   Thus, we see that monotonicity is a surface appearance of
logical operations, which means we can automatically obtain
the monotonicity inference capability when we are proficient
in language-logic translation. However, it is well-known that
such a translation is too difficult to be expected in the near
future, so the NL stance remains valid.6

Unlike monotonicity, implicativity seems to have no clear-cut
account in terms of logical operations. Rather, it seems to
come from the patterns created in language. Consider the
implicative ref use. In most cases when X refuses to do A,
unless there is no additional information that contradicts the
general pattern, X does not do A. Having statistically learned
this pattern through our daily language interactions in text and
speech, we have come to view ref use as having implicativity
−/+. In contrast, the implicative manage has come to carry
+/−, because in most (probably all) cases when X manage to
do A, X does do A.
   Therefore, the phenomenon of implicativity can be truly
captured only by a machine learning approach to compute the
   6 But we must keep in mind that this NL approach is not

what humans employ in making inferences. We keep neither
a list of DEOs nor a set of axioms at the shallow level, but
rather ‘reason out’ in a manner similar to the one explored in
this appendix.


To top