Docstoc

Approximating Textual Entailment

Document Sample
Approximating Textual Entailment Powered By Docstoc
					Approximating Textual Entailment
with LFG and FrameNet Frames

Aljoscha Burchardt and Anette Frank

Computational Linguistics Department               Language Technology Lab
Saarland University                                DFKI GmbH
Saarbrücken                                        Saarbrücken


SALSA Workshop, Saarbrücken, June 27-28, 2006
Multilingual semantic annotation: theory and applications
                               Overview
The PASCAL Recognizing Textual Entailment task (RTE):
  What is it, and how to approach it?
The SALSA RTE System:
  A baseline system for approximating Textual Entailment
    – Building on LFG-based syntactic analysis and frame semantics
    – Computing structural and semantic overlap
      as an approximation of textual entailment in a learning architecture
    – Open architecture for future extensions towards deeper modelling
Linguistic analysis: LFG and FrameNet frames
Approximating Textual Entailment
    – Computing a match graph for structural and semantic overlap
    – Feature extraction and machine learning
Results of this year’s RTE task
    – Discussion, error analysis and perspectives
Conclusion
   The PASCAL RTE Task: What is it?

A recently established Challenge for the NLP/AI community
    Testing a system‗s capacity to recognize „Textual Entailment―

    Sunday„s earthquake was felt
    in the southern Indian city of        The city of Madras is located    TASK:
                                                                          Entailed?
                  text                           hypothesis
    Madras on the mainland, as                 in Southern India.          – Yes
                                                                          Entailed?
    well as other parts of south India.



„Realistic“, open-domain data set
   drawn from system outputs in NLP applications: IR, IE, QA, SUM
Controlled set-up: balanced training and test sets
   800/800 text-hypothesis pairs
               Taking a look at the data
Fine-grained linguistic analysis
    T: Oscar-winning actor Nicolas Cage‗s new son and Superman have sth. in common ...
    H: Nicolas Cage‗s new son was awarded an Oscar. — No (IE)
Lexical semantics and paraphrases (nominalisation, synonymy)
    T: [o]n December 10th 1936 King Edward VIII gave up his right to the British throne.
    H: King Edward VIII abdicated on the 10th of December, 1936. — Yes (QA)
Inference and world knowledge
    T: Olson, 62, previously worked as a partner at Ernst & Young LLP, before joining the
       Fed board in 2001, to serve a term ending in 2010.
    H: Olson is a member of the Fed board. — Yes (IE)
Modality
    T: U.S. Secretary of State Condoleezza Rice said Thursday that North Korea should
       return to nuclear disarmament talks and ...
    H: North Korea says it will rejoin nuclear talks.  — No (SUM)

Temporal and local restrictions (monotonicity)
    T: In most Pacific countries there are very few women in parliament.
    H: Women are poorly represented in parliament.     — Yes (!) (IR)
                  Textual Entailment

„We say that T entails H if the meaning of H can be inferred from the
meaning of T, as would typically be interpreted by people. This
somewhat informal definition is based on (and assumes) common
human understanding of language as well as common background
knowledge.“

„Cases in which inference is very probable (but not completely certain)
are still judged True.―

(Dagan, Glickmann, Magnini, RTE 2005 Workshop Proceedings)



―Circumscribing Textual Entailment‖?
See discussions in: Zaenen, Karttunen and Crouch (2005),
                    Manning (2006),
                   Crouch, Karttunen and Zaenen (2006).
                  A Challenge, ... in fact

   T: Hundreds of divers and treasure hunters, including the Duke of Argyll,
    have risked their lives in the dangerous waters of the Isle of Mull trying to
    discover the reputed 30,000,000 pounds in Gold carried by this vessel--
    the target of the most enduring treasure hunt in British history.
    H: Shipwreck salvaging was attempted. (Yes, IR)

   T: The 26-member International Energy Agency said, Friday, that
    member countries would release oil to help relieve the U.S. fuel crisis
    caused by Hurricane Katrina.
    H: Responding to a plea from the International Energy Agency for
    member countries to release reserves, Canada is prepared to help.
    (No, SUM)
    Approximating Textual Entailment

How to reconcile obvious complexity and required depth?
   – Parsing complexity
   – Semantic analysis
      • Argument structure, anaphora, lexical meaning, semantic and
        discourse relations, presupposition, ...
      • Inferences based on linguistic meaning and world knowledge

 Statistical/ML approximation of Textual Entailment
   – Based on state-of-the-art syntactic and shallow semantic analysis
   – Measuring structural and semantic overlap
 With possibilities for extensions towards deeper modelling
   – Inference on partial structures (lexical entailment)
   – Targeted modelling of specific aspects, e.g. modality contexts …
 A baseline system for approximating
          Textual Entailment

Fine-grained LFG-based syntactic analysis
   – English LFG grammar (Riezler et al. 2002)
     broad-coverage with high-quality probabilistic disambiguation
Frame Semantics
   – Coarse-grained lexical-semantic classification of predicates with role-
     based argument structure encoding
   – Extended semantic representations: WordNet senses, SUMO concepts
Computing structural and semantic overlap
   – Hypothesis: high/low ratio of H/T overlap => entailment: yes/no

                                                         H/T matching for TE
                                                            match graph size
            text                        hypothesis
                                                          hypothesis graph size
 A baseline system for approximating
          Textual Entailment

Fine-grained LFG-based syntactic analysis
   – English LFG grammar (Riezler et al. 2002)
     broad-coverage with high-quality probabilistic disambiguation
Frame Semantics
   – Coarse-grained lexical-semantic classification of predicates with role-
     based argument structure encoding
   – Extended semantic representations: WordNet senses, SUMO concepts
Computing structural and semantic overlap
   – A learning problem: measures of overlap, weighted entailment decision

                                                         H/T matching for TE
                                                            match graph size
            text                        hypothesis
                                                          hypothesis graph size
                     The SALSA RTE System
Linguistic analysis components                   Recognizing Textual Entailment:
                                                 Graph matching
and Integration                                  & Statistical approximation

                                                 text                    hypothesis
    XLE parsing:
   LFG f-structure                               f-structure w/         f-structure w/
                                                 frames & concepts      frames & concepts
                              f-structure w/
Fred/Detour + Rosy:
                            (extended) frame-
   frames & roles                                text-hypothesis-match graph
                           semantic projection
                                                 • matching nodes and edges
WordNet-based WSD:                               • different match types (similarity types)
 WordNet & SUMO                                  • extensions for deeper modelling
                                                   (modality, lexical entailment)


                                                            Feature extraction
Using XLE term rewriting
system (Crouch 2005)                                Model training & classification
            Linguistic Components
LFG analysis combined with FrameNet frames

Deep syntactic LFG analysis
   – Broad-coverage grammar with probabilistic disambiguation
   – Fine-grained grammatical function analysis with integrated NER
   – Performance on RTE-II development and test set:
       • Coverage:  99% ( 86% full parses,  13% partial parses)
       • On RTE H/T pairs:  76% fully analysed pairs –  2% single analysis only

Frame semantic analysis
   – Focusing on lexical semantic classes and role-based argument structure
   – Disregarding aspects of „deep― semantics: modality, quantification, ...
   – Normalisation over syntactic and lexical alternations
     (diatheses, lexicalisation, PoS)

      Determine semantic similarity based on lexical meaning,
      combined with similarity of argument structure, at
      a high level of abstraction
              Linguistic Components
               Frame and role assignment

Shalmaneser (Erk & Pado, 2006)
    – Shallow semantic parser for FrameNet frame and role assignment
    – Fred: statistical frame assignment
       • WSD system for predicates, in terms of frames
    – Rosy: semantic role assignment
       • Argument recognition and argument labelling
       • Using state-of-the-art features from robust syntactic parsing

Detour (to FrameNet via WordNet) (Burchardt et al., 2005)
    – Aim: overcome lexical gaps in FrameNet
    – A rule-based frame assignment system that takes a
      ―detour to FrameNet via WordNet‖
         Determine similarity of ―unknown LUs‖ to existing frames (their LUs)
          based on WordNet-similarity measures
         Linguistic Components
         Frame and role assignment

Fred &
Rosy




Fred,
Detour
& Rosy
              Linguistic Components
              Frame and role assignment

Fred & Detour – different sense assignments (FN coverage)
                      Linguistic Components
       Integration and extended semantics projection

      Porting frame and role assignments to LFG f-structure
           – Defining a frame semantics projection using head lemmata as
             interface layer (accounts for parser discrepancies)
           – Using XLE rewrite system (Crouch 2005)




    Head-indexed
frame & role assignments
             Linguistic Components
 Integration and extended semantics projection

Rule-based extensions of LFG-frame structures
   – Frames corresponding to LFG NE classes
       • Locations, companies, dates, …
   – Extra-thematic roles, based on LFG adjunct classes, etc.
       • Time, Reason, Location, Concessive, …
         +adjunct(Z,Y), ntype_sem(Y,time)
         ==> s::(Z,SemZ), s::(Y,SemY), time(SemZ,SemY).




Extended semantics projection: WordNet and SUMO classes
   – WSD: Banerjee & Pedersen, 2003
   – WordNet – SUMO/MILO mapping: Niles and Pease (20019
             Linguistic Components
 Integration and extended semantics projection

Normalisations of syntactic structure
   – Passive: Mapping SUBJ and OBJ to dsubj and dobj argument slots
   – Coindexing relative pronouns and relativised head, appositives, etc.
   – Heuristic rules collect antecedent candidate sets for pronominals
FEF: Frame-Exchange-Format
   – (Partial) Visualisation of extended syntactic-semantic graph
     structures in FEFViewer (Alexander Koller, Coli Saarbrücken)




                           A shark attacked a human being.
    A walk-through-example from RTE 2006

Pair 716

Text
   In 1983, Aki Kaurismäki directed his first full-time feature.

Hypothesis
   Aki Kaurismäki directed a film.
 LFG F-Structures
in XLE graphical display
      Automatic Frame Annotation for Text
                     in SALTO Viewer



                                       Detour System
    Fred & Rosy                            frames
   frames & roles                      (via WordNet)
     (statistical)


Collins Parse
Automatic Frame Annotation for Hypothesis

716_h: Aki Karusmäki directed a film.
           LFG and Frames for Hypothesis
                             in FEFViewer


                                            Rule-based
                                            (LFG-NER)




Aki Kaurismäki directed a film.
                    The SALSA RTE System
Linguistic analysis components              Recognizing Textual Entailment:
                                            Graph matching
and Integration                             & Statistical approximation

                                            text                    hypothesis
   XLE parsing:
  LFG f-structure                           f-structure w/         f-structure w/
                                            frames & concepts      frames & concepts
                         f-structure w/
Fred/Detour + Rosy:
                       (extended) frame-
   frames & roles                           text-hypothesis-match graph
                      semantic projection
                                            • matching nodes and edges
WordNet-based WSD:                          • different match types (similarity types)
 WordNet & SUMO                             • extensions for deeper modelling
                                              (modality, lexical entailment)


                                                       Feature extraction


                                               Model training & classification
         Hypothesis-Text-Match Graphs
       Computing structural and semantic overlap

Computing structural and semantic overlap
   – Computing a ―match graph‖ from text and hypothesis graphs
   – Matches are established by different aspects and degrees of “similarity”
Approximating textual entailment
   – High/low overlap ratio of hypothesis and match graph
     => entailment: yes/no


                                                          H/T matching for TE
            text                        hypothesis          match graph size
                                                          hypothesis graph size
            Hypothesis-Text-Match Graphs
                     Different matching strategies

I.   Match graph/Text overlap:
     Ratio of matched material and non-matched material in Text


                       text



II. Match graph/Hypothesis overlap:
     Ratio of the matched material and non-matched material in Hypothesis


                    hypothesis


     T: Leo Fender invented the first electric guitar and the electric bass guitar.
     H: Leo Fender invented the first electric guitar.
     I: 7/12 = 58% – II: 7/7 = 100%
          Hypothesis-Text-Match Graphs
       Computing structural and semantic overlap

Graph matching using XLE rewrite system
   – Defining different types of match conditions on t- and h-graph,
     triggering new nodes and edges in m-graph, with match-type info
       text-hypothesis        ==> text-hypothesis-match
       frame(h:x1,killing)
                                    frame(m:(z1,x1,y1), killing),
                              ==>
                                    match_type(m:(z1,x1,y1),killing,frame)
        frame(t:y1,killing)


      Rewrite rule
          +frame(h:X1,Frame), +frame(t:Y1,Frame)
      ==> frame(m:(Z1,X1,Y1),Frame), match_type(m:(Z1,X1,Y1),Frame,frame).


   – Matching algorithm tied to rewrite-logic
      • Locally defined matches (no graph traversal)
      • Starting with (multiple) node matches
      • Edge matches: restricted to connect matched nodes
          Hypothesis-Text-Match Graphs
       Computing structural and semantic overlap

Aspects of similarity
   – Syntax-based (i.e. lexical and structural) similarity
       Identical PREDs and attribute values trigger node matches
       Identical ATTRIBUTES (GF, morph. features) trigger edge matches
   – Semantics-based similarity
       Identical FRAMES and CONCEPTS trigger node matches
       Identical ROLES trigger edge matches
    Match graph consists of identical partial syntactic & semantic graphs
Degrees of similarity (strict vs. weak matching)
   – Non-identical, but ―structurally related‖ PREDs
       – coreferentially related (relative clauses, appositives, pronominals)
   – Non-identical, but ―semantically related‖ PREDs (WN-related, path<3)
   – Non-identical, but ―semantically related‖ FRAMES (FN-/Detour-related)
    Match graph establishes overlapping partial graphs (marked by match
     types)
                                     t: In 1983, Aki Kaurismäki
                                     directed his first full-time
                                     feature.




          Grammatically
             related
                          h: Aki Kaurismäki directed a film.




WordNet
related
       Approximating Textual Entailment
        Extensions for deeper modelling: Modality

Detecting indicators of inconsistent modality types
   – T: A pet must have rabies protection confirmed by a blood test.
     H: A case of rabies was confirmed.
Marking modal contexts in text and hypothesis
   – 5 modality types: conditional, future, diamond, box, negation
Handling inconsistent modality types in matching process
   – Introducing negatively marked match nodes
   – Blocking embedded structures for similarity-based matches
   – Thus, reducing the size of the match graph
        Approximating Textual Entailment
  Extensions for deeper modelling: Lexical Entailments

Bridging partial non-matching text and hypothesis pairs
    – T: Olson, 62, previously worked as a partner at Ernst & Young LLP, as a
      Minnesota bank president and as a congressional aide, before joining the
      Fed board in 2001, to serve a term ending in 2010.
      H: Olsen is a member of the Fed board.

Lexically induced inferences, defined as rewrite rules on h/t/m graphs
     t: (X1) joins X2
     h: (Y1) member-of Y2
     m:(Z2,Y2,X2)
     => match_type(heuristic_entailment_match).


Similar: non-lexical heuristic inferences
    – Appositions: prime minister X  X is prime minister
    – Possessive constructions: X‟s Y  the Y of X
         Approximating Textual Entailment
                               Machine learning

Feature selection with WEKA Classifiers
    – Many learners select intuitively important features, but also ―idiosyncratic‖ ones

Selected learners and models
    – Model 1 Simple Conjunctive Rule classifier: generated a single rule

            preds_m_relto_h  0.485294
                                               rte_entails = 0
         & frames_m_relto_h  0.954546

       Medium/high threshold on pred/frame matches as criterion for rejection
       High degree of frame similarity /w medium predicate similarity models entailment
    – Model 2 Meta-classifier LogitBoost (additive logistic regression)
      Features (1.-4.) used in iteration; final feature set: 1.,2.,4.

         1.   No. of predicate matches relative to hypothesis
         2.   No. of frame (Fred,Detour) matches relative to hypothesis
         3.   No. of roles (Rosy) matches relative to hypothesis
         4.   Match graph size rel. to hypothesis, incl. syn, sem, ontological info
                          Results in RTE-II

SALSA RTE system results
     RTE-II   all tasks    IE     IR       QA      SUM           Dev set   all tasks
    Model 1     59.0      49.5   59.5      54.5     72.5         Model 1       61.1
    Model 2     57.8      48.5   58.5      57.0     67.0         Model 2       59.8
   – Both models score SUM > IR > QA > IE
   – Refined model better on QA – simple model better on SUM
Overall RTE-II results
   – Average accuracy: 60% (Median: 59%)
       Accuracy range (in%)      53 - 56        58 - 61    62 - 64     74 -75
       No. of groups               7              11         3             2

   – Shallow overlap measures vary considerably between data sets,
     whereas ―deeper‖ approaches remain more stable
   – Tendency towards deeper, knowledge-rich methods
                       Discussion of Results
                                     True positives

High ratio of matching predicates, frames, and f-structure
Typical phenomena
   – Non-identical predicates compensated by matching frames (626)
   – Missing frame assignments compensated by WN relatedness
      • die – pass away (wn-related, 103)
   – Active-passive diathesis resolved by f-structure normalisation (129)
Relative overlap measures also work for longer hypotheses

     T: Everest summiter David Hiddleston has passed away in an avalanche of Mt. Tasman.
     H: A person died in an avalanche. (103)
     T: An earthquake has hit the east coast of Hokkaido, Japan, with a magnitude of 7.0 Mw.
     H: An earthquake occurred on the east coast of Hokkaido, Japan. (626)
     T: In one of the latest attacks, a US soldier on patrol was killed by a single shot from a
     sniper in northern Baghdad, the military said yesterday.
     H: A sniper killed a U.S. soldier on patrol in Baghdad with a single shot. (129)
                      Discussion of Results
                                  True negatives

Modal context marking seems to be effective
   – 27% of all true negatives involved modality mismatches,
     while only 11.9% of all sentences involve marked modal contexts

      T: The goal of preserving indigenous culture can hardly be achieved by a handful of
      researchers and curators at museums of ethnology and folk culture.
      H: Indigenous folk art is preserved. (233)
      T: Even today, within the deepest recesses of our mind, lies a primordial fear that will not
      allow us to enter the sea without thinking about the possibility of being attacked by a shark.
      H: A shark attacked a human being. (322)


Future plans
   – Extend to lexically induced modality/facticity indicators
   – Testing for non-monotonicity contexts
                        Error analysis
                          False positives

Typical cases

Semantic dissimilarity
    – Non-matching predicates within larger match graphs, which are in
       fact semantically dissimilar
Structural distance
    – Matching nodes within a match graph correspond to far distant
       nodes in the text graph – compared to neighbouring nodes in the
       match graph
                                   Error analysis
                                     False positives




 Unconnected
nodes matched
  with distant
 nodes in text
     grap



 T:Some 420 people have been hanged in Singapore since 1991, mostly for drug
 trafficking, an Amnesty International 2004 report said. That gives the country of 4.4
 million people the highest execution rate in the world relative to population.
 H:4.4 million people were executed in Singapore. (198) – False positive
                           Error analysis
                            False positives

Graph matching process
   – Not a top-down process
   – Starts by relating any nodes, and builds growing clusters by finding
     matching edges
   – This allows criss-cross matching of nodes in the match graph



                    text                           hypothesis




   Introduce weighted edges that reflect the relative distance of pairs
    of match nodes in text and hypothesis (path distance)
                           Error analysis
                            False positives

Graph matching process
   – Not a top-down process
   – Starts by relating any nodes, and builds growing clusters by finding
     matching edges
   – This allows criss-cross matching of nodes in the match graph



                    text                           hypothesis




   Introduce weighted edges that reflect the relative distance of pairs
    of match nodes in text and hypothesis (path distance)
                        Conclusions

A medium-depth approach: Approximating Textual Entailment
   – Lexical and syntactic overlap, semantic similarity (WordNet)
   – Frame semantics: lexical semantic classes & argument structure
   – Flexible graph matching method with extensions to deeper processing
       • Modality contexts, lexical inferences
Perspectives for future extensions
   – Engineering and fine-tuning
       • Combination with shallow (and deeper) methods in voting architecture
   – Frame and role assignment
       • Sense discrimination: outlier detection (Erk, 2006)
       • Coverage: integration with other resources (VerbNet, NomBank)
   – Modelling dissimilarity
       • Semantic distance measures and distance-weighted graph edges
   – Acquisition of lexical modality indicators and (lexical) entailment rules
                              References

   RTE Proceedings
     – RTE Challenge Homepage: http://www.pascal-network.org/Challenges/RTE2
     – I. Dagan, O. Glickman, and B. Magnini(2005): „The PASCAL recognising textual
       entailment challenge―. In Proceedings of the RTE-1 Workshop, Southampton,
       UK.
     – B. Magnini and I. Dagan, editors (2006). Proceedings of the Second PASCAL
       Recognising Textual Entailment Challenge, Venice, Italy.
     – Electronic proceedings and slides:
       http://ir-srv.cs.biu.ac.il:64080/RTE2/proceedings/
   Discussion about RTE Task:
     – Zaenen, Karttunen and Crouch, 2005: ―Local Textual Inference: can it be
       defined or circumscribed?‖, In ACL 2005 Workshop on Empirical Modelling of
       Semantic Equivalence and Entailment, Ann Arbor, Michigan.
     – Manning (2006): ―Local Textual Inference: It's hard to circumscribe, but you
       know it when you see it - and NLP needs it‖, MS. Stanford University.
     – Crouch, Karttunen and Zaenen (2006): ―Circumscribing is not excluding: A reply
       to Manning‖, MS. Palo Alto Research Center.
     – All papers: http://www2.parc.com/istl/members/zaenen/
                               References

   A. Burchardt and A. Frank (2006): ―Approximating Textual Entailment with LFG and
    FrameNet Frames‖ In Proceedings of the Second Recognising Textual Entailment
    Workshop, Venice, Italy.
    http://www.coli.uni-saarland.de/projects/salsa/page.php?id=publications
   K. Erk and S. Pado (2006): ―Shalmaneser - a flexible toolbox for semantic role
    assignment.‖ In Proceedings of LREC-06, Genoa.
    http://www.coli.uni-saarland.de/projects/salsa/page.php?id=publications
   A. Burchardt, K. Erk, and A. Frank (2005): ―A WordNet Detour to FrameNet.‖ In
    Proceedings of the GLDV 2005 Workshop GermaNet II, Bonn.
    http://www.coli.uni-saarland.de/projects/salsa/page.php?id=publications
   R. Crouch (2005). ―Packed Rewriting for Mapping Semantics to KR.‖ In Proceedings
    of the Sixth International Workshop on Computational Semantics, Tilburg.
    http://www2.parc.com/istl/groups/nltt/papers/iwcs05_crouch.pdf
              Approximating Textual Entailment
        Similarity/Entailment measures and feature extraction


                             hypothesis                                        proportional:
            text graph                        match graph
                             graph                                             h/t and m/h ratio
lexical     lex_id           lex_id           lex_id                           ratio_lexid
                                              node_m (pred, coref, pro)        ratio_nodes
syntactic
                                              edge_syn_m (all, gf, subc)       ratio_edges
Semantic    (lfg_)frames_t   (lfg_)frames_h   (lfg_)frames_m                   ratio_(lfg_)frames
strict      (lfg_)roles_t    (lfg_)roles_h    (lfg_)roles_m                    ratio_(lfg_)roles
weak                                          node_frameFN/derived_m
                                              mode_framerel/detour/wnrel_m
                                              node_heuristic_entailment_m
                                              node_modal_ctxt_mismatch_m
Connect-                                                                       clusters_avgsize_rel_h
                                              clusters_no, clusters_avg_size
edness                                                                         clusters_abssize_rel_h
other       fragmentary      fragmentary      rte_task
                       Error analysis
                        Sparse features

Feature set
   – High-frequency features that measure similarity
   – Few, and low-frequency features that model dissimilarity
   – Bias towards similarity
      • 29,5% false positives
      • 12,75% false negatives


Plans for further development
   – Introducing distance measures (semantic and structural)
   – Getting a grip on remaining differences, i.e. non-matched edges
     between matching clusters

				
DOCUMENT INFO