Textual Entailment Using Univariate Density Model and Maximizin
Shared by: HC12100121565
-
Stats
- views:
- 0
- posted:
- 10/1/2012
- language:
- English
- pages:
- 18
Document Sample


Recognizing Textual Entailment
Progress towards RTE 4
Scott Settembre
University at Buffalo, SNePS Research Group
ss424@cse.buffalo.edu
Recognizing Textual Entailment Challenge (RTE) -
Overview
• The task is to develop a system to determine if a given pair of
sentences has the first sentence “entail” the second sentence
• The pair of sentences is called the Text-Hypothesis pair (or T-
H pair)
• Participants are provided with 800 sample T-H pairs annotated
with the correct entailment answers
• The final testing set consists of 800 non-annotated samples
Development set examples
• Example of a YES result
<pair id=“28" entailment="YES" task="IE" length="short">
<t>As much as 200 mm of rain have been recorded in portions of British
Columbia , on the west coast of Canada since Monday.</t>
<h>British Columbia is located in Canada.</h>
</pair>
• Example of a NO result
<pair id="20" entailment="NO" task="IE" length="short">
<t>Blue Mountain Lumber is a subsidiary of Malaysian forestry transnational
corporation, Ernslaw One.</t>
<h>Blue Mountain Lumber owns Ernslaw One.</h>
</pair>
Entailment Task Types
• There are 4 different entailment tasks:
– “IE” or Information Extraction
• Text: “An Afghan interpreter, employed by the United States,
was also wounded.”
• Hypothesis: “An interpreter worked for Afghanistan.”
– “IR” or Information Retrieval
• Text: “Catastrophic floods in Europe endanger lives and cause
human tragedy as well as heavy economic losses”
• Hypothesis: “Flooding in Europe causes major economic
losses.”
Entailment Task Types - continued
• The two remaining entailment tasks are:
– “SUM” or Multi-document summarization
• Text: “Sheriff's officials said a robot could be put to use in
Ventura County, where the bomb squad has responded to more
than 40 calls this year.”
• Hypothesis: “Police use robots for bomb-handling.”
– “QA” or Question Answering
• Text: “Israel's prime Minister, Ariel Sharon, visited Prague.”
• Hypothesis: “Ariel Sharon is the Israeli Prime Minister.”
RTE3 - 2007 Results
• Our two runs submitted this year (2007) scored:
– %62.62 (501 correct out of 800)
– %61.00 (488 correct out of 800)
• For the 3nd RTE Challenge of 2007, a %62.62 ties for 12th out
of 26 teams.
– Top scores were %80, %72, %69, and %67.
– Median: %61.75
– Range: %49 to %80 (up from %75.38 last year)
RTE3 - 2007 Results
• Category breakdown consistent with last year
– QA (question answering) average was %71 [%75]
– IR (information retrieval) average was %66 [%63]
– Summary average was %58 [%61.5]
– IE (information extraction) average was %52 [%51]
• This relationship between the entailment categories was
consistent between the groups as well.
Best Performers
• Hickl, one of the top performers, used techniques like these:
– Lexical relationships, using Wordnet *
– N-gram, word similiarity *
– Anaphora resolution
– Machine learning techniques *
• Entailment corpora, more than provided by RTE
– Logical inference
• Using background knowledge
*Also used by our submission
Best Performers
• Another top performer Tatu (%72), focused mainly on these
techniques
– Lexical relationships, using Wordnet
– Anaphora resolution
– Logical inference
• Using background knowledge
• A good performances came out of LSA, Lexical Semantic
Analysis
– %67 score came out of using LSA (top 4 performer)
– Only 3 teams used LSA, 2 scored low (%58,%55)
List of Techniques Used
• Lexical similarity, using a dictionary/thesaurus source
– Wordnet, DIRT, and MSOffice dictionary used
• n-gram, word similarity (also “bag of words”)
• Syntactic matching and aligning
• Semantic role labeling
– Framenet, Probank, Verbnet
• Corpus (web-based) statistics
– LSA – Latent Semantic Analysis
• Machine Learning Classification
– ANNs (Neural networks), HMMs, SVM (Support Vector machines)
• Anaphora resolution
• Entailment corpora, background knowledge
• Logical Inference
Logical Inference Techniques Used
• SNePS should be here!
• Extended Wordnet or Wordnet 3.0
– Expresses word relationships in logic rules
• DIRT – a paraphrase database of world knowledge
– Expresses equivalent paraphrases in terms of rules
– i.e. X kills Y X attacks Y
– note: this rule did not contain (“and Y dies”)
• Framenet
– Uses a Frame to express a relationship between a “objects” in a script along with
other “objects”, like roles, situations, events
• Use specifically developed semantic inference modules
• Oddly, no one used OpenCyc
New Technique for our RTE 4 Submission
• Latent Semantic Analysis – LSA
• LSI technique developed back in 1988, addressing search indexing
• LSA improved upon LSI in 1990's, applied to summary and evaluation
• Important for us because result can be expressed as a metric or a feature
vector, fits right into the RTE Tool
• Helps overcome the “poverty of the stimulus” problem, by
“accommodating a very large number of local co-occurrence relations
simultaneously” [Landauer, T. K., Foltz, P. W., & Laham, D. 1998]
How LSA Works
• The process includes
– Setting up a matrix of words to words or words to documents
– Performing a Singular Value Decomposition (SVD) on that matrix
– Reducing the resulting three smaller matrixes by removing rows of 0 coefficients
– We then reconstruct the original matrix, which essentially relates words (cells)
that had not been directly related to each other initially, and redistributes the
correlation between them
– Then, depending on what relationship one is trying to find, we can extract the
feature vectors we wish to compare and calculate the cosine between them
– Uh huh, so what does this all mean…
• Let’s look at an oversimplified example
LSA - Oversimplified Example – part 1
• We have two documents: D1 is about dogs, D2 cats
• D1 contains the words “dog” “pet” “leash” “walk” “bark”
• D2 contains the words “cat” “roam” “jump” “purr” “pet”
• At this level, we may not know if any of these words are related, especially
if we have many documents and many words
• But, we can see that the word “pet” is in both documents
• This “may” imply that there is a relationship between some words in D1
and D2, simply because “pet” occurred in both
LSA – How I Plan to Apply
• I will be creating two matricies
– One matrix will contain data ONLY from entailed sentence pairs
– Second matrix will be for non-entailed pairs
• Each matrix vector will contain word to word comparisons
– Each row will contain a word that has been used in successful entailment
– Each column will contain the passage to be entailed from
– SVD performed on each, reduced, then combined again
• Now, to determine if a new pair is entailed
– We calculate the feature vector associated with each word and each matrix
– We then calculate the COS between each word/matrix vector
– Then “combine” the COS for each vector set (vector set from each matrix)
– Perform a linear discriminant function to classify entailment (from the RTE Tool of
RTE3)
LSA – Progress
• Developing LSA in ACL
• Using Matrix package from http://matlisp.sourceforge.net/
• Benefits of using LSA
– No need to program rules or compare sentence structures
– Mimics performance that humans have [see ref from before]
– No need to consider all information, since correlations can be created between
words even if a specific word has not been seen before
• Drawbacks of using LSA
– I need a lot more data, unsure how much (I may be able to calculate later)
– Linear algebra is complicated for my small symbolic brain
– I’m at the mercy of the literature, though I made some innovation in LSA use
RTE Challenge - Final Notes
• See the continued progress at:
http://www.cse.buffalo.edu/~ss424/rte3_challenge.html
• RTE Web Site:
http://www.pascal-network.org/Challenges/RTE3/
• Textual Entailment resource pool:
http://aclweb.org/aclwiki/index.php?title=Textual_Entailment_Resource_Pool
• Actual ranking released in June 2007 at:
http://acl.ldc.upenn.edu/W/W07/W07-14.pdf
November 15, SNeRG Meeting Scott
Get documents about "