Embed
Email

Statistical Relational Learning for NLP

Document Sample

Shared by: qinmei liao
Categories
Tags
Stats
views:
0
posted:
11/16/2011
language:
English
pages:
25
Statistical Relational

Learning for NLP

Ray Mooney & Razvan Bunescu

Statistical Relational Learning



Presented by Michele Banko

Outline

 Quick intro to NLP

– Problems, History

– Why SRL?

 SRL for two NLP tasks

– Information Extraction using RMNs

– Semantic Parsing using ILP

Quick Tour of NLP

 Systems

– Translation, Question-Answering/NL Search,

Reading Comprehension

 Sub-tasks

– Part-of-Speech Tagging, Phrase Finding,

Parsing [syntax, semantics], Named-entity

Recognition

Quick Tour of NLP [2]

 Knowledge engineering/linguistics vs. statistics

 Recently, many NLP tasks treated as sequence

labeling problems

– Pos-tagging: The/DT cat/NN can/MD walk/VB

– NP-finding: Stanford/B-NP University/I-NP is/O in/O California/B-NP

– SRL with 1 relation, adjacency

 HMMs, CRFs to model, Viterbi to label

– Find most probable assignment of labels

– States = Part-of-speech tags

– Compute P(w|t), emit word in a particular state

Why SRL?

NLP involves..

 Complex reasoning about entities and

relationships between them

 Predicate logic

 Resolving & integrating ambiguities on

mulitple levels (morphology, syntax,

semantics..)

– More than just adjacency!

 Bayesian methods, graphical models

Intro to Information Extraction

 Early 90s, DARPA Message

Understanding Conference

– Identify references to named-entities

(people, companies, locations..)

– Multi-lingual, multi-document

– Attrributes, relationships, events

Fletcher Maddox, former Dean

of the UCSD Business School,

announced the formation of La

Jolla Genomatics together with Attributes

his two sons.

NAME Fletcher Maddox,

Dr. Maddox

Dr. Maddox will be the firm's

CEO. His son, Oliver, is the DESCRIPTORS former Dean of the

Chief Scientist and holds UCSD Business

patents on many of the School,

algorithms used in Geninfo. his father,

the firm's CEO

CATEGORY PERSON







Facts



PERSON Employee_Of



Fletcher Maddox UCSD Business

School, La Jolla

Genomatics

IE for Protein Identification

 Medline DB: 750K abstracts, 3700 proteins

Production of nitric oxide ( NO ) in endothelial

cells is regulated by direct interactions of

endothelial nitric oxide synthase ( eNOS ) which

effector proteins such as Ca2+ - calmodulin . ...

which avidly binds to the carboxyl terminal region

of the eNOS oxygenase domain.



 Rule-based, HMM, SVM, MaxEnt

 CRFs outperform rest (Ramani et al, 2005)

– May fail to capture long-distance

dependencies

Collective IE using RMNs

(Bunescu & Mooney, 2004)

 Typical IE: extractions in isolation

 Want to consider influences between extractions

– If context surrounding one occurrence strongly

indicates protein, should affect future taggings in

different contexts

– Acronyms & their long forms

 Use Sequence Labeling Treatment to get all

substrings that make up protein names

– Start, End, Continue, Unique, Other

 Classification of sequence types, not solo tokens

 RMNs (Relational Markov Networks)

RMNs (Taskar et al., 2002)

 For each document d in collection

– Associate d with set of candidate entities d.E

 Entities = token sequences = too many possible

phrases

 Either: constrain length or form (baseNPs)

– Characterize each entity e in d.E with a

predefined set of boolean features e.F

 E.label=1 if e is a valid extraction

 E.HeadWord, E.POS_Left, E.BigramRight, E.Prefix

Clique Templates

 Find all subsets of entities satisfying a given

constraint, then for each subset, form a clique c

 Local Templates

– Number of hidden labels = 1

– Model correlations between an entity‟s observed

features and its label

 Global Templates

– Number of hidden labels > 1

– Model influences among multiple entities

 Overlap Template, Repeat Template, Acronym Template

Using Local Templates

Variable Nodes: labels of all candidate entities in

document

Potential Nodes: represent correlations between

>1 entity attributes by linking to variable nodes

– Edges: by matching clique templates against d.E

RMN by Local Templates Factor Graph by Local Templates

e.label e.label





φ.PREFIX=A0





… φ.HEAD=enzyme φ.POSLEFT=NOUN



E.f1=vi E.f2=vj E.fh=vk



φ.WORDLEFT=the

Using Global Templates

 Connect label nodes of two or more

entities

 Acronym Factor Graph

The antioxidant superoxide dismutase-1 (SOD1)



φAT (acronym potential)





uOR V (the entity)







φOR



antioxidant superoxide

dismutase-1 superoxide dismutase-1

dismutase-1

Inference & Learning in RMNs

 Inference

– Labels are only hidden features

– Probability distribution over hidden entity labels

computed using Gibbs distribution

– Find most probable assignment of values to labels

using max-product algorithm

 Learning

– Structure defined by clique templates

– Find clique potentials that maximize likelihood over

training data using Voted Perceptron

Experiments

 Datasets

– Yapex, Aimed: ~200 Medline abstracts,

~4000 protein references each

 Systems

– LT-RMN: RMN with local + overlap templates

– GLT-RMN: RMN with local + global templates

– CRF: McCallum 2002

 Evaluation

– Position-based precision, recall

Results

Yapex

Method Precision Recall F-Measure

LT-RMN 70.79 53.81 61.14

GLT-RMN 69.71 65.76 67.68

CRF 72.45 58.64 64.81

Aimed

Method Precision Recall F-Measure

LT-RMN 81.33 72.79 76.82

GLT-RMN 82.79 80.04 81.39

CRF 85.37 75.90 80.36

Intro to Semantic Parsing

 NL input  logical form

 Assignment of semantic roles

“John gives the book to Mary”

gives1(subj: John2, dobj: book3, iobj: Mary4)

 Ambiguities

“Every man loves a woman”

– Is there one woman loved by all or…

YX man(X) ^ woman(Y) -> loves(X,Y)

XY man(X) ^ woman(Y) -> loves(X,Y)

– LF just says

loves1(every m1:man(m1), a w1:woman(w1))

Previous Work in Semantic Parsing

 Hand-built/Linguistic systems

– Grammar development

– NLPwin (MSR)

 Data-driven approaches

– Tagging problem: for each NP, tag role

 Gildea & Jurafsky (2002): NB classifier combination

using lexical and syntactic features, previous labels

 Jurafsky et al. (2004): SVMs

– Sentence + LF  Parser

 CHILL: ILP to learn a generalized Prolog parser

ILP in CHILL

 Parser induction = learn control rules of a shift reduce

parser

– Shift symbols from the input string onto a stack

– Reduce items on the stack by applying a matching grammar

rule

 Can be encoded as Prolog program:

– parse(S,Parse) :- parse([],S,[Parse],[]).

 Start with 3 generic operators

– INRTRODUCE pushes predicate onto stack based on word

input

– Lexicon: „capital‟ -> capital(_,_)

– COREF_VARS unifies two variables under consideration

– DROP_CONJ embeds one predicate as argument of another

– SHIFT pushes word input onto stack

What is the capital of Texas?

Stack Input Buffer Action

[answer(_,_):[]] [what,is,the,capital,of,texas,?]

[answer(_,_):[the,is,what]] [capital,of,texas,?] SHIFT, SHIFT, SHIFT

[capital(_,_):[],

[capital,of,texas,?] INTRODUCE

answer(_,_):[the,is,what]]

[capital(C,_):[],

[capital,of,texas,?] COREF_VARS

answer(C,_):[the,is,what]]

[capital(C,_):[of,capital],

[texas,?] SHIFT, SHIFT

answer(C,_):[the,is,what]]

[const(S,stateid(texas)):[],

INTRODUCE,

capital(C,S):[of,capital], [texas,?]

COREF_VARS

answer(C,_):[the,is,what]]

[answer(C, (capital(C,S),

const(S,stateid(texas)))): DROP_CONJ,

[]

[?,texas,of,capital,the,is,what] SHIFT, SHIFT

]

Learning Control Rules

 Operator Generation

– Initial parser is too general, will produce spurious

parses

– Use training data to extend program

 Example Analysis

– Use general parser, recording parse states

– Positive Examples

 Parse states to which operator should be applied

 Find 1st correct parse of training pair, ops used to achieve

subgoals become positive examples

– Negative Examples for single-parse systems

 S is a negative example for the current action if S is a positive

example for a previous action

Induction in CHILL

 Control-Rule Induction

– Cover all positive examples, not negative

– Bottom-Up: Compact rule set by forming Least-

General Generalizations of clause pairs

– Top-Down: Overly-general rules specialized by

addition of literals

– Invention of new predicates

op([X,[Y,det:the]], [the|Z],A,B) :-

animate(Y).

animate(man). animate(boy). animate(girl). . .

 Fold constraints back into general parser

ProbCHILL: Parsing via ILP Ensembles

 ILP + Statistical Parsing

– ILP: not forced to make decisions based on

predetermined features

– SP: handle multiple sources of uncertainty

 Ensemble classifier

– Combine outputs of > 1 classifiers

– Bagging, boosting

 TABULATE: generate several ILP hypotheses

– Use SP to estimate probability for each potential

operator

– Find most probable semantic parse (beam-search)

 P(parse state) = product of probs of operators to reach state

Experiments

 U.S. Geography DB with 800 Prolog facts

 250 questions from 50 humans, annotated

with Prolog queries

What is the capital of the state with the highest

population?

answer(C, (capital(S,C), largest(P, (state(S),

population(S,P)))))



 10-fold CV, measuring

Recall = # correct queries produced

# test sentences

Precision = # correct queries produced

# complete parses reduced

Results

System Recall Precision F-measure



Geobase 56.00 96.40 70.85





CHILL 68.50 97.65 80.52





ProbCHILL 80.40 88.16 84.10



Related docs
Other docs by qinmei liao
Arrival RSE Financial Year
Views: 0  |  Downloads: 0
Take chill pill Workshop GO KART RACING
Views: 0  |  Downloads: 0
Abe cough with sputum
Views: 2  |  Downloads: 0
SDPI Healthy Heart Project
Views: 2  |  Downloads: 0
Alternative Trade Adjustment Assistance ATAA
Views: 0  |  Downloads: 0
Improving the Bjorken estimate PHENIX
Views: 0  |  Downloads: 0
Teacher Erase Color Rhyme
Views: 1  |  Downloads: 0
Estimates of District Domestic Product
Views: 4  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!