Markov Logic in Natural
Language Processing
Hoifung Poon
Dept. of Computer Science & Eng.
University of Washington
Overview
Motivation
Foundational areas
Markov logic
NLP applications
Basics
Supervised learning
Unsupervised learning
2
Languages Are Structural
governments
lm$pxtm
(according to their families)
3
Languages Are Structural
S
govern-ment-s
NP VP
l-m$px-t-m
(according to their families) V NP
IL-4 induces CD11B
Involvement of p70(S6)-kinase
George Walker Bush was the
activation in IL-10 up-regulation
43rd President of the United
in human monocytes by gp41......
States.
involvement ……
Theme Cause Bush was the eldest son of
up-regulation activation President G. H. W. Bush and
Babara Bush.
Theme Cause Site Theme
…….
human In November 1977, he met
IL-10 gp41 p70(S6)-kinase Laura Welch at a barbecue. 4
monocyte
Languages Are Structural
S
govern-ment-s
NP VP
l-m$px-t-m
(according to their families) V NP
IL-4 induces CD11B
Involvement of p70(S6)-kinase
George Walker Bush was the
activation in IL-10 up-regulation
43rd President of the United
in human monocytes by gp41......
States.
involvement ……
Theme Cause Bush was the eldest son of
up-regulation activation President G. H. W. Bush and
Babara Bush.
Theme Cause Site Theme
…….
human In November 1977, he met
IL-10 gp41 p70(S6)-kinase Laura Welch at a barbecue. 5
monocyte
Languages Are Structural
Objects are not just feature vectors
They have parts and subparts
Which have relations with each other
They can be trees, graphs, etc.
Objects are seldom i.i.d.
(independent and identically distributed)
They exhibit local and global dependencies
They form class hierarchies (with multiple inheritance)
Objects’ properties depend on those of related objects
Deeply interwoven with knowledge
6
First-Order Logic
Main theoretical foundation of computer science
General language for describing
complex structures and knowledge
Trees, graphs, dependencies, hierarchies, etc.
easily expressed
Inference algorithms (satisfiability testing,
theorem proving, etc.)
7
Languages Are Statistical
I saw the man with the telescope Microsoft buys Powerset
NP Microsoft acquires Powerset
I saw the man with the telescope Powerset is acquired by Microsoft Corporation
NP The Redmond software giant buys Powerset
ADVP
I saw the man with the telescope Microsoft’s purchase of Powerset, …
……
Here in London, Frances Deek is a retired teacher … G. W. Bush ……
In the Israeli town …, Karen London says … …… Laura Bush ……
Now London says … Mrs. Bush ……
London PERSON or LOCATION? Which one?
8
Languages Are Statistical
Languages are ambiguous
Our information is always incomplete
We need to model correlations
Our predictions are uncertain
Statistics provides the tools to handle this
9
Probabilistic Graphical Models
Mixture models
Hidden Markov models
Bayesian networks
Markov random fields
Maximum entropy models
Conditional random fields
Etc.
10
The Problem
Logic is deterministic, requires manual coding
Statistical models assume i.i.d. data,
objects = feature vectors
Historically, statistical and logical NLP
have been pursued separately
We need to unify the two!
Burgeoning field in machine learning:
Statistical relational learning
11
Costs and Benefits of
Statistical Relational Learning
Benefits
Better predictive accuracy
Better understanding of domains
Enable learning with less or no labeled data
Costs
Learning is much harder
Inference becomes a crucial issue
Greater complexity for user
12
Progress to Date
Probabilistic logic [Nilsson, 1986]
Statistics and beliefs [Halpern, 1990]
Knowledge-based model construction
[Wellman et al., 1992]
Stochastic logic programs [Muggleton, 1996]
Probabilistic relational models [Friedman et al., 1999]
Relational Markov networks [Taskar et al., 2002]
Etc.
This talk: Markov logic [Domingos & Lowd, 2009]
13
Markov Logic:
A Unifying Framework
Probabilistic graphical models and
first-order logic are special cases
Unified inference and learning algorithms
Easy-to-use software: Alchemy
Broad applicability
Goal of this tutorial:
Quickly learn how to use Markov logic and Alchemy
for a broad spectrum of NLP applications
14
Overview
Motivation
Foundational areas
Probabilistic inference
Statistical learning
Logical inference
Inductive logic programming
Markov logic
NLP applications
Basics
Supervised learning
Unsupervised learning
15
Markov Networks
Undirected graphical models
Smoking Cancer
Asthma Cough
Potential functions defined over cliques
1 Smoking Cancer Ф(S,C)
P( x) c ( xc )
Z c False False 4.5
False True 4.5
Z c ( xc ) True False 2.7
x c
True True 4.5
16
Markov Networks
Undirected graphical models
Smoking Cancer
Asthma Cough
Log-linear model:
1
P( x) exp wi f i ( x)
Z i
Weight of Feature i Feature i
1 if Smoking Cancer
f1 (Smoking, Cancer )
0 otherwise
w1 1.5
17
Markov Nets vs. Bayes Nets
Property Markov Nets Bayes Nets
Form Prod. potentials Prod. potentials
Potentials Arbitrary Cond. probabilities
Cycles Allowed Forbidden
Partition func. Z = ? Z=1
Indep. check Graph separation D-separation
Indep. props. Some Some
Inference MCMC, BP, etc. Convert to Markov
18
Inference in Markov Networks
Goal: compute marginals & conditionals of
1
P( X ) exp wi f i ( X ) Z exp wi fi ( X )
Z i X i
Exact inference is #P-complete
Conditioning on Markov blanket is easy:
P( x | MB( x ))
w f ( x)
exp i i i
exp w f ( x 0) exp w f ( x 1)
i i i i i i
Gibbs sampling exploits this
19
MCMC: Gibbs Sampling
state ← random truth assignment
for i ← 1 to num-samples do
for each variable x
sample x according to P(x|neighbors(x))
state ← state with new value of x
P(F) ← fraction of states in which F is true
20
Other Inference Methods
Belief propagation (sum-product)
Mean field / Variational approximations
21
MAP/MPE Inference
Goal: Find most likely state of world given
evidence
max P( y | x)
y
Query Evidence
22
MAP Inference Algorithms
Iterated conditional modes
Simulated annealing
Graph cuts
Belief propagation (max-product)
LP relaxation
23
Overview
Motivation
Foundational areas
Probabilistic inference
Statistical learning
Logical inference
Inductive logic programming
Markov logic
NLP applications
Basics
Supervised learning
Unsupervised learning
24
Generative Weight Learning
Maximize likelihood
Use gradient ascent or L-BFGS
No local maxima
log Pw ( x) ni ( x) Ew ni ( x)
wi
No. of times feature i is true in data
Expected no. times feature i is true according to model
Requires inference at each step (slow!)
25
Pseudo-Likelihood
PL ( x) P ( xi | neighbors ( xi ))
i
Likelihood of each variable given its
neighbors in the data
Does not require inference at each step
Widely used in vision, spatial statistics, etc.
But PL parameters may not work well for
long inference chains
26
Discriminative Weight Learning
Maximize conditional likelihood of query (y)
given evidence (x)
log Pw ( y | x) ni ( x, y ) Ew ni ( x, y )
wi
No. of true groundings of clause i in data
Expected no. true groundings according to model
Approximate expected counts by counts in
MAP state of y given x
27
Voted Perceptron
Originally proposed for training HMMs
discriminatively
Assumes network is linear chain
Can be generalized to arbitrary networks
wi ← 0
for t ← 1 to T do
yMAP ← Viterbi(x)
wi ← wi + η [counti(yData) – counti(yMAP)]
return wi / T
28
Overview
Motivation
Foundational areas
Probabilistic inference
Statistical learning
Logical inference
Inductive logic programming
Markov logic
NLP applications
Basics
Supervised learning
Unsupervised learning
29
First-Order Logic
Constants, variables, functions, predicates
E.g.: Anna, x, MotherOf(x), Friends(x, y)
Literal: Predicate or its negation
Clause: Disjunction of literals
Grounding: Replace all variables by constants
E.g.: Friends (Anna, Bob)
World (model, interpretation):
Assignment of truth values to all ground
predicates
30
Inference in First-Order Logic
Traditionally done by theorem proving
(e.g.: Prolog)
Propositionalization followed by model
checking turns out to be faster (often by a lot)
Propositionalization:
Create all ground atoms and clauses
Model checking: Satisfiability testing
Two main approaches:
Backtracking (e.g.: DPLL)
Stochastic local search (e.g.: WalkSAT)
31
Satisfiability
Input: Set of clauses
(Convert KB to conjunctive normal form (CNF))
Output: Truth assignment that satisfies all clauses,
or failure
The paradigmatic NP-complete problem
Solution: Search
Key point:
Most SAT problems are actually easy
Hard region: Narrow range of
#Clauses / #Variables
32
Stochastic Local Search
Uses complete assignments instead of partial
Start with random state
Flip variables in unsatisfied clauses
Hill-climbing: Minimize # unsatisfied clauses
Avoid local minima: Random flips
Multiple restarts
33
The WalkSAT Algorithm
for i ← 1 to max-tries do
solution = random truth assignment
for j ← 1 to max-flips do
if all clauses satisfied then
return solution
c ← random unsatisfied clause
with probability p
flip a random variable in c
else
flip variable in c that maximizes # satisfied clauses
return failure
34
Overview
Motivation
Foundational areas
Probabilistic inference
Statistical learning
Logical inference
Inductive logic programming
Markov logic
NLP applications
Basics
Supervised learning
Unsupervised learning
35
Rule Induction
Given: Set of positive and negative examples of
some concept
Example: (x1, x2, … , xn, y)
y: concept (Boolean)
x1, x2, … , xn: attributes (assume Boolean)
Goal: Induce a set of rules that cover all positive
examples and no negative ones
Rule: xa ^ xb ^ … y (xa: Literal, i.e., xi or its negation)
Same as Horn clause: Body Head
Rule r covers example x iff x satisfies body of r
Eval(r): Accuracy, info gain, coverage, support, etc.
36
Learning a Single Rule
head ← y
body ← Ø
repeat
for each literal x
rx ← r with x added to body
Eval(rx)
body ← body ^ best x
until no x improves Eval(r)
return r
37
Learning a Set of Rules
R←Ø
S ← examples
repeat
learn a single rule r
R←RU{r}
S ← S − positive examples covered by r
until S = Ø
return R
38
First-Order Rule Induction
y and xi are now predicates with arguments
E.g.: y is Ancestor(x,y), xi is Parent(x,y)
Literals to add are predicates or their negations
Literal to add must include at least one variable
already appearing in rule
Adding a literal changes # groundings of rule
E.g.: Ancestor(x,z) ^ Parent(z,y) Ancestor(x,y)
Eval(r) must take this into account
E.g.: Multiply by # positive groundings of rule
still covered after adding literal
39
Overview
Motivation
Foundational areas
Markov logic
NLP applications
Basics
Supervised learning
Unsupervised learning
40
Markov Logic
Syntax: Weighted first-order formulas
Semantics: Feature templates for Markov
networks
Intuition: Soften logical constraints
Give each formula a weight
(Higher weight Stronger constraint)
P(world) exp weights of formulasit satisfies
41
Example: Coreference Resolution
Mentions of Obama are often headed by "Obama"
Mentions of Obama are often headed by "President"
Appositions usually refer to the same entity
Barack Obama, the 44th President
of the United States, is the first
African American to hold the office.
……
42
Example: Coreference Resolution
x MentionOf ( x, Obama) Head( x,"Obama ")
x MentionOf ( x, Obama) Head( x,"President ")
x, y, c Apposition( x, y ) MentionOf ( x, c) MentionOf ( y, c)
43
Example: Coreference Resolution
1.5 x MentionOf ( x, Obama) Head( x,"Obama ")
0.8 x MentionOf ( x, Obama) Head( x,"President ")
100 x, y, c Apposition( x, y ) MentionOf ( x, c) MentionOf ( y, c)
44
Example: Coreference Resolution
1.5 x MentionOf ( x, Obama) Head( x,"Obama ")
0.8 x MentionOf ( x, Obama) Head( x,"President ")
100 x, y, c Apposition( x, y ) MentionOf ( x, c) MentionOf ( y, c)
Two mention constants: A and B
Apposition(A,B)
Head(A,“President”) Head(B,“President”)
MentionOf(A,Obama) MentionOf(B,Obama)
Head(A,“Obama”) Head(B,“Obama”)
Apposition(B,A) 45
Markov Logic Networks
MLN is template for ground Markov nets
Probability of a world x:
1
P( x) exp wi ni ( x)
Z i
Weight of formula i No. of true groundings of formula i in x
Typed variables and constants greatly reduce size of
ground Markov net
Functions, existential quantifiers, etc.
Can handle infinite domains [Singla & Domingos, 2007]
and continuous domains [Wang & Domingos, 2008] 46
Relation to Statistical Models
Special cases: Obtained by making all
Markov networks predicates zero-arity
Markov random fields
Bayesian networks Markov logic allows
Log-linear models objects to be
Exponential models interdependent
Max. entropy models (non-i.i.d.)
Gibbs distributions
Boltzmann machines
Logistic regression
Hidden Markov models
Conditional random fields
47
Relation to First-Order Logic
Infinite weights First-order logic
Satisfiable KB, positive weights
Satisfying assignments = Modes of distribution
Markov logic allows contradictions between
formulas
48
MLN Algorithms:
The First Three Generations
Problem First Second Third
generation generation generation
MAP Weighted Lazy Cutting
inference satisfiability inference planes
Marginal Gibbs MC-SAT Lifted
inference sampling inference
Weight Pseudo- Voted Scaled conj.
learning likelihood perceptron gradient
Structure Inductive ILP + PL Clustering +
learning logic progr. (etc.) pathfinding
49
MAP/MPE Inference
Problem: Find most likely state of world
given evidence
max P( y | x)
y
Query Evidence
50
MAP/MPE Inference
Problem: Find most likely state of world
given evidence
1
max exp wi ni ( x, y)
y Zx i
51
MAP/MPE Inference
Problem: Find most likely state of world
given evidence
max
y
w n ( x, y )
i
i i
52
MAP/MPE Inference
Problem: Find most likely state of world
given evidence
max
y
w n ( x, y )
i
i i
This is just the weighted MaxSAT problem
Use weighted SAT solver
(e.g., MaxWalkSAT [Kautz et al., 1997] )
53
The MaxWalkSAT Algorithm
for i ← 1 to max-tries do
solution = random truth assignment
for j ← 1 to max-flips do
if weights(sat. clauses) > threshold then
return solution
c ← random unsatisfied clause
with probability p
flip a random variable in c
else
flip variable in c that maximizes
weights(sat. clauses)
return failure, best solution found
54
Computing Probabilities
P(Formula|MLN,C) = ?
MCMC: Sample worlds, check formula holds
P(Formula1|Formula2,MLN,C) = ?
If Formula2 = Conjunction of ground atoms
First construct min subset of network necessary to
answer query (generalization of KBMC)
Then apply MCMC
55
But … Insufficient for Logic
Problem:
Deterministic dependencies break MCMC
Near-deterministic ones make it very slow
Solution:
Combine MCMC and WalkSAT
→ MC-SAT algorithm [Poon & Domingos, 2006]
56
Auxiliary-Variable Methods
Main ideas:
Use auxiliary variables to capture dependencies
Turn difficult sampling into uniform sampling
Given distribution P(x)
1, if 0 u P( x)
f ( x, u )
0, otherwise
f ( x, u) du P( x)
Sample from f (x, u), then discard u
57
Slice Sampling [Damien et al. 1999]
U P(x)
Slice
u(k)
X
x(k) x(k+1)
58
Slice Sampling
Identifying the slice may be difficult
1
P( x ) i ( x )
Z i
Introduce an auxiliary variable ui for each Фi
1 if 0 ui i ( x)
f ( x, u1, , un )
0 otherwise
59
The MC-SAT Algorithm
Select random subset M of satisfied clauses
With probability 1 – exp ( – wi )
Larger wi Ci more likely to be selected
Hard clause (wi ): Always selected
Slice States that satisfy clauses in M
Uses SAT solver to sample x | u.
Orders of magnitude faster than Gibbs sampling,
etc.
60
But … It Is Not Scalable
1000 researchers
Coauthor(x,y): 1 million ground atoms
Coauthor(x,y) Coauthor(y,z) Coauthor(x,z):
1 billion ground clauses
Exponential in arity
61
Sparsity to the Rescue
1000 researchers
Coauthor(x,y): 1 million ground atoms
But … most atoms are false
Coauthor(x,y) Coauthor(y,z) Coauthor(x,z):
1 billion ground clauses
Most trivially satisfied if most atoms are false
No need to explicitly compute most of them
62
Lazy Inference
LazySAT [Singla & Domingos, 2006a]
Lazy version of WalkSAT [Selman et al., 1996]
Grounds atoms/clauses as needed
Greatly reduces memory usage
The idea is much more general
[Poon & Domingos, 2008a]
63
General Method for Lazy Inference
If most variables assume the default value,
wasteful to instantiate all variables / functions
Main idea:
Allocate memory for a small subset of
“active” variables / functions
Activate more if necessary as inference proceeds
Applicable to a diverse set of algorithms:
Satisfiability solvers (systematic, local-search), Markov chain Monte
Carlo, MPE / MAP algorithms, Maximum expected utility algorithms,
Belief propagation, MC-SAT, Etc.
Reduce memory and time by orders of magnitude
64
Lifted Inference
Consider belief propagation (BP)
Often in large problems, many nodes are
interchangeable:
They send and receive the same messages
throughout BP
Basic idea: Group them into supernodes,
forming lifted network
Smaller network → Faster inference
Akin to resolution in first-order logic
65
Belief Propagation
x f ( x)
hn ( x ) \{ f }
h x ( x)
Nodes Features
(x) (f)
wf ( x )
f x ( x) e \{}y f ( y)
~{ x} yn ( f ) x 66
Lifted Belief Propagation
x f ( x)
hn ( x ) \{ f }
h x ( x)
Nodes Features
(x) (f)
wf ( x )
f x ( x) e \{}y f ( y)
~{ x} yn ( f ) x 67
Lifted Belief Propagation
, :
Functions
of edge x f ( x) h x ( x)
counts hn ( x ) \{ f }
Nodes Features
(x) (f)
wf ( x )
f x ( x) e \{}y f ( y)
~{ x} yn ( f ) x 68
Learning
Data is a relational database
Closed world assumption (if not: EM)
Learning parameters (weights)
Learning structure (formulas)
69
Parameter Learning
Parameter tying: Groundings of same clause
log P ( x) ni ( x) Ex ni ( x)
wi
No. of times clause i is true in data
Expected no. times clause i is true according to MLN
Generative learning: Pseudo-likelihood
Discriminative learning: Conditional likelihood,
use MC-SAT or MaxWalkSAT for inference
70
Parameter Learning
Pseudo-likelihood + L-BFGS is fast and
robust but can give poor inference results
Voted perceptron:
Gradient descent + MAP inference
Scaled conjugate gradient
71
Voted Perceptron for MLNs
HMMs are special case of MLNs
Replace Viterbi by MaxWalkSAT
Network can now be arbitrary graph
wi ← 0
for t ← 1 to T do
yMAP ← MaxWalkSAT(x)
wi ← wi + η [counti(yData) – counti(yMAP)]
return wi / T
72
Problem: Multiple Modes
Not alleviated by contrastive divergence
Alleviated by MC-SAT
Warm start: Start each MC-SAT run at
previous end state
73
Problem: Extreme Ill-Conditioning
Solvable by quasi-Newton, conjugate gradient, etc.
But line searches require exact inference
Solution: Scaled conjugate gradient
[Lowd & Domingos, 2008]
Use Hessian to choose step size
Compute quadratic form inside MC-SAT
Use inverse diagonal Hessian as preconditioner
74
Structure Learning
Standard inductive logic programming optimizes
the wrong thing
But can be used to overgenerate for L1 pruning
Our approach:
ILP + Pseudo-likelihood + Structure priors
For each candidate structure change:
Start from current weights & relax convergence
Use subsampling to compute sufficient statistics
75
Structure Learning
Initial state: Unit clauses or prototype KB
Operators: Add/remove literal, flip sign
Evaluation function:
Pseudo-likelihood + Structure prior
Search: Beam search, shortest-first search
76
Alchemy
Open-source software including:
Full first-order logic syntax
Generative & discriminative weight learning
Structure learning
Weighted satisfiability, MCMC, lifted BP
Programming language features
alchemy.cs.washington.edu
77
Alchemy Prolog BUGS
Represent- F.O. Logic + Horn Bayes
ation Markov nets clauses nets
Inference Model check- Theorem MCMC
ing, MCMC, proving
lifted BP
Learning Parameters No Params.
& structure
Uncertainty Yes No Yes
Relational Yes Yes No
78
Constrained Conditional Model
Representation: Integer linear programs
Local classifiers + Global constraints
Inference: LP solver
Parameter learning: None for constraints
Weights of soft constraints set heuristically
Local weights typically learned independently
Structure learning: None to date
But see latest development in NAACL-10
79
Running Alchemy
Programs MLN file
Infer Types (optional)
Learnwts Predicates
Learnstruct Formulas
Options Database files
80
Overview
Motivation
Foundational areas
Markov logic
NLP applications
Basics
Supervised learning
Unsupervised learning
81
Uniform Distribn.: Empty MLN
Example: Unbiased coin flips
Type: flip = { 1, … , 20 }
Predicate: Heads(flip)
1
e0
1
P(Heads( f )) 1 Z
Z
e Ze
1 0
0
2
82
Binomial Distribn.: Unit Clause
Example: Biased coin flips
Type: flip = { 1, … , 20 }
Predicate: Heads(flip)
Formula: Heads(f)
p
Weight: Log odds of heads: 1 p
w log
1
ew 1
P(Heads(f )) 1 Z
w
p
Z
e Z e 1 e
w1 0
By default, MLN includes unit clauses for all predicates
(captures marginal distributions, etc.)
83
Multinomial Distribution
Example: Throwing die
Types: throw = { 1, … , 20 }
face = { 1, … , 6 }
Predicate: Outcome(throw,face)
Formulas: Outcome(t,f) ^ f != f’ => !Outcome(t,f’).
Exist f Outcome(t,f).
Too cumbersome!
84
Multinomial Distrib.: ! Notation
Example: Throwing die
Types: throw = { 1, … , 20 }
face = { 1, … , 6 }
Predicate: Outcome(throw,face!)
Formulas:
Semantics: Arguments without “!” determine arguments with “!”.
Also makes inference more efficient (triggers blocking).
85
Multinomial Distrib.: + Notation
Example: Throwing biased die
Types: throw = { 1, … , 20 }
face = { 1, … , 6 }
Predicate: Outcome(throw,face!)
Formulas: Outcome(t,+f)
Semantics: Learn weight for each grounding of args with “+”.
86
Logistic Regression (MaxEnt)
P(C 1 | F f )
P(C 0 | F f ) a bi f i
Logistic regression: log
Type: obj = { 1, ... , n }
Query predicate: C(obj)
Evidence predicates: Fi(obj)
Formulas: a C(x)
bi Fi(x) ^ C(x)
1
Resulting distribution: P(C c, F f ) exp ac bi f i c
Z i
P(C 1 | F f ) expa bi f i
Therefore: log a bi f i
P(C 0 | F f ) log
exp(0)
Alternative form: Fi(x) => C(x)
87
Hidden Markov Models
obs = { Red, Green, Yellow }
state = { Stop, Drive, Slow }
time = { 0, ..., 100 }
State(state!,time)
Obs(obs!,time)
State(+s,0)
State(+s,t) ^ State(+s',t+1)
Obs(+o,t) ^ State(+s,t)
Sparse HMM:
State(s,t) => State(s1,t+1) v State(s2, t+1) v ... .
88
Bayesian Networks
Use all binary predicates with same first argument
(the object x).
One predicate for each variable A: A(x,v!)
One clause for each line in the CPT and
value of the variable
Context-specific independence:
One clause for each path in the decision tree
Logistic regression: As before
Noisy OR: Deterministic OR + Pairwise clauses
89
Relational Models
Knowledge-based model construction
Allow only Horn clauses
Same as Bayes nets, except arbitrary relations
Combin. function: Logistic regression, noisy-OR or external
Stochastic logic programs
Allow only Horn clauses
Weight of clause = log(p)
Add formulas: Head holds Exactly one body holds
Probabilistic relational models
Allow only binary relations
Same as Bayes nets, except first argument can vary
90
Relational Models
Relational Markov networks
SQL → Datalog → First-order logic
One clause for each state of a clique
+ syntax in Alchemy facilitates this
Bayesian logic
Object = Cluster of similar/related observations
Observation constants + Object constants
Predicate InstanceOf(Obs,Obj) and clauses using it
Unknown relations: Second-order Markov logic
S. Kok & P. Domingos, “Statistical Predicate Invention”, in
Proc. ICML-2007.
91
Overview
Motivation
Foundational areas
Markov logic
NLP applications
Basics
Supervised learning
Unsupervised learning
92
Text Classification
The 56th quadrennial United States presidential
election was held on November 4, 2008. Outgoing
Republican President George W. Bush's policies and Topic = politics
actions and the American public's desire for change
were key issues throughout the campaign. ……
The Chicago Bulls are an American professional
basketball team based in Chicago, Illinois, playing in
the Central Division of the Eastern Conference in the Topic = sports
National Basketball Association (NBA). ……
……
93
Text Classification
page = {1, ..., max}
word = { ... }
topic = { ... }
Topic(page,topic)
HasWord(page,word)
Topic(p,t)
HasWord(p,+w) => Topic(p,+t)
If topics mutually exclusive: Topic(page,topic!)
94
Text Classification
page = {1, ..., max}
word = { ... }
topic = { ... }
Topic(page,topic)
HasWord(page,word)
Links(page,page)
Topic(p,t)
HasWord(p,+w) => Topic(p,+t)
Topic(p,t) ^ Links(p,p') => Topic(p',t)
Cf. S. Chakrabarti, B. Dom & P. Indyk, “Hypertext Classification
Using Hyperlinks,” in Proc. SIGMOD-1998.
95
Entity Resolution
AUTHOR: H. POON & P. DOMINGOS
TITLE: UNSUPERVISED SEMANTIC PARSING
VENUE: EMNLP-09
SAME?
AUTHOR: Hoifung Poon and Pedro Domings
TITLE: Unsupervised semantic parsing
VENUE: Proceedings of the 2009 Conference on Empirical Methods in
Natural Language Processing
AUTHOR: Poon, Hoifung and Domings, Pedro
TITLE: Unsupervised ontology induction from text
VENUE: Proceedings of the Forty-Eighth Annual Meeting of the
Association for Computational Linguistics
SAME?
AUTHOR: H. Poon, P. Domings
TITLE: Unsupervised ontology induction
VENUE: ACL-10 96
Entity Resolution
Problem: Given database, find duplicate records
HasToken(token,field,record)
SameField(field,record,record)
SameRecord(record,record)
HasToken(+t,+f,r) ^ HasToken(+t,+f,r’)
=> SameField(f,r,r’)
SameField(f,r,r’) => SameRecord(r,r’)
97
Entity Resolution
Problem: Given database, find duplicate records
HasToken(token,field,record)
SameField(field,record,record)
SameRecord(record,record)
HasToken(+t,+f,r) ^ HasToken(+t,+f,r’)
=> SameField(f,r,r’)
SameField(f,r,r’) => SameRecord(r,r’)
SameRecord(r,r’) ^ SameRecord(r’,r”)
=> SameRecord(r,r”)
Cf. A. McCallum & B. Wellner, “Conditional Models of Identity Uncertainty
with Application to Noun Coreference,” in Adv. NIPS 17, 2005.
98
Entity Resolution
Can also resolve fields:
HasToken(token,field,record)
SameField(field,record,record)
SameRecord(record,record)
HasToken(+t,+f,r) ^ HasToken(+t,+f,r’)
=> SameField(f,r,r’)
SameField(f,r,r’) SameRecord(r,r’)
SameRecord(r,r’) ^ SameRecord(r’,r”)
=> SameRecord(r,r”)
SameField(f,r,r’) ^ SameField(f,r’,r”)
=> SameField(f,r,r”)
More: P. Singla & P. Domingos, “Entity Resolution with Markov
Logic”, in Proc. ICDM-2006. 99
Information Extraction
Unsupervised Semantic Parsing, Hoifung Poon and Pedro Domingos.
Proceedings of the 2009 Conference on Empirical Methods in Natural
Language Processing. Singapore: ACL.
UNSUPERVISED SEMANTIC PARSING. H. POON & P. DOMINGOS.
EMNLP-2009.
100
Information Extraction
Author Title Venue
Unsupervised Semantic Parsing, Hoifung Poon and Pedro Domingos.
Proceedings of the 2009 Conference on Empirical Methods in Natural
Language Processing. Singapore: ACL.
SAME?
UNSUPERVISED SEMANTIC PARSING. H. POON & P. DOMINGOS.
EMNLP-2009.
101
Information Extraction
Problem: Extract database from text or
semi-structured sources
Example: Extract database of publications
from citation list(s) (the “CiteSeer problem”)
Two steps:
Segmentation:
Use HMM to assign tokens to fields
Entity resolution:
Use logistic regression and transitivity
102
Information Extraction
Token(token, position, citation)
InField(position, field!, citation)
SameField(field, citation, citation)
SameCit(citation, citation)
Token(+t,i,c) => InField(i,+f,c)
InField(i,+f,c) ^ InField(i+1,+f,c)
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’)
^ InField(i’,+f,c’) => SameField(+f,c,c’)
SameField(+f,c,c’) SameCit(c,c’)
SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)
SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
103
Information Extraction
Token(token, position, citation)
InField(position, field!, citation)
SameField(field, citation, citation)
SameCit(citation, citation)
Token(+t,i,c) => InField(i,+f,c)
InField(i,+f,c) ^ !Token(“.”,i,c) ^ InField(i+1,+f,c)
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’)
^ InField(i’,+f,c’) => SameField(+f,c,c’)
SameField(+f,c,c’) SameCit(c,c’)
SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)
SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
More: H. Poon & P. Domingos, “Joint Inference in Information
Extraction”, in Proc. AAAI-2007.
104
Biomedical Text Mining
Traditionally, name entity recognition or
information extraction
E.g., protein recognition, protein-protein identification
BioNLP-09 shared task: Nested bio-events
Much harder than traditional IE
Top F1 around 50%
Naturally calls for joint inference
105
Bio-Event Extraction
Involvement of p70(S6)-kinase activation in IL-10
up-regulation in human monocytes by gp41 envelope
protein of human immunodeficiency virus type 1 ...
involvement
Theme Cause
up-regulation activation
Theme Cause Site Theme
human
IL-10 gp41 p70(S6)-kinase
monocyte
106
Bio-Event Extraction
Token(position, token)
DepEdge(position, position, dependency)
IsProtein(position)
EvtType(position, evtType) Logistic
InArgPath(position, position, argType!) regression
Token(i,+w) => EvtType(i,+t)
Token(j,w) ^ DepEdge(i,j,+d) => EvtType(i,+t)
DepEdge(i,j,+d) => InArgPath(i,j,+a)
Token(i,+w) ^ DepEdge(i,j,+d) => InArgPath(i,j,+a)
…
107
Bio-Event Extraction
Token(position, token)
DepEdge(position, position, dependency)
IsProtein(position)
EvtType(position, evtType)
InArgPath(position, position, argType!)
Token(i,+w) => EvtType(i,+t)
Token(j,w) ^ DepEdge(i,j,+d) => EvtType(i,+t)
DepEdge(i,j,+d) => InArgPath(i,j,+a)
Adding a few joint inference
Token(i,+w) ^ DepEdge(i,j,+d) => InArgPath(i,j,+a)
… rules doubles the F1
InArgPath(i,j,Theme) => IsProtein(j) v
(Exist k k!=i ^ InArgPath(j, k, Theme)).
…
More: H. Poon and L. Vanderwende, “Joint Inference for Knowledge
Extraction from Biomedical Literature”, 10:40 am, June 4, Gold Room.
108
Temporal Information Extraction
Identify event times and temporal relations
(BEFORE, AFTER, OVERLAP)
E.g., who is the President of U.S.A.?
Obama: 1/20/2009 present
G. W. Bush: 1/20/2001 1/19/2009
Etc.
109
Temporal Information Extraction
DepEdge(position, position, dependency)
Event(position, event)
After(event, event)
DepEdge(i,j,+d) ^ Event(i,p) ^ Event(j,q) => After(p,q)
After(p,q) ^ After(q,r) => After(p,r)
110
Temporal Information Extraction
DepEdge(position, position, dependency)
Event(position, event)
After(event, event)
Role(position, position, role)
DepEdge(I,j,+d) ^ Event(i,p) ^ Event(j,q) => After(p,q)
Role(i,j,ROLE-AFTER) ^ Event(i,p) ^ Event(j,q) => After(p,q)
After(p,q) ^ After(q,r) => After(p,r)
More:
K. Yoshikawa, S. Riedel, M. Asahara and Y. Matsumoto, “Jointly
Identifying Temporal Relations with Markov Logic”, in Proc. ACL-2009.
X. Ling & D. Weld, “Temporal Information Extraction”, in Proc. AAAI-2010.
111
Semantic Role Labeling
Problem: Identify arguments for a predicate
Two steps:
Argument identification:
Determine whether a phrase is an argument
Role classification:
Determine the type of an argument (agent, theme,
temporal, adjunct, etc.)
112
Semantic Role Labeling
Token(position, token)
DepPath(position, position, path)
IsPredicate(position)
Role(position, position, role!)
HasRole(position, position)
Token(i,+t) => IsPredicate(i)
DepPath(i,j,+p) => Role(i,j,+r)
HasRole(i,j) => IsPredicate(i)
IsPredicate(i) => Exist j HasRole(i,j)
HasRole(i,j) => Exist r Role(i,j,r)
Role(i,j,r) => HasRole(i,j)
Cf. K. Toutanova, A. Haghighi, C. Manning, “A global joint model for
semantic role labeling”, in Computational Linguistics 2008.
113
Joint Semantic Role Labeling
and Word Sense Disambiguation
Token(position, token)
DepPath(position, position, path)
IsPredicate(position)
Role(position, position, role!)
HasRole(position, position)
Sense(position, sense!)
Token(i,+t) => IsPredicate(i)
DepPath(i,j,+p) => Role(i,j,+r)
Sense(I,s) => IsPredicate(i)
HasRole(i,j) => IsPredicate(i)
IsPredicate(i) => Exist j HasRole(i,j)
HasRole(i,j) => Exist r Role(i,j,r)
Role(i,j,r) => HasRole(i,j)
Token(i,+t) ^ Role(i,j,+r) => Sense(i,+s)
More: I. Meza-Ruiz & S. Riedel, “Jointly Identifying Predicates,
Arguments and Senses using Markov Logic”, in Proc. NAACL-2009.
114
Practical Tips: Modeling
Add all unit clauses (the default)
How to handle uncertain data:
R(x,y) ^ R’(x,y) (the “HMM trick”)
Implications vs. conjunctions
For soft correlation, conjunctions often better
Implication: A => B is equivalent to !(A ^ !B)
Share cases with others like A => C
Make learning unnecessarily harder
115
Practical Tips: Efficiency
Open/closed world assumptions
Low clause arities
Low numbers of constants
Short inference chains
116
Practical Tips: Development
Start with easy components
Gradually expand to full task
Use the simplest MLN that works
Cycle: Add/delete formulas, learn and test
117
Overview
Motivation
Foundational areas
Markov logic
NLP applications
Basics
Supervised learning
Unsupervised learning
118
Unsupervised Learning: Why?
Virtually unlimited supply of unlabeled text
Labeling is expensive (Cf. Penn-Treebank)
Often difficult to label with consistency and
high quality (e.g., semantic parses)
Emerging field: Machine reading
Extract knowledge from unstructured text with
high precision/recall and minimal human effort
Check out LBR-Workshop (WS9) on Sunday
119
Unsupervised Learning: How?
I.i.d. learning: Sophisticated model requires
more labeled data
Statistical relational learning: Sophisticated
model may require less labeled data
Relational dependencies constrain problem space
One formula is worth a thousand labels
Small amount of domain knowledge
large-scale joint inference
120
Unsupervised Learning: How?
Ambiguities vary among objects
Joint inference Propagate information from
unambiguous objects to ambiguous ones
E.g.: Are they
G. W. Bush … coreferent?
He …
…
Mrs. Bush …
121
Unsupervised Learning: How
Ambiguities vary among objects
Joint inference Propagate information from
unambiguous objects to ambiguous ones
E.g.: Should be
G. W. Bush … coreferent
He …
…
Mrs. Bush …
122
Unsupervised Learning: How
Ambiguities vary among objects
Joint inference Propagate information from
unambiguous objects to ambiguous ones
E.g.: So must be
G. W. Bush … singular male!
He …
…
Mrs. Bush …
123
Unsupervised Learning: How
Ambiguities vary among objects
Joint inference Propagate information from
unambiguous objects to ambiguous ones
E.g.: Must be
G. W. Bush … singular female!
He …
…
Mrs. Bush …
124
Unsupervised Learning: How
Ambiguities vary among objects
Joint inference Propagate information from
unambiguous objects to ambiguous ones
E.g.: Verdict:
G. W. Bush … Not coreferent!
He …
…
Mrs. Bush …
125
Parameter Learning
Marginalize out hidden variables
log P ( x) Ez| x ni ( x, z ) Ex , z ni ( x, z )
wi
Sum over z, conditioned on observed x
Summed over both x and z
Use MC-SAT to approximate both expectations
May also combine with contrastive estimation
[Poon & Cherry & Toutanova, NAACL-2009]
126
Unsupervised Coreference Resolution
Head(mention, string)
Type(mention, type)
MentionOf(mention, entity)
Mixture model
MentionOf(+m,+e)
Type(+m,+t) Joint inference formulas:
Head(+m,+h) ^ MentionOf(+m,+e) Enforce agreement
MentionOf(a,e) ^ MentionOf(b,e) => (Type(a,t) Type(b,t))
… (similarly for Number, Gender etc.)
127
Unsupervised Coreference Resolution
Head(mention, string)
Type(mention, type)
MentionOf(mention, entity)
Apposition(mention, mention)
MentionOf(+m,+e)
Type(+m,+t)
Head(+m,+h) ^ MentionOf(+m,+e)
MentionOf(a,e) ^ MentionOf(b,e) => (Type(a,t) Type(b,t))
Joint inference formulas:
… (similarly for Number, Gender etc.) Leverage apposition
Apposition(a,b) => (MentionOf(a,e) MentionOf(b,e))
More: H. Poon and P. Domingos, “Joint Unsupervised Coreference
Resolution with Markov Logic”, in Proc. EMNLP-2008. 128
Relational Clustering:
Discover Unknown Predicates
Cluster relations along with objects
Use second-order Markov logic
[Kok & Domingos, 2007, 2008]
Key idea: Cluster combination determines
likelihood of relations
InClust(r,+c) ^ InClust(x,+a) ^ InClust(y,+b)
=> r(x,y)
Input: Relational tuples extracted by
TextRunner [Banko et al., 2007]
Output: Semantic network
129
Recursive Relational Clustering
Unsupervised semantic parsing
[Poon & Domingos, EMNLP-2009]
Text Knowledge
Start directly from text
Identify meaning units + Resolve variations
Use high-order Markov logic (variables over
arbitrary lambda forms and their clusters)
End-to-end machine reading:
Read text, then answer questions
130
Semantic Parsing
INDUCE(e1)
IL-4 protein
INDUCER(e1,e2) INDUCED(e1,e3)
induces CD11b
IL-4(e2) CD11B(e3)
Structured prediction: Partition + Assignment
induces INDUCE induces
nsubj dobj INDUCER nsubj dobj INDUCED
protein CD11b protein CD11b
nn CD11B
nn
IL-4 IL-4
131
IL-4
Challenge:
Same Meaning, Many Variations
IL-4 up-regulates CD11b
Protein IL-4 enhances the expression of CD11b
CD11b expression is induced by IL-4 protein
The cytokin interleukin-4 induces CD11b expression
IL-4’s up-regulation of CD11b, …
……
132
Unsupervised Semantic Parsing
USP Recursively cluster arbitrary expressions
composed with / by similar expressions
IL-4 induces CD11b
Protein IL-4 enhances the expression of CD11b
CD11b expression is enhanced by IL-4 protein
The cytokin interleukin-4 induces CD11b expression
IL-4’s up-regulation of CD11b, …
133
Unsupervised Semantic Parsing
USP Recursively cluster arbitrary expressions
composed with / by similar expressions
IL-4 induces CD11b
Protein IL-4 enhances the expression of CD11b
CD11b expression is enhanced by IL-4 protein
The cytokin interleukin-4 induces CD11b expression
IL-4’s up-regulation of CD11b, …
Cluster same forms at the atom level
134
Unsupervised Semantic Parsing
USP Recursively cluster arbitrary expressions
composed with / by similar expressions
IL-4 induces CD11b
Protein IL-4 enhances the expression of CD11b
CD11b expression is enhanced by IL-4 protein
The cytokin interleukin-4 induces CD11b expression
IL-4’s up-regulation of CD11b, …
Cluster forms in composition with same forms
135
Unsupervised Semantic Parsing
USP Recursively cluster arbitrary expressions
composed with / by similar expressions
IL-4 induces CD11b
Protein IL-4 enhances the expression of CD11b
CD11b expression is enhanced by IL-4 protein
The cytokin interleukin-4 induces CD11b expression
IL-4’s up-regulation of CD11b, …
Cluster forms in composition with same forms
136
Unsupervised Semantic Parsing
USP Recursively cluster arbitrary expressions
composed with / by similar expressions
IL-4 induces CD11b
Protein IL-4 enhances the expression of CD11b
CD11b expression is enhanced by IL-4 protein
The cytokin interleukin-4 induces CD11b expression
IL-4’s up-regulation of CD11b, …
Cluster forms in composition with same forms
137
Unsupervised Semantic Parsing
USP Recursively cluster arbitrary expressions
composed with / by similar expressions
IL-4 induces CD11b
Protein IL-4 enhances the expression of CD11b
CD11b expression is enhanced by IL-4 protein
The cytokin interleukin-4 induces CD11b expression
IL-4’s up-regulation of CD11b, …
Cluster forms in composition with same forms
138
Unsupervised Semantic Parsing
Exponential prior on number of parameters
Event/object/property cluster mixtures:
InClust(e,+c) ^ HasValue(e,+v)
Object/Event Cluster: INDUCE
Property Cluster: INDUCER
induces 0.1
nsubj 0.5 IL-4 0.2 None 0.1
enhances 0.4 …
agent 0.4 IL-8 0.1 One 0.8
…
…
…
…
139
But … State Space Too Large
Coreference: #-clusters #-mentions
USP: #-clusters exp(#-tokens)
Also, meaning units often small and
many singleton clusters
Use combinatorial search
140
Inference: Hill-Climb Probability
? induces
? nsubj dobj ?
Initialize ? protein CD11B ?
? nn
? IL-4
Lambda reduction
? protein protein
Search ? nn
? nn
Operator
? IL-4 IL-4
141
Learning: Hill-Climb Likelihood
Initialize induces 1 enhances 1 IL-4 1 protein 1 …
MERGE COMPOSE
induces 1 enhances 1 IL-4 1 protein 1
Search
Operator
induces 0.2
IL-4 protein 1
enhances 0.8
142
Unsupervised Ontology Induction
Limitations of USP:
No ISA hierarchy among clusters
Little smoothing
Limited capability to generalize
OntoUSP [Poon & Domingos, ACL-2010]
Extends USP to also induce ISA hierarchy
Joint approach for ontology induction, population,
and knowledge extraction
To appear in ACL (see you in Uppsala :-)
143
OntoUSP
Modify the cluster mixture formula
MERGE
InClust(e,c) ^ ISA(c,+d) ^ HasValue(e,+v)
with
Hierarchical smoothing + clustering REGULATE
New operator in learning: ABSTRACTION ?
induces 0.3
enhances 0.1
induces 0.6 inhibits 0.2
INDUCE up-regulates 0.2 suppresses 0.1
…
…
ISA ISA
INDUCE INHIBIT
inhibits 0.4
INHIBIT suppresses 0.2 induces 0.6 inhibits 0.4
up-regulates 0.2 suppresses 0.2
…
…
…
144
End of The Beginning …
Not merely a user guide of MLN and Alchemy
Statistical relational learning:
Growth area for machine learning and NLP
145
Future Work: Inference
Scale up inference
Cutting-planes methods (e.g., [Riedel, 2008])
Unify lifted inference with sampling
Coarse-to-fine inference
Alternative technology
E.g., linear programming, lagrangian relaxation
146
Future Work: Supervised Learning
Alternative optimization objectives
E.g., max-margin learning [Huynh & Mooney, 2009]
Learning for efficient inference
E.g., learning arithmetic circuits [Lowd & Domingos, 2008]
Structure learning:
Improve accuracy and scalability
E.g., [Kok & Domingos, 2009]
147
Future Work: Unsupervised Learning
Model: Learning objective, formalism, etc.
Learning: Local optima, intractability, etc.
Hyperparameter tuning
Leverage available resources
Semi-supervised learning
Multi-task learning
Transfer learning (e.g., domain adaptation)
Human in the loop
E.g., interative ML, active learning, crowdsourcing
148
Future Work: NLP Applications
Existing application areas:
More joint inference opportunities
Additional domain knowledge
Combine multiple pipeline stages
A “killer app”: Machine reading
Many, many more awaiting YOU to discover
149
Summary
We need to unify logical and statistical NLP
Markov logic provides a language for this
Syntax: Weighted first-order formulas
Semantics: Feature templates of Markov nets
Inference: Satisfiability, MCMC, lifted BP, etc.
Learning: Pseudo-likelihood, VP, PSCG, ILP, etc.
Growing set of NLP applications
Open-source software: Alchemy
alchemy.cs.washington.edu
Book: Domingos & Lowd, Markov Logic,
Morgan & Claypool, 2009. 150
References
[Banko et al., 2007] Michele Banko, Michael J. Cafarella, Stephen
Soderland, Matt Broadhead, Oren Etzioni, "Open Information
Extraction From the Web", In Proc. IJCAI-2007.
[Chakrabarti et al., 1998] Soumen Chakrabarti, Byron Dom, Piotr Indyk,
"Hypertext Classification Using Hyperlinks", in Proc. SIGMOD-1998.
[Damien et al., 1999] Paul Damien, Jon Wakefield, Stephen Walker,
"Gibbs sampling for Bayesian non-conjugate and hierarchical
models by auxiliary variables", Journal of the Royal Statistical
Society B, 61:2.
[Domingos & Lowd, 2009] Pedro Domingos and Daniel Lowd, Markov
Logic, Morgan & Claypool.
[Friedman et al., 1999] Nir Friedman, Lise Getoor, Daphne Koller, Avi
Pfeffer, "Learning probabilistic relational models", in Proc. IJCAI- 151
1999.
References
[Halpern, 1990] Joe Halpern, "An analysis of first-order logics of
probability", Artificial Intelligence 46.
[Huynh & Mooney, 2009] Tuyen Huynh and Raymond Mooney, "Max-
Margin Weight Learning for Markov Logic Networks", In Proc.
ECML-2009.
[Kautz et al., 1997] Henry Kautz, Bart Selman, Yuejun Jiang, "A general
stochastic approach to solving problems with hard and soft
constraints", In The Satisfiability Problem: Theory and Applications.
AMS.
[Kok & Domingos, 2007] Stanley Kok and Pedro Domingos, "Statistical
Predicate Invention", In Proc. ICML-2007.
[Kok & Domingos, 2008] Stanley Kok and Pedro Domingos, "Extracting
Semantic Networks from Text via Relational Clustering", In Proc. 152
ECML-2008.
References
[Kok & Domingos, 2009] Stanley Kok and Pedro Domingos, "Learning
Markov Logic Network Structure via Hypergraph Lifting", In Proc.
ICML-2009.
[Ling & Weld, 2010] Xiao Ling and Daniel S. Weld, "Temporal
Information Extraction", In Proc. AAAI-2010.
[Lowd & Domingos, 2007] Daniel Lowd and Pedro Domingos, "Efficient
Weight Learning for Markov Logic Networks", In Proc. PKDD-2007.
[Lowd & Domingos, 2008] Daniel Lowd and Pedro Domingos,
"Learning Arithmetic Circuits", In Proc. UAI-2008.
[Meza-Ruiz & Riedel, 2009] Ivan Meza-Ruiz and Sebastian Riedel,
"Jointly Identifying Predicates, Arguments and Senses using Markov
Logic", In Proc. NAACL-2009.
153
References
[Muggleton, 1996] Stephen Muggleton, "Stochastic logic programs", in
Proc. ILP-1996.
[Nilsson, 1986] Nil Nilsson, "Probabilistic logic", Artificial Intelligence
28.
[Page et al., 1998] Lawrence Page, Sergey Brin, Rajeev Motwani, Terry
Winograd, "The PageRank Citation Ranking: Bringing Order to the
Web", Tech. Rept., Stanford University, 1998.
[Poon & Domingos, 2006] Hoifung Poon and Pedro Domingos, "Sound
and Efficient Inference with Probabilistic and Deterministic
Dependencies", In Proc. AAAI-06.
[Poon & Domingos, 2007] Hoifung Poon and Pedro Domingo, "Joint
Inference in Information Extraction", In Proc. AAAI-07.
154
References
[Poon & Domingos, 2008a] Hoifung Poon, Pedro Domingos, Marc
Sumner, "A General Method for Reducing the Complexity of
Relational Inference and its Application to MCMC", In Proc. AAAI-
08.
[Poon & Domingos, 2008b] Hoifung Poon and Pedro Domingos, "Joint
Unsupervised Coreference Resolution with Markov Logic", In Proc.
EMNLP-08.
[Poon & Domingos, 2009] Hoifung and Pedro Domingos,
"Unsupervised Semantic Parsing", In Proc. EMNLP-09.
[Poon & Cherry & Toutanova, 2009] Hoifung Poon, Colin Cherry,
Kristina Toutanova, "Unsupervised Morphological Segmentation
with Log-Linear Models", In Proc. NAACL-2009.
155
References
[Poon & Vanderwende, 2010] Hoifung Poon and Lucy Vanderwende,
"Joint Inference for Knowledge Extraction from Biomedical
Literature", In Proc. NAACL-10.
[Poon & Domingos, 2010] Hoifung and Pedro Domingos,
"Unsupervised Ontology Induction From Text", In Proc. ACL-10.
[Riedel 2008] Sebatian Riedel, "Improving the Accuracy and Efficiency
of MAP Inference for Markov Logic", In Proc. UAI-2008.
[Riedel et al., 2009] Sebastian Riedel, Hong-Woo Chun, Toshihisa
Takagi and Jun'ichi Tsujii, "A Markov Logic Approach to Bio-
Molecular Event Extraction", In Proc. BioNLP 2009 Shared Task.
[Selman et al., 1996] Bart Selman, Henry Kautz, Bram Cohen, "Local
search strategies for satisfiability testing", In Cliques, Coloring, and
Satisfiability: Second DIMACS Implementation Challenge. AMS. 156
References
[Singla & Domingos, 2006a] Parag Singla and Pedro Domingos,
"Memory-Efficient Inference in Relational Domains", In Proc. AAAI-
2006.
[Singla & Domingos, 2006b] Parag Singla and Pedro Domingos, "Entity
Resolution with Markov Logic", In Proc. ICDM-2006.
[Singla & Domingos, 2007] Parag Singla and Pedro Domingos,
"Markov Logic in Infinite Domains", In Proc. UAI-2007.
[Singla & Domingos, 2008] Parag Singla and Pedro Domingos, "Lifted
First-Order Belief Propagation", In Proc. AAAI-2008.
[Taskar et al., 2002] Ben Taskar, Pieter Abbeel, Daphne Koller,
"Discriminative probabilistic models for relational data", in Proc. UAI-
2002.
157
References
[Toutanova & Haghighi & Manning, 2008] Kristina Toutanova, Aria
Haghighi, Chris Manning, "A global joint model for semantic role
labeling", Computational Linguistics.
[Wang & Domingos, 2008] Jue Wang and Pedro Domingos, "Hybrid
Markov Logic Networks", In Proc. AAAI-2008.
[Wellman et al., 1992] Michael Wellman, John S. Breese, Robert P.
Goldman, "From knowledge bases to decision models", Knowledge
Engineering Review 7.
[Yoshikawa et al., 2009] Katsumasa Yoshikawa, Sebastian Riedel,
Masayuki Asahara and Yuji Matsumoto, "Jointly Identifying
Temporal Relations with Markov Logic", In Proc. ACL-2009.
158