Embed
Email

Markov Logic in Natural Language Processing

Document Sample
Markov Logic in Natural Language Processing
Shared by: HC111116214814
Categories
Tags
Stats
views:
0
posted:
11/16/2011
language:
English
pages:
158
Markov Logic in Natural

Language Processing



Hoifung Poon

Dept. of Computer Science & Eng.

University of Washington

Overview

 Motivation

 Foundational areas

 Markov logic

 NLP applications

 Basics

 Supervised learning

 Unsupervised learning





2

Languages Are Structural

governments

lm$pxtm

(according to their families)









3

Languages Are Structural

S

govern-ment-s

NP VP

l-m$px-t-m

(according to their families) V NP

IL-4 induces CD11B

Involvement of p70(S6)-kinase

George Walker Bush was the

activation in IL-10 up-regulation

43rd President of the United

in human monocytes by gp41......

States.

involvement ……

Theme Cause Bush was the eldest son of

up-regulation activation President G. H. W. Bush and

Babara Bush.

Theme Cause Site Theme

…….

human In November 1977, he met

IL-10 gp41 p70(S6)-kinase Laura Welch at a barbecue. 4

monocyte

Languages Are Structural

S

govern-ment-s

NP VP

l-m$px-t-m

(according to their families) V NP

IL-4 induces CD11B

Involvement of p70(S6)-kinase

George Walker Bush was the

activation in IL-10 up-regulation

43rd President of the United

in human monocytes by gp41......

States.

involvement ……

Theme Cause Bush was the eldest son of

up-regulation activation President G. H. W. Bush and

Babara Bush.

Theme Cause Site Theme

…….

human In November 1977, he met

IL-10 gp41 p70(S6)-kinase Laura Welch at a barbecue. 5

monocyte

Languages Are Structural

 Objects are not just feature vectors

 They have parts and subparts

 Which have relations with each other

 They can be trees, graphs, etc.

 Objects are seldom i.i.d.

(independent and identically distributed)

 They exhibit local and global dependencies

 They form class hierarchies (with multiple inheritance)

 Objects’ properties depend on those of related objects

 Deeply interwoven with knowledge

6

First-Order Logic

 Main theoretical foundation of computer science

 General language for describing

complex structures and knowledge

 Trees, graphs, dependencies, hierarchies, etc.

easily expressed

 Inference algorithms (satisfiability testing,

theorem proving, etc.)





7

Languages Are Statistical

I saw the man with the telescope Microsoft buys Powerset

NP Microsoft acquires Powerset

I saw the man with the telescope Powerset is acquired by Microsoft Corporation

NP The Redmond software giant buys Powerset

ADVP

I saw the man with the telescope Microsoft’s purchase of Powerset, …

……



Here in London, Frances Deek is a retired teacher … G. W. Bush ……

In the Israeli town …, Karen London says … …… Laura Bush ……

Now London says … Mrs. Bush ……



London  PERSON or LOCATION? Which one?

8

Languages Are Statistical

 Languages are ambiguous

 Our information is always incomplete

 We need to model correlations

 Our predictions are uncertain

 Statistics provides the tools to handle this









9

Probabilistic Graphical Models

 Mixture models

 Hidden Markov models

 Bayesian networks

 Markov random fields

 Maximum entropy models

 Conditional random fields

 Etc.



10

The Problem

 Logic is deterministic, requires manual coding

 Statistical models assume i.i.d. data,

objects = feature vectors

 Historically, statistical and logical NLP

have been pursued separately

 We need to unify the two!

 Burgeoning field in machine learning:

Statistical relational learning

11

Costs and Benefits of

Statistical Relational Learning

 Benefits

 Better predictive accuracy

 Better understanding of domains

 Enable learning with less or no labeled data

 Costs

 Learning is much harder

 Inference becomes a crucial issue

 Greater complexity for user



12

Progress to Date

 Probabilistic logic [Nilsson, 1986]

 Statistics and beliefs [Halpern, 1990]

 Knowledge-based model construction

[Wellman et al., 1992]

 Stochastic logic programs [Muggleton, 1996]

 Probabilistic relational models [Friedman et al., 1999]

 Relational Markov networks [Taskar et al., 2002]

 Etc.

 This talk: Markov logic [Domingos & Lowd, 2009]



13

Markov Logic:

A Unifying Framework

 Probabilistic graphical models and

first-order logic are special cases

 Unified inference and learning algorithms

 Easy-to-use software: Alchemy

 Broad applicability

 Goal of this tutorial:

Quickly learn how to use Markov logic and Alchemy

for a broad spectrum of NLP applications



14

Overview

 Motivation

 Foundational areas

 Probabilistic inference

 Statistical learning

 Logical inference

 Inductive logic programming

 Markov logic

 NLP applications

 Basics

 Supervised learning

 Unsupervised learning

15

Markov Networks

 Undirected graphical models

Smoking Cancer



Asthma Cough

 Potential functions defined over cliques

1 Smoking Cancer Ф(S,C)

P( x)    c ( xc )

Z c False False 4.5

False True 4.5



Z    c ( xc ) True False 2.7

x c

True True 4.5

16

Markov Networks

 Undirected graphical models

Smoking Cancer



Asthma Cough

 Log-linear model:

1  

P( x)  exp   wi f i ( x) 

Z  i 

Weight of Feature i Feature i



 1 if  Smoking  Cancer

f1 (Smoking, Cancer )  

 0 otherwise

w1  1.5

17

Markov Nets vs. Bayes Nets

Property Markov Nets Bayes Nets

Form Prod. potentials Prod. potentials

Potentials Arbitrary Cond. probabilities

Cycles Allowed Forbidden

Partition func. Z = ? Z=1

Indep. check Graph separation D-separation

Indep. props. Some Some

Inference MCMC, BP, etc. Convert to Markov

18

Inference in Markov Networks

 Goal: compute marginals & conditionals of

1    

P( X )  exp   wi f i ( X )  Z   exp   wi fi ( X ) 

Z  i  X  i 

 Exact inference is #P-complete

 Conditioning on Markov blanket is easy:



P( x | MB( x )) 

  w f ( x) 

exp i i i



exp   w f ( x  0)   exp   w f ( x  1) 

i i i i i i





 Gibbs sampling exploits this

19

MCMC: Gibbs Sampling



state ← random truth assignment

for i ← 1 to num-samples do

for each variable x

sample x according to P(x|neighbors(x))

state ← state with new value of x

P(F) ← fraction of states in which F is true







20

Other Inference Methods

 Belief propagation (sum-product)

 Mean field / Variational approximations









21

MAP/MPE Inference

 Goal: Find most likely state of world given

evidence



max P( y | x)

y



Query Evidence







22

MAP Inference Algorithms

 Iterated conditional modes

 Simulated annealing

 Graph cuts

 Belief propagation (max-product)

 LP relaxation









23

Overview

 Motivation

 Foundational areas

 Probabilistic inference

 Statistical learning

 Logical inference

 Inductive logic programming

 Markov logic

 NLP applications

 Basics

 Supervised learning

 Unsupervised learning

24

Generative Weight Learning

 Maximize likelihood

 Use gradient ascent or L-BFGS

 No local maxima



log Pw ( x)  ni ( x)  Ew ni ( x)

wi

No. of times feature i is true in data



Expected no. times feature i is true according to model





 Requires inference at each step (slow!)

25

Pseudo-Likelihood



PL ( x)   P ( xi | neighbors ( xi ))

i



 Likelihood of each variable given its

neighbors in the data

 Does not require inference at each step

 Widely used in vision, spatial statistics, etc.

 But PL parameters may not work well for

long inference chains

26

Discriminative Weight Learning

 Maximize conditional likelihood of query (y)

given evidence (x)



log Pw ( y | x)  ni ( x, y )  Ew ni ( x, y )

wi

No. of true groundings of clause i in data



Expected no. true groundings according to model



 Approximate expected counts by counts in

MAP state of y given x

27

Voted Perceptron

 Originally proposed for training HMMs

discriminatively

 Assumes network is linear chain

 Can be generalized to arbitrary networks



wi ← 0

for t ← 1 to T do

yMAP ← Viterbi(x)

wi ← wi + η [counti(yData) – counti(yMAP)]

return  wi / T

28

Overview

 Motivation

 Foundational areas

 Probabilistic inference

 Statistical learning

 Logical inference

 Inductive logic programming

 Markov logic

 NLP applications

 Basics

 Supervised learning

 Unsupervised learning

29

First-Order Logic

 Constants, variables, functions, predicates

E.g.: Anna, x, MotherOf(x), Friends(x, y)

 Literal: Predicate or its negation

 Clause: Disjunction of literals

 Grounding: Replace all variables by constants

E.g.: Friends (Anna, Bob)

 World (model, interpretation):

Assignment of truth values to all ground

predicates

30

Inference in First-Order Logic

 Traditionally done by theorem proving

(e.g.: Prolog)

 Propositionalization followed by model

checking turns out to be faster (often by a lot)

 Propositionalization:

Create all ground atoms and clauses

 Model checking: Satisfiability testing

 Two main approaches:

 Backtracking (e.g.: DPLL)

 Stochastic local search (e.g.: WalkSAT)

31

Satisfiability

 Input: Set of clauses

(Convert KB to conjunctive normal form (CNF))

 Output: Truth assignment that satisfies all clauses,

or failure

 The paradigmatic NP-complete problem

 Solution: Search

 Key point:

Most SAT problems are actually easy

 Hard region: Narrow range of

#Clauses / #Variables

32

Stochastic Local Search

 Uses complete assignments instead of partial

 Start with random state

 Flip variables in unsatisfied clauses

 Hill-climbing: Minimize # unsatisfied clauses

 Avoid local minima: Random flips

 Multiple restarts







33

The WalkSAT Algorithm

for i ← 1 to max-tries do

solution = random truth assignment

for j ← 1 to max-flips do

if all clauses satisfied then

return solution

c ← random unsatisfied clause

with probability p

flip a random variable in c

else

flip variable in c that maximizes # satisfied clauses

return failure





34

Overview

 Motivation

 Foundational areas

 Probabilistic inference

 Statistical learning

 Logical inference

 Inductive logic programming

 Markov logic

 NLP applications

 Basics

 Supervised learning

 Unsupervised learning

35

Rule Induction

 Given: Set of positive and negative examples of

some concept

 Example: (x1, x2, … , xn, y)

 y: concept (Boolean)

 x1, x2, … , xn: attributes (assume Boolean)

 Goal: Induce a set of rules that cover all positive

examples and no negative ones

 Rule: xa ^ xb ^ …  y (xa: Literal, i.e., xi or its negation)

 Same as Horn clause: Body  Head

 Rule r covers example x iff x satisfies body of r

 Eval(r): Accuracy, info gain, coverage, support, etc.

36

Learning a Single Rule

head ← y

body ← Ø

repeat

for each literal x

rx ← r with x added to body

Eval(rx)

body ← body ^ best x

until no x improves Eval(r)

return r

37

Learning a Set of Rules

R←Ø

S ← examples

repeat

learn a single rule r

R←RU{r}

S ← S − positive examples covered by r

until S = Ø

return R



38

First-Order Rule Induction

 y and xi are now predicates with arguments

E.g.: y is Ancestor(x,y), xi is Parent(x,y)

 Literals to add are predicates or their negations

 Literal to add must include at least one variable

already appearing in rule

 Adding a literal changes # groundings of rule

E.g.: Ancestor(x,z) ^ Parent(z,y)  Ancestor(x,y)

 Eval(r) must take this into account

E.g.: Multiply by # positive groundings of rule

still covered after adding literal

39

Overview

 Motivation

 Foundational areas

 Markov logic

 NLP applications

 Basics

 Supervised learning

 Unsupervised learning





40

Markov Logic

 Syntax: Weighted first-order formulas

 Semantics: Feature templates for Markov

networks

 Intuition: Soften logical constraints

 Give each formula a weight

(Higher weight  Stronger constraint)



P(world) exp weights of formulasit satisfies

41

Example: Coreference Resolution



Mentions of Obama are often headed by "Obama"

Mentions of Obama are often headed by "President"

Appositions usually refer to the same entity





Barack Obama, the 44th President

of the United States, is the first

African American to hold the office.

……







42

Example: Coreference Resolution

x MentionOf ( x, Obama)  Head( x,"Obama ")

x MentionOf ( x, Obama)  Head( x,"President ")

x, y, c Apposition( x, y )  MentionOf ( x, c)  MentionOf ( y, c)









43

Example: Coreference Resolution

1.5 x MentionOf ( x, Obama)  Head( x,"Obama ")

0.8 x MentionOf ( x, Obama)  Head( x,"President ")

100 x, y, c Apposition( x, y )  MentionOf ( x, c)  MentionOf ( y, c)









44

Example: Coreference Resolution

1.5 x MentionOf ( x, Obama)  Head( x,"Obama ")

0.8 x MentionOf ( x, Obama)  Head( x,"President ")

100 x, y, c Apposition( x, y )  MentionOf ( x, c)  MentionOf ( y, c)



Two mention constants: A and B

Apposition(A,B)

Head(A,“President”) Head(B,“President”)



MentionOf(A,Obama) MentionOf(B,Obama)





Head(A,“Obama”) Head(B,“Obama”)

Apposition(B,A) 45

Markov Logic Networks

 MLN is template for ground Markov nets

 Probability of a world x:

1  

P( x)  exp   wi ni ( x) 

Z  i 

Weight of formula i No. of true groundings of formula i in x



 Typed variables and constants greatly reduce size of

ground Markov net

 Functions, existential quantifiers, etc.

 Can handle infinite domains [Singla & Domingos, 2007]

and continuous domains [Wang & Domingos, 2008] 46

Relation to Statistical Models

 Special cases:  Obtained by making all

 Markov networks predicates zero-arity

 Markov random fields

 Bayesian networks  Markov logic allows

 Log-linear models objects to be

 Exponential models interdependent

 Max. entropy models (non-i.i.d.)

 Gibbs distributions

 Boltzmann machines

 Logistic regression

 Hidden Markov models

 Conditional random fields

47

Relation to First-Order Logic

 Infinite weights  First-order logic

 Satisfiable KB, positive weights 

Satisfying assignments = Modes of distribution

 Markov logic allows contradictions between

formulas









48

MLN Algorithms:

The First Three Generations

Problem First Second Third

generation generation generation

MAP Weighted Lazy Cutting

inference satisfiability inference planes

Marginal Gibbs MC-SAT Lifted

inference sampling inference

Weight Pseudo- Voted Scaled conj.

learning likelihood perceptron gradient

Structure Inductive ILP + PL Clustering +

learning logic progr. (etc.) pathfinding

49

MAP/MPE Inference

 Problem: Find most likely state of world

given evidence



max P( y | x)

y



Query Evidence







50

MAP/MPE Inference

 Problem: Find most likely state of world

given evidence

1  

max exp   wi ni ( x, y) 

y Zx  i 









51

MAP/MPE Inference

 Problem: Find most likely state of world

given evidence



max

y

 w n ( x, y )

i

i i









52

MAP/MPE Inference

 Problem: Find most likely state of world

given evidence

max

y

 w n ( x, y )

i

i i





 This is just the weighted MaxSAT problem

 Use weighted SAT solver

(e.g., MaxWalkSAT [Kautz et al., 1997] )





53

The MaxWalkSAT Algorithm

for i ← 1 to max-tries do

solution = random truth assignment

for j ← 1 to max-flips do

if  weights(sat. clauses) > threshold then

return solution

c ← random unsatisfied clause

with probability p

flip a random variable in c

else

flip variable in c that maximizes

 weights(sat. clauses)

return failure, best solution found

54

Computing Probabilities

 P(Formula|MLN,C) = ?

 MCMC: Sample worlds, check formula holds

 P(Formula1|Formula2,MLN,C) = ?

 If Formula2 = Conjunction of ground atoms

 First construct min subset of network necessary to

answer query (generalization of KBMC)

 Then apply MCMC





55

But … Insufficient for Logic

 Problem:

Deterministic dependencies break MCMC

Near-deterministic ones make it very slow



 Solution:

Combine MCMC and WalkSAT

→ MC-SAT algorithm [Poon & Domingos, 2006]





56

Auxiliary-Variable Methods

 Main ideas:

 Use auxiliary variables to capture dependencies

 Turn difficult sampling into uniform sampling

 Given distribution P(x)

1, if 0  u  P( x)

f ( x, u )  

0, otherwise

  f ( x, u) du  P( x)

 Sample from f (x, u), then discard u







57

Slice Sampling [Damien et al. 1999]

U P(x)



Slice





u(k)





X

x(k) x(k+1)







58

Slice Sampling

 Identifying the slice may be difficult



1

P( x )    i ( x )

Z i

 Introduce an auxiliary variable ui for each Фi



1 if 0  ui   i ( x)

f ( x, u1, , un )  

0 otherwise



59

The MC-SAT Algorithm

 Select random subset M of satisfied clauses

 With probability 1 – exp ( – wi )

 Larger wi  Ci more likely to be selected

 Hard clause (wi  ): Always selected

 Slice  States that satisfy clauses in M

 Uses SAT solver to sample x | u.

 Orders of magnitude faster than Gibbs sampling,

etc.





60

But … It Is Not Scalable

 1000 researchers

 Coauthor(x,y): 1 million ground atoms

 Coauthor(x,y)  Coauthor(y,z)  Coauthor(x,z):

1 billion ground clauses

 Exponential in arity









61

Sparsity to the Rescue

 1000 researchers

 Coauthor(x,y): 1 million ground atoms

But … most atoms are false

 Coauthor(x,y)  Coauthor(y,z)  Coauthor(x,z):

1 billion ground clauses

Most trivially satisfied if most atoms are false

 No need to explicitly compute most of them





62

Lazy Inference

 LazySAT [Singla & Domingos, 2006a]

 Lazy version of WalkSAT [Selman et al., 1996]

 Grounds atoms/clauses as needed

 Greatly reduces memory usage

 The idea is much more general

[Poon & Domingos, 2008a]









63

General Method for Lazy Inference

 If most variables assume the default value,

wasteful to instantiate all variables / functions

 Main idea:

 Allocate memory for a small subset of

“active” variables / functions

 Activate more if necessary as inference proceeds

 Applicable to a diverse set of algorithms:

Satisfiability solvers (systematic, local-search), Markov chain Monte

Carlo, MPE / MAP algorithms, Maximum expected utility algorithms,

Belief propagation, MC-SAT, Etc.

 Reduce memory and time by orders of magnitude

64

Lifted Inference

 Consider belief propagation (BP)

 Often in large problems, many nodes are

interchangeable:

They send and receive the same messages

throughout BP

 Basic idea: Group them into supernodes,

forming lifted network

 Smaller network → Faster inference

 Akin to resolution in first-order logic

65

Belief Propagation



 x  f ( x)  

hn ( x ) \{ f }

h x ( x)







Nodes Features

(x) (f)









 wf ( x ) 

 f  x ( x)    e  \{}y f ( y) 



~{ x}  yn ( f ) x  66

Lifted Belief Propagation



 x  f ( x)  

hn ( x ) \{ f }

h x ( x)







Nodes Features

(x) (f)









 wf ( x ) 

 f  x ( x)    e  \{}y f ( y) 



~{ x}  yn ( f ) x  67

Lifted Belief Propagation

, :

Functions 

of edge  x  f ( x)     h  x ( x)

counts hn ( x ) \{ f }









Nodes Features

(x) (f)









 wf ( x ) 

 f  x ( x)    e  \{}y f ( y) 



~{ x}  yn ( f ) x  68

Learning

 Data is a relational database

 Closed world assumption (if not: EM)

 Learning parameters (weights)

 Learning structure (formulas)









69

Parameter Learning

 Parameter tying: Groundings of same clause



log P ( x)  ni ( x)  Ex  ni ( x) 

wi



No. of times clause i is true in data



Expected no. times clause i is true according to MLN





 Generative learning: Pseudo-likelihood

 Discriminative learning: Conditional likelihood,

use MC-SAT or MaxWalkSAT for inference

70

Parameter Learning

 Pseudo-likelihood + L-BFGS is fast and

robust but can give poor inference results

 Voted perceptron:

Gradient descent + MAP inference

 Scaled conjugate gradient









71

Voted Perceptron for MLNs

 HMMs are special case of MLNs

 Replace Viterbi by MaxWalkSAT

 Network can now be arbitrary graph

wi ← 0

for t ← 1 to T do

yMAP ← MaxWalkSAT(x)

wi ← wi + η [counti(yData) – counti(yMAP)]

return  wi / T

72

Problem: Multiple Modes

 Not alleviated by contrastive divergence

 Alleviated by MC-SAT

 Warm start: Start each MC-SAT run at

previous end state









73

Problem: Extreme Ill-Conditioning







 Solvable by quasi-Newton, conjugate gradient, etc.

 But line searches require exact inference

 Solution: Scaled conjugate gradient

[Lowd & Domingos, 2008]

 Use Hessian to choose step size

 Compute quadratic form inside MC-SAT

 Use inverse diagonal Hessian as preconditioner

74

Structure Learning

 Standard inductive logic programming optimizes

the wrong thing

 But can be used to overgenerate for L1 pruning

 Our approach:

ILP + Pseudo-likelihood + Structure priors

 For each candidate structure change:

Start from current weights & relax convergence

 Use subsampling to compute sufficient statistics





75

Structure Learning

 Initial state: Unit clauses or prototype KB

 Operators: Add/remove literal, flip sign

 Evaluation function:

Pseudo-likelihood + Structure prior

 Search: Beam search, shortest-first search









76

Alchemy

Open-source software including:

 Full first-order logic syntax



 Generative & discriminative weight learning



 Structure learning



 Weighted satisfiability, MCMC, lifted BP



 Programming language features



alchemy.cs.washington.edu

77

Alchemy Prolog BUGS



Represent- F.O. Logic + Horn Bayes

ation Markov nets clauses nets

Inference Model check- Theorem MCMC

ing, MCMC, proving

lifted BP

Learning Parameters No Params.

& structure

Uncertainty Yes No Yes



Relational Yes Yes No

78

Constrained Conditional Model

 Representation: Integer linear programs

Local classifiers + Global constraints

 Inference: LP solver

 Parameter learning: None for constraints

 Weights of soft constraints set heuristically

 Local weights typically learned independently

 Structure learning: None to date

 But see latest development in NAACL-10

79

Running Alchemy

 Programs  MLN file

 Infer  Types (optional)

 Learnwts  Predicates

 Learnstruct  Formulas

 Options  Database files









80

Overview

 Motivation

 Foundational areas

 Markov logic

 NLP applications

 Basics

 Supervised learning

 Unsupervised learning





81

Uniform Distribn.: Empty MLN

Example: Unbiased coin flips



Type: flip = { 1, … , 20 }

Predicate: Heads(flip)





1

e0

1

P(Heads( f ))  1 Z



Z

e Ze

1 0

0

2







82

Binomial Distribn.: Unit Clause

Example: Biased coin flips

Type: flip = { 1, … , 20 }

Predicate: Heads(flip)

Formula: Heads(f)

 p 

Weight: Log odds of heads: 1 p 

w  log  

 

1

ew 1

P(Heads(f ))  1 Z

 w

p

Z

e  Z e 1 e

w1 0







By default, MLN includes unit clauses for all predicates

(captures marginal distributions, etc.)

83

Multinomial Distribution

Example: Throwing die



Types: throw = { 1, … , 20 }

face = { 1, … , 6 }

Predicate: Outcome(throw,face)

Formulas: Outcome(t,f) ^ f != f’ => !Outcome(t,f’).

Exist f Outcome(t,f).



Too cumbersome!







84

Multinomial Distrib.: ! Notation

Example: Throwing die



Types: throw = { 1, … , 20 }

face = { 1, … , 6 }

Predicate: Outcome(throw,face!)

Formulas:



Semantics: Arguments without “!” determine arguments with “!”.

Also makes inference more efficient (triggers blocking).







85

Multinomial Distrib.: + Notation

Example: Throwing biased die



Types: throw = { 1, … , 20 }

face = { 1, … , 6 }

Predicate: Outcome(throw,face!)

Formulas: Outcome(t,+f)



Semantics: Learn weight for each grounding of args with “+”.









86

Logistic Regression (MaxEnt)

 P(C  1 | F  f ) 

 P(C  0 | F  f )   a  bi f i

Logistic regression: log 

 

Type: obj = { 1, ... , n }

Query predicate: C(obj)

Evidence predicates: Fi(obj)

Formulas: a C(x)

bi Fi(x) ^ C(x)

1  

Resulting distribution: P(C  c, F  f )  exp  ac   bi f i c 

Z  i 

 P(C  1 | F  f )   expa   bi f i  

Therefore: log     a   bi f i

 P(C  0 | F  f )   log 

 

   exp(0) 



Alternative form: Fi(x) => C(x)

87

Hidden Markov Models

obs = { Red, Green, Yellow }

state = { Stop, Drive, Slow }

time = { 0, ..., 100 }



State(state!,time)

Obs(obs!,time)



State(+s,0)

State(+s,t) ^ State(+s',t+1)

Obs(+o,t) ^ State(+s,t)



Sparse HMM:

State(s,t) => State(s1,t+1) v State(s2, t+1) v ... .



88

Bayesian Networks

 Use all binary predicates with same first argument

(the object x).

 One predicate for each variable A: A(x,v!)

 One clause for each line in the CPT and

value of the variable

 Context-specific independence:

One clause for each path in the decision tree

 Logistic regression: As before

 Noisy OR: Deterministic OR + Pairwise clauses



89

Relational Models

 Knowledge-based model construction

 Allow only Horn clauses

 Same as Bayes nets, except arbitrary relations

 Combin. function: Logistic regression, noisy-OR or external

 Stochastic logic programs

 Allow only Horn clauses

 Weight of clause = log(p)

 Add formulas: Head holds  Exactly one body holds

 Probabilistic relational models

 Allow only binary relations

 Same as Bayes nets, except first argument can vary

90

Relational Models

 Relational Markov networks

 SQL → Datalog → First-order logic

 One clause for each state of a clique

 + syntax in Alchemy facilitates this

 Bayesian logic

 Object = Cluster of similar/related observations

 Observation constants + Object constants

 Predicate InstanceOf(Obs,Obj) and clauses using it

 Unknown relations: Second-order Markov logic

S. Kok & P. Domingos, “Statistical Predicate Invention”, in

Proc. ICML-2007.



91

Overview

 Motivation

 Foundational areas

 Markov logic

 NLP applications

 Basics

 Supervised learning

 Unsupervised learning





92

Text Classification



The 56th quadrennial United States presidential

election was held on November 4, 2008. Outgoing

Republican President George W. Bush's policies and Topic = politics

actions and the American public's desire for change

were key issues throughout the campaign. ……



The Chicago Bulls are an American professional

basketball team based in Chicago, Illinois, playing in

the Central Division of the Eastern Conference in the Topic = sports

National Basketball Association (NBA). ……

……









93

Text Classification

page = {1, ..., max}

word = { ... }

topic = { ... }



Topic(page,topic)

HasWord(page,word)





Topic(p,t)

HasWord(p,+w) => Topic(p,+t)





If topics mutually exclusive: Topic(page,topic!)





94

Text Classification

page = {1, ..., max}

word = { ... }

topic = { ... }



Topic(page,topic)

HasWord(page,word)

Links(page,page)



Topic(p,t)

HasWord(p,+w) => Topic(p,+t)

Topic(p,t) ^ Links(p,p') => Topic(p',t)





Cf. S. Chakrabarti, B. Dom & P. Indyk, “Hypertext Classification

Using Hyperlinks,” in Proc. SIGMOD-1998.

95

Entity Resolution

AUTHOR: H. POON & P. DOMINGOS

TITLE: UNSUPERVISED SEMANTIC PARSING

VENUE: EMNLP-09

SAME?

AUTHOR: Hoifung Poon and Pedro Domings

TITLE: Unsupervised semantic parsing

VENUE: Proceedings of the 2009 Conference on Empirical Methods in

Natural Language Processing

AUTHOR: Poon, Hoifung and Domings, Pedro

TITLE: Unsupervised ontology induction from text

VENUE: Proceedings of the Forty-Eighth Annual Meeting of the

Association for Computational Linguistics

SAME?

AUTHOR: H. Poon, P. Domings

TITLE: Unsupervised ontology induction

VENUE: ACL-10 96

Entity Resolution

Problem: Given database, find duplicate records



HasToken(token,field,record)

SameField(field,record,record)

SameRecord(record,record)



HasToken(+t,+f,r) ^ HasToken(+t,+f,r’)

=> SameField(f,r,r’)

SameField(f,r,r’) => SameRecord(r,r’)









97

Entity Resolution

Problem: Given database, find duplicate records



HasToken(token,field,record)

SameField(field,record,record)

SameRecord(record,record)



HasToken(+t,+f,r) ^ HasToken(+t,+f,r’)

=> SameField(f,r,r’)

SameField(f,r,r’) => SameRecord(r,r’)

SameRecord(r,r’) ^ SameRecord(r’,r”)

=> SameRecord(r,r”)





Cf. A. McCallum & B. Wellner, “Conditional Models of Identity Uncertainty

with Application to Noun Coreference,” in Adv. NIPS 17, 2005.

98

Entity Resolution

Can also resolve fields:



HasToken(token,field,record)

SameField(field,record,record)

SameRecord(record,record)



HasToken(+t,+f,r) ^ HasToken(+t,+f,r’)

=> SameField(f,r,r’)

SameField(f,r,r’) SameRecord(r,r’)

SameRecord(r,r’) ^ SameRecord(r’,r”)

=> SameRecord(r,r”)

SameField(f,r,r’) ^ SameField(f,r’,r”)

=> SameField(f,r,r”)



More: P. Singla & P. Domingos, “Entity Resolution with Markov

Logic”, in Proc. ICDM-2006. 99

Information Extraction



Unsupervised Semantic Parsing, Hoifung Poon and Pedro Domingos.

Proceedings of the 2009 Conference on Empirical Methods in Natural

Language Processing. Singapore: ACL.









UNSUPERVISED SEMANTIC PARSING. H. POON & P. DOMINGOS.

EMNLP-2009.







100

Information Extraction

Author Title Venue



Unsupervised Semantic Parsing, Hoifung Poon and Pedro Domingos.

Proceedings of the 2009 Conference on Empirical Methods in Natural

Language Processing. Singapore: ACL.





SAME?





UNSUPERVISED SEMANTIC PARSING. H. POON & P. DOMINGOS.

EMNLP-2009.







101

Information Extraction

 Problem: Extract database from text or

semi-structured sources

 Example: Extract database of publications

from citation list(s) (the “CiteSeer problem”)

 Two steps:

 Segmentation:

Use HMM to assign tokens to fields

 Entity resolution:

Use logistic regression and transitivity

102

Information Extraction

Token(token, position, citation)

InField(position, field!, citation)

SameField(field, citation, citation)

SameCit(citation, citation)



Token(+t,i,c) => InField(i,+f,c)

InField(i,+f,c) ^ InField(i+1,+f,c)





Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’)

^ InField(i’,+f,c’) => SameField(+f,c,c’)

SameField(+f,c,c’) SameCit(c,c’)

SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)

SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)









103

Information Extraction

Token(token, position, citation)

InField(position, field!, citation)

SameField(field, citation, citation)

SameCit(citation, citation)



Token(+t,i,c) => InField(i,+f,c)

InField(i,+f,c) ^ !Token(“.”,i,c) ^ InField(i+1,+f,c)





Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’)

^ InField(i’,+f,c’) => SameField(+f,c,c’)

SameField(+f,c,c’) SameCit(c,c’)

SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)

SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)



More: H. Poon & P. Domingos, “Joint Inference in Information

Extraction”, in Proc. AAAI-2007.

104

Biomedical Text Mining

 Traditionally, name entity recognition or

information extraction

E.g., protein recognition, protein-protein identification

 BioNLP-09 shared task: Nested bio-events

 Much harder than traditional IE

 Top F1 around 50%

 Naturally calls for joint inference







105

Bio-Event Extraction

Involvement of p70(S6)-kinase activation in IL-10

up-regulation in human monocytes by gp41 envelope

protein of human immunodeficiency virus type 1 ...

involvement

Theme Cause



up-regulation activation

Theme Cause Site Theme



human

IL-10 gp41 p70(S6)-kinase

monocyte

106

Bio-Event Extraction

Token(position, token)

DepEdge(position, position, dependency)

IsProtein(position)

EvtType(position, evtType) Logistic

InArgPath(position, position, argType!) regression



Token(i,+w) => EvtType(i,+t)

Token(j,w) ^ DepEdge(i,j,+d) => EvtType(i,+t)

DepEdge(i,j,+d) => InArgPath(i,j,+a)

Token(i,+w) ^ DepEdge(i,j,+d) => InArgPath(i,j,+a)











107

Bio-Event Extraction

Token(position, token)

DepEdge(position, position, dependency)

IsProtein(position)

EvtType(position, evtType)

InArgPath(position, position, argType!)



Token(i,+w) => EvtType(i,+t)

Token(j,w) ^ DepEdge(i,j,+d) => EvtType(i,+t)

DepEdge(i,j,+d) => InArgPath(i,j,+a)

Adding a few joint inference

Token(i,+w) ^ DepEdge(i,j,+d) => InArgPath(i,j,+a)

… rules doubles the F1



InArgPath(i,j,Theme) => IsProtein(j) v

(Exist k k!=i ^ InArgPath(j, k, Theme)).





More: H. Poon and L. Vanderwende, “Joint Inference for Knowledge

Extraction from Biomedical Literature”, 10:40 am, June 4, Gold Room.

108

Temporal Information Extraction

 Identify event times and temporal relations

(BEFORE, AFTER, OVERLAP)

 E.g., who is the President of U.S.A.?

 Obama: 1/20/2009  present

 G. W. Bush: 1/20/2001  1/19/2009

 Etc.









109

Temporal Information Extraction

DepEdge(position, position, dependency)

Event(position, event)

After(event, event)





DepEdge(i,j,+d) ^ Event(i,p) ^ Event(j,q) => After(p,q)





After(p,q) ^ After(q,r) => After(p,r)









110

Temporal Information Extraction

DepEdge(position, position, dependency)

Event(position, event)

After(event, event)

Role(position, position, role)



DepEdge(I,j,+d) ^ Event(i,p) ^ Event(j,q) => After(p,q)

Role(i,j,ROLE-AFTER) ^ Event(i,p) ^ Event(j,q) => After(p,q)



After(p,q) ^ After(q,r) => After(p,r)



More:

K. Yoshikawa, S. Riedel, M. Asahara and Y. Matsumoto, “Jointly

Identifying Temporal Relations with Markov Logic”, in Proc. ACL-2009.



X. Ling & D. Weld, “Temporal Information Extraction”, in Proc. AAAI-2010.



111

Semantic Role Labeling

 Problem: Identify arguments for a predicate

 Two steps:

 Argument identification:

Determine whether a phrase is an argument

 Role classification:

Determine the type of an argument (agent, theme,

temporal, adjunct, etc.)







112

Semantic Role Labeling

Token(position, token)

DepPath(position, position, path)

IsPredicate(position)

Role(position, position, role!)

HasRole(position, position)





Token(i,+t) => IsPredicate(i)

DepPath(i,j,+p) => Role(i,j,+r)





HasRole(i,j) => IsPredicate(i)

IsPredicate(i) => Exist j HasRole(i,j)

HasRole(i,j) => Exist r Role(i,j,r)

Role(i,j,r) => HasRole(i,j)



Cf. K. Toutanova, A. Haghighi, C. Manning, “A global joint model for

semantic role labeling”, in Computational Linguistics 2008.

113

Joint Semantic Role Labeling

and Word Sense Disambiguation

Token(position, token)

DepPath(position, position, path)

IsPredicate(position)

Role(position, position, role!)

HasRole(position, position)

Sense(position, sense!)



Token(i,+t) => IsPredicate(i)

DepPath(i,j,+p) => Role(i,j,+r)

Sense(I,s) => IsPredicate(i)



HasRole(i,j) => IsPredicate(i)

IsPredicate(i) => Exist j HasRole(i,j)

HasRole(i,j) => Exist r Role(i,j,r)

Role(i,j,r) => HasRole(i,j)

Token(i,+t) ^ Role(i,j,+r) => Sense(i,+s)



More: I. Meza-Ruiz & S. Riedel, “Jointly Identifying Predicates,

Arguments and Senses using Markov Logic”, in Proc. NAACL-2009.

114

Practical Tips: Modeling

 Add all unit clauses (the default)

 How to handle uncertain data:

R(x,y) ^ R’(x,y) (the “HMM trick”)

 Implications vs. conjunctions

For soft correlation, conjunctions often better

 Implication: A => B is equivalent to !(A ^ !B)

 Share cases with others like A => C

 Make learning unnecessarily harder





115

Practical Tips: Efficiency

 Open/closed world assumptions

 Low clause arities

 Low numbers of constants

 Short inference chains









116

Practical Tips: Development

 Start with easy components

 Gradually expand to full task

 Use the simplest MLN that works

 Cycle: Add/delete formulas, learn and test









117

Overview

 Motivation

 Foundational areas

 Markov logic

 NLP applications

 Basics

 Supervised learning

 Unsupervised learning





118

Unsupervised Learning: Why?

 Virtually unlimited supply of unlabeled text

 Labeling is expensive (Cf. Penn-Treebank)

 Often difficult to label with consistency and

high quality (e.g., semantic parses)

 Emerging field: Machine reading

Extract knowledge from unstructured text with

high precision/recall and minimal human effort

Check out LBR-Workshop (WS9) on Sunday

119

Unsupervised Learning: How?

 I.i.d. learning: Sophisticated model requires

more labeled data

 Statistical relational learning: Sophisticated

model may require less labeled data

 Relational dependencies constrain problem space

 One formula is worth a thousand labels

 Small amount of domain knowledge 

large-scale joint inference



120

Unsupervised Learning: How?

 Ambiguities vary among objects

 Joint inference  Propagate information from

unambiguous objects to ambiguous ones

 E.g.: Are they

G. W. Bush … coreferent?

He …



Mrs. Bush …

121

Unsupervised Learning: How

 Ambiguities vary among objects

 Joint inference  Propagate information from

unambiguous objects to ambiguous ones

 E.g.: Should be

G. W. Bush … coreferent

He …



Mrs. Bush …

122

Unsupervised Learning: How

 Ambiguities vary among objects

 Joint inference  Propagate information from

unambiguous objects to ambiguous ones

 E.g.: So must be

G. W. Bush … singular male!

He …



Mrs. Bush …

123

Unsupervised Learning: How

 Ambiguities vary among objects

 Joint inference  Propagate information from

unambiguous objects to ambiguous ones

 E.g.: Must be

G. W. Bush … singular female!

He …



Mrs. Bush …

124

Unsupervised Learning: How

 Ambiguities vary among objects

 Joint inference  Propagate information from

unambiguous objects to ambiguous ones

 E.g.: Verdict:

G. W. Bush … Not coreferent!

He …



Mrs. Bush …

125

Parameter Learning

 Marginalize out hidden variables



log P ( x)  Ez| x  ni ( x, z )   Ex , z  ni ( x, z ) 

wi



Sum over z, conditioned on observed x



Summed over both x and z





 Use MC-SAT to approximate both expectations

 May also combine with contrastive estimation

[Poon & Cherry & Toutanova, NAACL-2009]

126

Unsupervised Coreference Resolution

Head(mention, string)

Type(mention, type)

MentionOf(mention, entity)

Mixture model



MentionOf(+m,+e)

Type(+m,+t) Joint inference formulas:

Head(+m,+h) ^ MentionOf(+m,+e) Enforce agreement

MentionOf(a,e) ^ MentionOf(b,e) => (Type(a,t) Type(b,t))



… (similarly for Number, Gender etc.)









127

Unsupervised Coreference Resolution

Head(mention, string)

Type(mention, type)

MentionOf(mention, entity)

Apposition(mention, mention)



MentionOf(+m,+e)

Type(+m,+t)

Head(+m,+h) ^ MentionOf(+m,+e)



MentionOf(a,e) ^ MentionOf(b,e) => (Type(a,t) Type(b,t))

Joint inference formulas:

… (similarly for Number, Gender etc.) Leverage apposition



Apposition(a,b) => (MentionOf(a,e) MentionOf(b,e))



More: H. Poon and P. Domingos, “Joint Unsupervised Coreference

Resolution with Markov Logic”, in Proc. EMNLP-2008. 128

Relational Clustering:

Discover Unknown Predicates

 Cluster relations along with objects

 Use second-order Markov logic

[Kok & Domingos, 2007, 2008]

 Key idea: Cluster combination determines

likelihood of relations

InClust(r,+c) ^ InClust(x,+a) ^ InClust(y,+b)

=> r(x,y)

 Input: Relational tuples extracted by

TextRunner [Banko et al., 2007]

 Output: Semantic network

129

Recursive Relational Clustering

 Unsupervised semantic parsing

[Poon & Domingos, EMNLP-2009]

 Text  Knowledge

 Start directly from text

 Identify meaning units + Resolve variations

 Use high-order Markov logic (variables over

arbitrary lambda forms and their clusters)

 End-to-end machine reading:

Read text, then answer questions

130

Semantic Parsing

INDUCE(e1)

IL-4 protein

INDUCER(e1,e2) INDUCED(e1,e3)

induces CD11b

IL-4(e2) CD11B(e3)



Structured prediction: Partition + Assignment

induces INDUCE induces

nsubj dobj INDUCER nsubj dobj INDUCED



protein CD11b protein CD11b

nn CD11B

nn

IL-4 IL-4

131

IL-4

Challenge:

Same Meaning, Many Variations

IL-4 up-regulates CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is induced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …

……







132

Unsupervised Semantic Parsing

 USP  Recursively cluster arbitrary expressions

composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …









133

Unsupervised Semantic Parsing

 USP  Recursively cluster arbitrary expressions

composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …



Cluster same forms at the atom level



134

Unsupervised Semantic Parsing

 USP  Recursively cluster arbitrary expressions

composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …



Cluster forms in composition with same forms



135

Unsupervised Semantic Parsing

 USP  Recursively cluster arbitrary expressions

composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …



Cluster forms in composition with same forms



136

Unsupervised Semantic Parsing

 USP  Recursively cluster arbitrary expressions

composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …



Cluster forms in composition with same forms



137

Unsupervised Semantic Parsing

 USP  Recursively cluster arbitrary expressions

composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …



Cluster forms in composition with same forms



138

Unsupervised Semantic Parsing

 Exponential prior on number of parameters

 Event/object/property cluster mixtures:

InClust(e,+c) ^ HasValue(e,+v)



Object/Event Cluster: INDUCE



Property Cluster: INDUCER

induces 0.1

nsubj 0.5 IL-4 0.2 None 0.1

enhances 0.4 …

agent 0.4 IL-8 0.1 One 0.8

































139

But … State Space Too Large

 Coreference: #-clusters  #-mentions

 USP: #-clusters  exp(#-tokens)



 Also, meaning units often small and

many singleton clusters

 Use combinatorial search









140

Inference: Hill-Climb Probability

? induces

? nsubj dobj ?



Initialize ? protein CD11B ?



? nn



? IL-4



Lambda reduction



? protein protein

Search ? nn

? nn

Operator

? IL-4 IL-4

141

Learning: Hill-Climb Likelihood



Initialize induces 1 enhances 1 IL-4 1 protein 1 …



MERGE COMPOSE

induces 1 enhances 1 IL-4 1 protein 1

Search

Operator

induces 0.2

IL-4 protein 1

enhances 0.8









142

Unsupervised Ontology Induction

 Limitations of USP:

 No ISA hierarchy among clusters

 Little smoothing

 Limited capability to generalize

 OntoUSP [Poon & Domingos, ACL-2010]

 Extends USP to also induce ISA hierarchy

 Joint approach for ontology induction, population,

and knowledge extraction

 To appear in ACL (see you in Uppsala :-)

143

OntoUSP

 Modify the cluster mixture formula

MERGE

InClust(e,c) ^ ISA(c,+d) ^ HasValue(e,+v)

with

 Hierarchical smoothing + clustering REGULATE

 New operator in learning: ABSTRACTION ?





induces 0.3

enhances 0.1

induces 0.6 inhibits 0.2

INDUCE up-regulates 0.2 suppresses 0.1





















ISA ISA

INDUCE INHIBIT

inhibits 0.4

INHIBIT suppresses 0.2 induces 0.6 inhibits 0.4

up-regulates 0.2 suppresses 0.2























144

End of The Beginning …

 Not merely a user guide of MLN and Alchemy

 Statistical relational learning:

Growth area for machine learning and NLP









145

Future Work: Inference

 Scale up inference

 Cutting-planes methods (e.g., [Riedel, 2008])

 Unify lifted inference with sampling

 Coarse-to-fine inference

 Alternative technology

E.g., linear programming, lagrangian relaxation









146

Future Work: Supervised Learning



 Alternative optimization objectives

E.g., max-margin learning [Huynh & Mooney, 2009]

 Learning for efficient inference

E.g., learning arithmetic circuits [Lowd & Domingos, 2008]

 Structure learning:

Improve accuracy and scalability

E.g., [Kok & Domingos, 2009]





147

Future Work: Unsupervised Learning



 Model: Learning objective, formalism, etc.

 Learning: Local optima, intractability, etc.

 Hyperparameter tuning

 Leverage available resources

 Semi-supervised learning

 Multi-task learning

 Transfer learning (e.g., domain adaptation)

 Human in the loop

E.g., interative ML, active learning, crowdsourcing

148

Future Work: NLP Applications

 Existing application areas:

 More joint inference opportunities

 Additional domain knowledge

 Combine multiple pipeline stages

 A “killer app”: Machine reading

 Many, many more awaiting YOU to discover







149

Summary

 We need to unify logical and statistical NLP

 Markov logic provides a language for this

 Syntax: Weighted first-order formulas

 Semantics: Feature templates of Markov nets

 Inference: Satisfiability, MCMC, lifted BP, etc.

 Learning: Pseudo-likelihood, VP, PSCG, ILP, etc.

 Growing set of NLP applications

 Open-source software: Alchemy

alchemy.cs.washington.edu

 Book: Domingos & Lowd, Markov Logic,

Morgan & Claypool, 2009. 150

References

[Banko et al., 2007] Michele Banko, Michael J. Cafarella, Stephen

Soderland, Matt Broadhead, Oren Etzioni, "Open Information

Extraction From the Web", In Proc. IJCAI-2007.



[Chakrabarti et al., 1998] Soumen Chakrabarti, Byron Dom, Piotr Indyk,

"Hypertext Classification Using Hyperlinks", in Proc. SIGMOD-1998.



[Damien et al., 1999] Paul Damien, Jon Wakefield, Stephen Walker,

"Gibbs sampling for Bayesian non-conjugate and hierarchical

models by auxiliary variables", Journal of the Royal Statistical

Society B, 61:2.



[Domingos & Lowd, 2009] Pedro Domingos and Daniel Lowd, Markov

Logic, Morgan & Claypool.



[Friedman et al., 1999] Nir Friedman, Lise Getoor, Daphne Koller, Avi

Pfeffer, "Learning probabilistic relational models", in Proc. IJCAI- 151

1999.

References

[Halpern, 1990] Joe Halpern, "An analysis of first-order logics of

probability", Artificial Intelligence 46.



[Huynh & Mooney, 2009] Tuyen Huynh and Raymond Mooney, "Max-

Margin Weight Learning for Markov Logic Networks", In Proc.

ECML-2009.



[Kautz et al., 1997] Henry Kautz, Bart Selman, Yuejun Jiang, "A general

stochastic approach to solving problems with hard and soft

constraints", In The Satisfiability Problem: Theory and Applications.

AMS.



[Kok & Domingos, 2007] Stanley Kok and Pedro Domingos, "Statistical

Predicate Invention", In Proc. ICML-2007.



[Kok & Domingos, 2008] Stanley Kok and Pedro Domingos, "Extracting

Semantic Networks from Text via Relational Clustering", In Proc. 152

ECML-2008.

References

[Kok & Domingos, 2009] Stanley Kok and Pedro Domingos, "Learning

Markov Logic Network Structure via Hypergraph Lifting", In Proc.

ICML-2009.



[Ling & Weld, 2010] Xiao Ling and Daniel S. Weld, "Temporal

Information Extraction", In Proc. AAAI-2010.



[Lowd & Domingos, 2007] Daniel Lowd and Pedro Domingos, "Efficient

Weight Learning for Markov Logic Networks", In Proc. PKDD-2007.



[Lowd & Domingos, 2008] Daniel Lowd and Pedro Domingos,

"Learning Arithmetic Circuits", In Proc. UAI-2008.



[Meza-Ruiz & Riedel, 2009] Ivan Meza-Ruiz and Sebastian Riedel,

"Jointly Identifying Predicates, Arguments and Senses using Markov

Logic", In Proc. NAACL-2009.

153

References

[Muggleton, 1996] Stephen Muggleton, "Stochastic logic programs", in

Proc. ILP-1996.



[Nilsson, 1986] Nil Nilsson, "Probabilistic logic", Artificial Intelligence

28.



[Page et al., 1998] Lawrence Page, Sergey Brin, Rajeev Motwani, Terry

Winograd, "The PageRank Citation Ranking: Bringing Order to the

Web", Tech. Rept., Stanford University, 1998.



[Poon & Domingos, 2006] Hoifung Poon and Pedro Domingos, "Sound

and Efficient Inference with Probabilistic and Deterministic

Dependencies", In Proc. AAAI-06.



[Poon & Domingos, 2007] Hoifung Poon and Pedro Domingo, "Joint

Inference in Information Extraction", In Proc. AAAI-07.

154

References

[Poon & Domingos, 2008a] Hoifung Poon, Pedro Domingos, Marc

Sumner, "A General Method for Reducing the Complexity of

Relational Inference and its Application to MCMC", In Proc. AAAI-

08.



[Poon & Domingos, 2008b] Hoifung Poon and Pedro Domingos, "Joint

Unsupervised Coreference Resolution with Markov Logic", In Proc.

EMNLP-08.



[Poon & Domingos, 2009] Hoifung and Pedro Domingos,

"Unsupervised Semantic Parsing", In Proc. EMNLP-09.



[Poon & Cherry & Toutanova, 2009] Hoifung Poon, Colin Cherry,

Kristina Toutanova, "Unsupervised Morphological Segmentation

with Log-Linear Models", In Proc. NAACL-2009.



155

References

[Poon & Vanderwende, 2010] Hoifung Poon and Lucy Vanderwende,

"Joint Inference for Knowledge Extraction from Biomedical

Literature", In Proc. NAACL-10.



[Poon & Domingos, 2010] Hoifung and Pedro Domingos,

"Unsupervised Ontology Induction From Text", In Proc. ACL-10.



[Riedel 2008] Sebatian Riedel, "Improving the Accuracy and Efficiency

of MAP Inference for Markov Logic", In Proc. UAI-2008.



[Riedel et al., 2009] Sebastian Riedel, Hong-Woo Chun, Toshihisa

Takagi and Jun'ichi Tsujii, "A Markov Logic Approach to Bio-

Molecular Event Extraction", In Proc. BioNLP 2009 Shared Task.



[Selman et al., 1996] Bart Selman, Henry Kautz, Bram Cohen, "Local

search strategies for satisfiability testing", In Cliques, Coloring, and

Satisfiability: Second DIMACS Implementation Challenge. AMS. 156

References

[Singla & Domingos, 2006a] Parag Singla and Pedro Domingos,

"Memory-Efficient Inference in Relational Domains", In Proc. AAAI-

2006.



[Singla & Domingos, 2006b] Parag Singla and Pedro Domingos, "Entity

Resolution with Markov Logic", In Proc. ICDM-2006.



[Singla & Domingos, 2007] Parag Singla and Pedro Domingos,

"Markov Logic in Infinite Domains", In Proc. UAI-2007.



[Singla & Domingos, 2008] Parag Singla and Pedro Domingos, "Lifted

First-Order Belief Propagation", In Proc. AAAI-2008.



[Taskar et al., 2002] Ben Taskar, Pieter Abbeel, Daphne Koller,

"Discriminative probabilistic models for relational data", in Proc. UAI-

2002.

157

References

[Toutanova & Haghighi & Manning, 2008] Kristina Toutanova, Aria

Haghighi, Chris Manning, "A global joint model for semantic role

labeling", Computational Linguistics.



[Wang & Domingos, 2008] Jue Wang and Pedro Domingos, "Hybrid

Markov Logic Networks", In Proc. AAAI-2008.



[Wellman et al., 1992] Michael Wellman, John S. Breese, Robert P.

Goldman, "From knowledge bases to decision models", Knowledge

Engineering Review 7.



[Yoshikawa et al., 2009] Katsumasa Yoshikawa, Sebastian Riedel,

Masayuki Asahara and Yuji Matsumoto, "Jointly Identifying

Temporal Relations with Markov Logic", In Proc. ACL-2009.





158


Related docs
Other docs by HC111116214814
Principles of Landscape Design
Views: 0  |  Downloads: 0
Styles of Eruptions and Volcanic Hazards
Views: 4  |  Downloads: 0
picture language
Views: 0  |  Downloads: 0
MASTER SCHEDULE COLLECTION
Views: 1  |  Downloads: 0
FORM 4
Views: 1  |  Downloads: 0
Metadata Standards and Applications
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!