Docstoc

Markov Logic in Natural Language Processing

Document Sample
Markov Logic in Natural Language Processing Powered By Docstoc
					Markov Logic in Natural
 Language Processing

                 Hoifung Poon
  Dept. of Computer Science & Eng.
            University of Washington
Overview
   Motivation
   Foundational areas
   Markov logic
   NLP applications
       Basics
       Supervised learning
       Unsupervised learning


                                2
Holy Grail of NLP:
Automatic Language Understanding

      Natural language search
      Answer questions
      Knowledge discovery
      ……

      Text             Meaning



                                   3
Reality: Increasingly Fragmented

     Parsing

                          Semantics


Tagging




                       Information
          Morphology    Extraction    4
Time for a New Synthesis?
   Speed up progress
   New opportunities to improve performance
   But we need a new tool for this …




                                               5
Languages Are Structural
         governments
           lm$pxtm
   (according to their families)




                                   6
 Languages Are Structural
                                                             S
               govern-ment-s
                                                        NP           VP
                l-m$px-t-m
           (according to their families)                         V        NP
                                                       IL-4 induces CD11B
         Involvement of p70(S6)-kinase
                                                   George Walker Bush was the
         activation in IL-10 up-regulation
                                                   43rd President of the United
         in human monocytes by gp41......
                                                   States.
                    involvement                    ……
           Theme                  Cause            Bush was the eldest son of
   up-regulation                activation         President G. H. W. Bush and
                                                   Babara Bush.
 Theme     Cause    Site                   Theme
                                                   …….
                    human                          In November 1977, he met
IL-10    gp41               p70(S6)-kinase         Laura Welch at a barbecue. 7
                   monocyte
 Languages Are Structural
                                                             S
               govern-ment-s
                                                        NP           VP
                l-m$px-t-m
           (according to their families)                         V        NP
                                                       IL-4 induces CD11B
         Involvement of p70(S6)-kinase
                                                   George Walker Bush was the
         activation in IL-10 up-regulation
                                                   43rd President of the United
         in human monocytes by gp41......
                                                   States.
                    involvement                    ……
           Theme                  Cause            Bush was the eldest son of
   up-regulation                activation         President G. H. W. Bush and
                                                   Babara Bush.
 Theme     Cause    Site                   Theme
                                                   …….
                    human                          In November 1977, he met
IL-10    gp41               p70(S6)-kinase         Laura Welch at a barbecue. 8
                   monocyte
Processing Is Complex

  Morphology      POS Tagging        Chunking


 Semantic Role Labeling         Syntactic Parsing



  Coreference Resolution   Information Extraction

                      ……

                                                    9
Pipeline Is Suboptimal

  Morphology      POS Tagging        Chunking


 Semantic Role Labeling         Syntactic Parsing



 Coreference Resolution    Information Extraction

                      ……

                                                    10
First-Order Logic
   Main theoretical foundation of computer science
   General language for describing
    complex structures and knowledge
   Trees, graphs, dependencies, hierarchies, etc.
    easily expressed
   Inference algorithms (satisfiability testing,
    theorem proving, etc.)


                                                11
  Languages Are Statistical
I saw the man with the telescope   Microsoft buys Powerset
               NP                  Microsoft acquires Powerset
I saw the man with the telescope   Powerset is acquired by Microsoft Corporation
        NP                         The Redmond software giant buys Powerset
                    ADVP
I saw the man with the telescope   Microsoft’s purchase of Powerset, …
                                   ……

Here in London, Frances Deek is a retired teacher …   G. W. Bush ……
In the Israeli town …, Karen London says …            …… Laura Bush ……
Now London says …                                     Mrs. Bush ……

      London  PERSON or LOCATION?                           Which one?
                                                                            12
Languages Are Statistical
   Languages are ambiguous
   Our information is always incomplete
   We need to model correlations
   Our predictions are uncertain
   Statistics provides the tools to handle this




                                                   13
Probabilistic Graphical Models
   Mixture models
   Hidden Markov models
   Bayesian networks
   Markov random fields
   Maximum entropy models
   Conditional random fields
   Etc.

                                 14
The Problem
   Logic is deterministic, requires manual coding
   Statistical models assume i.i.d. data,
    objects = feature vectors
   Historically, statistical and logical NLP
    have been pursued separately
   We need to unify the two!



                                                 15
Also, Supervision Is Scarce
   Supervised learning needs training examples
   Tons of texts … but most are not annotated
   Labeling is expensive (Cf. Penn-Treebank)
 Need to leverage indirect supervision




                                              16
A Promising Solution:
Statistical Relational Learning
   Emerging direction in machine learning
   Unifies logical and statistical approaches
   Principal way to leverage direct and indirect
    supervision




                                                    17
Key: Joint Inference
   Models complex interdependencies
   Propagates information from more certain
    decisions to resolve ambiguities in others
   Advantages:
       Better and more intuitive models
       Improve predictive accuracy
       Compensate for lack of training examples
   SRL can have even greater impact when
    direct supervision is scarce
                                                   18
Challenges in Applying
Statistical Relational Learning
   Learning is much harder
   Inference becomes a crucial issue
   Greater complexity for user




                                        19
Progress to Date
   Probabilistic logic [Nilsson, 1986]
   Statistics and beliefs [Halpern, 1990]
   Knowledge-based model construction
    [Wellman et al., 1992]
   Stochastic logic programs [Muggleton, 1996]
   Probabilistic relational models [Friedman et al., 1999]
   Relational Markov networks [Taskar et al., 2002]
   Etc.
   This talk: Markov logic [Domingos & Lowd, 2009]

                                                              20
Markov Logic:
A Unifying Framework
   Probabilistic graphical models and
    first-order logic are special cases
   Unified inference and learning algorithms
   Easy-to-use software: Alchemy
   Broad applicability
   Goal of this tutorial:
    Quickly learn how to use Markov logic and Alchemy
    for a broad spectrum of NLP applications

                                                  21
Overview
   Motivation
   Foundational areas
       Probabilistic inference
       Statistical learning
       Logical inference
       Inductive logic programming
   Markov logic
   NLP applications
       Basics
       Supervised learning
       Unsupervised learning
                                      22
Markov Networks
   Undirected graphical models
         Smoking            Cancer

                   Asthma             Cough
   Potential functions defined over cliques
            1                Smoking Cancer    Ф(S,C)
     P( x)    c ( xc )
            Z c              False    False     4.5
                             False    True      4.5

      Z    c ( xc )      True     False     2.7
           x   c
                             True     True      4.5
                                                        23
Markov Networks
   Undirected graphical models
        Smoking                      Cancer

                    Asthma                         Cough
   Log-linear model:
                  1                   
           P( x)  exp   wi f i ( x) 
                  Z     i             
               Weight of Feature i     Feature i

                             1 if  Smoking  Cancer
    f1 (Smoking, Cancer )  
                             0 otherwise
    w1  1.5
                                                           24
Markov Nets vs. Bayes Nets
Property       Markov Nets        Bayes Nets
Form           Prod. potentials   Prod. potentials
Potentials     Arbitrary          Cond. probabilities
Cycles         Allowed            Forbidden
Partition func. Z = ?             Z=1
Indep. check   Graph separation D-separation
Indep. props. Some                Some
Inference      MCMC, BP, etc.     Convert to Markov
                                                      25
Inference in Markov Networks
   Goal: compute marginals & conditionals of
               1                                                        
    P( X )      exp   wi f i ( X )           Z   exp   wi fi ( X ) 
               Z      i                            X      i             
   Exact inference is #P-complete
   Conditioning on Markov blanket is easy:

P( x | MB( x )) 
                                     w f ( x) 
                                           exp      i   i i

                  exp   w f ( x  0)   exp   w f ( x  1) 
                                 i   i i                       i   i i


   Gibbs sampling exploits this
                                                                               26
MCMC: Gibbs Sampling

state ← random truth assignment
for i ← 1 to num-samples do
   for each variable x
     sample x according to P(x|neighbors(x))
     state ← state with new value of x
P(F) ← fraction of states in which F is true



                                               27
Other Inference Methods
   Belief propagation (sum-product)
   Mean field / Variational approximations




                                              28
MAP/MPE Inference
   Goal: Find most likely state of world given
    evidence

                 max P( y | x)
                    y

                  Query       Evidence



                                                  29
MAP Inference Algorithms
   Iterated conditional modes
   Simulated annealing
   Graph cuts
   Belief propagation (max-product)
   LP relaxation




                                       30
Overview
   Motivation
   Foundational areas
       Probabilistic inference
       Statistical learning
       Logical inference
       Inductive logic programming
   Markov logic
   NLP applications
       Basics
       Supervised learning
       Unsupervised learning
                                      31
Generative Weight Learning
   Maximize likelihood
   Use gradient ascent or L-BFGS
   No local maxima
          
             log Pw ( x)  ni ( x)  Ew ni ( x)
         wi
     No. of times feature i is true in data

                          Expected no. times feature i is true according to model


   Requires inference at each step (slow!)
                                                                             32
Pseudo-Likelihood

        PL ( x)   P ( xi | neighbors ( xi ))
                   i

   Likelihood of each variable given its
    neighbors in the data
   Does not require inference at each step
   Widely used in vision, spatial statistics, etc.
   But PL parameters may not work well for
    long inference chains
                                                      33
Discriminative Weight Learning
   Maximize conditional likelihood of query (y)
    given evidence (x)
     
        log Pw ( y | x)  ni ( x, y )  Ew ni ( x, y )
    wi
     No. of true groundings of clause i in data

                              Expected no. true groundings according to model

   Approximate expected counts by counts in
    MAP state of y given x
                                                                           34
Voted Perceptron
   Originally proposed for training HMMs
    discriminatively
   Assumes network is linear chain
   Can be generalized to arbitrary networks

    wi ← 0
    for t ← 1 to T do
       yMAP ← Viterbi(x)
       wi ← wi + η [counti(yData) – counti(yMAP)]
    return  wi / T
                                                    35
Overview
   Motivation
   Foundational areas
       Probabilistic inference
       Statistical learning
       Logical inference
       Inductive logic programming
   Markov logic
   NLP applications
       Basics
       Supervised learning
       Unsupervised learning
                                      36
First-Order Logic
   Constants, variables, functions, predicates
    E.g.: Anna, x, MotherOf(x), Friends(x, y)
   Literal: Predicate or its negation
   Clause: Disjunction of literals
   Grounding: Replace all variables by constants
    E.g.: Friends (Anna, Bob)
   World (model, interpretation):
    Assignment of truth values to all ground
    predicates
                                              37
Inference in First-Order Logic
   Traditionally done by theorem proving
    (e.g.: Prolog)
   Propositionalization followed by model
    checking turns out to be faster (often by a lot)
   Propositionalization:
    Create all ground atoms and clauses
   Model checking: Satisfiability testing
   Two main approaches:
       Backtracking (e.g.: DPLL)
       Stochastic local search (e.g.: WalkSAT)
                                                   38
Satisfiability
   Input: Set of clauses
    (Convert KB to conjunctive normal form (CNF))
   Output: Truth assignment that satisfies all clauses,
    or failure
   The paradigmatic NP-complete problem
   Solution: Search
   Key point:
    Most SAT problems are actually easy
   Hard region: Narrow range of
    #Clauses / #Variables
                                                           39
Stochastic Local Search
   Uses complete assignments instead of partial
   Start with random state
   Flip variables in unsatisfied clauses
   Hill-climbing: Minimize # unsatisfied clauses
   Avoid local minima: Random flips
   Multiple restarts



                                                40
The WalkSAT Algorithm
for i ← 1 to max-tries do
  solution = random truth assignment
  for j ← 1 to max-flips do
      if all clauses satisfied then
         return solution
      c ← random unsatisfied clause
      with probability p
         flip a random variable in c
      else
         flip variable in c that maximizes # satisfied clauses
return failure


                                                                 41
Overview
   Motivation
   Foundational areas
       Probabilistic inference
       Statistical learning
       Logical inference
       Inductive logic programming
   Markov logic
   NLP applications
       Basics
       Supervised learning
       Unsupervised learning
                                      42
Rule Induction
   Given: Set of positive and negative examples of
    some concept
       Example: (x1, x2, … , xn, y)
       y: concept (Boolean)
       x1, x2, … , xn: attributes (assume Boolean)
   Goal: Induce a set of rules that cover all positive
    examples and no negative ones
       Rule: xa ^ xb ^ …  y (xa: Literal, i.e., xi or its negation)
       Same as Horn clause: Body  Head
       Rule r covers example x iff x satisfies body of r
   Eval(r): Accuracy, info gain, coverage, support, etc.
                                                                        43
Learning a Single Rule
 head ← y
 body ← Ø
 repeat
    for each literal x
      rx ← r with x added to body
      Eval(rx)
      body ← body ^ best x
 until no x improves Eval(r)
 return r
                                    44
Learning a Set of Rules
 R←Ø
 S ← examples
 repeat
    learn a single rule r
    R←RU{r}
   S ← S − positive examples covered by r
 until S = Ø
 return R

                                            45
First-Order Rule Induction
   y and xi are now predicates with arguments
    E.g.: y is Ancestor(x,y), xi is Parent(x,y)
   Literals to add are predicates or their negations
   Literal to add must include at least one variable
    already appearing in rule
   Adding a literal changes # groundings of rule
    E.g.: Ancestor(x,z) ^ Parent(z,y)  Ancestor(x,y)
   Eval(r) must take this into account
    E.g.: Multiply by # positive groundings of rule
          still covered after adding literal
                                                        46
Overview
   Motivation
   Foundational areas
   Markov logic
   NLP applications
       Basics
       Supervised learning
       Unsupervised learning


                                47
Markov Logic
   Syntax: Weighted first-order formulas
   Semantics: Feature templates for Markov
    networks
   Intuition: Soften logical constraints
   Give each formula a weight
    (Higher weight  Stronger constraint)

P(world) exp weights of formulasit satisfies
                                               48
Example: Coreference Resolution

   Mentions of Obama are often headed by "Obama"
   Mentions of Obama are often headed by "President"
   Appositions usually refer to the same entity


             Barack Obama, the 44th President
             of the United States, is the first
             African American to hold the office.
             ……



                                                    49
Example: Coreference Resolution
  x MentionOf ( x, Obama)  Head( x,"Obama ")
  x MentionOf ( x, Obama)  Head( x,"President ")
  x, y, c Apposition( x, y )  MentionOf ( x, c)  MentionOf ( y, c)




                                                                 50
Example: Coreference Resolution
1.5 x MentionOf ( x, Obama)  Head( x,"Obama ")
0.8 x MentionOf ( x, Obama)  Head( x,"President ")
100 x, y, c Apposition( x, y )  MentionOf ( x, c)  MentionOf ( y, c)




                                                                   51
  Example: Coreference Resolution
 1.5 x MentionOf ( x, Obama)  Head( x,"Obama ")
 0.8 x MentionOf ( x, Obama)  Head( x,"President ")
 100 x, y, c Apposition( x, y )  MentionOf ( x, c)  MentionOf ( y, c)

  Two mention constants: A and B
                             Apposition(A,B)
Head(A,“President”)                                    Head(B,“President”)

                MentionOf(A,Obama)      MentionOf(B,Obama)


Head(A,“Obama”)                                         Head(B,“Obama”)
                              Apposition(B,A)                         52
Markov Logic Networks
   MLN is template for ground Markov nets
   Probability of a world x:
                    1                  
             P( x)  exp   wi ni ( x) 
                    Z     i            
               Weight of formula i   No. of true groundings of formula i in x

   Typed variables and constants greatly reduce size of
    ground Markov net
   Functions, existential quantifiers, etc.
   Can handle infinite domains [Singla & Domingos, 2007]
    and continuous domains [Wang & Domingos, 2008]                         53
Relation to Statistical Models
   Special cases:                     Obtained by making all
       Markov networks                 predicates zero-arity
       Markov random fields
       Bayesian networks              Markov logic allows
       Log-linear models               objects to be
       Exponential models              interdependent
       Max. entropy models             (non-i.i.d.)
       Gibbs distributions
       Boltzmann machines
       Logistic regression
       Hidden Markov models
       Conditional random fields
                                                                 54
Relation to First-Order Logic
   Infinite weights  First-order logic
   Satisfiable KB, positive weights 
    Satisfying assignments = Modes of distribution
   Markov logic allows contradictions between
    formulas




                                                55
MLN Algorithms:
The First Three Generations
Problem     First            Second       Third
            generation       generation   generation
MAP         Weighted         Lazy         Cutting
inference   satisfiability   inference    planes
Marginal    Gibbs            MC-SAT       Lifted
inference   sampling                      inference
Weight      Pseudo-          Voted        Scaled conj.
learning    likelihood       perceptron   gradient
Structure   Inductive        ILP + PL     Clustering +
learning    logic progr.     (etc.)       pathfinding
                                                     56
MAP/MPE Inference
   Problem: Find most likely state of world
    given evidence

                 max P( y | x)
                   y

                  Query      Evidence



                                               57
MAP/MPE Inference
   Problem: Find most likely state of world
    given evidence
               1                       
           max    exp   wi ni ( x, y) 
            y  Zx      i               




                                               58
MAP/MPE Inference
   Problem: Find most likely state of world
    given evidence

               max
                 y
                      w n ( x, y )
                      i
                          i i




                                               59
MAP/MPE Inference
   Problem: Find most likely state of world
    given evidence
               max
                 y
                      w n ( x, y )
                      i
                          i i


   This is just the weighted MaxSAT problem
   Use weighted SAT solver
    (e.g., MaxWalkSAT [Kautz et al., 1997] )


                                               60
The MaxWalkSAT Algorithm
for i ← 1 to max-tries do
  solution = random truth assignment
  for j ← 1 to max-flips do
      if  weights(sat. clauses) > threshold then
          return solution
      c ← random unsatisfied clause
      with probability p
         flip a random variable in c
      else
          flip variable in c that maximizes
             weights(sat. clauses)
return failure, best solution found
                                                    61
Computing Probabilities
   P(Formula|MLN,C) = ?
   MCMC: Sample worlds, check formula holds
   P(Formula1|Formula2,MLN,C) = ?
   If Formula2 = Conjunction of ground atoms
       First construct min subset of network necessary to
        answer query (generalization of KBMC)
       Then apply MCMC


                                                         62
But … Insufficient for Logic
   Problem:
    Deterministic dependencies break MCMC
    Near-deterministic ones make it very slow

   Solution:
    Combine MCMC and WalkSAT
    → MC-SAT algorithm [Poon & Domingos, 2006]


                                                 63
Auxiliary-Variable Methods
   Main ideas:
       Use auxiliary variables to capture dependencies
       Turn difficult sampling into uniform sampling
   Given distribution P(x)
                 1, if 0  u  P( x)
    f ( x, u )  
                 0, otherwise
                                          f ( x, u) du  P( x)
   Sample from f (x, u), then discard u



                                                                   64
Slice Sampling [Damien et al. 1999]
    U                                P(x)

                    Slice


         u(k)


                                            X
            x(k)            x(k+1)



                                                65
Slice Sampling
   Identifying the slice may be difficult

                        1
                P( x )    i ( x )
                        Z i
   Introduce an auxiliary variable ui for each Фi

                              1 if 0  ui   i ( x)
        f ( x, u1,   , un )  
                              0 otherwise

                                                        66
The MC-SAT Algorithm
   Select random subset M of satisfied clauses
       With probability 1 – exp ( – wi )
       Larger wi  Ci more likely to be selected
       Hard clause (wi  ): Always selected
   Slice  States that satisfy clauses in M
   Uses SAT solver to sample x | u.
   Orders of magnitude faster than Gibbs sampling,
    etc.


                                                    67
But … It Is Not Scalable
   1000 researchers
   Coauthor(x,y): 1 million ground atoms
   Coauthor(x,y)  Coauthor(y,z)  Coauthor(x,z):
    1 billion ground clauses
   Exponential in arity




                                                     68
    Sparsity to the Rescue
   1000 researchers
   Coauthor(x,y): 1 million ground atoms
    But … most atoms are false
   Coauthor(x,y)  Coauthor(y,z)  Coauthor(x,z):
    1 billion ground clauses
    Most trivially satisfied if most atoms are false
   No need to explicitly compute most of them


                                                 69
Lazy Inference
   LazySAT [Singla & Domingos, 2006a]
       Lazy version of WalkSAT [Selman et al., 1996]
       Grounds atoms/clauses as needed
       Greatly reduces memory usage
   The idea is much more general
    [Poon & Domingos, 2008a]




                                                        70
General Method for Lazy Inference
   If most variables assume the default value,
    wasteful to instantiate all variables / functions
   Main idea:
       Allocate memory for a small subset of
        “active” variables / functions
       Activate more if necessary as inference proceeds
   Applicable to a diverse set of algorithms:
    Satisfiability solvers (systematic, local-search), Markov chain Monte
    Carlo, MPE / MAP algorithms, Maximum expected utility algorithms,
    Belief propagation, MC-SAT, Etc.
   Reduce memory and time by orders of magnitude
                                                                        71
Lifted Inference
   Consider belief propagation (BP)
   Often in large problems, many nodes are
    interchangeable:
    They send and receive the same messages
    throughout BP
   Basic idea: Group them into supernodes,
    forming lifted network
   Smaller network → Faster inference
   Akin to resolution in first-order logic
                                              72
Belief Propagation

            x  f ( x)        
                            hn ( x ) \{ f }
                                            h x   ( x)



  Nodes                                                    Features
  (x)                                                      (f)




                                wf ( x )                  
           f  x ( x)    e             \{}y f ( y) 
                                                           
                         ~{ x}           yn ( f ) x            73
Lifted Belief Propagation

           x  f ( x)        
                           hn ( x ) \{ f }
                                           h x   ( x)



  Nodes                                                   Features
  (x)                                                     (f)




                                wf ( x )                  
           f  x ( x)    e             \{}y f ( y) 
                                                           
                         ~{ x}           yn ( f ) x           74
Lifted Belief Propagation
, :
Functions                                          
of edge      x  f ( x)     h  x ( x)
counts                       hn ( x ) \{ f }




    Nodes                                                   Features
    (x)                                                     (f)




                                  wf ( x )                  
             f  x ( x)    e             \{}y f ( y) 
                                                             
                           ~{ x}           yn ( f ) x           75
Learning
   Data is a relational database
   Closed world assumption (if not: EM)
   Learning parameters (weights)
   Learning structure (formulas)




                                           76
Parameter Learning
   Parameter tying: Groundings of same clause
             
                log P ( x)  ni ( x)  Ex  ni ( x) 
            wi

     No. of times clause i is true in data

                           Expected no. times clause i is true according to MLN


   Generative learning: Pseudo-likelihood
   Discriminative learning: Conditional likelihood,
    use MC-SAT or MaxWalkSAT for inference
                                                                             77
Parameter Learning
   Pseudo-likelihood + L-BFGS is fast and
    robust but can give poor inference results
   Voted perceptron:
    Gradient descent + MAP inference
   Scaled conjugate gradient




                                                 78
Voted Perceptron for MLNs
   HMMs are special case of MLNs
   Replace Viterbi by MaxWalkSAT
   Network can now be arbitrary graph
    wi ← 0
    for t ← 1 to T do
       yMAP ← MaxWalkSAT(x)
       wi ← wi + η [counti(yData) – counti(yMAP)]
    return  wi / T
                                                    79
Problem: Multiple Modes
   Not alleviated by contrastive divergence
   Alleviated by MC-SAT
   Warm start: Start each MC-SAT run at
    previous end state




                                               80
Problem: Extreme Ill-Conditioning



   Solvable by quasi-Newton, conjugate gradient, etc.
   But line searches require exact inference
   Solution: Scaled conjugate gradient
    [Lowd & Domingos, 2008]
   Use Hessian to choose step size
   Compute quadratic form inside MC-SAT
   Use inverse diagonal Hessian as preconditioner
                                                         81
Structure Learning
   Standard inductive logic programming optimizes
    the wrong thing
   But can be used to overgenerate for L1 pruning
   Our approach:
    ILP + Pseudo-likelihood + Structure priors
    [Kok & Domingos 2005, 2008, 2009]
   For each candidate structure change:
    Start from current weights & relax convergence
   Use subsampling to compute sufficient statistics

                                                       82
Structure Learning
   Initial state: Unit clauses or prototype KB
   Operators: Add/remove literal, flip sign
   Evaluation function:
    Pseudo-likelihood + Structure prior
   Search: Beam search, shortest-first search




                                                  83
Alchemy
Open-source software including:
 Full first-order logic syntax

 Generative & discriminative weight learning

 Structure learning

 Weighted satisfiability, MCMC, lifted BP

 Programming language features

       alchemy.cs.washington.edu
                                                84
             Alchemy      Prolog    BUGS

Represent- F.O. Logic +   Horn      Bayes
ation      Markov nets    clauses   nets
Inference    Model check- Theorem MCMC
             ing, MCMC, proving
             lifted BP
Learning     Parameters   No      Params.
             & structure
Uncertainty Yes           No        Yes

Relational   Yes          Yes       No
                                            85
Running Alchemy
   Programs             MLN file
       Infer                Types (optional)
       Learnwts             Predicates
       Learnstruct          Formulas
   Options              Database files




                                                 86
Overview
   Motivation
   Foundational areas
   Markov logic
   NLP applications
       Basics
       Supervised learning
       Unsupervised learning


                                87
Uniform Distribn.: Empty MLN
Example: Unbiased coin flips

Type:      flip = { 1, … , 20 }
Predicate: Heads(flip)


                                1
                                    e0
                                    1
         P(Heads( f ))  1      Z
                                  
                         Z
                           e Ze
                              1 0
                               0
                                    2



                                         88
Binomial Distribn.: Unit Clause
Example: Biased coin flips
Type:      flip = { 1, … , 20 }
Predicate: Heads(flip)
Formula: Heads(f)
                                            p 
Weight:     Log odds of heads:             1 p 
                                   w  log      
                                                
                          1
                              ew 1
      P(Heads(f ))  1    Z
                                   w
                                       p
                     Z
                       e  Z e 1 e
                          w1 0



By default, MLN includes unit clauses for all predicates
(captures marginal distributions, etc.)
                                                           89
Multinomial Distribution
Example: Throwing die

Types:     throw = { 1, … , 20 }
           face = { 1, … , 6 }
Predicate: Outcome(throw,face)
Formulas: Outcome(t,f) ^ f != f’ => !Outcome(t,f’).
           Exist f Outcome(t,f).

Too cumbersome!



                                                 90
Multinomial Distrib.: ! Notation
Example: Throwing die

Types:     throw = { 1, … , 20 }
           face = { 1, … , 6 }
Predicate: Outcome(throw,face!)
Formulas:

Semantics: Arguments without “!” determine arguments with “!”.
Also makes inference more efficient (triggers blocking).



                                                             91
Multinomial Distrib.: + Notation
Example: Throwing biased die

Types:     throw = { 1, … , 20 }
           face = { 1, … , 6 }
Predicate: Outcome(throw,face!)
Formulas: Outcome(t,+f)

Semantics: Learn weight for each grounding of args with “+”.




                                                               92
Logistic Regression (MaxEnt)
                         P(C  1 | F  f ) 
                         P(C  0 | F  f )   a  bi f i
Logistic regression: log                   
                                           
Type:                obj = { 1, ... , n }
Query predicate:     C(obj)
Evidence predicates: Fi(obj)
Formulas:             a    C(x)
                      bi   Fi(x) ^ C(x)
                                                  1                      
Resulting distribution: P(C  c, F  f )           exp  ac   bi f i c 
                                                  Z           i          
                P(C  1 | F  f )         expa   bi f i  
Therefore: log                                                  a   bi f i
                P(C  0 | F  f )   log 
                                                               
                                              exp(0)         

Alternative form:       Fi(x) => C(x)
                                                                                   93
Hidden Markov Models
obs = { Red, Green, Yellow }
state = { Stop, Drive, Slow }
time = { 0, ..., 100 }

State(state!,time)
Obs(obs!,time)

State(+s,0)
State(+s,t) ^ State(+s',t+1)
Obs(+o,t) ^ State(+s,t)

Sparse HMM:
State(s,t) => State(s1,t+1) v State(s2, t+1) v ... .

                                                  94
Bayesian Networks
   Use all binary predicates with same first argument
    (the object x).
   One predicate for each variable A: A(x,v!)
   One clause for each line in the CPT and
    value of the variable
   Context-specific independence:
    One clause for each path in the decision tree
   Logistic regression: As before
   Noisy OR: Deterministic OR + Pairwise clauses

                                                         95
Relational Models
   Knowledge-based model construction
       Allow only Horn clauses
       Same as Bayes nets, except arbitrary relations
       Combin. function: Logistic regression, noisy-OR or external
   Stochastic logic programs
       Allow only Horn clauses
       Weight of clause = log(p)
       Add formulas: Head holds  Exactly one body holds
   Probabilistic relational models
       Allow only binary relations
       Same as Bayes nets, except first argument can vary
                                                                  96
Relational Models
   Relational Markov networks
       SQL → Datalog → First-order logic
       One clause for each state of a clique
       + syntax in Alchemy facilitates this
   Bayesian logic
       Object = Cluster of similar/related observations
       Observation constants + Object constants
       Predicate InstanceOf(Obs,Obj) and clauses using it
   Unknown relations: Second-order Markov logic
    S. Kok & P. Domingos, “Statistical Predicate Invention”, in
    Proc. ICML-2007.

                                                                  97
Overview
   Motivation
   Foundational areas
   Markov logic
   NLP applications
       Basics
       Supervised learning
       Unsupervised learning


                                98
Text Classification

The 56th quadrennial United States presidential
election was held on November 4, 2008. Outgoing
Republican President George W. Bush's policies and       Topic = politics
actions and the American public's desire for change
were key issues throughout the campaign. ……

The Chicago Bulls are an American professional
basketball team based in Chicago, Illinois, playing in
the Central Division of the Eastern Conference in the    Topic = sports
National Basketball Association (NBA). ……
                         ……




                                                                      99
Text Classification
page = {1, ..., max}
word = { ... }
topic = { ... }

Topic(page,topic)
HasWord(page,word)


Topic(p,t)
HasWord(p,+w) => Topic(p,+t)


If topics mutually exclusive: Topic(page,topic!)


                                                   100
Text Classification
page = {1, ..., max}
word = { ... }
topic = { ... }

Topic(page,topic)
HasWord(page,word)
Links(page,page)

Topic(p,t)
HasWord(p,+w) => Topic(p,+t)
Topic(p,t) ^ Links(p,p') => Topic(p',t)


Cf. S. Chakrabarti, B. Dom & P. Indyk, “Hypertext Classification
Using Hyperlinks,” in Proc. SIGMOD-1998.
                                                                   101
  Entity Resolution
AUTHOR: H. POON & P. DOMINGOS
TITLE: UNSUPERVISED SEMANTIC PARSING
VENUE: EMNLP-09
                                                                    SAME?
AUTHOR: Hoifung Poon and Pedro Domings
TITLE: Unsupervised semantic parsing
VENUE: Proceedings of the 2009 Conference on Empirical Methods in
Natural Language Processing
AUTHOR: Poon, Hoifung and Domings, Pedro
TITLE: Unsupervised ontology induction from text
VENUE: Proceedings of the Forty-Eighth Annual Meeting of the
Association for Computational Linguistics
                                                                    SAME?
AUTHOR: H. Poon, P. Domings
TITLE: Unsupervised ontology induction
VENUE: ACL-10                                                        102
Entity Resolution
Problem: Given database, find duplicate records

HasToken(token,field,record)
SameField(field,record,record)
SameRecord(record,record)

HasToken(+t,+f,r) ^ HasToken(+t,+f,r’)
      => SameField(f,r,r’)
SameField(f,r,r’) => SameRecord(r,r’)




                                                  103
Entity Resolution
Problem: Given database, find duplicate records

HasToken(token,field,record)
SameField(field,record,record)
SameRecord(record,record)

HasToken(+t,+f,r) ^ HasToken(+t,+f,r’)
      => SameField(f,r,r’)
SameField(f,r,r’) => SameRecord(r,r’)
SameRecord(r,r’) ^ SameRecord(r’,r”)
      => SameRecord(r,r”)


Cf. A. McCallum & B. Wellner, “Conditional Models of Identity Uncertainty
with Application to Noun Coreference,” in Adv. NIPS 17, 2005.
                                                                    104
Entity Resolution
Can also resolve fields:

HasToken(token,field,record)
SameField(field,record,record)
SameRecord(record,record)

HasToken(+t,+f,r) ^ HasToken(+t,+f,r’)
      => SameField(f,r,r’)
SameField(f,r,r’) <=> SameRecord(r,r’)
SameRecord(r,r’) ^ SameRecord(r’,r”)
      => SameRecord(r,r”)
SameField(f,r,r’) ^ SameField(f,r’,r”)
      => SameField(f,r,r”)

More: P. Singla & P. Domingos, “Entity Resolution with Markov
Logic”, in Proc. ICDM-2006.                                     105
Information Extraction

Unsupervised Semantic Parsing, Hoifung Poon and Pedro Domingos.
Proceedings of the 2009 Conference on Empirical Methods in Natural
Language Processing. Singapore: ACL.




UNSUPERVISED SEMANTIC PARSING. H. POON & P. DOMINGOS.
EMNLP-2009.



                                                                     106
Information Extraction
                                     Author        Title      Venue

Unsupervised Semantic Parsing, Hoifung Poon and Pedro Domingos.
Proceedings of the 2009 Conference on Empirical Methods in Natural
Language Processing. Singapore: ACL.


                                                  SAME?


UNSUPERVISED SEMANTIC PARSING. H. POON & P. DOMINGOS.
EMNLP-2009.



                                                                     107
Information Extraction
   Problem: Extract database from text or
    semi-structured sources
   Example: Extract database of publications
    from citation list(s) (the “CiteSeer problem”)
   Two steps:
       Segmentation:
        Use HMM to assign tokens to fields
       Entity resolution:
        Use logistic regression and transitivity
                                                     108
Information Extraction
Token(token, position, citation)
InField(position, field!, citation)
SameField(field, citation, citation)
SameCit(citation, citation)

Token(+t,i,c) => InField(i,+f,c)
InField(i,+f,c) ^ InField(i+1,+f,c)


Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’)
   ^ InField(i’,+f,c’) => SameField(+f,c,c’)
SameField(+f,c,c’) <=> SameCit(c,c’)
SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)
SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)




                                                        109
Information Extraction
Token(token, position, citation)
InField(position, field!, citation)
SameField(field, citation, citation)
SameCit(citation, citation)

Token(+t,i,c) => InField(i,+f,c)
InField(i,+f,c) ^ !Token(“.”,i,c) ^ InField(i+1,+f,c)


Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’)
   ^ InField(i’,+f,c’) => SameField(+f,c,c’)
SameField(+f,c,c’) <=> SameCit(c,c’)
SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)
SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)

More: H. Poon & P. Domingos, “Joint Inference in Information
Extraction”, in Proc. AAAI-2007.
                                                               110
Biomedical Text Mining
   Traditionally, name entity recognition or
    information extraction
    E.g., protein recognition, protein-protein identification
   BioNLP-09 shared task: Nested bio-events
       Much harder than traditional IE
       Top F1 around 50%
       Naturally calls for joint inference



                                                           111
Bio-Event Extraction
 Involvement of p70(S6)-kinase activation in IL-10
 up-regulation in human monocytes by gp41 envelope
 protein of human immunodeficiency virus type 1 ...
                        involvement
                Theme                  Cause

           up-regulation                activation
   Theme      Cause      Site                  Theme

                          human
IL-10         gp41                    p70(S6)-kinase
                         monocyte
                                                       112
Bio-Event Extraction
Token(position, token)
DepEdge(position, position, dependency)
IsProtein(position)
EvtType(position, evtType)                     Logistic
InArgPath(position, position, argType!)      regression

Token(i,+w) => EvtType(i,+t)
Token(j,w) ^ DepEdge(i,j,+d) => EvtType(i,+t)
DepEdge(i,j,+d) => InArgPath(i,j,+a)
Token(i,+w) ^ DepEdge(i,j,+d) => InArgPath(i,j,+a)
…




                                                          113
Bio-Event Extraction
Token(position, token)
DepEdge(position, position, dependency)
IsProtein(position)
EvtType(position, evtType)
InArgPath(position, position, argType!)

Token(i,+w) => EvtType(i,+t)
Token(j,w) ^ DepEdge(i,j,+d) => EvtType(i,+t)
DepEdge(i,j,+d) => InArgPath(i,j,+a)
                                     Adding a few joint inference
Token(i,+w) ^ DepEdge(i,j,+d) => InArgPath(i,j,+a)
…                                       rules doubles the F1

InArgPath(i,j,Theme) => IsProtein(j) v
       (Exist k k!=i ^ InArgPath(j, k, Theme)).
…

More: H. Poon and L. Vanderwende, “Joint Inference for Knowledge
Extraction from Biomedical Literature”, NAACL-2010.
                                                                   114
Temporal Information Extraction
   Identify event times and temporal relations
    (BEFORE, AFTER, OVERLAP)
   E.g., who is the President of U.S.A.?
       Obama: 1/20/2009  present
       G. W. Bush: 1/20/2001  1/19/2009
       Etc.




                                                  115
Temporal Information Extraction
DepEdge(position, position, dependency)
Event(position, event)
After(event, event)


DepEdge(i,j,+d) ^ Event(i,p) ^ Event(j,q) => After(p,q)


After(p,q) ^ After(q,r) => After(p,r)




                                                          116
Temporal Information Extraction
DepEdge(position, position, dependency)
Event(position, event)
After(event, event)
Role(position, position, role)

DepEdge(I,j,+d) ^ Event(i,p) ^ Event(j,q) => After(p,q)
Role(i,j,ROLE-AFTER) ^ Event(i,p) ^ Event(j,q) => After(p,q)

After(p,q) ^ After(q,r) => After(p,r)

More:
K. Yoshikawa, S. Riedel, M. Asahara and Y. Matsumoto, “Jointly
Identifying Temporal Relations with Markov Logic”, in Proc. ACL-2009.

X. Ling & D. Weld, “Temporal Information Extraction”, in Proc. AAAI-2010.

                                                                    117
Semantic Role Labeling
   Problem: Identify arguments for a predicate
   Two steps:
       Argument identification:
        Determine whether a phrase is an argument
       Role classification:
        Determine the type of an argument (agent, theme,
        temporal, adjunct, etc.)



                                                      118
Semantic Role Labeling
Token(position, token)
DepPath(position, position, path)
IsPredicate(position)
Role(position, position, role!)
HasRole(position, position)


Token(i,+t) => IsPredicate(i)
DepPath(i,j,+p) => Role(i,j,+r)


HasRole(i,j) => IsPredicate(i)
IsPredicate(i) => Exist j HasRole(i,j)
HasRole(i,j) => Exist r Role(i,j,r)
Role(i,j,r) => HasRole(i,j)

Cf. K. Toutanova, A. Haghighi, C. Manning, “A global joint model for
semantic role labeling”, in Computational Linguistics 2008.
                                                                       119
Joint Semantic Role Labeling
and Word Sense Disambiguation
Token(position, token)
DepPath(position, position, path)
IsPredicate(position)
Role(position, position, role!)
HasRole(position, position)
Sense(position, sense!)

Token(i,+t) => IsPredicate(i)
DepPath(i,j,+p) => Role(i,j,+r)
Sense(I,s) => IsPredicate(i)

HasRole(i,j) => IsPredicate(i)
IsPredicate(i) => Exist j HasRole(i,j)
HasRole(i,j) => Exist r Role(i,j,r)
Role(i,j,r) => HasRole(i,j)
Token(i,+t) ^ Role(i,j,+r) => Sense(i,+s)

More: I. Meza-Ruiz & S. Riedel, “Jointly Identifying Predicates,
Arguments and Senses using Markov Logic”, in Proc. NAACL-2009.
                                                                   120
Practical Tips: Modeling
   Add all unit clauses (the default)
   How to handle uncertain data:
    R(x,y) ^ R’(x,y) (the “HMM trick”)
   Implications vs. conjunctions
    For soft correlation, conjunctions often better
     Implication: A => B is equivalent to !(A ^ !B)
     Share cases with others like A => C
     Make learning unnecessarily harder


                                                       121
Practical Tips: Efficiency
   Open/closed world assumptions
   Low clause arities
   Low numbers of constants
   Short inference chains




                                    122
Practical Tips: Development
   Start with easy components
   Gradually expand to full task
   Use the simplest MLN that works
   Cycle: Add/delete formulas, learn and test




                                             123
Overview
   Motivation
   Foundational areas
   Markov logic
   NLP applications
       Basics
       Supervised learning
       Unsupervised learning


                                124
Unsupervised Learning: Why?
   Virtually unlimited supply of unlabeled text
   Labeling is expensive (Cf. Penn-Treebank)
   Often difficult to label with consistency and
    high quality (e.g., semantic parses)
   Emerging field: Machine reading
    Extract knowledge from unstructured text with
    high precision/recall and minimal human effort
    (More in tomorrow’s talk at 4PM)
                                                125
Unsupervised Learning: How?
   I.i.d. learning: Sophisticated model requires
    more labeled data
   Statistical relational learning: Sophisticated
    model may require less labeled data
       Relational dependencies constrain problem space
       One formula is worth a thousand labels
   Small amount of domain knowledge 
    large-scale joint inference

                                                     126
Unsupervised Learning: How?
   Ambiguities vary among objects
   Joint inference  Propagate information from
    unambiguous objects to ambiguous ones
   E.g.:                  Are they
    G. W. Bush …          coreferent?
    He …
    …
    Mrs. Bush …
                                              127
Unsupervised Learning: How
   Ambiguities vary among objects
   Joint inference  Propagate information from
    unambiguous objects to ambiguous ones
   E.g.:                  Should be
    G. W. Bush …           coreferent
    He …
    …
    Mrs. Bush …
                                              128
Unsupervised Learning: How
   Ambiguities vary among objects
   Joint inference  Propagate information from
    unambiguous objects to ambiguous ones
   E.g.:                  So must be
    G. W. Bush …         singular male!
    He …
    …
    Mrs. Bush …
                                              129
Unsupervised Learning: How
   Ambiguities vary among objects
   Joint inference  Propagate information from
    unambiguous objects to ambiguous ones
   E.g.:                    Must be
    G. W. Bush …         singular female!
    He …
    …
    Mrs. Bush …
                                              130
Unsupervised Learning: How
   Ambiguities vary among objects
   Joint inference  Propagate information from
    unambiguous objects to ambiguous ones
   E.g.:                   Verdict:
    G. W. Bush …         Not coreferent!
    He …
    …
    Mrs. Bush …
                                              131
Parameter Learning
   Marginalize out hidden variables
     
        log P ( x)  Ez| x  ni ( x, z )   Ex , z  ni ( x, z ) 
    wi

     Sum over z, conditioned on observed x

                                                 Summed over both x and z


   Use MC-SAT to approximate both expectations
   May also combine with contrastive estimation
    [Poon & Cherry & Toutanova, NAACL-2009]
                                                                       132
Unsupervised Coreference Resolution
Head(mention, string)
Type(mention, type)
MentionOf(mention, entity)
                                        Mixture model

MentionOf(+m,+e)
Type(+m,+t)                                    Joint inference formulas:
Head(+m,+h) ^ MentionOf(+m,+e)                    Enforce agreement
MentionOf(a,e) ^ MentionOf(b,e) => (Type(a,t) <=> Type(b,t))

… (similarly for Number, Gender etc.)




                                                                       133
Unsupervised Coreference Resolution
Head(mention, string)
Type(mention, type)
MentionOf(mention, entity)
Apposition(mention, mention)

MentionOf(+m,+e)
Type(+m,+t)
Head(+m,+h) ^ MentionOf(+m,+e)

MentionOf(a,e) ^ MentionOf(b,e) => (Type(a,t) <=> Type(b,t))
                                         Joint inference formulas:
… (similarly for Number, Gender etc.)      Leverage apposition

Apposition(a,b) => (MentionOf(a,e) <=> MentionOf(b,e))

More: H. Poon and P. Domingos, “Joint Unsupervised Coreference
Resolution with Markov Logic”, in Proc. EMNLP-2008.              134
USP: End-to-End Machine Reading
    [Poon & Domingos, EMNLP-2009, ACL-2010]
   Read text, extract knowledge, answer
    questions, all without any training examples
   Recursively clusters expressions composed
    with or by similar expressions
   Compared to state of the art like TextRunner,
    five-fold increase in recall, precision from
    below 60% to 91%
    (More in tomorrow’s talk at 4PM)
                                                135
End of The Beginning …
   Let’s not forget our grand goal:
    Computers understand natural languages
   Time to think about a new synthesis
       Integrate previously fragmented subfields
       Adopt 80/20 rule
       End-to-end evaluations
   Statistical relational learning offers a
    promising new tool for this
   Growth area of machine learning and NLP
                                                    136
Future Work: Inference
   Scale up joint inference
       Cutting-planes methods (e.g., [Riedel, 2008])
       Unify lifted inference with sampling
       Coarse-to-fine inference [Kiddon & Domingos, 2010]
   Alternative technology
    E.g., linear programming, lagrangian relaxation




                                                             137
Future Work: Supervised Learning

   Alternative optimization objectives
    E.g., max-margin learning [Huynh & Mooney, 2009]
   Learning for efficient inference
    E.g., learning arithmetic circuits [Lowd & Domingos, 2008]
   Structure learning:
    Improve accuracy and scalability
    E.g., [Kok & Domingos, 2009]


                                                             138
Future Work: Unsupervised Learning

   Model: Learning objective, formalism, etc.
   Learning: Local optima, intractability, etc.
   Hyperparameter tuning
   Leverage available resources
       Semi-supervised learning
       Multi-task learning
       Transfer learning (e.g., domain adaptation)
   Human in the loop
    E.g., interative ML, active learning, crowdsourcing
                                                          139
Future Work: NLP Applications
   Existing application areas:
       More joint inference opportunities
       Additional domain knowledge
       Combine multiple pipeline stages
   A “killer app”: Machine reading
   Many, many more awaiting YOU to discover



                                               140
Summary
   We need to unify logical and statistical NLP
   Markov logic provides a language for this
       Syntax: Weighted first-order formulas
       Semantics: Feature templates of Markov nets
       Inference: Satisfiability, MCMC, lifted BP, etc.
       Learning: Pseudo-likelihood, VP, PSCG, ILP, etc.
   Growing set of NLP applications
   Open-source software: Alchemy
               alchemy.cs.washington.edu
   Book: Domingos & Lowd, Markov Logic,
    Morgan & Claypool, 2009.                           141
References
[Banko et al., 2007] Michele Banko, Michael J. Cafarella, Stephen
   Soderland, Matt Broadhead, Oren Etzioni, "Open Information
   Extraction From the Web", In Proc. IJCAI-2007.

[Chakrabarti et al., 1998] Soumen Chakrabarti, Byron Dom, Piotr Indyk,
   "Hypertext Classification Using Hyperlinks", in Proc. SIGMOD-1998.

[Damien et al., 1999] Paul Damien, Jon Wakefield, Stephen Walker,
   "Gibbs sampling for Bayesian non-conjugate and hierarchical
   models by auxiliary variables", Journal of the Royal Statistical
   Society B, 61:2.

[Domingos & Lowd, 2009] Pedro Domingos and Daniel Lowd, Markov
   Logic, Morgan & Claypool.

[Friedman et al., 1999] Nir Friedman, Lise Getoor, Daphne Koller, Avi
    Pfeffer, "Learning probabilistic relational models", in Proc. IJCAI- 142
    1999.
References
[Halpern, 1990] Joe Halpern, "An analysis of first-order logics of
   probability", Artificial Intelligence 46.

[Huynh & Mooney, 2009] Tuyen Huynh and Raymond Mooney, "Max-
   Margin Weight Learning for Markov Logic Networks", In Proc.
   ECML-2009.

[Kautz et al., 1997] Henry Kautz, Bart Selman, Yuejun Jiang, "A general
   stochastic approach to solving problems with hard and soft
   constraints", In The Satisfiability Problem: Theory and Applications.
   AMS.

[Kok & Domingos, 2007] Stanley Kok and Pedro Domingos, "Statistical
   Predicate Invention", In Proc. ICML-2007.

[Kok & Domingos, 2008] Stanley Kok and Pedro Domingos, "Extracting
   Semantic Networks from Text via Relational Clustering", In Proc. 143
   ECML-2008.
References
[Kok & Domingos, 2009] Stanley Kok and Pedro Domingos, "Learning
   Markov Logic Network Structure via Hypergraph Lifting", In Proc.
   ICML-2009.

[Ling & Weld, 2010] Xiao Ling and Daniel S. Weld, "Temporal
    Information Extraction", In Proc. AAAI-2010.

[Lowd & Domingos, 2007] Daniel Lowd and Pedro Domingos, "Efficient
   Weight Learning for Markov Logic Networks", In Proc. PKDD-2007.

[Lowd & Domingos, 2008] Daniel Lowd and Pedro Domingos,
   "Learning Arithmetic Circuits", In Proc. UAI-2008.

[Meza-Ruiz & Riedel, 2009] Ivan Meza-Ruiz and Sebastian Riedel,
  "Jointly Identifying Predicates, Arguments and Senses using Markov
  Logic", In Proc. NAACL-2009.
                                                                      144
References
[Muggleton, 1996] Stephen Muggleton, "Stochastic logic programs", in
  Proc. ILP-1996.

[Nilsson, 1986] Nil Nilsson, "Probabilistic logic", Artificial Intelligence
    28.

[Page et al., 1998] Lawrence Page, Sergey Brin, Rajeev Motwani, Terry
   Winograd, "The PageRank Citation Ranking: Bringing Order to the
   Web", Tech. Rept., Stanford University, 1998.

[Poon & Domingos, 2006] Hoifung Poon and Pedro Domingos, "Sound
   and Efficient Inference with Probabilistic and Deterministic
   Dependencies", In Proc. AAAI-06.

[Poon & Domingos, 2007] Hoifung Poon and Pedro Domingo, "Joint
   Inference in Information Extraction", In Proc. AAAI-07.
                                                                              145
References
[Poon & Domingos, 2008a] Hoifung Poon, Pedro Domingos, Marc
   Sumner, "A General Method for Reducing the Complexity of
   Relational Inference and its Application to MCMC", In Proc. AAAI-
   08.

[Poon & Domingos, 2008b] Hoifung Poon and Pedro Domingos, "Joint
   Unsupervised Coreference Resolution with Markov Logic", In Proc.
   EMNLP-08.

[Poon & Domingos, 2009] Hoifung and Pedro Domingos,
   "Unsupervised Semantic Parsing", In Proc. EMNLP-09.

[Poon & Cherry & Toutanova, 2009] Hoifung Poon, Colin Cherry,
   Kristina Toutanova, "Unsupervised Morphological Segmentation
   with Log-Linear Models", In Proc. NAACL-2009.

                                                                       146
References
[Poon & Vanderwende, 2010] Hoifung Poon and Lucy Vanderwende,
   "Joint Inference for Knowledge Extraction from Biomedical
   Literature", In Proc. NAACL-10.

[Poon & Domingos, 2010] Hoifung and Pedro Domingos,
   "Unsupervised Ontology Induction From Text", In Proc. ACL-10.

[Riedel 2008] Sebatian Riedel, "Improving the Accuracy and Efficiency
   of MAP Inference for Markov Logic", In Proc. UAI-2008.

[Riedel et al., 2009] Sebastian Riedel, Hong-Woo Chun, Toshihisa
   Takagi and Jun'ichi Tsujii, "A Markov Logic Approach to Bio-
   Molecular Event Extraction", In Proc. BioNLP 2009 Shared Task.

[Selman et al., 1996] Bart Selman, Henry Kautz, Bram Cohen, "Local
   search strategies for satisfiability testing", In Cliques, Coloring, and
   Satisfiability: Second DIMACS Implementation Challenge. AMS. 147
References
[Singla & Domingos, 2006a] Parag Singla and Pedro Domingos,
   "Memory-Efficient Inference in Relational Domains", In Proc. AAAI-
   2006.

[Singla & Domingos, 2006b] Parag Singla and Pedro Domingos, "Entity
   Resolution with Markov Logic", In Proc. ICDM-2006.

[Singla & Domingos, 2007] Parag Singla and Pedro Domingos,
   "Markov Logic in Infinite Domains", In Proc. UAI-2007.

[Singla & Domingos, 2008] Parag Singla and Pedro Domingos, "Lifted
   First-Order Belief Propagation", In Proc. AAAI-2008.

[Taskar et al., 2002] Ben Taskar, Pieter Abbeel, Daphne Koller,
   "Discriminative probabilistic models for relational data", in Proc. UAI-
   2002.
                                                                         148
References
[Toutanova & Haghighi & Manning, 2008] Kristina Toutanova, Aria
   Haghighi, Chris Manning, "A global joint model for semantic role
   labeling", Computational Linguistics.

[Wang & Domingos, 2008] Jue Wang and Pedro Domingos, "Hybrid
  Markov Logic Networks", In Proc. AAAI-2008.

[Wellman et al., 1992] Michael Wellman, John S. Breese, Robert P.
  Goldman, "From knowledge bases to decision models", Knowledge
  Engineering Review 7.

[Yoshikawa et al., 2009] Katsumasa Yoshikawa, Sebastian Riedel,
   Masayuki Asahara and Yuji Matsumoto, "Jointly Identifying
   Temporal Relations with Markov Logic", In Proc. ACL-2009.


                                                                      149

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:16
posted:8/14/2012
language:English
pages:149