Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Corpus Annotation for Computational Linguistics by gabyion

VIEWS: 12 PAGES: 24

									 Name Tagging: Maximum Entropy
 Model



                         Heng Ji
                  hengji@cs.qc.cuny.edu
                      February 11, 2009




Acknowledgement: some slides from Fei Xia and Radu Florian
                                                             1
Deadlines
 Assignment 1: Feb 18
 Assignment 2: Feb 25
 Assignment 3 (today’s): March 4
 Questions?




                                    2/40
Introduction
 More about BIO Format
 F-Measure Scoring (Cont’)
 Maximum Entropy Model
     Basic Idea
     Feature Encoding
     Tool
 Homework Preview



                              3/40
BIO Format
 CONLL Task
 http://www.cnts.ua.ac.be/conll2002/ner/




                                            4/40
Scoring: F-Measure



 System     A                Reference
                 B     C




  Precision (P) = #B / #A
  Recall (R) = #B / #C
  F-Measure (F) = 2*P*R / (P+R)

                                         5/40
Name Tagging Exercise: Data
 Reference
  The performance at <ORG>Kent State</ORG> features soloists
  <PER>Michael Todd Simpson </PER> (baritone) and
  <PER>Kate Lindsey </PER> (mezzo-soprano), as well as
  narrator <PER>Dorothy Silver</PER>, one of
  <LOC>Cleveland</LOC>’s most significant stage actors.

 System
  The performance at <LOC>Kent State</LOC> features soloists
  <PER>Michael Todd Simpson </PER> (baritone) and
  <PER>Kate Lindsey (mezzo-soprano) </PER>, as well as
  narrator Dorothy Silver, one of <LOC>Cleveland</LOC>’s most
  significant stage actors.



                                                           6/40
  Name Tagging by Maximum Entropy
  Model

 Maximum Entropy is a technique for learning probability
   distributions from data

 “Don’t assume anything about your probability distribution other
   than what you have observed.”

 Always choose the most uniform distribution subject to the
   observed constraints.




                                                                     7/40
   The basic idea
 Goal: estimate p


 Choose p with maximum entropy (or “uncertainty”) subject
  to the constraints (or “evidence”).



            H ( p)       p( x) log p( x)
                         xA B


             x  (a, b), where a  A  b  B
                                                      8/40
Setting
 From training data, collect (a, b) pairs:
    a: thing to be predicted (e.g., a class in a
     classification problem)
    b: the context
    Ex: Name tagging:
         a=person
         b=the words in a window and previous two tags



 Learn the prob of each (a, b): p(a, b)

                                                      9/40
Maximum Entropy
 Why maximum entropy?
 Maximize entropy = Minimize commitment

 Model all that is known and assume nothing about
  what is unknown.
      Model all that is known: satisfy a set of constraints that
       must hold

      Assume nothing about what is unknown:
       choose the most “uniform” distribution
        choose the one with maximum entropy



                                                              10/40
Why Try to be Uniform?

  Most Uniform = Maximum Entropy

  By making the distribution as uniform as possible, we don’t make
    any additional assumptions to what is supported by the data
  Matches intuition of how probability distributions should be
    estimated from data

  Abides by the principle of Occam’s Razor
 (least assumption made = simplest explanation)




                                                                  11/40
    Ex1: Coin-flip example
    (Klein & Manning 2003)
       Toss a coin: p(H)=p1, p(T)=p2.
       Constraint: p1 + p2 = 1
       Question: what’s your estimation of p=(p1, p2)?
       Answer: choose the p that maximizes H(p)

         H ( p)   p( x) log p( x)
                       x




H




                 p1                                       p1=0.3   12/40
    Coin-flip example (cont)



H

                       p1 + p2 = 1


    p1      p2




                                          13/40
                      p1+p2=1.0, p1=0.3
  Maximum Entropy Model (Cont’)


 An expert can classify mention pairs into 2 classes: Corefered,
   Not-Corefered

 The training data is a set of tokens; each token is represented by a
   vector of features

 We want to construct a probability distribution that represents the
   tokens




                                                                    14/40
 Possible Features
• Words and lemmas in a 5-word window
Sonny Bono was an advanced skier on an intermediate run , the Orion
  trail . And he was familiar with this popular South Lake TAHOE
  resort , having skied at Heavenly for more than 20 years .




 {Sonny-1, Bono0, was+1 , be+1, an+2}        PER-NAM-s

{ skied-2, ski-2, at-1, Heavenly0, for1, more2 } LOC-NAM-s


                                                               15/40
   Possible Features
  • Prefixes/suffixes of length up to 4
  Sonny Bono was an advanced skier on an intermediate run , the Orion
     trail . And he was familiar with this popular South Lake TAHOE
     resort , having skied at Heavenly for more than 20 years .




{s_, sk_, ski_, skie_, _r, _er, _ier, ..}




                                                                  16/40
   Possible Features
 • Word flags
  Sonny Bono was an advanced skier on an intermediate run , the Orion
    trail . And he was familiar with this popular South Lake TAHOE resort ,
    having skied at Heavenly for more than 20 years .



number                            allLower
                 firstCap

         allCaps


                                                                  17/40
  Possible Features
 • Gazetteer information
 Sonny Bono was an advanced skier on an intermediate run , the Orion
    trail . And he was familiar with this popular South Lake TAHOE resort ,
    having skied at Heavenly for more than 20 years .




Location          Person

       Location


                                                                  18/40
   Possible Features
  • Affix Information
  Sonny Bono was an advanced skier on an intermediate run , the Orion
     trail . And he was familiar with this popular South Lake TAHOE resort ,
     having skied at Heavenly for more than 20 years .




year+~s      advance + ~ed             have + ~ing




                                                                   19/40
 Possible Features
• Wordnet-based features
Sonny Bono was an advanced skier on an intermediate run ,
 the Orion trail . And he was familiar with this popular South
 Lake TAHOE resort , having skied at Heavenly for more than
 20 years .

         geographical area   cardinal compass point    athlete
                                                         confederacy

                                                      contestant
              region
                                                       person
              location
                                                      life-form
              object
                                                        entity
              entity

                                                                       20/40
 Possible Features
• Wordnet-based features                    PER
Sonny Bono was an advanced skier on an intermediate run , the Orion
  trail . And he was familiar with this popular South Lake TAHOE resort ,
  having skied at Heavenly for more than 20 years .




                  PER                                 LOC
                            LOC
         LOC


                                                                21/40
 Possible Features
• Syntactic features: POS tag and text chunk label
Sonny Bono was an advanced skier on an intermediate run , the Orion
  trail . And he was familiar with this popular South Lake TAHOE
  resort , having skied at Heavenly for more than 20 years .


                                     VBN
NNP          VBD
                                     I-VP
B-NP         B-VP




                                                               22/40
Maximum Entropy Tool: OpenNLP
 http://maxent.sourceforge.net/
 Training Demo
 Test Demo




                                   23/40
Homework Preview
 http://nlp.cs.nyu.edu/Assignment3.doc
 Data Format




                                          24/40

								
To top