Your Federal Quarterly Tax Payments are due April 15th

# Corpus Annotation for Computational Linguistics by gabyion

VIEWS: 12 PAGES: 24

• pg 1
```									 Name Tagging: Maximum Entropy
Model

Heng Ji
hengji@cs.qc.cuny.edu
February 11, 2009

Acknowledgement: some slides from Fei Xia and Radu Florian
1
 Assignment 1: Feb 18
 Assignment 2: Feb 25
 Assignment 3 (today’s): March 4
 Questions?

2/40
Introduction
 F-Measure Scoring (Cont’)
 Maximum Entropy Model
   Basic Idea
   Feature Encoding
   Tool
 Homework Preview

3/40
BIO Format
 http://www.cnts.ua.ac.be/conll2002/ner/

4/40
Scoring: F-Measure

System     A                Reference
B     C

 Precision (P) = #B / #A
 Recall (R) = #B / #C
 F-Measure (F) = 2*P*R / (P+R)

5/40
Name Tagging Exercise: Data
 Reference
The performance at <ORG>Kent State</ORG> features soloists
<PER>Michael Todd Simpson </PER> (baritone) and
<PER>Kate Lindsey </PER> (mezzo-soprano), as well as
narrator <PER>Dorothy Silver</PER>, one of
<LOC>Cleveland</LOC>’s most significant stage actors.

 System
The performance at <LOC>Kent State</LOC> features soloists
<PER>Michael Todd Simpson </PER> (baritone) and
<PER>Kate Lindsey (mezzo-soprano) </PER>, as well as
narrator Dorothy Silver, one of <LOC>Cleveland</LOC>’s most
significant stage actors.

6/40
Name Tagging by Maximum Entropy
Model

 Maximum Entropy is a technique for learning probability
distributions from data

than what you have observed.”

 Always choose the most uniform distribution subject to the
observed constraints.

7/40
The basic idea
 Goal: estimate p

 Choose p with maximum entropy (or “uncertainty”) subject
to the constraints (or “evidence”).

H ( p)       p( x) log p( x)
xA B

x  (a, b), where a  A  b  B
8/40
Setting
 From training data, collect (a, b) pairs:
 a: thing to be predicted (e.g., a class in a
classification problem)
 b: the context
 Ex: Name tagging:
   a=person
   b=the words in a window and previous two tags

 Learn the prob of each (a, b): p(a, b)

9/40
Maximum Entropy
 Why maximum entropy?
 Maximize entropy = Minimize commitment

 Model all that is known and assume nothing about
what is unknown.
   Model all that is known: satisfy a set of constraints that
must hold

   Assume nothing about what is unknown:
choose the most “uniform” distribution
 choose the one with maximum entropy

10/40
Why Try to be Uniform?

 Most Uniform = Maximum Entropy

 By making the distribution as uniform as possible, we don’t make
any additional assumptions to what is supported by the data
 Matches intuition of how probability distributions should be
estimated from data

 Abides by the principle of Occam’s Razor
(least assumption made = simplest explanation)

11/40
Ex1: Coin-flip example
(Klein & Manning 2003)
   Toss a coin: p(H)=p1, p(T)=p2.
   Constraint: p1 + p2 = 1
   Question: what’s your estimation of p=(p1, p2)?
   Answer: choose the p that maximizes H(p)

H ( p)   p( x) log p( x)
x

H

p1                                       p1=0.3   12/40
Coin-flip example (cont)

H

p1 + p2 = 1

p1      p2

13/40
p1+p2=1.0, p1=0.3
Maximum Entropy Model (Cont’)

 An expert can classify mention pairs into 2 classes: Corefered,
Not-Corefered

 The training data is a set of tokens; each token is represented by a
vector of features

 We want to construct a probability distribution that represents the
tokens

14/40
Possible Features
• Words and lemmas in a 5-word window
Sonny Bono was an advanced skier on an intermediate run , the Orion
trail . And he was familiar with this popular South Lake TAHOE
resort , having skied at Heavenly for more than 20 years .

{Sonny-1, Bono0, was+1 , be+1, an+2}        PER-NAM-s

{ skied-2, ski-2, at-1, Heavenly0, for1, more2 } LOC-NAM-s

15/40
Possible Features
• Prefixes/suffixes of length up to 4
Sonny Bono was an advanced skier on an intermediate run , the Orion
trail . And he was familiar with this popular South Lake TAHOE
resort , having skied at Heavenly for more than 20 years .

{s_, sk_, ski_, skie_, _r, _er, _ier, ..}

16/40
Possible Features
• Word flags
Sonny Bono was an advanced skier on an intermediate run , the Orion
trail . And he was familiar with this popular South Lake TAHOE resort ,
having skied at Heavenly for more than 20 years .

number                            allLower
firstCap

allCaps

17/40
Possible Features
• Gazetteer information
Sonny Bono was an advanced skier on an intermediate run , the Orion
trail . And he was familiar with this popular South Lake TAHOE resort ,
having skied at Heavenly for more than 20 years .

Location          Person

Location

18/40
Possible Features
• Affix Information
Sonny Bono was an advanced skier on an intermediate run , the Orion
trail . And he was familiar with this popular South Lake TAHOE resort ,
having skied at Heavenly for more than 20 years .

year+~s      advance + ~ed             have + ~ing

19/40
Possible Features
• Wordnet-based features
Sonny Bono was an advanced skier on an intermediate run ,
the Orion trail . And he was familiar with this popular South
Lake TAHOE resort , having skied at Heavenly for more than
20 years .

geographical area   cardinal compass point    athlete
confederacy

contestant
region
person
location
life-form
object
entity
entity

20/40
Possible Features
• Wordnet-based features                    PER
Sonny Bono was an advanced skier on an intermediate run , the Orion
trail . And he was familiar with this popular South Lake TAHOE resort ,
having skied at Heavenly for more than 20 years .

PER                                 LOC
LOC
LOC

21/40
Possible Features
• Syntactic features: POS tag and text chunk label
Sonny Bono was an advanced skier on an intermediate run , the Orion
trail . And he was familiar with this popular South Lake TAHOE
resort , having skied at Heavenly for more than 20 years .

VBN
NNP          VBD
I-VP
B-NP         B-VP

22/40
Maximum Entropy Tool: OpenNLP
 http://maxent.sourceforge.net/
 Training Demo
 Test Demo

23/40
Homework Preview
 http://nlp.cs.nyu.edu/Assignment3.doc
 Data Format

24/40

```
To top