Embed
Email

SIMS Applied Natural Language Processing Marti Hearst

Document Sample

Shared by: qinmei liao
Categories
Tags
Stats
views:
0
posted:
12/8/2011
language:
pages:
34
I256:

Applied Natural Language Processing







Marti Hearst

Oct 9, 2006









1

Today



Finish Conditional Probabilities and Bayesian Learning

Intro to Classification; Identification of

Language

Author









2

Conditional Probability



A way to reason about the outcome of an experiment

based on partial information

In a word guessing game the first letter for the word

is a “t”. What is the likelihood that the second letter

is an “h”?

How likely is it that a person has a disease given that

a medical test was negative?

A spot shows up on a radar screen. How likely is it

that it corresponds to an aircraft?









Slide adapted from Dan Jurafsky's 3

Conditional Probability

Conditional probability specifies the probability given

that the values of some other random variables are

known.

P(Sneeze | Cold) = 0.8

P(Cold | Sneeze) = 0.6

The probability of a sneeze given a cold is 80%.

The probability of a cold given a sneeze is 60%.









Slides adapted from Mary Ellen Califf 4

More precisely



Given an experiment, a corresponding sample space S, and the

probability law

Suppose we know that the outcome is within some given event B

The first letter was „t‟

We want to quantify the likelihood that the outcome also belongs

to some other given event A.

The second letter will be „h‟

We need a new probability law that gives us the conditional

probability of A given B

P(A|B) “the probability of A given B”









Slide adapted from Dan Jurafsky's 5

Joint Probability Distribution



The joint probability distribution for a set of random variables X1…Xn

gives the probability of every combination of values



P(X1,...,Xn)

Sneeze ¬Sneeze

Cold 0.08 0.01

¬Cold 0.01 0.9





The probability of all possible cases can be calculated by summing

the appropriate subset of values from the joint distribution.

All conditional probabilities can therefore also be calculated

P(Cold | ¬Sneeze)









Slides adapted from Mary Ellen Califf 6

An intuition



• Let’s say A is “it’s raining”.

• Let’s say P(A) in dry California is .01

• Let’s say B is “it was sunny ten minutes ago”

• P(A|B) means

• “what is the probability of it raining now if it was sunny 10

minutes ago”

• P(A|B) is probably way less than P(A)

• Perhaps P(A|B) is .0001

• Intuition: The knowledge about B should change our estimate of

the probability of A.









Slide adapted from Dan Jurafsky's 7

Conditional Probability

Let A and B be events

P(A,B) and P(A  B) both means “the probability that

BOTH A and B occur”

p(B|A) = the probability of event B occurring given

event A occurs

definition: p(A|B) = p(A  B) / p(B)



P( A, B)

P( A | B) 

P( B)



P(A, B) = P(A|B) * P(B) (simple arithmetic)

P(A, B) = P(B, A)





Slide adapted from Dan Jurafsky's 8

Bayes Theorem



We start with conditional probability definition:



P( A, B)

P( A | B) 

P( B)

So say we know how to compute P(A|B). What if we

want to figure out P(B|A)? We can re-arrange the

formula using Bayes Theorem:



P( A | B) P( B)

P ( B | A) 

P ( A)

9

Deriving Bayes Rule

P(A  B)

P(A | B)  P(A  B)

P(B) P(B | A) 

P(A)



P(A | B)P(B)  P(A  B) P(B | A)P(A)  P(A  B)





P(A | B)P(B)  P(B | A)P(A)



P(B | A)P(A)

P(A | B) 

 P(B)

Slide adapted from Dan Jurafsky's 10

How to compute probilities?

We don’t have the probabilities for most NLP

problems

We can try to estimate them from data

(that‟s the learning part)

Usually we can’t actually estimate the probability that

something belongs to a given class given the

information about it

BUT we can estimate the probability that something

in a given class has particular values.









Slides adapted from Mary Ellen Califf 11

Simple Bayesian Reasoning

If we assume there are n possible disjoint tags, t1 … tn

P(ti | w) = P(w | ti) P(ti)

P(w)

Want to know the probability of the tag given the word.



P(w| ti ) = number of times we see this tag with this word

divided by how often we see the tag





P(w| ti ) = Sum(word with tag i) / (count of tag i in corpus)





P(ti ) = Sum(count of tag i in corpus) / (count of all tags)



P(w) = Sum(count of word w in corpus) / (count of all words)



Slides adapted from Mary Ellen Califf 12

Some notation



 P(fi| Sentence)

This means that you multiple all the features

together

P(f1| S) * P(f2 | S) * … * P(fn | S)



There is a similar one for summation.







13

Naïve Bayes Classifier

The simpler version of Bayes was:

P(B|A) = P(A|B)P(B)

P(Sentence | feature) = P(feature | S) P(S)





Using Naïve Bayes, we expand the number of feaures by

defining a joint probability distribution:

P(Sentence, f1, f2, … fn) = P(Sentence) P(fi| Sentence)

We learn P(Sentence) and P(fi| Sentence) in training





Test: we need to state P(Sentence | f1, f2, … fn)

P(Sentence| f1, f2, … fn) =

P(Sentence, f1, f2, … fn) / P(f1, f2, … fn)

14

Bayes Independence Example

If there are many kinds of evidence, we need to combine them

By assuming independence, we ignore the possible interactions:



Imagine there are diagnoses ALLERGY, COLD, and WELL

Symptoms SNEEZE, COUGH, and FEVER



Prob Well Cold Allergy

P(d) 0.9 0.05 0.05

P(sneeze|d) 0.1 0.9 0.9

P(cough | d) 0.1 0.8 0.7

P(fever | d) 0.01 0.7 0.4









Slides adapted from Mary Ellen Califf 15

Bayes Independence Example

If symptoms are: sneeze & cough & no fever:

P(well | s, c, not(f)) = P(e | well) P(well) / P (e)

= (P(s | well) * P (c | well) * 1 - P(f|well)) * P(well) / P(e)

= (0.1)(0.1)(0.99)(0.9)/P(e) = 0.0089/P(e)





P(cold | e) = (.05)(0.9)(0.8)(0.3)/P(e) = 0.01/P(e)

P(allergy | e) = (.05)(0.9)(0.7)(0.6)/P(e) = 0.019/P(e)



P(e) = .0089 + .01 + .019 = .0379

P(well | e) = .23

P(cold | e) = .26

P(allergy | e) = .50



Diagnosis: allergy





Slides adapted from Mary Ellen Califf 16

Kupiec et al. Feature Representation

Fixed-phrase feature

Certain phrases indicate summary, e.g. “in summary”

Paragraph feature

Paragraph initial/final more likely to be important.

Thematic word feature

Repetition is an indicator of importance

Uppercase word feature

Uppercase often indicates named entities. (Taylor)

Sentence length cut-off

Summary sentence should be > 5 words.







17

Details: Bayesian Classifier

P( F1 , F2 ,... Fk | s  S ) P( s  S )

P( s  S | F1 , F2 ,... Fk ) 

P( F1 , F2 ,... Fk )

Probability of feature-value pair

Assuming statistical independence: occurring in a source sentence

which is also in the summary







k

j 1

P( F j | s  S ) P( s  S )

P( s  S | F , F ,...F ) 



1 2 k k

j 1

P( F j ) compression

rate

Probability that sentence s is included

in summary S, given that sentence’s

feature value pairs

Probability of feature-value pair

occurring in a source sentence

18

Language Identification









19

Language identification

Tutti gli esseri umani nascono liberi ed eguali

in dignità e diritti. Essi sono dotati di

ragione e di coscienza e devono agire gli uni

verso gli altri in spirito di fratellanza.



Alle Menschen sind frei und gleich an Würde und

Rechten geboren. Sie sind mit Vernunft und

Gewissen begabt und sollen einander im Geist

der Brüderlichkeit begegnen.



Universal Declaration of Human Rights, UN, in 363 languages

http://www.unhchr.ch/udhr/navigate/alpha.htm







20

Language identification

égaux

eguali

iguales



edistämään



Ü

¿

How to do determine, for a stretch of text, which

language it is from?



21

Language Identification

Turns out to be really simple

Just a few character bigrams can do it (Sibun & Reynar 96)

Used Kullback Leibler distance (relative entropy)

Compare probability distribution of the test set to

those for the languages trained on

Smallest distance determines the language

Using special character sets helps a bit, but barely









22

Language Identification

(Sibun & Reynar 96)









23

Confusion Matrix



A table that shows, for each class, which ones your

algorithm got right and which wrong



Gold standard







Algorithm’s guess









24

25

Author Identification

(Stylometry)









26

Author Identification



Also called Stylometry in the humanities



An example of a Classification Problem



Classifiers:

Decide which of N buckets to put an item in

(Some classifiers allow for multiple buckets)









27

The Disputed Federalist Papers

In 1787-1788, Jay, Madison, and Hamilton

wrote a series of anonymous essays to

convince the voters of New York to ratify the

new U. S. Constitution.

Scholars have consensus that:

5 authored by Jay

51 authored by Hamilton

14 authored by Madison

3 jointly by Hamilton and Madison



12 remain in dispute … Hamilton or Madison?







28

Author identification



Federalist papers

In 1963 Mosteller and Wallace solved the problem



They identified function words as good candidates for

authorships analysis



Using statistical inference they concluded the author

was Madison



Since then, other statistical techniques have

supported this conclusion.





29

Function vs. Content Words









High rates for “by” favor M, low favor H

High rates for “from” favor M, low says little

High rats for “to” favor H, low favor M

30

Function vs. Content Words









No consistent pattern for “war”

31

Federalist Papers Problem









Fung, The Disputed Federalist Papers: SVM Feature Selection

Via Concave Minimization, ACM TAPIA’03 32

Discussion



Can Pseudonymity Really Guarantee Privacy?

Rao and Rohatgi, 2000









33

Next Time



Guest lecture by Elizabeth Charnock and Steve

Roberts of Cataphora









34



Related docs
Other docs by qinmei liao
Translator
Views: 0  |  Downloads: 0
Circular no CuR June Introduction of
Views: 0  |  Downloads: 0
Post Thiopental Tremors
Views: 0  |  Downloads: 0
Antivirals
Views: 0  |  Downloads: 0
Participles
Views: 2  |  Downloads: 0
Caring for your Child
Views: 0  |  Downloads: 0
Section One Inspiration
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!