Embed
Email

Project Report on

Document Sample
Project Report on
The University of Texas at Dallas

CS 6320

Natural Language Processing

Fall 2006



Class Project

A Maximum Likelihood Approach to Phrase

Chunking









Large Text

Chunked Phrases

Corpus









By:



Mahbubur Rahman Haque

Yusuf Bhagat









Course Instructor: Dr. Sanda Harabagiu





1

Table of Contents



1. The Problem .............................................................................................................. 3

1.1 Background Information ................................................................................. 3

1.2 Train and Test Data .......................................................................................... 4

2. Methodology: ............................................................................................................. 5

3. Implementation: ........................................................................................................ 5

3.1 Training ............................................................................................................. 5

3.2 Testing ................................................................................................................ 6

4. Experimental Results ................................................................................................ 7

5. Discussion................................................................................................................... 8

6. Conclusion ................................................................................................................. 9

7. References .................................................................................................................. 9









2

1. The Problem



Phrase chunking consists of dividing a text into the phrases. For example, the sentence

“He reckons the current account deficit will narrow to only # 1.8 billion in September .”

can be divided as follows:



[NP He] [VP reckons] [NP the current account deficit] [VP will narrow] [PP to] [NP only

# 1.8 billion] [PP in] [NP September].



Phrase chunking is an intermediate step towards full parsing. So, there has been a lot of

research work done on chunking. The main task in phrase chunking is to train computer

with a learning algorithm with a training corpus and then using the information leaned

from training, to chunk test data and measure the accuracy. There are several machine

learning approaches that have been employed in phrase chunking including SVM

(Support Vector Machine), Maximum Entropy, MLE (Maximum Likelihood Estimation)

etc. For our project, we selected the MLE approach.





1.1 Background Information



In 1991, Steven Abney proposed to approach parsing by starting with finding correlated

chunks of words [Abn91]. Lance Ramshaw and Mitch Marcus have approached chunking

by using a machine learning method [RM95]. Their work has inspired many others to

study the application of learning methods to noun phrase chunking. Other chunk types

have not received the same attention as NP chunks. The most complete work is [BVD99]

which presents results for NP, VP, PP, ADJP and ADVP chunks. [Vee99] works with

NP, VP and PP chunks. [RM95] have recognized arbitrary chunks but classified every

non-NP chunk as VP chunk. [Rat98] has recognized arbitrary chunks as part of a parsing

task but did not report on the chunking performance.









3

1.2 Train and Test Data



For our project, we used the test and train data that had been given for “Conference on

Computational Natural Language Learning” (CoNNL-2000). The train and test data

consist of three columns separated by spaces. Each word has been put on a separate line

and there is an empty line after each sentence. The first column contains the current

word, the second its part-of-speech tag as derived by the Brill tagger and the third its

chunk tag as derived from the WSJ corpus. The chunk tags contain the name of the chunk

type, for example I-NP for noun phrase words and I-VP for verb phrase words. Most

chunk types have two types of chunk tags, B-CHUNK for the first word of the chunk and

I-CHUNK for each other word in the chunk.



Here is an example of the file format:



He PRP B-NP

reckons VBZ B-VP

the DT B-NP

current JJ I-NP

account NN I-NP

deficit NN I-NP

will MD B-VP

narrow VB I-VP

to TO B-PP

only RB B-NP

# # I-NP

1.8 CD I-NP

billion CD I-NP

in IN B-PP

September NNP B-NP

. . O





The O chunk tag is used for tokens which are not part of any chunk. Instead of using the

part-of-speech tags of the WSJ corpus, the data set used tags generated by the Brill

tagger. The performance with the corpus tags will be better but it will be unrealistic since

for novel text no perfect part-of-speech tags will be available.[2]







4

2. Methodology:



For our project we followed the Maximum Likelihood Approach to predict the phrase

chunks for the sentences given in the test corpus. Maximum Likelihood Approach is just

to find out the chunk label that is most likely in some given context. For each of the

words of the test corpus some label is assigned and finally, using a script the final

accuracy, precision, recall and F values were computed to find out how good the

implementation was. More about the methodology we used for our chunking can be

found in the paper named: “A Context Sensitive Maximum Likelihood Approach to

Chunking” by Christer Johansson, published in proceedings of CoNLL-2000.





3. Implementation:



The Maximum Likelihood Estimation method allows us to predict the maximum likely

chunk labels for each of the phrases of a sentence. However, this prediction is not always

correct and hence the accuracy needs to be calculated. We can divide the chunking task

into two steps: (a) Training and (b) Testing.







3.1 Training

The main task of the training process is to allow the learning algorithm find out the most

frequent label for a phrase in a particular context. Only the POS (parts of speech) tag

information had been used for this. During training phase, we created a list of maximum

likely chunk-labels in any context using the POS tag information of the training file. This

list was created as follows:





We constructed a symmetric n-context for each of the words in the training corpus. A 1-

context is simple the most frequent chunk label for each tag. A 3-context is the tag of the

word under consideration, the tag of the preceding word and the tag of the word after

and kept as “[t-1 t0 t+1]  label” in the list. Similarly, in 5-context we keep something like:









5

“[t-2 t-1 t0 t+1 t+2]  label” in the list. Now, when we assigned some label to a word with

some context, we used the most frequent label.





For our project, we used only 3 different context information for determining chunk-

labels: 1-context, 3-context and 5-context. Also, we added “” tag at the beginning and

ending of a sentence to be considered in the context. Therefore, in brief, the steps below

were followed in the training phase:





1. Obtain 5-context, 3-context and 1-context information from the training file.





2. For each set of the n-contexts, associate the most frequent label with the context

information as the predicted label for any word occurring at the same context.



3.2 Testing

Once training is completed, the information learnt from the training phase is applied on

the test file. For each word, 5-context was considered first. If the same 5-context was

found in the list of 5-contexts (created during training phase), the label given by the 5-

context list is assigned to that word. If 5-context is not found for a word, we looked if 3-

context information is available for the given word. If 3-context is available, the 3-

context label is assigned to that particular word. If 3-context is unavailable, 1-context was

used. Therefore, in the testing phase, we needed to follow the steps below:





1. Compute the longest context used in training phase (5-context in our case) for

each word in the test corpus.





2. For each of the contexts computed at step-1, look at the n-context lists created at

the training phase and return the label that corresponds to the longest surviving

n-contexts. Simply look up for [t-2 t-1 t0 t+1 t+2] …… [t0] in the list obtained at the

training phase and use the longest context available in the list to predict the

label.









6

4. Experimental Results



We used the training and test files provided at CoNLL-2000. And after training our

program with the portion of the training file (since it takes a lot of time to run on the

entire test file, we took portions of that training file for training our program) of CoNLL-

200) („train.txt‟) we used the test file („test.txt‟) to find measure the accuracy of our

program. We trained our program on increasing sizes of the training corpus and the

following table compiles the accuracy, precision, recall and F values that we achieved

for two different training files. Please note, in both cases, the test file was the same

(“test.txt” from CoNLL-2000).









Table 4-1: Test Results-1

Accuracy on Test File: 86.66%; processed 47377 tokens

with 23852 phrases; found: 25149 phrases; correct:

20207

Training File Size: 160 KB

Test Data Precision Recall F

ADJP 34.35% 25.80% 29.47

ADVP 56.33% 62.70% 59.34

CONJP 0.00% 0.00% 0.00

INTJ 0.00% 0.00% 0.00

LST 0.00% 0.00% 0.00

NP 82.85% 87.55% 85.14

PP 85.52% 91.33% 88.33

PRT 28.87% 26.42% 27.59

SBAR 40.26% 23.18% 29.42

VP 80.75 % 88.64 % 84.52

All 80.35% 84.72% 82.48









7

Table 4-2: Test Result-2

Accuracy on Test File: 90.63%; processed 47377 tokens

with 23852 phrases; found: 24871 phrases; correct:

20642

Training File Size: 442 KB

Test Data Precision Recall F

ADJP 48.99% 44.29% 46.52

ADVP 65.31% 74.36% 69.55

CONJP 0.00% 0.00% 0.00

INTJ 0.00% 0.00% 0.00

LST 0.00% 0.00% 0.00

NP 84.58% 88.67% 86.58

PP 86.51% 92.54% 89.42

PRT 39.13% 33.96% 36.36

SBAR 55.56% 21.50% 31.00

VP 83.49 % 89.87 % 86.56

All 83.00% 86.54% 84.73









5. Discussion



The resulting accuracy shows that Maximum Likelihood Estimation works well for

chunking. From our test results, increase in accuracy, precision, recall and Fscore with

the increase of training file‟s size allows us to claim that we would have achieved a

higher accuracy if we could train our program with the whole training corpus provided

for CoNLL-2000. But due to time constraints, we could not train our algorithm using the

whole of that training file. Therefore, we would like to claim our implementation would

achieve a higher accuracy if a bigger training corpus is given to it. Also, accuracy can be

further increased by human intervention. It can be easily checked where the “inside

phrase” chunk labels are mismatching with the “beginning phrase” chunk labels. On

correcting these labels, the accuracy can climb higher. The mismatches will not be many

and will require very less work on human side. Checking by a human can also help in







8

discovering new rules to be added to the chunking algorithm which may improve

performance.

Generally sentence chunks contain up to five words. Hence 5-context chunking was

enough to give a sufficiently high accuracy. Depending on the data, adding 7-context

might improve the accuracy. However, it does not seem to add on much to the present

accuracy.





6. Conclusion



Maximum Likelihood Estimation is a simple but effective technique to perform text

chunking. It creates straight forward mappings from word tags to chunk labels. Parsing

through these mappings and chunking new text requires very less computation and gives

a good accuracy. The accuracy has a scope of increasing further if all the groups of labels

between “beginning phrases” are recorded and the label that occurring the highest

number of times is taken to identify the phrase.





7. References

1. Christer Johansson, A Context Sensitive Maximum Likelihood Approach to

Chunking. In: Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal,

2000.

2. Papers, train and test data from “Conference on Computational Natural Language

Learning”[CoNLL-2000 web address: http://www.cnts.ua.ac.be/conll2000/

chunking/]









9


Related docs
Other docs by rogerholland
CARD
Views: 6  |  Downloads: 0
Chapter #4 Controlling Motion
Views: 31  |  Downloads: 0
S NIR P - D I C S
Views: 6  |  Downloads: 0
REGISTERED FOR IDES OF MARCH r
Views: 2  |  Downloads: 0
The deadweight loss from an import quota
Views: 1187  |  Downloads: 2
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!