Embed
Email

MIT-Airforce_01

Document Sample

Shared by: qingyunliuliu
Categories
Tags
Stats
views:
2
posted:
11/25/2011
language:
English
pages:
16
The MITLL/AFRL MT System





Wade Shen, Brian Delaney, and Tim Anderson



23 October 2005









This work is sponsored by the United States Air Force Research Laboratory under Air Force Contract FA8721-05-C-0002.

Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by

the United States Government.





MIT Lincoln Laboratory



999999-1

XYZ 11/26/2011

Statistical Translation System

Experimental Architecture



• Standard Statistical Architecture Model Training Translation







• Developed in-house to support SMT Ch En Ch



experiments Training Bitext Test Set

– Framework for experiments with low-

resource languages GIZA++ Word Decode

Alignment

– Test-bed for S2S MT system



• Most components are home-grown Alignment

Rescore



– Phrase Training/Minimum Error Rate Expansion

Training

– Pharaoh used for decoding in IWSLT, Phrase

comparable performance with new Extraction

En

Viterbi Decoder

Minimum Error Rate Translated Output

Training



• Participated in Chinese  English Ch En

Supplied Data track

Dev Set









999999-2

MIT Lincoln Laboratory

WS 11/26/2011

The MITLL/AFRL MT System

Overview



• Translation Model



• Minimum Error Rate Training



• Decoder



• Development Experiments

– Segmentation

– Distortion



• Evaluation Results

– Manual Transcription

– ASR Transcription



• Next Steps



999999-3

MIT Lincoln Laboratory

WS 11/26/2011

Translation Model

Phrase Extraction



• Basic Alignment Template Model

Proposed by Och & Ney 2000

– Expand word alignments interpolating

between the intersection and union of

bidirectional GIZA++ alignments

– Extract consistent phrase pairs from

expanded alignments





• Modifications

1. Add points to intersection that are

unaligned in both source and target

language sentences before iterative

expansion

2. Allow target phrases to be longer/shorter

than source phrases by a fixed factor

(target phrase factor)

3. 1+2 results in +2 BLEU points





999999-4

MIT Lincoln Laboratory

WS 11/26/2011

Translation Model

Distortion, Lexical and Language Models



• Distortion

– We used Pharaoh’s simple model (unlimited):







• Lexical Weighting

– Both model 4 and expanded alignment lexical translation

models tried

– Expanded alignments  1.5 BLEU point gain





• Language Model

– Trained with SRILM

– Interpolated trigram model with Knesser-Ney discounting

used for decoding

– 4-gram LM and 5-gram class-based LM used during rescoring









999999-5

MIT Lincoln Laboratory

WS 11/26/2011

Minimum Error Rate Training



• Log-linear Model Combination



Model Weight Parameters

1 P(f|e) – Forward Translation Model

• Additional Language models 2 P(e|f) – Backward Translation Model

applied during rescoring 3 LexW(f|e) – Forward Lexical Weight

4 LexW(e|f) – Backward Lexical Weight

• N-best lists of 2k and 8k used 5 PPen – Constant, per-phrase Penalty

– Minor gain with 8k n-best 6 WPen – Constant, per-word Penalty

7 Dist – Distortion Model

• 5-7% relative improvement over 8 Tri-LM – Trigram Language Model

hand optimized parameters 9 4-LM – Four-gram Language Model

10 ClassLM – Five-gram class-based LM

• Insignificant differences from

beam-width relaxation



999999-6

MIT Lincoln Laboratory

WS 11/26/2011

Decoder Development

• A phrase-based Viterbi beam search

decoder has been implemented



• Decoder can account for word

movement between source and

target languages (distortion)

– With distortion, search complexity

approaches O(2n)



• Decoding speed:

– Monotone search (without

distortion) can exceed 500 words

per second

– With distortion, search slows to 10

words per second but can be

improved with limits on distortion



• Decoder can produce word lattice

output for optional second pass

rescoring with higher order

language models



999999-7

MIT Lincoln Laboratory

WS 11/26/2011

Development Experiments I

Dev Sets and Results



• Code development experiments summary (on IWSLT04 devset)



Implementation Summary BLEU Manual Transcription

Dev Results

Basic Phrase Extraction ~36

+ Enhancements to Phrase Extraction 37.7 Dev 1 2

+ Lexical Weights from expanded ali. 39.1

Test

1 36.64

• Dev Set Design

– Dev1: CSTAR 2003 (supplied) 2 42.00

– Dev2: IWSLT 2004 (supplied)

– Dev3: ½ Dev1 + ½ Dev2 (first half) Dev 3 4

– Dev4: ½ Dev1 + ½ Dev2 (second half)

– Dev5: Dev1 + Dev2 Test

3 42.44

• Manual Transcription Results (BLEU)

– Full Evaluation System 4 33.84





999999-8

MIT Lincoln Laboratory

WS 11/26/2011

Development Experiments II

Phrase Extraction/MER Experiments

Parameters Varied

Segmentation Additional Language Models

(word or character) (4-gram and 5-gram)

Lexical Back-off Minimum Error Rate Training







Configurations BLEU

Base: CharSeg, UTF-8, 4x TPF, hand-tuned weights 39.12

+ lexbackoff 40.32

+ lexbackoff + 2x TPF 40.76

+ lexbackoff + 2x TPF + WordSeg 34.12

+ lexbackoff + 2x TPF + MER 40.99

+ lexbackoff + 2x TPF + extra LMs 41.45

+ lexbackoff + 2x TPF + extra LMs + MER 42.00

999999-9

MIT Lincoln Laboratory

WS 11/26/2011

Development Experiments III

Details and ASR

• ASR

– Compared 1-best vs. N-best

Using Nbest  7-10% relative improvement

– Scored N-best without weighting acoustic model or ASR

language model parameters

– Used system trained/optimized with manual transcription





ASR N-best N-best Correct % BLEU

1 1 68.7 26.15

2 1 80.9 32.30

3 1 87.3 35.08

1 20 80.1 28.37

2 20 91.8 36.90

3 20 94.5 37.68





999999-10

MIT Lincoln Laboratory

WS 11/26/2011

IWSLT 2005 Results

MT Evaluation Metrics



• Metrics Used for IWSLT-2005

– WER: word error rate – the edit distance between output and closest

reference translation



– PER: position independent WER – same as WER but disregards word

ordering



– BLEU: geometric mean of n-gram precision between output and all

references



– NIST: a variant of BLEU - arithmetic mean of weighted n-gram

precision



– GTM: general text matcher – measures similarity between output and

reference in terms of precision and recall using a unigram based F-

measure



– METEOR: uses natural language processing tools including word

stemmer and synonym matching to find unigram matches



999999-11

MIT Lincoln Laboratory

WS 11/26/2011

IWSLT 2005 Results

Manual Transcription



• Participated in supplied data track, Chinese  English Translation Task

– Manual and ASR transcription

– 20,000 sentence pair training

– Used in-house trainer and freely available Pharaoh decoder from ISI (in-

house decoder was not ready at submission time)

System BLEU4 NIST METEOR WER PER GTM

ITC 0.528 9.060 0.689 0.414 0.346 0.620

RWTH 0.511 9.567 0.665 0.428 0.358 0.601

EDINBURGH 0.465 6.492 0.632 0.453 0.398 0.599

TALP 0.452 7.974 0.663 0.459 0.380 0.609

MIT 0.450 9.311 0.709 0.464 0.355 0.619

CMU 0.444 6.188 0.564 0.513 0.459 0.524

IBM 0.440 8.436 0.642 0.469 0.391 0.588

ATR-C3 0.394 8.000 0.629 0.523 0.428 0.553

USC 0.332 5.566 0.567 0.544 0.469 0.526

NTT 0.278 7.519 0.593 0.653 0.521 0.492

MIT New Decoder



999999-12

MIT Lincoln Laboratory

WS 11/26/2011

IWSLT 2005 Results

ASR Transcription

• Used ASR n-best lists as input to MT

• Decode and merge resulting MT output

• Rescore combined output and select best output



• Results





System BLEU4 NIST METEOR WER PER GTM

RWTH 0.383 7.389 0.540 0.565 0.472 0.488

CMU 0.363 6.533 0.520 0.581 0.499 0.483

MIT 0.360 7.556 0.593 0.560 0.455 0.000

IBM 0.336 7.083 0.533 0.598 0.504 0.481

NTT 0.274 6.519 0.522 0.643 0.535 0.458









999999-13

MIT Lincoln Laboratory

WS 11/26/2011

IWSLT 2005 Results

Example Output



Translation of MT output vs. reference transcription

Reference

Chinese Input System Output

Translation

Manual

Transcription i'd like to take a group i'd like to take a

tour sightseeing tour

Sentence #1

ASR output i'd like to take a

to take a tour group

Sentence #1 sightseeing tour



Manual

Transcription request wear formal is formal dress

dress night ? required

Sentence #2

ASR output is formal dress

there's been feel and required

Sentence #2 formal dresses ?









999999-14

MIT Lincoln Laboratory

WS 11/26/2011

Summary







• The MIT/AFRL MT system is capable of state-of-the-art

performance on a Chinese-English task with a limited

training set



• Many in-house components were built, but we also rely on

the existence of freely available components such as

Pharaoh and GIZA++ to accelerate development



• Further research into error mitigation techniques for

speech to speech machine translation is needed









999999-15

MIT Lincoln Laboratory

WS 11/26/2011

Next Steps





• ASR Lattice rescoring and joint optimization



• Decoder development and evaluation



• Scale to large vocabulary tasks



• Hybrid Interlingual efforts with MIT/CSAIL









999999-16

MIT Lincoln Laboratory

WS 11/26/2011



Other docs by qingyunliuliu
CONTOURLP_ION
Views: 0  |  Downloads: 0
Route_description_car
Views: 0  |  Downloads: 0
1598_0130
Views: 0  |  Downloads: 0
PreparingtotaketheGRE08
Views: 0  |  Downloads: 0
d4_english
Views: 0  |  Downloads: 0
Slide 1 - tonywhiddon.org
Views: 0  |  Downloads: 0
cibinninger
Views: 0  |  Downloads: 0
Steve Jobs
Views: 3  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!