oasis
Shared by: dffhrtcv3
-
Stats
- views:
- 7
- posted:
- 3/24/2012
- language:
- Latin
- pages:
- 20
Document Sample


Oasis
A Beam Search Decoder for Phrase-Based
Statistical Machine Translation Models
MAHCINE INTELLIGENCE AND TRANSLATION LAB
HARBIN INSTITUTE OF TECHNOLOGY
Oasis
• A user manual for Oasis
• The files needed by Oasis
• Some explanations for the parameters
• A summary for the parameters
• The details for the decoding process
• Oasis for Phrase-based SMT
• Core algorithm
A User Manual for Oasis
• The Designers
– Li Jun, Liang Huashen, Jiang Hongfei, Sun
Jiadong
• CPU: P4 2.0GHz or higher
• Ram: 512M or larger
• Windows —— Visual C++ 6.0 or
Linux —— gcc 3.4.2
The Files Needed by Oasis
• Phrase-Table File
• Language-Model File
--------ARPA
• The Configuration File for Oasis
(The User Manual for Oasis)
Some Explanations for The
Parameters(1)
• [ttable-limit] ---top n
Translation, Language Model, English
Phrase Length
• [stack] & [threshold]
-s:maximum size of the beam (default
100)
-b:minimum threshold of the beam
(default 0.00001)
Some Explanations for The
Parameters(2)
• [distortion] P d
d
i
i
d = abs(last word position of previously
translated phrase + 1- first word position of
newly translated phrase)
-dl:Maximum distance between two input
phrase that are translated to two
neighboring output phrase
-d:Minimum of the distortion score
Some explanations for the
parameters(3)
• [lm-limit]
-m:minimum score of the language model of the
phrase
• [nbest]
-l: generate n-best lattice
A summary for the parameters
• -f Specify the Configuration file
• -in Specify the input data
• -out Specify the output data
• -s Specify the maximum size of the beam, 100
• -b Specify the beam threshold, default 0.00001
• -l Specify the N-best output, default 1
• -d Specify the min distortion score,-2.30259
• -dl Specify the distortion distance, -9
• -m Specify
The details for the decoding
process
• Translation options
• Future cost
• Hypothesis element
An example for the details
• creating hypothesis 1 from 0
• base score 0
• translation cost -1.28215
• distortion cost 0
• language model cost for 'same' -2.57302
• language model cost for 'the' -1.91582
• word penalty 2
• score -3.77099 + futureCost -15.8278 = -19.5988
• new best estimate for this stack
• merged hypothesis on stack 1, now size 1
Oasis for Phrase-based SMT
• Translation Option [ 同样 ]
• the same, -1.28215, -4.05183
[的]
in, -1.8011, -2.94581
right, -2.71656, -4.88024
of a, -2.82411, -4.96468
flight, -2.62181, -4.90627
's, -2.5144, -4.40283
of the, -2.21918, -3.63043
[ 东西 ]
anything, -2.03706, -4.64327
came, -2.2911, -5.35805
Future cost (1)
• future costs from 0 to 0 is -4.05183
• future costs from 0 to 1 is -4.95449
• ……
• future costs from 0 to 7 is -22.5568
• future costs from 0 to 8 is -19.8797
• future costs from 1 to 1 is -2.29265
• future costs from 1 to 2 is -4.18843
• ……
• future costs from 6 to 7 is -5.82631
• future costs from 6 to 8 is -1.82436
• future costs from 7 to 7 is -3.53366
• future costs from 7 to 8 is -0.85651
• future costs from 8 to 8 is -0.68513
Future cost (2)
• The calculation of future cost
translation option cost
language model cost
Core algorithm
• The generation of a phrase table
Future cost
• The hypothesis and state expansion
• Beam search
• The generation of the English sentence
The structure of the hypothesis
data
• The present cost
• The translation cost for the new translation phrase
• The distortion cost
• The penalty for the new translation option
• Language cost
• The future cost
• The new phrase positions in the foreign sentence
• The new phrase
• The last two english words generated
• ID
Recombining hypotheses
• The foreign words covered so far
• The last two English words generated
• The last word of the last foreign phrase
covered
Beam search
• The fixed and relative threshold
• The stack
The Result of the Experiment(1)
The Result of the Experiment(2)
Thank you!
Get documents about "