oasis by dffhrtcv3

VIEWS: 7 PAGES: 20

									                    Oasis
A Beam Search Decoder for Phrase-Based
  Statistical Machine Translation Models

      MAHCINE INTELLIGENCE AND TRANSLATION LAB
          HARBIN INSTITUTE OF TECHNOLOGY
                   Oasis
•   A user manual for Oasis
•   The files needed by Oasis
•   Some explanations for the parameters
•   A summary for the parameters
•   The details for the decoding process
•   Oasis for Phrase-based SMT
•   Core algorithm
     A User Manual for Oasis
• The Designers
  – Li Jun, Liang Huashen, Jiang Hongfei, Sun
   Jiadong
• CPU: P4 2.0GHz or higher
• Ram: 512M or larger
• Windows —— Visual C++ 6.0 or
  Linux    —— gcc 3.4.2
   The Files Needed by Oasis

• Phrase-Table File

• Language-Model File
  --------ARPA
• The Configuration File for Oasis
  (The User Manual for Oasis)
    Some Explanations for The
         Parameters(1)
• [ttable-limit] ---top n
  Translation, Language Model, English
  Phrase          Length
• [stack] & [threshold]
   -s:maximum size of the beam (default
  100)
   -b:minimum threshold of the beam
  (default 0.00001)
     Some Explanations for The
          Parameters(2)
• [distortion] P   d
                   d
                          i
                              i



   d = abs(last word position of previously
  translated phrase + 1- first word position of
  newly translated phrase)
-dl:Maximum distance between two input
  phrase that are translated to two
  neighboring output phrase
-d:Minimum of the distortion score
        Some explanations for the
            parameters(3)
• [lm-limit]
  -m:minimum score of the language model of the
  phrase

• [nbest]

  -l: generate n-best lattice
    A summary for the parameters
•   -f     Specify the Configuration file
•   -in    Specify the input data
•   -out   Specify the output data
•   -s     Specify the maximum size of the beam, 100
•   -b     Specify the beam threshold, default 0.00001
•   -l     Specify the N-best output, default 1
•   -d     Specify the min distortion score,-2.30259
•   -dl    Specify the distortion distance, -9
•   -m     Specify
   The details for the decoding
             process
• Translation options
• Future cost
• Hypothesis element
     An example for the details
• creating hypothesis 1 from 0
•     base score 0
•     translation cost -1.28215
•     distortion cost 0
•     language model cost for 'same' -2.57302
•     language model cost for 'the' -1.91582
•     word penalty 2
•     score -3.77099 + futureCost -15.8278 = -19.5988
•     new best estimate for this stack
•     merged hypothesis on stack 1, now size 1
  Oasis for Phrase-based SMT
• Translation Option [ 同样 ]
•     the same, -1.28215, -4.05183
  [的]
      in, -1.8011, -2.94581
      right, -2.71656, -4.88024
      of a, -2.82411, -4.96468
      flight, -2.62181, -4.90627
      's, -2.5144, -4.40283
      of the, -2.21918, -3.63043
  [ 东西 ]
      anything, -2.03706, -4.64327
      came, -2.2911, -5.35805
                Future cost (1)
•   future costs from 0 to 0 is -4.05183
•   future costs from 0 to 1 is -4.95449
•   ……
•   future costs from 0 to 7 is -22.5568
•   future costs from 0 to 8 is -19.8797
•   future costs from 1 to 1 is -2.29265
•   future costs from 1 to 2 is -4.18843
•   ……
•   future costs from 6 to 7 is -5.82631
•   future costs from 6 to 8 is -1.82436
•   future costs from 7 to 7 is -3.53366
•   future costs from 7 to 8 is -0.85651
•   future costs from 8 to 8 is -0.68513
            Future cost (2)

• The calculation of future cost
   translation option cost


   language model cost
             Core algorithm
• The generation of a phrase table
  Future cost
• The hypothesis and state expansion
• Beam search
• The generation of the English sentence
    The structure of the hypothesis
                 data
•   The present cost
•   The translation cost for the new translation phrase
•   The distortion cost
•   The penalty for the new translation option
•   Language cost
•   The future cost
•   The new phrase positions in the foreign sentence
•   The new phrase
•   The last two english words generated
•   ID
     Recombining hypotheses
• The foreign words covered so far
• The last two English words generated
• The last word of the last foreign phrase
  covered
              Beam search

• The fixed and relative threshold

• The stack
The Result of the Experiment(1)
The Result of the Experiment(2)
Thank   you!

								
To top