Docstoc

Moses Decoder for Statistical Machine Translation

Document Sample
Moses Decoder for Statistical Machine Translation Powered By Docstoc
					The Moses Decoder for Statistical
     Machine Translation
          Ahmed El Kholy
                    Outline
•   Phrase Based Models (review)
•   Training a New Model
•   Decoding
•   Tuning Model Weights
•   Factored Translation Models
         Phrase-Based Models




• Foreign input is segmented in phrases
  – any sequence of words, not necessarily
    linguistically motivated
• Each phrase is translated into English
• Phrases are reordered
                   from Koehn et al., 2003, NAACL
           Phrase-Based Models
• Sentence f is decomposed into J phrases f1J = f1,...,fj,...,fJ
• Sentence e is decomposed into l phrases e = eI1 = e1,...,ei,...,eI.
• We choose the sentence with the highest probability:
            Phrase-Based Models
• Model the posterior probability using a log-linear combination
  of feature functions.
• We have a set of M feature functions hm(eI1,f1J),m = 1,...,M.
  For each feature function, there exists a model parameter λm
  ,m = 1,...,M
• The decision Rule is


• Features cover the main components
   • Phrase-Translation Model
   • Reordering Model
   • Language Model
            Reordering Model




• D(e,f) =Phrase Translates
           -Σi(d_i) where d for each phrase i is
                             Distance
  defined1 as: 1-3           0
                 6           2
  d = abs(2 last word position of previously
          3      4-5         3
  translated phrase + 1 -first word position of
          4      7           1
  newly translated phrase ).
                     http://www.statmt.org/
   Training a Translation Model
• Input
  –Parallel Training Data
  –Monolingual Data of the Target
   Language
• Output
  –Phrase Table
  –Language Model
  –Initial Translation Model parameters
Training a Translation Model
    Sourc
                Target   Monolingual
      e



        Preprocess
                          Create LM

        Alignment


         Phrase              LM
        Extraction


      Phrase Table
      (PT)
   Training a Translation Model in
                Moses
There are nine steps:
1) Prepare data
2) Run GIZA++
3) Align words
4) Get lexical translation table
5) Extract phrases
6) Score phrases
7) Build lexicalized reordering model
8) Build generation models
9) Create configuration file
                   http://www.statmt.org/
          Step 1: Prepare Data
• Clean Corpora
  – Delete white spaces, too short/too long sentences
  – Delete sentences that does not fulfill the 1:9 ratio
    constrain of GIZA++ toolkit


• The parallel corpus is converted into a format
  that is suitable to the GIZA++ toolkit


                      http://www.statmt.org/
Steps 2 & 3: Word Alignment




         from Koehn et al., 2003, NAACL
Symmetrizing Word Alignments
Step 4: Get lexical translation table
• Estimate the w(e|f) as well as the inverse w(f|e) word
  translation table.
• Here is an example of the top translations for europa into
  English:
              europe               europa       0.8874152
              european             europa       0.0542998
              union                europa       0.0047325
              it                   europa       0.0039230
              we                   europa       0.0021795
              eu                   europa       0.0019304
              europeans            europa       0.0016190
              euro-mediterranean   europa       0.0011209
              europa               europa       0.0010586
              continent            europa       0.0008718

                             http://www.statmt.org/
         Step 5: Extract Phrases
• Start with the word alignment
• Collect all phrase pairs that are consistent with
  the word alignment




                   from Koehn et al., 2003, NAACL
         Step 5: Extract Phrases
• Phrases consistent with word alignment
  – phrases has to contain all alignment points for all
    covered words
        Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-

          Word Alignment Induced Phrases
        lecture.ppt


                           Maria    no     dió    una bofetada a           la   bruja verde


                 Mary

                 did

                 not

                 slap

                 the

                 green

                 witch

(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green)
        Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-

          Word Alignment Induced Phrases
        lecture.ppt


                           Maria    no     dió    una bofetada a           la   bruja verde


                 Mary

                 did

                 not

                 slap

                 the

                 green

                 witch

(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green)
(a la, the) (dió una bofetada a, slap the)
         Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-

          Word Alignment Induced Phrases
         lecture.ppt


                            Maria    no     dió    una bofetada a           la   bruja verde


                  Mary

                  did

                  not

                  slap

                  the

                  green

                  witch

(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green)
(a la, the) (dió una bofetada a, slap the)
(Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the)
(bruja verde, green witch)
         Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-

      Word Alignment Induced Phrases
         lecture.ppt


                            Maria    no     dió    una bofetada a           la   bruja verde


                  Mary

                  did

                  not

                  slap

                  the

                  green

                  witch

(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green)
(a la, the) (dió una bofetada a, slap the)
(Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the)
(bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap)
(a la bruja verde, the green witch) …
         Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-

      Word Alignment Induced Phrases
         lecture.ppt


                            Maria    no     dió    una bofetada a           la   bruja verde


                  Mary

                  did

                  not

                  slap

                  the

                  green

                  witch

(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green)
(a la, the) (dió una bofetada a, slap the)
(Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the)
(bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap)
(a la bruja verde, the green witch) …
(Maria no dió una bofetada a la bruja verde, Mary did not slap the green witch)
           Step 6: Score Phrases

• Five different phrase translation scores are
  computed:
  – phrase translation probability φ(f|e)
  – lexical weighting lex(f|e)
  – phrase translation probability φ(e|f)
  – lexical weighting lex(e|f)
  – phrase penalty (always exp(1) = 2.718)


                     http://www.statmt.org/
                   Step 6: Score Phrases
Snap shot of a Phrase Table:

in europa                      ||| in europe ||| 0.829 0.207 0.801 0.492 2.718
 europas                       ||| in europe ||| 0.025 0.066 0.034 0.007 2.718
 in der europaeischen union    ||| in europe ||| 0.018 0.001 0.031 0.019 2.718
 in europa ,                   ||| in europe ||| 0.011 0.207 0.207 0.492 2.718
 europaeischen                 ||| in europe ||| 0.006 0.075 0.008 0.046 2.718
 im europaeischen              ||| in europe ||| 0.005 0.009 0.024 0.016 2.718
 fuer europa                   ||| in europe ||| 0.004 0.013 0.037 0.051 2.718
 in europa zu                  ||| in europe ||| 0.004 0.207 0.714 0.492 2.718
 an europa                     ||| in europe ||| 0.003 0.011 0.352 0.118 2.718
 der europaeischen             ||| in europe ||| 0.003 0.001 0.009 0.005 2.718



                                    http://www.statmt.org/
Step 7: Build Lexicalized Reordering Model




               from Koehn et al., 2003, NAACL
Lexicalized reordering models
 Steps 8 & 9: Build generation models
     and create a configuration file
• The generation model is build from the target
  side of the parallel corpus
• As a final step, a configuration file for the
  decoder is generated with all the correct paths
  for the generated model and a number of
  default parameter settings.
• This file is called model/moses.ini
                   Decoding
What do we need ?
• Language model
• Translation models
  – Phrase translation table
  – Lexical Reordering table
• Configuration file
  – To tell the decoder where to find the model files
    and what are the values of the weights.
            Decoding Process



• Look up possible phrase translations
  – many different ways to segment words into
    phrases
  – many different ways to translate each phrase


                   from Koehn et al., 2003, NAACL
             Decoding Process
• Explosion of Search Space
  – Number of hypotheses is exponential with respect
    to sentence length
• Decoding is NP-complete [Knight, 1999]
• Need to reduce search space
  – Use Heuristics and pruning techniques
     • Histogram pruning & Threshold pruning
• Set Reordering Limit
          Tuning Model Weights
• Input
  – Parallel Tune Data Set
  – Decoder
  – Phrase Table + Language Model + Initial
    Translation Model parameters
• Output
  – Improved Translation Model parameters
     Tuning Model Weights
      Tune Set


     Preprocess
LM


                  Weights   Adjust
      Decoder
                            Weights
PT


      Output
                                  No

                                        Yes
       Metric       Score    Optimal?         Done
    Factored Translation Models
• Statistical machine translation today
  – Best performing methods based on phrases
     •   short sequences of words
     •   no use of explicit syntactic information
     •   no use of morphological information
     •   currently best performing method
      Factored Translation Models
One motivation: morphology
• Models treat ‘car’ and ‘cars’ as completely different words
   – training occurrences of ‘car’ have no effect on learning translation of
     cars
   – if we only see car, we do not know how to translate cars
   – rich morphology (German, Arabic, Finnish, Czech, ...) ! many word
     forms
• Better approach
   – analyze surface word forms into lemma and morphology, e.g.: car
     +plural
   – translate lemma and morphology separately
   – generate target surface form
      Factored Translation Models
• Factored representation of words




• Goals
   – Generalization, e.g. by translating lemmas, not surface forms
   – Richer model, e.g. using syntax for reordering, language modeling)
    Factored Translation Models
• Decomposing translation: example
  – Translate lemma and syntactic information
    separately

  – Generate surface form on target side
                 Moses Advantages
• Represents the State of the Art Decoder
   – Moses outperforms other decoders like Pharaoh and Phramer
• Integration of Word-level Factors
   – Augment different factors like POS, lemma into source and
     target sentences
• Modularity
   – Extensible components and flexible integration with other tools
• Adaptability
   – Operating System and compiler neutral
• Openness
   – Use and extend Moses with minimal effort
• Ongoing support

                         from Koehn et al., 2003, NAACL
Thank You