Docstoc

Machine Translation Statistical Machine Translation Word-based

Document Sample
Machine Translation Statistical Machine Translation Word-based Powered By Docstoc
					                                                                                   Statistical Machine Translation


                           Machine Translation
                          Phrase-Based Statistical MT


                                  Jörg Tiedemann

                           jorg.tiedemann@lingfil.uu.se
                          Department of Linguistics and Philology
                                  Uppsala University
                                                                                        Probabilistic view on MT (E = target language, F = source language):


                                                                                                               E    = argmaxE P(E|F )
                                                                                                                    = argmaxE P(F |E)P(E)


Jörg Tiedemann                                                              1/69   Jörg Tiedemann                                                                 2/69




Word-based Translation Models                                                      Word-based Alignment Models (IBM 1)
            Example: translate this: “a house”                                          Where do we get the lexical probabilities from?
            Translation candidate 1: “ett hus”                                          → Automatic Word alignment!
            Question: What is P("ett hus"|"a house")?
                                                                                        Example corpus and EM (see chapter 4.2!):
                                                                                        ett     hus                   ett barn                    mitt barn
            simplest model: context-independent lexical probabilities
            t(wordenglish |wordswedish ) (no NULL alignments)

          P("a house"|"ett hus") = sum of all possible ways to generate “ett             a house                      a child                     my child
          hus” from “a house” given our model (table of lexical
          probabilities)                                                                       Basic question: How often do we link certain words together in
        P("a house"|"ett hus") = /22 ∗ t("a"|"ett") ∗ t("house"|"hus") +                       all possible alignments (relative to other possible links)?
                                       /22 ∗ t("ett"|"house") ∗ t("hus"|"a") +
                                                                                               We don’t have fixed links — we only know the likelihood of an
                                       /22 ∗ t("ett"|"a") ∗ t("hus"|"a") +                     alignment! → count link likelihoods instead!
                                       /22 ∗ t("ett"|"house") ∗ t("hus"|"house")
                                                                                               Initially: All links have the same probability (t(e|f ) = 0.25)!
     According to this model: What is P("a house"|"hus ett")?
Jörg Tiedemann                                                              3/69   Jörg Tiedemann                                                                 5/69
IBM 1: Initialization                                                             IBM 1: Iteration 1
   ett    hus                  ett barn                       mitt barn                                                                mitt barn
                                                                                     ett    hus            ett barn


    a house                     a child                       my child                                                                 my child
                                                                                      a house               a child
                       e       f      total(f)   count         t
                                                                                                   e       f      total(f)   count        t
                       house   mitt    0.000     0.000    0.250
                       house   ett     0.000     0.000    0.250
                                                                                                   house   ett     2.000     0.500   0.250
                       house   barn    0.000     0.000    0.250                                    house   hus     1.000     0.500   0.500
                       house   hus     0.000     0.000    0.250                                    a       ett     2.000     1.000   0.500
                       a       mitt    0.000     0.000    0.250                                    a       barn    2.000     0.500   0.250
                       a       ett     0.000     0.000    0.250                                    a       hus     1.000     0.500   0.500
                       ...     ...         ...      ...      ...                                   my      mitt    1.000     0.500   0.500
                                                                                                   my      barn    2.000     0.500   0.250
            Example count: in sentence pair 1: “ett” and “a” are linked once                       child   mitt    1.000     0.500   0.500
            with likelihood 0.25 out of two possible links for “a”                                 child   ett     2.000     0.500   0.250
            → relative count = 0.25/(0.25+0.25) = 0.25/0.5 = 0.5                                   child   barn    2.000     1.000   0.500
            The same in sentence pair 2 → total count = 0.5 + 0.5 = 1.0!
Jörg Tiedemann                                                             7/69   Jörg Tiedemann                                                    9/69




IBM 1: Iteration 2                                                                IBM 1: Iteration 3

   ett    hus                  ett barn                       mitt barn              ett    hus            ett barn                    mitt barn



    a house                     a child                       my child                a house               a child                    my child
                      e        f      total(f)   count         t                                   e       f      total(f)   count        t
                      house    ett     1.833     0.333     0.182                                   house   ett     1.839     0.241   0.131
                      house    hus     1.167     0.667     0.571                                   house   hus     1.161     0.759   0.653
                      a        ett     1.833     1.167     0.636                                   a       ett     1.839     1.375   0.748
                      a        barn    1.833     0.333     0.182                                   a       barn    1.839     0.222   0.121
                      a        hus     1.167     0.500     0.429                                   a       hus     1.161     0.402   0.347
                      my       mitt    1.167     0.667     0.571                                   my      mitt    1.161     0.759   0.653
                      my       barn    1.833     0.333     0.182                                   my      barn    1.839     0.241   0.131
                      child    mitt    1.167     0.500     0.429                                   child   mitt    1.161     0.402   0.347
                      child    ett     1.833     0.333     0.182                                   child   ett     1.839     0.222   0.121
                      child    barn    1.833     1.167     0.636                                   child   barn    1.839     1.375   0.748

Jörg Tiedemann                                                            11/69   Jörg Tiedemann                                                   13/69
IBM 1: Iteration 4                                                       IBM 1: Iteration 5

   ett    hus            ett barn                    mitt barn              ett    hus                     ett barn                       mitt barn



    a house               a child                    my child                a house                        a child                        my child
                 e       f      total(f)   count       t                                       e           f      total(f)   count          t
                 house   ett     1.851     0.167   0.090                                       house       ett     1.863     0.111     0.060
                 house   hus     1.149     0.833   0.724                                       house       hus     1.137     0.889     0.782
                 a       ett     1.851     1.544   0.834                                       a           ett     1.863     1.669     0.896
                 a       barn    1.851     0.139   0.075                                       a           barn    1.863     0.083     0.044
                 a       hus     1.149     0.317   0.276                                       a           hus     1.137     0.248     0.218
                 my      mitt    1.149     0.833   0.724                                       my          mitt    1.137     0.889     0.782
                 my      barn    1.851     0.167   0.090                                       my          barn    1.863     0.111     0.060
                 child   mitt    1.149     0.317   0.276                                       child       mitt    1.137     0.248     0.218
                 child   ett     1.851     0.139   0.075                                       child       ett     1.863     0.083     0.044
                 child   barn    1.851     1.544   0.834                                       child       barn    1.863     1.669     0.896

Jörg Tiedemann                                                   15/69   Jörg Tiedemann                                                                  17/69




IBM 1: Iteration 13                                                      Word-based Translation Models

   ett    hus            ett barn                    mitt barn                       What happens if we introduce position parameters?
                                                                                     a(1|2) = probability that word at position 1 is generated by
                                                                                     word at position 2
    a house               a child                    my child                        What is P("a house"|"ett hus") now?
                 e       f      total(f)   count       t
                 house   ett     1.942     0.002   0.001
                                                                              P("a house"|"ett hus")   =    ∗ t("a"|"ett") ∗ a(1|1) ∗ t("house"|"hus") ∗ a(2|2) +
                 house   hus     1.058     0.998   0.944
                                                                                                            ∗ t("ett"|"house") ∗ a(2|1) ∗ t("hus"|"a") ∗ a(1|2) +
                 a       ett     1.942     1.940   0.999
                 a       barn    1.942     0.001   0.000                                                    ∗ t("ett"|"a") ∗ a(1|1) ∗ t("hus"|"a") ∗ a(1|2) +
                 a       hus     1.058     0.059   0.056                                                    ∗ t("ett"|"house") ∗ a(2|1) ∗ t("hus"|"house") ∗ a(2|2)
                 my      mitt    1.058     0.998   0.944
                 my      barn    1.942     0.002   0.001
                 child   mitt    1.058     0.059   0.056                      According to this model: What is P("a house"|"hus ett")?
                 child   ett     1.942     0.001   0.000
                                                                              Probably: P("a house"|"ett hus") > P("a house"|"hus ett")
                 child   barn    1.942     1.940   0.999
                                                                              This model has more information about the translation process!
Jörg Tiedemann                                                   19/69   Jörg Tiedemann                                                                  20/69
Word-based Translation Models                                                 Statistical word alignment


                                                                                   What are the issues with EM:
     Increasing complexity:
            IBM 1: lexical translation probabilities                                      expensive procedure especially with many open variables
            IBM 2: add absolute reordering                                                (especially E-step)
            IBM 3: add fertility                                                          no guarantee to find the global optimum (if local optima
                                                                                          exist) → good initialization is necessary!
            IBM 4: relative reordering & word classes
                                                                                          IBM 1 has only one (global) optimum → good!
                                                                                          from IBM 3 & 4: need to approximate E-step
     Why do we need the lower models with less information?

                                                                                   → start with simple models to initialize more complex ones



Jörg Tiedemann                                                        21/69   Jörg Tiedemann                                                   22/69




Summary on statistical word alignment                                         Statistical Machine Translation: Language Modeling

     In word-based SMT:
                                                                                   Remember:
            translation model is based on probabilistic parameters
                 lexical translation model                                                              ˆ
                                                                                                        E = argmaxE P(F |E)P(E)
                 reordering model (distortion)
                 fertility model
            statistical word alignment with EM                                            now we have the translation model P(F |E)
                 cascaded training procedure                                              we still need the language model P(E)
                 bi-product of parameter estimation: word alignment
                 (for mathematical details: see chapter 4)
                                                                                   Easy! → Use standard N-gram language models

     Something is still missing in our SMT system ...



Jörg Tiedemann                                                        23/69   Jörg Tiedemann                                                   24/69
Statistical Machine Translation: Language Modeling                                         Statistical Machine Translation: Language Modeling

     Language modeling:                                                                         Remember: MLE for conditional probabilities
                                                                                                                                              count(e1 , e2 , ..., ej )
            (probabilistic) LM = predict likelihood of any given string                                            P(ej |e1 , .., ej−1 ) =
                                                                                                                                             count(e1 , e2 , ..., ej−1 )
            What is the likelihood P(E) to observe sentence E?
     PLM (the house is small) > PLM (small the is house)                                        Again: What is the problem?
     PLM (ett hus) > PLM (en hus)                                                               → sparse counts for large N-grams!

     Estimate probabilities from corpora:                                                       → Markov assumption! (bigram model: P(e3 |e1 , e2 ) ≈ P(e3 |e2 ))
            P(E) = P(e1 , e2 , e3 , .., ej )
                                                                                                       unigram model: P(E) = P(e1 ) ∗ P(e2 )...P(en )
            P(E) = P(e1 ) ∗ P(e2 |e1 ) ∗ P(e3 |e1 , e2 ) ∗ ... ∗ P(ej |e1 , .., ej−1 )
                                                                                                       bigram model: P(E) = P(e1 ) ∗ P(e2 |e1 ) ∗ P(e3 |e2 )...P(en |en−1 )

     What is the problem here again?                                                                   trigram model: P(E) = P(e1 ) ∗ P(e2 |e1 ) ∗ P(e3 |e1 , e2 )...P(en |en−2 en−1 , )



Jörg Tiedemann                                                                     25/69   Jörg Tiedemann                                                                       26/69




Statistical Machine Translation: Language Modeling                                         Statistical Machine Translation: Decoding


     Another problem: zero counts!
                                                                                                                                 ˆ
                                                                                                Decoding = search a solution for E given F using:
            some N-grams are never observed (→ count(e1 , e2 ) = 0)
                                                                                                                          ˆ
                                                                                                                          E = argmaxE P(F |E)P(E)
            ... but appear in real data (e.g. as translation candidate)
            → multiplying with one factor = 0 → everything is zero
            → BAD!                                                                              Far too many possible E’s to search globally!

     → Smoothing! (reserve probability mass for unseen events)                                  → Approximate search using good partial candidates!
                                                                                                → later more about this ....
     ... there would be so much more to say about LM’s (see ch.7)




Jörg Tiedemann                                                                     27/69   Jörg Tiedemann                                                                       28/69
Motivation for Phrase-based SMT                                                Phrase-based SMT

     Word-based SMT
            statistical word alignment → P(F |E)
            language modeling → P(E)
            global decoding argmaxE P(F |E)P(E)
                                                                                    Motivation
     Word-by-word translation is too weak!                                                 phrases = word N-grams
            contextual dependencies, local reordering                                      less ambiguity, more context in translation table
            non-compositional constructions                                                handle non-compositional expressions
            n:m relations                                                                  local reorderings covered by phrase translations
                                                                                           “distortion”: reordering on phrase level
     → look at larger chunks!
                                                                                    → Moses toolkit: (http://www.statmt.org/moses/)

Jörg Tiedemann                                                         29/69   Jörg Tiedemann                                                  30/69




Phrase-based SMT                                                               Phrase-based SMT

     Translation model in PSMT:
                                  I
                     P(F |E) =         φ(fi |ei )d(starti , endi−1 )                Phrase translation probabilities:
                                 i=1
                                                                                           need phrase alignments in parallel corpus
            phrases are extracted from word aligned parallel corpora
                                                                                           induce them from word alignments (IBM models)
            phrase translation probabilities (MLE):
                                                                                           score extracted phrases (MLE)
                                              count(f , e)
                              φ(f |e) =
                                               f count(f , e)
            distance-based reordering (d)



Jörg Tiedemann                                                         31/69   Jörg Tiedemann                                                  32/69
Statistical word alignment                                                Viterbi Word Alignment


     Standard models:

            IBM models 1 - 5 (cascaded), EM training,
            final parameters:
                 word translation probabilities (lexical model)
                                                                                      special NULL word (NULL → la)
                 fertility probabilities
                 distortion probabilities (reordering)                                EMPTY alignment possible (did)
                                                                                      only 1:many (slap); not many:1
     Viterbi alignment → assign most likely links between words                       → depending on alignment direction
     according to the statistical word alignment model from above              → Alignment tool: GIZA++
                                                                               (http://code.google.com/p/giza-pp/)



Jörg Tiedemann                                                    33/69   Jörg Tiedemann                                                         34/69




Viterbi Word Alignment from GIZA++                                        Viterbi Word Alignment

     From the German-English Europarl corpus:


     # Sentence pair (5) source length 12 target
                                                                               Asymmetric alignment!
       length 11 alignment score : 2.14036e-24                                        no n:1 alignments
     ich bitte sie , sich zu einer schweigeminute zu erheben .
     NULL ({ }) please ({ 1 2 3 }) rise ({ }) , ({ 4 }) then ({ 5 })                  can run IBM models in both directions!
     , ({ }) for ({ 6 }) this ({ 7 }) minute ({ 8 }) ’ ({ }) s ({ })                  different links in source-to-target and target-to-source
     silence ({ 9 10 }) . ({ 11 })
                                                                                      best alignment = merge both directions (?!)
     # Sentence pair (6) source length 12 target
       length 10 alignment score : 3.38628e-15
     ( das parlament erhebt sich zu einer schweigeminute . )                   How? → Symmetrization heuristics!
     NULL ({ }) ( ({ 1 }) the ({ 2 }) house ({ 3 }) rose ({ 4 5 })
     and ({ }) observed ({ 6 }) a ({ 7 }) minute ({ 8 }) ’ ({ })
     s ({ }) silence ({ 9 }) ) ({ 10 })



Jörg Tiedemann                                                    36/69   Jörg Tiedemann                                                         37/69
Word Alignment Symmetrization                                            Word Alignment Symmetrization


                                                                              start with intersection, add adjacent links (from union) ...




Jörg Tiedemann                                                   38/69   Jörg Tiedemann                                                      39/69




Word Alignment Symmetrization                                            Phrase extraction


     Many symmetrization heuristics exist!                                    How do we get phrase alignments from word aligned data?


            intersection (→ high precision, low recall)                              phrases = contiguous word sequences
            union (→ low precision, high recall)                                     short phrases & long phrases are important
            grow, grow-diag, grow-diag-final, ...                                     (general vs. specific translation units)
            (→ different kinds of balance between precision & recall)                should be conform to alignment


     → better would be: symmetric alignment models!                           → phrase extraction algorithms




Jörg Tiedemann                                                   40/69   Jörg Tiedemann                                                      41/69
Phrase extraction                                                                       Phrase extraction
     Get ALL phrase pairs that are consistent with word alignments
                                                                                             What is a phrase pair that is consistent with the word
                                                                                             alignment?

                                                                                                    all alignment points of words in source and target phrase
                                                                                                    are within the phrase pair
                                                                                                    no word is aligned to any other word outside of the phrase
                                                                                                    pair
                                                                                                    words may be unaligned

                                                                                             (algorithm see Figure 5.5 in chapter 5)




Jörg Tiedemann                                                                  42/69   Jörg Tiedemann                                                                  43/69




Phrase extraction                                                                       Phrase extraction
                                                                                                                                     bofetada         bruja
                                              bofetada         bruja                                                 Maria no daba una       a   la       verde
                              Maria no daba una       a   la       verde
                                                                                                             Mary
                     Mary
                                                                                                              did
                      did
                                                                                                              not
                      not
                                                                                                             slap
                     slap
                                                                                                              the
                      the
                                                                                                             green
                     green
                                                                                                             witch
                     witch



                                                                                             (Maria, Mary), (no, did not), (slap, daba una bofetada), (a la, the), (bruja,
     (Maria, Mary), (no, did not), (slap, daba una bofetada), (a la, the), (bruja,
                                                                                             witch), (verde, green), (Maria no, Mary did not), (no daba una bofetada, did
     witch), (verde, green)
                                                                                             not slap), (daba una bofetada a la, slap the), (bruja verde, green witch)

Jörg Tiedemann                                                                  44/69   Jörg Tiedemann                                                                  45/69
Phrase extraction                                                                                   Phrase extraction
                                                 bofetada                       bruja                                                                    bofetada         bruja
                                                                                                                                         Maria no daba una       a   la       verde
                                 Maria no daba una       a                 la       verde
                                                                                                                                 Mary
                         Mary
                                                                                                                                  did

                          did                                                                                                     not

                          not                                                                                                    slap

                                                                                                                                  the
                         slap
                                                                                                                                 green
                          the
                                                                                                                                 witch

                         green

                         witch
                                                                                                         (Maria, Mary), (no, did not), (slap, daba una bofetada), (a la, the), (bruja,
                                                                                                         witch), (verde, green), (Maria no, Mary did not), (no daba una bofetada, did
     (Maria, Mary), (no, did not), (slap, daba una bofetada), (a la, the), (bruja,                       not slap), (daba una bofetada a la, slap the), (bruja verde, green witch),
     witch), (verde, green), (Maria no, Mary did not), (no daba una bofetada, did                        (Maria no daba una bofetada, Mary did not slap), (no daba una bofetada a la,
     not slap), (daba una bofetada a la, slap the), (bruja verde, green witch),                          did not slap the), (a la bruja verde, the green witch), (Maria no daba una
     (Maria no daba una bofetada, Mary did not slap), (no daba una bofetada a la,                        bofetada a la, Mary did not slap the), (daba una bofetada a la bruja verde,
     did not slap the), (a la bruja verde, the green witch)                                              slap the green witch)
Jörg Tiedemann                                                                              46/69   Jörg Tiedemann                                                                    47/69




Phrase extraction                                                                                   Scoring phrases
                                                         bofetada         bruja
                                         Maria no daba una       a   la       verde

                                  Mary

                                  did

                                  not

                                  slap

                                  the                                                                    Simple Maximum likelihood estimation:
                                 green

                                 witch                                                                                                                  count(f , e)
                                                                                                                                  φ(f |e) =
                                                                                                                                                         f count(f , e)
     (Maria, Mary), (no, did not), (slap, daba una bofetada), (a la, the), (bruja,
     witch), (verde, green), (Maria no, Mary did not), (no daba una bofetada, did
     not slap), (daba una bofetada a la, slap the), (bruja verde, green witch),
                                                                                                         → A huge phrase table! (with a lot of garbage?)
     (Maria no daba una bofetada, Mary did not slap), (no daba una bofetada a la,
     did not slap the), (a la bruja verde, the green witch), (Maria no daba una
     bofetada a la, Mary did not slap the), (daba una bofetada a la bruja verde,
     slap the green witch), (no daba una bofetada a la bruja verde, did not slap
     the green witch), (Maria no daba una bofetada a la bruja verde, Mary did not
     slap the green witch)
Jörg Tiedemann                                                                              48/69   Jörg Tiedemann                                                                    49/69
Phrase tables                                                                          The final model for Phrase-Based SMT

     Examples from a phrase table (Pirates of the Caribbean):

                            Swedish      English                         Score               ˆ
                               , det ?   , it’ s                         0.666667
                                                                                             E     = argmaxE P(E|F )
                               , det ?   , that’ s                       1                         = argmaxE φ(fi |ei ) ∗ d(starti , endi−1 ) ∗ P(E) ∗ ω length(E)
                    att bli besvikna     be disappointed                 1
                      att bli en sj?v    to becoming one                 1
                              bara vi    just                            0.1
                                 bara    just                            0.6                Distortion d: Chance to move phrases to other positions
                                 bara    only                            0.375
       barbossa och hans bes?tning       barbossa and his crew           1                         fixed distortion limit (e.g. 6)
                barbossa och hans        barbossa and his                1                         simple penalty for moving: α|starti −endi−1 −1| OR
            barbossa t?ker g?a . allt    barbossa is up to ... ... all   1
                                                                                                   lexicalized distortion (learned from alignment)
                                                                                            Word cost: ω length(E) = bias for longer output
     (The training set was too small to get reasonable counts!)



Jörg Tiedemann                                                                 50/69   Jörg Tiedemann                                                        51/69




PB-SMT extension: Log-linear Models                                                    PB-SMT extension: Log-linear Models
                                                                                            P(E|F ) = weighted (λm ) combination of feature functions (hm )
                                    ˆ
     Instead of noisy-channel model E = argmaxE P(F |E)P(E):
                                                                                                                              M
                                                                                                                    1
                                                                                                        P(E|F ) =     ∗ exp         λm hm (E, F )
                                      ˆ
            model posterior directly: E = argmaxE P(E|F )                                                           Z
                                                                                                                              m=1
            many feature functions hm (E, F ) may influence P(E|F )                                          ˆ
                                                                                                            E   = argmaxE P(E|F ) = argmaxE (logP(E|F ))
                                                                                                                               M

                 phrase translation model E → F                                                                 = argmaxE            λm hm (E, F )
                 phrase translation model F → E                                                                                m=1
                 lexical weights from underlying word alignment
                 a language model P(E)                                                      How to learn weights λm ?
                 lexicalized reordering model
                                                                                                   Minimum error rate training (MERT) on development set!
                 length features (word/phrase costs/penalties)
                                                                                                   Measure error in terms of BLEU scores (n-best list)
     → P(E|F ) = weighted combination of feature functions!                                        Iterative adjustment of model parameters
                                                                                                   (slow but effective!)

Jörg Tiedemann                                                                 52/69   Jörg Tiedemann                                                        53/69
Phrase table with multiple scores                                                           Translation = “decoding”

     That’s what you will get from Moses:
         Swedish             English            Scores                                                          ˆ
                                                                                                 Global search: E = argmaxE P(E|F )
         , det ?             , it’ s            0.6667 0.0959975 0.6667 0.0263227 2.718
         att bli besvikna    be disappointed    1 0.0221815 1 0.105472 2.718
         att bli en sj?v     to becoming one    1 0.00896375 1 0.00157689 2.718                         many translation alternatives (huge phrase table)
         bara vi             just               0.1 0.0102041 1 0.128968 2.718                          many ways to segment sentences into phrases
         bara                just               0.6 0.285714 0.6 0.25 2.718
         bara                naught but         1 0.268518 0.1 0.00195312 2.718                         re-ordering makes it even more complex
         bara                only               0.375 0.222222 0.3 0.125 2.718                          Very Expensive! → need search heuristics
                                                                                                                  pruning (early discard weak hypotheses)
             phrase translation probability φ(f |e)                                                               stack decoding (histograms & thresholds)
                                                                                                                  reordering limits
             lexical weighting lex(f |e)
             phrase translation probability φ(e|f )
             lexical weighting lex(e|f )
             phrase penalty (always exp(1) ≈ 2.718)

Jörg Tiedemann                                                              54/69           Jörg Tiedemann                                                            55/69




Decoding Process                                                                            Decoding Process


 Maria           no         dio     una     bofetada   a        la     bruja        verde

                                                                                             Maria           no        dio      una   bofetada    a          la   bruja       verde




  Mary

                                                                                              Mary       did not


             build translation left-to-right
             select foreign word to be translated                                                       mark first (foreign) word as translated
             select translation in phrase table                                                         new example: one-to-many translation
             add translation to partial translation (hypothesis)



Jörg Tiedemann                                                              56/69           Jörg Tiedemann                                                            57/69
Decoding Process                                                               Decoding Process



 Maria           no     dio una bofetada   a          la   bruja       verde    Maria           no      dio una bofetada   a la   bruja       verde




  Mary       did not         slap                                                Mary       did not          slap          the




            many-to-one translation                                                        many-to-one translation




Jörg Tiedemann                                                 58/69           Jörg Tiedemann                                         59/69




Decoding Process                                                               Decoding Process



 Maria           no     dio una bofetada       a la        bruja       verde    Maria           no      dio una bofetada   a la   bruja       verde




  Mary       did not         slap              the         green                 Mary       did not          slap          the    green       witch




            example for re-ordering                                                        translation finished




Jörg Tiedemann                                                 60/69           Jörg Tiedemann                                         61/69
Decoding Process: Lattice of translation options                                           Hypothesis expansion

Maria            no      daba   una     bofetada    a          la   bruja          verde


Mary             not     give    a           slap   to        the   witch          green
             did not                  a slap        by               green witch
                 no              slap                to the
                 did not give                            to
                                                         the
                                      slap                     the witch




Jörg Tiedemann                                                             62/69           Jörg Tiedemann                                  63/69




Hypothesis expansion                                                                       Hypothesis expansion




                                                                                                ... and continue adding more hypothesis
                                                                                                → exponential explosion of search space!


Jörg Tiedemann                                                             64/69           Jörg Tiedemann                                  65/69
Hypothesis Stacks                                                        Phrase-based SMT




                                                                              More information:

                                                                              → Homepage of the Moses toolkit
                                                                              http://www.statmt.org/moses/

            here: based on number of foreign words translated
            expand all hypotheses from one stack during translation
            place expanded hypotheses into appropriate stacks
     → get n-best list of translations

Jörg Tiedemann                                                  66/69    Jörg Tiedemann                                         67/69




Summary PB-SMT                                                           What’s next?



            phrase-based SMT = state-of-the-art in data-driven MT (?!)        Next lab session:
            based on standard word alignment models                                  build your own SMT models
            phrase extraction heuristics & simple scoring                            run different setups and evaluate
            simplistic re-ordering model
            huge phrase table = big memory of fragment translations           Lecture:
            heuristics for efficient decoding                                         a quick look at other topics
                                                                                     course summary & more info about project
     → Active research area! New developments all the time!




Jörg Tiedemann                                                  68/69    Jörg Tiedemann                                         69/69

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:12/13/2011
language:
pages:15