reordering via N-best lists for Spanish-Basque translation by qtd21362


									    Reordering via N-Best Lists for Spanish-Basque Translation

                        Germ´n Sanchis, Francisco Casacuberta
                                           o              a
                           Instituto Tecnol´gico de Inform´tica
                            Universidad Politcnica de Valencia
                        Camino de Vera, s/n. 46022 Valencia, Spain

                  Abstract                     trained on. Moreover, the development effort
                                               behind a rule-based machine translation sys-
     In Statistical Machine Translation        tem and an SMT system is dramatically dif-
     (SMT), one of the main problems           ferent, the latter being able to adapt to new
     they are confronted with is the           language pairs with little or no human effort,
     problem stemming from the differ-          whenever suitable corpora are available.
     ent word order that different lan-            The grounds of modern SMT were estab-
     guages imply. Most works address-         lished in (Brown et al., 1993), where the
     ing this issue centre their effort in      problem of machine translation was defined
     pairs of languages involving Ara-         as following: given a sentence s from a cer-
     bic, Japanese or Chinese because of       tain source language, an adequate sentence t ˆ
     their utmost different origin with re-     that maximises the posterior probability is to
     spect to western languages. How-          be found. Such a statement can be specified
     ever, Basque is also a language with      with the following formula:
     an extremely different word order
     with respect to most other Euro-                       ˆ
                                                            t = argmax P r(t|s)
     pean languages, linguists being un-                            t

     able to determine its origins with           Applying the Bayes theorem on this defini-
     certainty.    Hence, SMT systems          tion, one can easily reach the next formula
     which do not tackle the reordering
     problem in any way are mostly un-                  ˆ          P r(t) · P r(s|t)
                                                        t = argmax
     able to yield satisfactory results. In                    t        P r(s)
     this work, a novel source sentence        and, since we are maximising over t, the de-
     reordering technique is presented,        nominator can be neglected, arriving to
     based on monotonized alignments
     and n-best lists, endorsed by very                  ˆ
                                                         t = argmax P r(t) · P r(s|t)
     promissing results obtained from a
     Basque-Spanish translation task.          where P r(t|s) has been decomposed into two
                                               different probabilities: the statistical language
                                               model of the target language P r(t) and the
1    Introduction
                                               (inverse) translation model P r(s|t).
SMT systems have proved in the last years         Although it might seem odd to model the
to be an important alternative to rule-based   probability of the source sentence given the
machine translation systems, being even able   target sentence, this decomposition has a
of outperforming commercial machine trans-     very intuitive interpretation: the translation
lation systems in the tasks they have been     model P r(s|t) will capture the word relations

between both input and output language,          allowing all possible word permutations the
whereas the language model P r(t) will ensure    search is NP-hard (Knight, 1999).
that the output sentence is a well-formed sen-      In the present work we develop a new ap-
tence belonging to the target language.          proach to the problem, based on the work
   In the last years, SMT systems have evolved   of Zens, Matusov and Kanthak (Zens et al.,
to become the present state of the art, two      2004; Matusov et al., 2005; Kanthak et al.,
of the most representative techniques being      2005), who introduced the idea of monotoniz-
the phrase based models (Koehn et al., 2003;     ing a corpus. A very preliminary result of
Och and Ney, 2004) and the Weighted Fi-          our work was published in a Spanish work-
nite State Transducers for Machine Transla-      shop (Sanchis and Casacuberta, 2006). The
tion (Casacuberta and Vidal, 2004; Kumar         key idea behind this concept is to use the
and Byrne, 2003). Both of these frameworks       IBM alignment models to efficiently reorder
typically rely on word-aligned corpora, which    the input sentence s and produce a new bilin-
often lead them to incur in word ordering re-    gual, monotone pair, composed by the re-
lated errors. Although there have been dif-      ordered input sentence s′ and the output sen-
ferent efforts aiming towards enabling them       tence t. Hence, once this new bilingual pair
to deal with non-monotonicity, the algorithms    has been produced, the translation model to
developed often only account for very lim-       be applied will not have to tackle with the
ited reorderings, being unable to tackle with    problems derived from different word reorder-
the more complex reorderings that e.g. some      ings, since this problem will not be present
Asian languages introduce with respect to eu-    any more. Still, there is one more problem to
ropean languages. Because of this, not only      be solved: in search time, only the input sen-
will monotone systems present incorrectly or-    tence is available, and hence the pair cannot
dered translations, but, in addition, the pa-    be monotonized. To solve this, a very simple
rameters of such models will be incorrectly      reordering model will be introduced, together
estimated, whenever a certain input phrase is    with a reordered language model and n-best
erroneously assumed to be the translation of     hypothesis generation. In this work, a phrase
a certain output phrase in training time.        based model is trained using these monotone
   Although no efficient solution has still been   pairs.
found, this problem is well known already           In the following section, a brief overview of
since the origin of what is known as statisti-   the latest efforts made towards solving the re-
cal machine translation: (Berger et al., 1996)   ordering problem will be pointed. In section
already introduced in their alignment mod-       3, the approach presented in this work will be
els what they called distortion models, in an    described, and in section 4 the experiments
effort towards including in their SMT sys-        performed with this system will be shown. Fi-
tem a solution for the reordering problem.       nally, in section 5 the conclusions from this
However, these distortion models are usually     work will be elucidated, as well as the work
implemented within the decoding algorithms       that is still to be done.
and imply serious computational problems,
                                                 2   Brief overview of existing
leading ultimately to restrictions being ap-
plied to the set of possible permutations of
the output sentence. Hence, the search per-      Three main possibilities exist when trying to
formed turns sub-optimal, and an important       solve the reordering problem: input sentence
loss in the representational power of the dis-   reordering, output sentence reordering, or re-
tortion models takes place.                      ordering both. The latter is, to the best of
   On the other hand, dealing with arbitrary     our knowledge, as yet unexplored.
word reordering and choosing the one which         Vilar et al. (1996), tried to partially solve
best scores given a translation model has been   the problem by monotonizing the most prob-
shown not to be a viable solution, since when    able non-monotone alignment patterns and

adding a mark in order to be able to remem-       • Let:
ber the original word order. This being done,        – s a source sentence, and sj its j-th word
a new output language has been defined and            – t a target sentence, and ti its i-th word
a new language and translation model can be
trained, making the translation process now       • Let C be a cost matrix
monotone.                                             cij = cost(align(sj , ti ))

                                                  • Let {sr } = {all possible permutations of s}.
   More recently, Kumar and Byrne (2005)
learned weighted finite state transducers ac-          1. compute alignment AD (j) = argmin cij
counting for local reorderings of two or three
                                                      2. s′ = {sr |∀j : AD (j) ≤ AD (j + 1)}
positions. These models were applied to
phrase reordering, but the training of the            3. recompute (reorder) C, obtaining C ′ .
models did not yield statistically significant         4. set A′ (i) = argmin c′ ij .
results with respect to the introduction of the       5. Optional: Compute minimum-cost
models with fixed probabilities.                          monotonic path through cost matrix C ′ .
   When dealing with input sentence reorder-      Figure 1: Algorithm for obtaining a mono-
ing (Zens et al., 2004; Matusov et al., 2005;     tonic alignment by reordering the source sen-
Kanthak et al., 2005), the main idea is to re-    tence.
order the input sentence in such a way that
the translation model will not need to account
for possible word reorderings. To achieve this,   3    The reordering model and
alignment models are used, in order to estab-          N-Best reorderings
lish which word order should be the appropri-
                                                  An important motivation behind the ap-
ate for the translation to be monotone, and
                                                  proach in this work is that the reordering con-
then the input sentence is reordered in such a
                                                  straints presented by Kanthak et al. (Kan-
manner that the alignment is monotone.
                                                  thak et al., 2005) do not take into account
  However, this approach has an obvious           extremely significant information that can be
problem, since the output sentence is not         extracted from monotonized corpora: while
available in search time and the sentence pair    reordering the input sentence in such a fash-
cannot be made monotone.                          ion that the alignment turns monotone, we
                                                  are performing the reordering step needed fur-
   The na¨  ıve solution, test on all possi-      ther on when this action is needed to be taken
ble permutations of the input sentence, has       on the input test set. Hence, what we would
already been discussed earlier, being NP-         ideally want to do is learn a model using this
hard (Knight, 1999), as J! possible permu-        information that will be capable of reordering
tations can be obtained from a sentence of        a given, unseen, input sentence in the same
length J. Hence, the search space must be         way that the monotonization procedure would
restricted, and such restrictions are bound to    have done, in the hope that the benefits intro-
yield sub-optimal results. In their work, Kan-    duced will be greater than the error that an
thak et al. present four types of constraints:    additional model will add into the translation
IBM, inverse IBM, local and ITG constraints.      procedure.
                                                     Once the alignments made monotonic ac-
   Although the restrictions presented in their   cording to the algorithm shown in Fig-
work (IBM, inverse IBM, local and ITG con-        ure 1 (Kanthak et al., 2005), a new source
straints) did yield interesting results, the      ”language” has been established, meaning
search space still remained huge, and the com-    that a reordered language model can be
putational price paid for a relatively small      trained with the reordered input sentences s′ .
benefit was far too high.                          Such a language will have the words of the

                                                                                   Spanish Basque
                                                                Sentences                38940

                                                                Different pairs           20318
                                                                Words              368314 290868
                                                                Vocabulary           722        884
                                                                Average length       9.5        7.5
                                                                Sentences                 1000
                                                                Test independent           434

                                                                Words               9507       7453
                                                                Average length       9.5        7.5
Figure 2: Alignment produced by GIZA (top)
                                                  Table 1: Characteristics of the Tourist corpus.
and alignment after the monotonization pro-
cedure (bottom). This is an example ex-
tracted from the Spanish→Basque corpus (i.e.         However, and in order to reduce the error
Spanish is the source language). Although         that will introduce a reordering model into the
these sentences mean “We have to go day 10        system, we found to be very useful to com-
in the evening.”, the reordered spanish sen-      pute an n-best list of reordering hypothesis
tence would mean something like “Day ten in       and translate them all, selecting then as fi-
the evening go to we have.”.                      nal output sentence the one which obtains the
                                                  highest probability according to the models
original source language, but the distinctive     P r(t)·P r(sr |t). Ultimately, what we are actu-
ordering of the target language. An example       ally doing with this procedure is to constrain
of this procedure is shown in Figure 2. Hence,    the search space of permutations of the source
a reordering model can be learnt from the         sentence as well, but taking into account the
monotonized corpus, which will most likely        information that monotonized alignments en-
not depend on the output sentence, when-          tail. In addition, this technique implies a
ever the word-by-word translation is accurate     much stronger restriction of the search space
enough.                                           than previous approaches, reducing signifi-
   Hence, the reordering problem can be de-       cantly the computational effort needed.
fined as:
                                                  4            Translation experiments
         ′                  r                r
        s = argmax P r(s ) · P r(s|s )            4.1           Corpus characteristics

where P r(sr ) is the reordered language model,   Our system has been tested on a Basque-
and P r(s|sr ) is the reordering model. Being     Spanish translation task, a tough machine
this problem very similar to the translation      translation problem in which reordering plays
problem but with a very constrained transla-      a crucial role.
tion table, it seems only natural to use the         The corpus chosen for this experiment
same methods developed to solve the transla-                                e
                                                  is the Tourist corpus (P´rez et al., 2005),
tion problem to face the reordering problem.      which is an adaptation of a set of Spanish-
Hence, in this paper we will be using an ex-      German grammars generating bilingual sen-
ponential model as reordering model, defined       tence pairs (Vidal, 1997) in such languages.
as:                                               Hence, the corpus is semi-synthetic. In this
                                                  task, the sentences describe typical human
             P r(s|s′ ) ≈ exp(−       di )        dialogues in the reception desk of a hotel,
                                  i               being mainly extracted from tourist guides.
where di is the distance between the last re-     However, because of its design, there is some
ordered word position and the current candi-      asymmetry between both languages, and a
date position.                                    concept being expressed in several manners

 in the source language will always be trans-                              Basque Spanish translation
 lated in the same manner in the target lan-                   81                                             21
 guage. Because of this, the target language is
 meant to be simpler than the source language.                                                                20
 Since the input language during the design of

                                                  bleu score

                                                                                                                   wer rate
 the corpus was Spanish, the vocabulary size                                           reordered bleu
                                                               79                       reordered wer
 of Basque should be smaller. In spite of this                                           baseline bleu        18
 fact, the vocabulary size of Basque is bigger                                            baseline wer
 than that of Spanish, and this is due to the                  78
 agglutinative nature of the Basque language.
 The corpus has been divided into two sepa-                    77                                             16
                                                                    0       5           10          15   20
 rate subsets, a bigger one for training and a                                  size of n-best list
 smaller one for test. The characteristics of
 this corpus can be seen in Table 1.                       Figure 3: Evolution of translation quality
                                                           when increasing n for Basque to Spanish.
 4.2   System evaluation
 The SMT system developed has been auto-                                     Baseline      Reordered, n = 5
 matically evaluated by measuring the follow-                       WER       20.7%             16.2%
 ing rates:                                                         BLEU      77.9%             79.8%
WER (Word Error Rate): The WER cri-                                 PER       12.6%             11.0%
   terion computes the minimum number
                                                           Table 2: Results for Basque to Spanish trans-
   of editions (substitutions, insertions and
   deletions) needed to convert the trans-
   lated sentence into the sentence consid-
   ered ground truth. This measure is be-
   cause of its nature a pessimistic one,                     First, the bilingual pairs were aligned us-
   when applied to Machine Translation.                    ing IBM model 4 by means of the GIZA++
                                                           toolkit (Och and Ney, 2000). After this,
 PER (position-independent WER): This cri-                 the alignments were made monotone in the
    terion is similar to WER, but word order               way described in Figure 1 and a new align-
    is ignored, accounting for the fact that an            ment was recalculated, determining the new
    acceptable (and even grammatically cor-                monotone alignment between the reordered
    rect) translation may be produced that                 source sentence and the target, and a re-
    differs only in word order.                             ordered source sentence language model was
BLEU (Bilingual Evaluation Understudy)                     built. In addition, a phrase based model in-
    score: This score measures the precision               volving reordered source sentences and tar-
    of unigrams, bigrams, trigrams, and                    get sentences was learned by using the Thot
    4-grams with respect to a set of reference             toolkit (Ortiz et al., 2005).
    translations, with a penalty for too short                For the next step, the reordering model,
    sentences (Papineni et al., 2001). BLEU                we used the reordering model built in the
    is not an error rate, i.e. the higher the              toolkit Pharaoh. This was done by including
    BLEU score, the better.                                in the translation table only the words con-
                                                           tained in the vocabulary of the desired source
 4.3   Experimental setup and                              language, and allowing the toolkit to reorder
       translation results                                 the words by taking into account the lan-
 We used the reordering technique described                guage model and the phrase-reordering model
 above to obtain an n-best reordering hypoth-              it implements, which is an exponential model.
 esis list and translate them, keeping the best            Since in this case, the phrases are just words,
 scoring one.                                              what results is an effective implementation of

                      Spanish Basque translation
                                                                          terion the results obtained are better. At first
             88                                           20              sight, this might seem odd, since the PER cri-
             87                                           19              terion does not take into account word order
                                                          18              errors within a sentence, which is the main
                                                          17              problem reordering techniques try to solve.
bleu score

             85                    reordered bleu

                                                               wer rate
                                                          16              However, this improvement is explained be-
                                    reordered wer
             84                      baseline bleu        15
                                      baseline wer
                                                                          cause reordering the source sentence allows for
             83                                           14
                                                          13              better phrases to be extracted.
             82                                                              It is also interesting to point out that
             81                                           11              the translation quality when translating from
             80                                           10              Spanish to Basque is much higher than in the
                  0    5           10          15    20
                           size of n-best list                            opposite sense. This is due to the corpus char-
                                                                          acteristics described in the previous section:
Figure 4: Evolution of translation quality                                Spanish being the input language of the cor-
when increasing n for Spanish to Basque.                                  pus, it is only natural that the translation
                                                                          quality will worsen when reversing the meant
                      Baseline      Reordered, n = 5                      translation direction. In addition, it can also
              WER      19.5%             10.9%                            be observed that the reordering pipeline has
              BLEU     81.0%             87.1%                            less beneficial effects when translating from
              PER       6.2%              4.9%                            Basque to Spanish.
                                                                             Lastly, in Figure 4 and Figure 3, the re-
Table 3: Results for Spanish to Basque trans-                             sult of increasing the size of the n-best re-
lation.                                                                   ordering hypothesis list can be seen. In the
                                                                          case of Spanish-Basque translation, it can
                                                                          be seen how the translation quality still in-
an exponential word-reordering model, just as                             creases until size 20, where as in the case
we wanted.                                                                of Basque-Spanish the translation quality al-
   Once the n best reordering hypothesis had                              ready reaches its maximum with the first 5
been calculated, we translated them all by us-                            best hypothesis. However, it can also be
ing Pharaoh once again, and kept the best                                 seen that just using the best reordering hy-
scoring translation, being the score deter-                               pothesis already yields better results than
mined as the product of the (inverse) transla-                            without introducing the reordering pipeline.
tion model and the language model.                                        Hence, these figures also show that the
   As a baseline, we took the results of trans-                           phrase extraction process obtains better qual-
lating the same test set, but without the re-                             ity phrases when the monotonization proce-
ordering pipeline, i.e. just using GIZA++                                 dure has been implemented before the extrac-
for aligning, Thot for phrase extraction and                              tion takes place.
Pharaoh for translating. The results of this
                                                                          5   Conclusions and Future Work
setup can be seen in Table 3 and Table 2, with
n-best list size set to 5. At this point, it must                         A reordering technique has been imple-
be noted that Pharaoh by itself also performs                             mented, taking profit of the information that
some reordering of the output sentence, but                               monotonized corpora provide. By doing so,
only on a per-phrase basis.                                               better quality phrases can be extracted and
   These results show that the reordering                                 the overall performance of the system im-
pipeline established does have significant ben-                            proves significantly in the case of a pair of lan-
efits on the overall quality of the translation,                           guages with heavy reordering complications.
almost achieving a relative improvement of                                   This technique has been applied to trans-
50% in WER. Furthermode, it is interesting to                             late a semi-synthetic corpus which deals with
point out that even in the case of the PER cri-                           the task of Spanish-Basque translation, and

the results obtained prove to be statistically     F. Casacuberta and E. Vidal.         2004.    Ma-
significant and show to be very promising,            chine translation with inferred stochastic finite-
                                                     state transducers. Computational Linguistics,
specially taking into account that Basque is
an extremely complex language that poses
many problems for state of the art systems.        S. Kanthak, D. Vilar, E. Matusov, R. Zens, and
   Moreover, the technique we propose in              H. Ney. 2005. Novel reordering approaches in
                                                      phrase-based statistical machine translation. In
this paper is learnt automatically, without           Proceedings of the ACL Workshop on Building
any need of linguistic annotation or manu-            and Using Parallel Texts, pages 167–174, Ann
ally specified syntactic reordering rules, which       Arbor, Michigan.
means that out technique can be applied to
                                                   K. Knight. 1999. Decoding complexity in word-
any language pair without need for any addi-         replacement translation models.    Computa-
tional development effort.                            tional Linguistics, 25(4):607–615.
   Both reordered corpora and reordering
                                                   P. Koehn, F.J. Och, and D. Marcu. 2003. Statis-
techniques seem to have a very important po-          tical phrase-based translation. In Proceedings
tential for the case of very different language        of the 2003 Conf. of the NAACL on Human
pairs, which are the most difficult translation         Language Technology, volume 1, pages 48–54,
tasks.                                                Edmonton, Canada.
   As future work, we are planning on obtain-      S. Kumar and W. Byrne. 2003. A weighted
ing results with other non-synthetic, richer         finite state transducer implementation of the
and more complex corpora, as may be other            alignment template model for statistical ma-
                                                     chine translation. In Proceedings of the 2003
Spanish-Basque corpora or corpora involving
                                                     Conference of the NAACL on Human Language
language pairs such as Arabic, Chinese or            Technology, volume 1, pages 63–70, Edmonton,
Japanese. In addition, we are planning on            Canada.
developping more specific reordering models,
                                                   S. Kumar and W. Byrne. 2005. Local phrase re-
which will be more suitable for this task than        ordering models for statistical machine trans-
the exponential model described here, as well         lation. In Proceedings of Human Language
as searching and developing integrated ap-            Technology Conference and Conference on Em-
proaches trying to solve the reordering prob-         pirical Methods in Natural Language Process-
                                                      ing (HLT/EMNLP), pages 161–168, Vancouver,
lem.                                                  Canada.
Acknowledgements                                   E. Matusov, S. Kanthak, and H. Ney. 2005.
                                                     Efficient statistical machine translation with
This work has been partially supported by            constrained reordering. In In Proceedings of
the EC (FEDER) and the Spanish MEC un-               EAMT 2005 (10th Annual Conference of the
der grant TIN2006-15694-CO2-01 and by the            European Association for Machine Transla-
MEC scholarship AP2005-4023.                         tion), pages 181–188, Budapest, Hungary.
                                                   F. J. Och and H. Ney. 2000. Improved statistical
                                                     alignment models. In ACL 2000, pages 440–
References                                           447, Hongkong, China, October.
A.L. Berger, P.F. Brown, S.A. Della Pietra,        F.J. Och and H. Ney. 2004. The alignment
  V.J. Della Pietra, J.R. Gillet, A.S. Kehler,       template approach to statistical machine trans-
  and R.L. Mercer. 1996. Language translation        lation. Computational Linguistics, 30(1):417–
  apparatus and method of using context-based        449.
  translation models. In United States Patent
  5510981.                                         D. Ortiz, I. Garca-Varea, and F. Casacuberta.
                                                     2005.    Thot: a toolkit to train phrase-
Peter F. Brown, Stephen A. Della Pietra, Vin-        based statistical translation models. In Tenth
  cent J. Della Pietra, and Robert L. Mercer.        Machine Translation Summit, pages 141–148.
  1993. The mathematics of machine translation.      Asia-Pacific Association for Machine Transla-
  In Computational Linguistics, volume 19, pages     tion, Phuket, Thailand, September.
  263–311, June.

Papineni, A. Kishore, S. Roukos, T. Ward, and       E. Vidal. 1997. Finite-state speech-to-speech
  W. Jing Zhu. 2001. Bleu: A method for au-           translation. In Proceedings of ICASSP-97, vol-
  tomatic evaluation of machine translation. In       ume 1, pages 111–114, Munich, Germany.
  Technical Report RC22176 (W0109-022), IBM
  Research Division, Thomas J. Watson Research      J.M. Vilar, E. Vidal, and J.C. Amengual. 1996.
  Center, Yorktown Heights, NY.                       Learning extended finite-state models for lan-
                                                      guage translation. In Proc. of Extended Finite
A. P´rez, F. Casacuberta, M.I. Torres, and            State Models Workshop (of ECAI’96), pages
  V. Guijarrubia. 2005. Finite-state transducers      92–96, Budapest, August.
  based on k-tss grammars for speech translation.
  In Proceedings of Finite-State Methods and        R. Zens, H. Ney, T. Watanabe, and E. Sumita.
  Natural Language Processing (FSMNLP 2005),          2004. Reordering constraints for phrase-based
  pages 270–272, Helsinki, Finland, September.        statistical machine translation. In COLING
                                                      ’04: The 20th Int. Conf. on Computational Lin-
G. Sanchis and F. Casacuberta. 2006. N-best re-       guistics, pages 205–211, Geneva, Switzerland.
  ordering in statistical machine translation. In
  IV Jornadas en Tecnolog´a del Habla, pages 99–
  104, Zaragoza, Spain, November.


To top