An Integrated Architecture for Example-Based Machine Translation

Document Sample
An Integrated Architecture for Example-Based Machine Translation Powered By Docstoc
					An Integrated Architecture for Example-Based Machine Translation

                               Alexander Franz, Keiko Horiguchi, Lei Duan,
                             Doris Ecker, Eugene Koontz, and Kazami Uchida
                          Spoken Language Technology, Sony US Research Labs
                                                   3300 Zanker P,oad
                                             San Jose, CA 95134, USA
                          {amf, kei ko, lei, do ri s,eko ontz, kuchid a }@ sit. sel. so

                                                               separate linguistic modules allows extending the
                       Abstract                                system to much broader domains. The
    This paper describes a machine translation                 HARMONY architecture for lkybrid Analogical
    architecture that integrates the use el'                   aud rule-based njachine translation of naturally_
                                                               occurring colloquial hmguagc combines the
    examples for flexible, idiomatic translations
                                                               adwmtages of both these approaches.
    with the use o1' linguistic rules for broad
    coverage and grammatical acctu'acy. We
    have     implemented     a prototype       for
                                                               2     The Travel Domain
    English-to-Japanese translation, and our
    ewfluation shows lhat tile system has good                 Our prototype implementation o1' the HARMONY
    translation quality, and only requires                     architecture was designed to cover the "travel
    reasonable computational resources.                        domain". This is composed of words, phrases,
                                                               expressions,     and    sentences   related   to
1       Introduction                                           international travel, similar to what is covered
                                                               by typical travel phrase books.
Machine translation by analogy lo pairs of
corresponding expressions in the source and
                                                               Two principles guided our detailed definition of
target languages, o r "example-based transhtlion",
                                                               the translation domain. FirsL the translation
was firs! proposed by (Nagao 1984). Recent
                                                               domain should not be limited to a narrow
work in the example-based I'ramework inchldes
                                                               sub-domain, such as appointment scheduling or
memory-based translation (Sate & Nagao 1990),
                                                               hotel rcscrwttions. Second, the expressions
similarity-driven translation (Watanabe 1992),
                                                               considered in the domain should reflect the fact
transl'cr-driven nlachine translation (Furusc &
                                                               that people quickly adapt to limitations in
Iida     1996),   and    patten>based    machine
                                                               human-machiue         or      machine-mediated
translation (Watanabe & Takeda 1998).
                                                               communication by simplifying the input. For
                                                               example, (Sugaya et al. 1999) found that the
The example-based approach promises easy
                                                               average length el' actual human utterances in a
translation knowledge acquisition, more flexible
                                                               hotel resmwation task using speech translation
transfer than brittle rule-based approaches, and
                                                               was only 6.1 words, much shorter than some o1'
idiomatic translations. At the same time, the use
                                                               the data that has been used in previous work on
o1' linguistic rules offers a number of important
                                                               speech translation.
benel'its. Detailed linguistic analysis can allow
an example-based machine translation system to
                                                               'File current vocabulary o1' 7,500 woMs is
handle a wide variety of input, since rules can be
                                                               divided into a group o1' general words, a number
used to factor out all linguistic wMations that do
                                                               of extensiblc word groups (such as names el'
not influence tile exampled)ased transfer.
                                                               food items or diseases), and a number of
Rule-based language generation from detailed
                                                               area-specific woM groups (such as names of
linguistic representations can lead to higher
                                                               cities or tourist destinations).The travel domain
grammatical output quality. Finally, a modular
                                                               is divided into eight "situations": A general
system          architecture       that       uses
                                                               situation (including everyday conversation);
domain-independent linguistic regularities in
                                                               transportation;    accommodation; sightseeing;

shopping; wining, dining, and nightlife; banking          feature-structure-based rewrite grammars. GPL
and postal; and doctor and pharmacy.                      is a l'ormalism that allows the direct expression
                                                          of linguistic algorithms l'or parsing, transfer, and
We created a corpus for this dolnain, and                 generation. Some ideas in GPL can be traced
divided it into a development set o1' 7,000               back to Tolnita's pseudo-unification fornmlisln
expressions, and a separate, unseen test set of           (Tomita 1988), and to Lexical-Functional
5,000 expressions.    The development set is              Grammar (Dalrymple et al. 1995).                GPL
used for creation and refinelnent of the                  includes variables, simple and complex tests,
translation knowledge sources, and the test set is        and various manipulation operators. GPL also
only used for evaluations. (Each evaluation               includes control flow statements including
uses a new, random 500-word sample from the               if-then-else,      switch,       iteration      over
5,000 word test set.)                                     sub-feature-structures, and other features. An
                                                          example of a silnplified GPL rule for English
The corpus was balanced to illustrate the widest          generation is shown in Figure 1.
possible variety of types o1' words, phrases,
syntactic structures, semantic patterns, and
pragmatic functions. The average length el' the                 Wll SENT --) NP YN_SENT {
                                                                      !exist[Sin VP SUBJ WH];
expressions in the corpus is 6.5 words. Some                          local-variable WIt_VP = [$m VP];
examples from the development corpus are                              local-wlriable WH PHP, ASE;
                                                                     $WH PHRASE = find-subfstruct in $ W H V P
shown below. Even though this dolnain might                                            where (?exist[$x WH]);
seem rather limited, it still contains inany                         $d I = [$WH_PHRASE SLOT-VALUE];
challenges for machine translation.                                  [$WH PHRASE SLOT-VALUE TRACE] = '+';
                                                                     $d2 = $m;
•     Can I have your last name, please ?                             Figure I   Example of a GPLGeneration P,ule
•     Is this the bus f o r Shinagawa station ?
                                                          3.2      The GPL Compiler
•     1 would like to make a reservation f o r two
      people f o r eight nights.                          GPL grammars are compiled into C code by the
                                                          GPL compiler. The GPL compiler was created
•     Can you tell us where we can see some
                                                          using the Unix tools lex and yacc (Levine et al.
      Buddhist temples ?
                                                          1990). For each rewrite rule, the GPL compiler
•     Most supermarkets sell liquor.                      creates a main action function, which carries out
•     Can you recommend a            good    Chinese      most of the tests and manipulatious specified by
      restaurant in this area ?                           the GPL statements.
•     I'd like to change 500 Dollars' in traveller's
      checks into Yen.                                    The GPL compiler handles disiunctive feature
                                                          structures in an efficient manner by keeping
•     Are there any English..speaking doctolw at          track of sub-feature-structure references within
      the hospital?                                       each GPL rule, and by generating an expansion
                                                          function that is called once before the action
3      NLP Infrastructure
                                                          function. The coinpiler also tracks variable
The prototype implementation is coustructed out           references, and generates and tracks separate test
of components that are based on a powerful                functions for nested test expressions.
infrastructure for natural language processing            3.3      The GPL Run-time Environment
and language engineering. The three inain
aspects of this infrastructure are the Grammar            The result of compiling a GPL grainmar is an
Programming Language (GPL), the GPL                       encapsulated object that can be accessed via a
compiler, and the GPL runtime environment.                public interface function. This interface fnnction
                                                          serves as the link between the compiled GPL
3.1    The Grammar Programming Language                   grammars,           and          the         various
The Gralnmar Programming Language (GPL) is                language-independent and domain-independent
an imperative programming language for                    software engines for parsing, transfer, generation,
                                                          and others. This is illustrated in Figure 2.

                                                                                        English lnpul
         GI)I+Grammar J                       Feature Slructure ]
                              GI)L Compiler          ]                     I              with
                                                                                   lmtlice Single-word
                                                                                  gexical l:cature,qlructures
                             hllcrface Function                                     Mulli-wordMatching
                      Aclion leunctions
                      l£xpansionFunclions                                  ~.~     LaHicewith Singleand
                      Test Ftmctions                                             Multi-word texical t"eature
                  Compiled GI)L Grammar
                                                                           I gexical Anlbiguity P,educlion
      Software Engine                 Feature Structtlre Library
      (parsers, Iransfers,                                                         P.educedl.exical Feature
       gellcrators,   ...)            P,el)resentaiion                                N[I'UCI.LII'eLattice
                                      Testing code                                                                    Thesaurus
           Figure 2 GI) L Run-time Environmenl
                                                                           V               I'm'sing
                                                                                                                I     Compiled
                                                                                 Senlei|lial FealureSlruclure          l'arsing
The compiled GPI. grammars use the feature
structure library, which provides services for
                                                                                  Figure 3 Architecture el'the Analysis Module
efficiently representing, testing, manipulating,
and managing memory for feature structures. A
special-purpose inemory manager maintains                                  Lexical ambiguity reduction reduces the nulnber
separate stacks of memory pages for each object                            el' arcs in the word lattice. This module carries
size. This scheme allows garbage colleclion                                out part-of-speech tagging over the lattice, and
that is so fast that it can be performed after every                       reduces the lattice to those lexical feature
aUempted GPL rule execution. In our                                        structures that arc part of the number of best
experiments with Japanese and English parsing,                             paths that represents tile best speed/accuracy
we found that l)el'-rule garbage collection                                trade-off (currently two). This calculation is
reduced      the overall       read/write memory                           based on the usual lexical and contextual bigram
requirements by as much as a factor of four to                             probabilities that were estimated from a training
six.                                                                       corpus, but it also takes into account manual
                                                                           costs that can be added to lexicon entries, or to
4        Source Language Analysis                                          individual part-of-speech bigrams.
Translation is divided into the steps ot' analysis,
transfer, and generation. Sourcc-hmguage                                   The resulting reduced lattice with lcxical
analysis is illustrated in Figure 3.                                       single-word and multi-word feature structures is
                                                                           parsed using tilt GLR parsing algorithm
English analysis begins with tokenization and                              extended to lattice input (Tomita 1986). The
morphological analysis, which creates a lattice                            English parsing grammar consists of 540 GPL
that contains lexical feature structures. I)uring                          rules. The output is a sentential feature structure
multi-word matching, expressions from tile                                 that represents the input to the transfer
multi-word lexicon (such as White House or take                            component.
on) are detected in the word lattice, and new arcs
with the appropriate lexical feature structures are

5     Transfer                                                                  At each invocation of the example matching
                                                                                procedure, linguistic constraints fl'om the
Transfer I¥om the source-language sentential
                                                                                transfer grammar are used to limit the search
feature structure to the target-language sentential
                                                                                space to appropriate examples. In an ol'l'-line
feature structure is accomplished with a hybrid
                                                                                step, these constraints are pre-compiled into a
rule-based and example-based method. This is
                                                                                complex index that allows a preliminary fast
illustrated in Figure 4.
                                                                                match. Examples that survive the fast match are
                                                                                matched and aligned with the input feature
            I EnglishScntcntial )                                               structure (or sub-feature-structure, during
                                                                                recursive invocations) using the thesaurus to
                                                                                calculate word similarity, and using various
     [Compiled l
     [Tra,,srer /          I                  [Database        I                other constraints and costs for inserting, deleting,
                                                                                or altering slots and features. Rather than rely on
             1                                                                  the exact distance in the thesaurus to calculate
                       1.inguistic ~                    Examl)le                lexical similarity, we use a scheme that is based
             ! I       Tra,,.<er ~                      Matching[
                                                                                on the information content of thesaurus nodes,
                                                                                similar to (Resnik 1995).
             ;iiiiii!iiiiiiiiiiii . . . . . . . . . .               ......
                                                                                6      Target-language Generation
                                                                                The Japanese target-language feature structure
                                                                                l'orms the input to the generation module, which
                                                                                is summarized in Figure 5 below. This module
      Figure 4 Architecture of the Transfer Module
                                                                                also consists o1' a rule-rewriting software engine,
                                                                                executing     the compiled         GPL Japanese
The input feature structure is passed to the                                    generation grammar, which consists ol' 200 GPL
linguistic transfer procedure. This consists of a                               rules. The generator uses the Japanese lexicon to
rule-rewriting software engine that executes the                                create the Japanese target-language expression.
compiled English-to-Japanese transfer grammar.
The transfer grammar consists of 140 GPL rules,
and its job is to specify linguistic constraints on                                     FeatureSlrtlclure         Compiled
examples, combine multiple examples, transfer                                                                     Granll/lar
informatiou that is beyond the scope of the
example database, and perl'orm various other                                                Generation
transformations. The overall effect is to broaden                                            Module
the linguistic coverage, and to raise the
grammatical accuracy far beyond the level of a
traditional example-based transfer procedure.
The linguistic transfer procedure operates on the
input feature structure in a recursive manner,                                      Figure 5 Architecture of~.he Generation Module
and it invokes the example matching procedure
to find the best translation example for various
parts of the input. The example matching                                        7      Evaluation and Conclusions
procedure retrieves the best translation examples
from the example database, which contains                                       We evaluated the trauslation system using a
14,000 example pairs ranging from individual                                    random 500-expression sample from the unseen
words to entire sentences. In an ofl'-line step, the                            test set (see Section 2 above). The translatious
example pairs are parsed, disambiguated, and                                    were manually assigned to one of the following
indexed for corresponding constituents using a                                  categories o1' translatiou quality:
Treebanking tool.
                                                                                Failure. Complete translation failure, due to
                                                                                lack of coverage of a rule-based component.

W r o n g . A translation that is COlnpletely wrong,                      Acknowledgements
or that has major errors in an important part,
such as in the main clause.                               Our thanks go to Robert Bowen, Benjamin
M a j o r Problem.       A translation that has a         Hartwell, Chigusa Inaba, Kaori Shibatani,
missing, extra, or incorrect constituent, such as a       Hirono Stonelake, and Kazue Watanabe for their
subject, object, or adjectival/prepositional              language engineering elTorts, and to Edward tto
predicate.                                                for user interface and application development.
M i n o r Problem. A translation that l-ms a
missing, extra, or incorrect minor part, such as                               References
an intensifier, tense, aspect, temporal or locative
adjunct, adverb, adjective or other prenominal            Dalryml)le, M., R.M. Kaplan, J.T. Maxwell lll, and A.
modifier, prepositional phrase, verb conjugation            Zacncn, cds. (1995)              l;ormal Issues in
form, adjective form, or required word or                   Lexical-Functional Grammar. CSLI Lecture Notes
constituent order.                                          47, Stanford, CA.
Stylistic Problem. Slylistic problems include             Furuse, O. and H. Iida (1996)              "incremental
awkward but tolerable word order, incorrect                 Iranslation utilizing constituent-I~oundary patlcrns",
                                                            in Proceedings of COL1NG-96, pages 412-417.
Japanese particles, incorrect idioms, and silnilar.
Flawless. A translation that does not exhibit any         Lcvine, J.R., T. Mason, and D. Brown (1990), lex &
of the above problems is considered flawless.
                                                            yacc (Second Editio,), O'Rcillcy and Associates,
                                                            Sebastopol, CA.
The results of the ewtluation are shown in Table          Nagao, M. (1984) "A framework of a Machine
1 below.     Overall, 84% of the translations               Translation between Japanese and English by
                                                            analogy principle", in Artificial and tlumatt
convey the meaning in an acceptable manner.
                                                            Intelligence, A. 17Jithorn and R. Baneiji (eds.),
We also ewtluated the computational resource                North Holland, pages 173--180.
requirements of the system. On a Pentium II1
                                                          Resnik, P. (1995) "Using information content to
running at 500 MHz, the average translation                 evalualc semantic similarity in a taxonomy", in
speed was 0.44 seconds. The memory                            Proceedings of LICAI-95.
requirements are summarized in Table 2 below.
                                                          Sale, S. and M. Nagao (1990)            "Toward
                                                            memo,y-based lratlslalioD", ill Proceedings of
 Flawless                                     60%           COLING-90, vol. 3, Helsinki, Finhmd, pages
 Stylistic Problem                              9%
 Minor F'roblcm                                14%          247--252.
 Acceptable with OOV                            1%        Sugaya, F., T. Takczawam A. Yokoo, and S.
 Major ProMem                                   9%          Yamamolo (1999)        '%rid-to-end ewtlualion in
 Wrong Translation                              5%          ATP,-Matrix", in Proceedings of Fmrospeech-99,
 Translation IVailure                           3%          Bt,dapest, Hungary, pages 2431--2434.
              Table 1 Translalion Quality                 Tomita, M.       (1988)    The Generalized LR
                                                            I'ars'eMCompiler (Version 8.1): User's Guide.
 Read-only Memory lbr Code and Data           6MB           Technical Memorandum CMU-CMT-88-Memo,
 Read-only Melnory for l)iclionary,                         Cenler for Machine Translation, Carnegie Mellon
 Examples, Fast Match Index, etc.            23MB           University.
 Read/Write Memory for Feature Slructures    14MB
 Read/Write Memory fo,"Software Engines       4MB         Tonlila, M., "All efficient word lattice parsing
                                                            algoritlun for conlinuous speech recognition", ill
             Table 2 Memoryl,Iequirements                   l'roceedings of ICASSP-86, Tokyo, Japan, pages
                                                          Walanabe, H. (1992) "A similarily-driven transfer
Our plans for further work include extending the
                                                            system", in Proceedings of COLING-92, Nantes,
size of the input w)cabulary, and developing                France, pages 770-776.
mechanisms for closer integration with speech
                                                          Walanabe, H. and K. Takeda (1998)                 "A
recognition and speech synthesis components for             pattern-based    Machine      Translation   syslem
speech-to-speech translation. We are also                   extended by exanll~le-based processing", in
working on the Japanese-to-English translation              Proceedings      of   ACL-COLING-98,         pages
direction, and we plan to report results on this in          1369-1373.
the future.


Shared By: