Making machine translation work

Document Sample
Making machine translation work Powered By Docstoc
					      Making machine
      translation work
By Stefan, Simon, Lisa, Nina and
Making machine translation work
   Introduction
   Human versus Machine Translation
   Methods in Machine Translation
   Example-Based Machine Translation
Making machine translation work
   Group work: HT vs. MT
   Try to translate the following proverb:

→ “Wer A sagt, muss auch B sagen.”

   HT: use your language knowledge
   MT: Use Babel Fish
Making machine translation work
   Possible solution:
               HT                       MT

    In for a penny, in for a   Who says A, also B
    pound.                     must say.

    In how far is such a translation
Human and Machine
       HT and MT differ in two main points:
         1. Mode of process
         2. Mode of product

       based on different specifications and
        theoretical positions
       both modes are used for comparison
Human and Machine
                    Mode of process
        By comparing the modes of process you:
    1.     gain knowledge about the respective stages
           and intersections
    2.     can make decisions about choices of alternative
    3.     … and about new designs of translation
Human and Machine
                    Mode of product
        By comparing the modes of product you:
    1.     check the appropriateness of the translation
    2.     figure out the most efficient method

→ the MT product must be usable in the same
   way as the human product
→ secure a basis of equality
Human and Machine
Another criterion for comparison:

  -   text input must be a constant so that the products
      are comparable
  -   → help to formulate guidelines for HT or MT texts
Human and Machine Translation
- translation processes -
           Translation as problem solving
Human and Machine Translation
- translation processes -
       Four major steps:

         (a ) SL linguistic de-composition
         (b) Problem identification at the SL linguistic
          and cognitive level
         (c) Problem solution at the cognitive and TL
          linguistic level (knowledge base)
         (d) TL linguistic re-composition
Human and Machine Translation
- translation processes -
   Characteristics of HT:
       Knowledge base is flexible
       Problems can be transferred
       Intuition/experience of the translator
       Knowledge base expands constantly
Human and Machine Translation
- translation processes -
       MT model of problem solving
Human and Machine Translation
- translation processes -
   Characteristics of MT:
       Knowledge base is relatively limited and rigid
       Has fixed and pre-established connections
       Limited possibility of transferring problems
       less semantic and pragmatic level experience
       Lack of essential world-knowledge
Human and Machine Translation
- translation processes –
            Major levels of comparison

    Human modules           Machine modules

Comprehension           Analysis

Matching                Transfer

Writing                 Generation/Synthesis
Human and Machine Translation
- translation processes –

   Comprehension vs. Analysis

             Human                    Machine
    adapts innovations        works retrospectively

    high amount of            limited amount of
    interpretative capacity   interpretative capacity
Human and Machine Translation
- translation processes –

   Matching vs. Transfer

            Human                    Machine

    compensation of items     equivalents cannot be
    which cannot be           pre-planned or
    matched in conventional   incorporated
Human and Machine Translation
- translation processes –

   Writing vs. Generation/Synthesis

          Human                    Machine

can respond to syntactic works prospectively
or lexical innovations or
can create equivalences
Human and Machine Translation
- translation products -
   Products can be compared with regard to
       to the nature of the output language
       to the produced text
Human and Machine Translation
- translation products -
             The nature of MT language
 MT language is constructed and artificial (the
  computer can‟t produce sentences on its own)
 it corresponds to the designer‟s perception of SL
  and TL
 has no creative potential (it is not as flexible and
  multifunctional as HT language)
 They exclude emotive, aesthetic of other meanings
→ each MT system produces its own language (i.e.
  Weidner English or Atlas English)
Human and Machine Translation
- translation products -
              The nature of MT language
   MT systems are one-way converter (they only
    recognize words that belong to the system)
   MT language often needs post-editing
Human and Machine Translation
- translation products -
     Flexibility vs. rigidity in text types
 MT lang. is conceived on the sentence level

→ no distinctions on the text type possible
→ MT systems can only handle text types they
  have been programmed for
→ unknown text types cause unacceptable
Human and Machine Translation
- translation products -
Human and Machine Translation
- translation products -
           Challenge for MT language
   construction of user-friendly articifial
   optimum transfer of information from SL/NL
    to AL
   to convince users that AL is equally efficient
    as NL
The Pragmatic Circumstances
of Automation in Translation
   Methods of MT
   Linguistic approach
   Semantic approach
   Users of MT systems
   Some MT systems
   Functional types of MT
Methods of MT
Linguistic approach

        three strategies:
    1.     Analysis of the source text
    2.     Mode of transfer
    3.     Generation of target text
Linguistic approach
Three main subtypes
 •   a) Language-pair-specific “direct” systems
        Earliest type of system
        Reflects the design philosophy of the 1950s and
        Exploited direct correspondences between two
Linguistic approach
Three main subtypes
 b) Interlingual systems
      SL text transformed into a semantic and syntactic
       representation (equivalent of the transfer phase) which
       is common to at least two languages
      That text in an other language can be generated from
       this representation
      “transform from a source language A into a target
       language B, using rules expressed in a third language
       C”. (Cherry. 1966)
      Two phases: 1. Analysing in terms of the interlingual
       representation 2. TL sentences are produced from this
Linguistic approach
Three main subtypes
 c) Transfer systems
     Analysis phase: SL text is processed to the depth
      required by the rules of its grammar
     Transfer phase: based on the target language
      transforming into a representation for the generation of
      a target language text
     Generation phase: the transfer representation is then
      transformed into a text in the TL without any further
      back-reference to the results of analysis.
The semantic approach
   Semantic processes only operate after the
    identification of syntactic structures.
   Chief components are semantic parsing, i.e.
    analysis of semantic features instead of, or in
    addition to, grammatical categories.
   The system does “understand” the SL text,
    before translation begins.
Users of MT systems
The translator as producer
   Machine to provide cheaper, faster and a
    larger volume of production, without
    significant loss of quality
   Clearly seen as a industry product
Users of MT systems
The writer as translation producer
   Writers gain a certain degree of
    independence from translators, who
    exclusively determined form and quality of
    the end product
   Writers may want to develop bi- or
    multilingual texts directly rather than write a
    text for subsequent translation
Users of MT systems
Readers of translation
 to be able to by-pass the time-consuming and
  costly human translation circuit, and instead
  obtain instant translations produced by an MT
Users of MT systems
The information supplier
 possibilities of providing translated versions
  automatically as part of the general
  information supply, e.g. multilingual versions
  of electronic journals or databases
Some MT systems
       Japanese system, based on structural transfer, for
        specialised technical texts
   CULT
       Interactive system, for on-line translation of texts
        in the field of mathematics from Chinese into
Some MT systems
       The Canadian Federal Government system for the
        production of bilingual French-English weather reports
       Oldest commercially available MT system, of un-edited
        output, for post-editing use, for restricted-language
        document input and for general use in the French Minitel
       Largest number of language pairs, all EC languages
Function types of machine
        Two possible modes of viewing automatic
    1.     See the computer as an aid to human
    2.     Accept that the computer provides a translation
           service sui generis which is not comparable to
           the human variety
MT as human translation aid
   MT as aids to translators
   Intended to accelerate the human process of
   Output is artificial to the extend that it does
    not conform to certain expectations
   End user still wants a human product, but will
    accept MT as long as it is either cheaper or
    produced more quickly
MT as human translation aid
   Systems are greatly improved by
    concentrating on particular text types and
    ranges of vocabulary
   Systems offer subject-specific modules of
    vocabulary and phraseology that can be
    switched into the process
Machine assisted human
Machine assisted human
   Check text against an automated dictionary
   Ignores common words and function words
   Looks up translation equivalents for special
    vocabulary items
   Speed up the process
Machine assisted human
Machine assisted human
   Text is pre-translated automatically
   Output not adequate for direct use or post-
   Offers words and expressions
   Translator reduce the time for dictionary look-
   Save the time of actually typing the found
    translation equivalents
Machine assisted human
Machine assisted human
   MT produces artificial language (AL2)
   Post-editing efforts must be less than that
    required for a full human translation
Machine assisted human
translation Three-stage machine
Machine assisted human
   Text is prepared for MT by human pre-editing
   System produces output in AL2 which post-
    editors can convert into a NL2 document
   Final document is not distinguishable from a
    human translation
Machine assisted translation
   These models of MT hide the true nature of
   Rather an aid than an alternative to human
   Application is limited
   simplest and the most difficult types of MT
    systems to design
   Examples: ALPS, ATLAS, WEIDNER,
Translation by reference to
existing models
   System scans existing documents by text-
    deconstruction method of text comparison
   Identifies similar passages and offer these to the
    translator as models for the new task
MT as text-type specific
independent systems
        „automatic‟ in the sense that human
         intervention is not required between input
         and output
        Is used
    1.    Without the intervention of a human translator
    2.    As a text-production system for previously edited
MT as text-type specific
independent systems
        Three forms of output:
    1.    Raw translation in AL2 suitable for post-editing
          and possible conversion to NL2
    2.    A final AL2 version which can be used almost in
          same way as natural language text, has been
    3.    Unedited final translation, i.e. an artificial
          language, which is acceptable for readers
Reader-oriented MT
   Readers accept „difficult-to-read‟ texts if they
    are cheap and above all fast
Reader-oriented MT
   Output is machine-produced and therefore by
    definition an artificial product which may be
    easier or more difficult to understand than a
    NL text
   Not comparable to a human translation
   L2 reader receive a text in L1
   Submit the text to MT in full knowledge that
    the output is a machine-translated text
Writer-oriented MT
   Writer know better than anybody else what
    they want to say
   Translators have to interpret what writers
    have said
   Machine asks questions about elements
    which it cannot analyse
Writer-oriented MT
Writer oriented editing of pre-
translated text
   System offer menus of existing SL text
    segments which are pre-translated
   E.g. business letters: choice of type of letter,
    separate menus within the types
   „Man does not translate a simple sentence by doing
    deep linguistic analysis, rather, man does
    translation, first, by properly decomposing an input-
    sentence into other language phrases, and finally by
    properly composing these fragmental translations
    into one long sentence. The translation of each
    fragmental phrase will be done by the analogy
    translation principle with proper examples as its
    reference.“ (Nagao)
Model EBMT
   EMBT does not presuppose an analytic
    it is an analog translation system

   Founded on:
   1) translation by decomposing
   2) translation of phrases
   3) composing fragments into long section
   EBMT consists of a bilingual corpus:

   1) a fixed corpus („how much is…“) of
   Example: How much is the bread? – Wie teuer ist das Brot?
   How much is the car? – Wie teuer ist das Auto?

   question varies by just one element
    (minimal pair: the bread/the car)
   Often linked with translation memory (TM)
   „…it must in fact be possible to produce a
    programme, which would enable the word
    processor to ‚remember„ whether any part of
    a new text typed into it had already been

    T9-typing within mobile phones
   Until the 80s  rule-based translations

   Research dominated by corpus-based approaches:

   1) statistical machine translation
   2) EBMT

   first suggested by Nagao Makoto in 1984
   soon attracted the attention of scientists in the field of natural
    language processing.
   First task in an EBMT system
   Searched for: a word or phrase that closely
    matches the source language

    Example: Where is the plate
    Correct translation: Wo ist der Teller
    and not: Wo ist die Platte
   most appropriate word is inserted
   Long passages  low probability of complete

   Short passages  probability of ambiguity

Sentences are not translated completely but
 are divided into smaller sections
 often incoherent translation results
 Problem of the size of the example database
 Some of the systems are more experimental
  than others
 Adding examples improves translation
 No improvement after an amount of
  examples, which is too broad
Problem: suitability of
   Some examples have identical translation

   Same phrase may have two different translations
    caused by inconsistency

   Too big variety of examples may cause problems
    with the choice of the exact word
   Ambiguity
   Can lead to overgeneralization
Problem: storage of examples
   Normally words are stored with no further
   To avoid ambiguity and to limit the choice:

Expansion of examples by adding contextual
context is regarded in order to help finding
 the right word
Suitable translation problems
   EBMT best suited for sublanguage translation

   EBMT is often more suitable than MT

   Antidote to: structure-preserving translation
    as first choice
 Most difficult step in EBMT process:
 Appropriate fragments have to be extracted
  from the text
 Problem: 1) words have to find its
  correspondence to the matched portions
             2) find the correct recombination,
  which is appropriate and grammatical
Boundary Friction: Problem of
Example: I ate the apple
 Translation: Ich aß der Apfel
 Example II: The apple is on the table
 Translation: Der Apfel liegt auf dem Tisch.

To solve the problem the translation system
  had to contain a grammatical system of the
  target language
   Examples should be similar in internal and external

   Example-retrieval can be scored on two counts:
    closeness of the match between the input text
    and the example
    the adaptability of the example, on the basis of
    the relationship between the representations of the
    example and its translations
   New generation of the target text
   Last action in translation-process

   Often not possible to put translated phrases together
Example: It„s raining outside – Es regnet nach draußen

Recombination has to make sure that the phrases are
  put together conformly
Example: It„s raining outside – Es regnet draußen
Computational Problems
Huge costs in terms of
matching/retrieval algorithms

SPEED as a main issue:
 A computer-translation has to be as fast as a
Flavours of EBMT
   Used as a component in a MT-system

EBMT can be used
with other engines
for certain problems
when some other component cannot deliver a result

EBMT = bitter rival to the existing engines
Example-based transfer
   Examples are stored as trees or other complex
    structures as „example-based transfer“ systems.

 „In these systems, source language input strings
 are analysed into structured representations in a
 conventional manner, only transfer is on the basis of
 examples rather than rules, and then generation of
 the target language output is again done in a
 traditional way.“ (H. Somers)
 Syntactic category:
 Example:
play baseball – yakyu o suru
play tennis – tenisu o suru
play the piano – piano o hiku
play the violine – baiorin o hiku

Different vocabulary for „play“ in Japanese, engine has to
   distinguish whether an instrument or sport is meant:

Play x (NP/sport) – x (NP) o suru
Play x (NP/instrument) – x (NP) o hiku
 Syntactic category:
 Example:
play baseball – yakyu o suru
play tennis – tenisu o suru
play the piano – piano o hiku
play the violine – baiorin o hiku

Different vocabulary for „play“ in Japanese, engine has to
   distinguish whether an instrument or sport is meant:

Play x (NP/sport) – x (NP) o suru
Play x (NP/instrument) – x (NP) o hiku
   Semantic category:

   A word must be chosen first
   Word is generated
   Word-level rule is made up

   The quality of the translation rules depends on the
    quality of the thesaurus
   Works best with non-idiomatic texts
   Automatic category

   A simpler approach
   Less initial analysis of the corpora

I am coming – geliyorum
I am going – gidiyorum

I am come+ing – gel+Hyor+yHm
I am go+ing – gid+Hyor+yHm

- „I am“ stays fixed, while „come“ and „go“ differ
Multi-engine system
 EBMT + two other techniques: knowledge
  based MT and lexical transfer engine
 Multi-engine system: combines EBMT with
  rule-based and corpus-based approaches
 User can

modify the results
intervene in the choice of translation
edit the output
   What counts as EBMT?
   Use of a bilingual corpus
   Use of a reference corpus

-What is the aim of EBMT?
to generalize the examples as much as possible

-What is the problem of EBMT?
 Some translations are suitable, some are not
   Advantages of EBMT

   Examples are real language data – overgeneration is reduced

Linguistic knowledge can be more easily enriched by adding more

can be quickly developed

not as a rival but as an alternative
   Somers, H. (2003). An overview of EBMT. In: Michael
       Carl and Andy Way (eds) Recent advances in
       Example-Based Machine Translation, Dordrecht:
       Kluwer, 3-57.
   Sager, J. (1994). Language engineering and translation:
       consequences of translation. Amsterdam. 267-292

Shared By: