Docstoc

Problems of Machine Translation

Document Sample
Problems of Machine Translation Powered By Docstoc
					                                                                 SB Program




                          Machine Translation




               Research Seminar on Software Business 21.5.2003
                                  Antti Ilmo




University of Jyväskylä
                                                   SB Program

     Outline

        Introduction
        Translation and Machine Translation Techniques
        The Early Machine Translation Systems
        Problems of Machine Translation
        Proposed Solutions to the Problems
        Summary




University of Jyväskylä
                                                     SB Program

     Introduction

        The Internet and globalisation have increased the
         need for localization of documentation and
         interaction between different nationalities
        Localization is expensive and time consuming
        Machine Translation a potential solution



        But…



University of Jyväskylä
                                                             SB Program

     Introduction (2)

        MT quality is not good enough
          – language works on many levels
              • interpretation
                  – dictionary may tell a meaning, but not how it is
                     interpreted
                       » competence, experience and internal models of
                          language users important
              • local usage etc. (Canadian French and French French)
                  – translation may sound ”wrong” in a dialect
              • typos
                  – syntactic errors occur


University of Jyväskylä
                                                SB Program

     Outline

        Introduction
        Translation and Machine Translation Techniques
        The Early Machine Translation Systems
        Problems of Machine Translation
        Proposed Solutions to the Problems
        Summary




University of Jyväskylä
                                                                                SB Program

     What is translation?

        Preservation of the original text
          – stylistic and semantic characteristics
              • word-for-word
              • meaning-for-meaning
        Rules of language
          – e.g. letters ”c”, ”a” and ”t” form a word only in the right order
        Translation process (translating) and translation product (translated
         text)
          – translation concept consists of both of the above
        Translator re-codes the message into a different language




University of Jyväskylä
                                                                     SB Program

     MT Technology

        Machine Translation (MT)
          – machine takes care of translation process
        Machine Aided Translation (MAT)
          – Machine-Assisted Human Translation (MAHT)
             • humans translate, machine assists
          – Human-Assisted Machine Translation (HAMT)
             • machine translates, humans assist
                  – e.g. choosing a correct word from a dictionary
        Terminology Databanks (TD)
          – technical terminology
              • most commonly used nowadays




University of Jyväskylä
                                                                      SB Program

     Linguistic Techniques

        Direct vs. indirect
          – direct uses word replacement
          – indirect tries to express a meaning
        Interlingua vs. transfer
          – Interlingua does not take into account variations in target
            languages
          – transfer approach uses language-specific meaning
        local vs. global
          – local scope uses word-level analysis
          – global scope analyses sentences or even more




University of Jyväskylä
                                                   SB Program

     Outline

        Introduction
        Translation and Machine Translation Techniques
        The Early Machine Translation Systems
        Problems of Machine Translation
        Proposed Solutions to the Problems
        Summary




University of Jyväskylä
                                                                  SB Program

     Early Systems (GAT)

        Georgetown Automatic Translation
          – one of the earliest MT projects
              • development began in 1952, in use 1964-1979
          – physics texts from Russian to English
          – replacement of words
          – no real linguistic theory
              • ”The spirit is willing, but the flesh is weak” translated to
                Russian and then back to English. The result: ”The wine
                is agreeable, but the meat has spoiled”




University of Jyväskylä
                                                               SB Program

     Early Systems (CETA)

        Centre d’Etudes pour la Traduction Automatique
          – launched in 1961 in Grenoble
          – in use 1967-71
              • approximately 400,000 words translated
          – Russian to French
          – sentence based analysis
          – Interlingua and transfer mixed
              • grammatical level vs. dictionary level
          – Realization: Interlingua approach not a good one




University of Jyväskylä
                                                                          SB Program

     Early Systems (SYSTRAN)

        one of the first systems marketed
        installed in 1970 (US Air Force Foreign Technology Division)
        used also at NASA and EURATOM
        semantic features ad hoc
        negative feedback at first
        post-editing found to be a good approach
          – GM of Canada claimed the system speeded up the work of human
            translators three to four times (3000-4000 words a day, approximately the
            same a human translator now translates with the help of translation
            workbenches)




University of Jyväskylä
                                                                         SB Program

     Early Systems (TAUM-METEO)

        TAUM-METEO was the first truly automatic MT system
        developed in 1960’s
        used by Canadian Meteorological Center
          – scanned network for English weather reports and translated them to
            French
        corrected its own errors without post-editors
          – forwarded offending content to human translators
        24,000 words/day
        problems
          – communication noise
          – misspellings
          – words missing from the dictionary
        specialised language made translations possible




University of Jyväskylä
                                                   SB Program

     Outline

        Introduction
        Translation and Machine Translation Techniques
        The Early Machine Translation Systems
        Problems of Machine Translation
        Proposed Solutions to the Problems
        Summary




University of Jyväskylä
                                                                     SB Program

     Problems

        Translation is not straightforward
          –   it is not replacing words for words
          –   word orders
          –   rewriting of text into another language
          –   choosing the right words
          –   e.g. imperative mood in English infinitive in French




University of Jyväskylä
                                                                         SB Program

     Problems (2)

        Automation of translation not easy
          – quality is poor
          – homographs
              • ”fan” a ventilator or an enthusiast
              • different word classes
                    – e.g. ”love” both a verb and a noun
                    – ”you” can be both singular and plural
          – idioms
              • e.g. ”country music” meaning type of music
          – personal pronouns
              • second person pronouns may vary in familiar and formal situations
          – also post-editing can take more time than translating from a scratch




University of Jyväskylä
                                                                            SB Program

     Problems (3)

        Morphological analysis
          – e.g. Chinese and Japanese do not use punctuations
              • sentences are not separated by anything
        Syntactic analysis
          – modifiers a problem
             • ”The boy saw a girl with a telescope”
                   – the girl had a telescope vs. the boy used a telescope to see a girl
        Analysis of context
          – 20-40 words in a sentence
              • 100 million possible translations
        There are always going to be problem cases




University of Jyväskylä
                                                   SB Program

     Outline

        Introduction
        Translation and Machine Translation Techniques
        The Early Machine Translation Systems
        Problems of Machine Translation
        Proposed Solutions to the Problems
        Summary




University of Jyväskylä
                                                                            SB Program

     AI-Based Approach

        Raman & Alwar 1990
        Conversations carried out across enquiry counters on railway stations
         in India
        System should understand a text before translating it
          – analysis of text to understand the meaning and storing it in a language-free
            semantic map
          – semantic maps used to generate translations
        Analyzer analyses one sentence at a time
          –   unnecessary adjectives not taken into account
          –   morphological analysis first
          –   building of semantic map second
          –   stages work concurrently
          –   large dictionary needed




University of Jyväskylä
                                                                 SB Program

     AI-Based Approach (2)

        Natural language generator builds a sentence in target
         language
          – analyzer’s result fed into the generator
          – translate everything vs. leave something out
          – definition of structure
              • words in right order and inflected correctly
          – minimal importance to style

        Successful in specific application and a restricted set of
         sentences




University of Jyväskylä
                                                                    SB Program

     Interactive Approach

        Sen, Zhaoxiong and Heyan 1997
        Knowledge of MT systems incomplete -> incorrect translations
        Possibility for an MT system to learn
          – quality should improve
        Interaction starts when a sentence is found that the system cannot
         analyse properly
          – message to the user
          – user responds with a coded message
              • updates systems knowledge base
          – interaction limited to three stages
              • lexical analysis
              • uncertain modifiers
              • multiple translations




University of Jyväskylä
                                                                             SB Program
     Multiple Translation Engines & Sentence
     Partitioning

        Ren, Shi and Kuroiwa 2000
        Multiple MT systems running in parallel
          –   all use different MT techniques
          –   controller coordinates translating
          –   each engine translates a sentence indepedently
          –   controller chooses the best translation
                • no proper translations leads to sentence partitioning
                • process starts from beginning
                • in the end the partitioned sentence is put back together




University of Jyväskylä
                                                                           SB Program
     Multiple Translation Engines & Sentence
     Partitioning (2)

        Parallel processing should improve success rate
          – correct translation preserved through procedures
          – combining the best translations should improve quality
        Morphological analysis
          –   analysis gives results that are used as inpupts for the engines
          –   engines are then ran on parallel
          –   if more than one result amount of engines increase
          –   if no results sentence is partitioned
                 • problem of partitioning a sentence e.g. Chinese & Japanese

        In a test situation with four engines the results improved dramatically
          – consumed time doubled
          – 1 MT system translated 45.6 % of sentences correctly
            with multiple engines the result was 74.2 % (Japanese to Chinese)




University of Jyväskylä
                                                   SB Program

     Outline

        Introduction
        Translation and Machine Translation Techniques
        The Early Machine Translation Systems
        Problems of Machine Translation
        Proposed Solutions to the Problems
        Summary




University of Jyväskylä
                                                                  SB Program

     Summary

        Definite solution is still to be found
        Biggest problems of MT are linguistic
           – it is very hard to cover all the rules and adjust them to all
             possible languages and variations
           – misspellings cause problems which means a very good
             proof-reading function is needed
        There is a long way to go before MT systems replace human
         translators
        Machine Translation can be used in applications where the
         language is very specific




University of Jyväskylä

				
DOCUMENT INFO