Problems of Machine Translation

Document Sample
scope of work template
							                                                                 SB Program




                          Machine Translation




               Research Seminar on Software Business 21.5.2003
                                  Antti Ilmo




University of Jyväskylä
                                                   SB Program

     Outline

        Introduction
        Translation and Machine Translation Techniques
        The Early Machine Translation Systems
        Problems of Machine Translation
        Proposed Solutions to the Problems
        Summary




University of Jyväskylä
                                                     SB Program

     Introduction

        The Internet and globalisation have increased the
         need for localization of documentation and
         interaction between different nationalities
        Localization is expensive and time consuming
        Machine Translation a potential solution



        But…



University of Jyväskylä
                                                             SB Program

     Introduction (2)

        MT quality is not good enough
          – language works on many levels
              • interpretation
                  – dictionary may tell a meaning, but not how it is
                     interpreted
                       » competence, experience and internal models of
                          language users important
              • local usage etc. (Canadian French and French French)
                  – translation may sound ”wrong” in a dialect
              • typos
                  – syntactic errors occur


University of Jyväskylä
                                                SB Program

     Outline

        Introduction
        Translation and Machine Translation Techniques
        The Early Machine Translation Systems
        Problems of Machine Translation
        Proposed Solutions to the Problems
        Summary




University of Jyväskylä
                                                                                SB Program

     What is translation?

        Preservation of the original text
          – stylistic and semantic characteristics
              • word-for-word
              • meaning-for-meaning
        Rules of language
          – e.g. letters ”c”, ”a” and ”t” form a word only in the right order
        Translation process (translating) and translation product (translated
         text)
          – translation concept consists of both of the above
        Translator re-codes the message into a different language




University of Jyväskylä
                                                                     SB Program

     MT Technology

        Machine Translation (MT)
          – machine takes care of translation process
        Machine Aided Translation (MAT)
          – Machine-Assisted Human Translation (MAHT)
             • humans translate, machine assists
          – Human-Assisted Machine Translation (HAMT)
             • machine translates, humans assist
                  – e.g. choosing a correct word from a dictionary
        Terminology Databanks (TD)
          – technical terminology
              • most commonly used nowadays




University of Jyväskylä
                                                                      SB Program

     Linguistic Techniques

        Direct vs. indirect
          – direct uses word replacement
          – indirect tries to express a meaning
        Interlingua vs. transfer
          – Interlingua does not take into account variations in target
            languages
          – transfer approach uses language-specific meaning
        local vs. global
          – local scope uses word-level analysis
          – global scope analyses sentences or even more




University of Jyväskylä
                                                   SB Program

     Outline

        Introduction
        Translation and Machine Translation Techniques
        The Early Machine Translation Systems
        Problems of Machine Translation
        Proposed Solutions to the Problems
        Summary




University of Jyväskylä
                                                                  SB Program

     Early Systems (GAT)

        Georgetown Automatic Translation
          – one of the earliest MT projects
              • development began in 1952, in use 1964-1979
          – physics texts from Russian to English
          – replacement of words
          – no real linguistic theory
              • ”The spirit is willing, but the flesh is weak” translated to
                Russian and then back to English. The result: ”The wine
                is agreeable, but the meat has spoiled”




University of Jyväskylä
                                                               SB Program

     Early Systems (CETA)

        Centre d’Etudes pour la Traduction Automatique
          – launched in 1961 in Grenoble
          – in use 1967-71
              • approximately 400,000 words translated
          – Russian to French
          – sentence based analysis
          – Interlingua and transfer mixed
              • grammatical level vs. dictionary level
          – Realization: Interlingua approach not a good one




University of Jyväskylä
                                                                          SB Program

     Early Systems (SYSTRAN)

        one of the first systems marketed
        installed in 1970 (US Air Force Foreign Technology Division)
        used also at NASA and EURATOM
        semantic features ad hoc
        negative feedback at first
        post-editing found to be a good approach
          – GM of Canada claimed the system speeded up the work of human
            translators three to four times (3000-4000 words a day, approximately the
            same a human translator now translates with the help of translation
            workbenches)




University of Jyväskylä
                                                                         SB Program

     Early Systems (TAUM-METEO)

        TAUM-METEO was the first truly automatic MT system
        developed in 1960’s
        used by Canadian Meteorological Center
          – scanned network for English weather reports and translated them to
            French
        corrected its own errors without post-editors
          – forwarded offending content to human translators
        24,000 words/day
        problems
          – communication noise
          – misspellings
          – words missing from the dictionary
        specialised language made translations possible




University of Jyväskylä
                                                   SB Program

     Outline

        Introduction
        Translation and Machine Translation Techniques
        The Early Machine Translation Systems
        Problems of Machine Translation
        Proposed Solutions to the Problems
        Summary




University of Jyväskylä
                                                                     SB Program

     Problems

        Translation is not straightforward
          –   it is not replacing words for words
          –   word orders
          –   rewriting of text into another language
          –   choosing the right words
          –   e.g. imperative mood in English infinitive in French




University of Jyväskylä
                                                                         SB Program

     Problems (2)

        Automation of translation not easy
          – quality is poor
          – homographs
              • ”fan” a ventilator or an enthusiast
              • different word classes
                    – e.g. ”love” both a verb and a noun
                    – ”you” can be both singular and plural
          – idioms
              • e.g. ”country music” meaning type of music
          – personal pronouns
              • second person pronouns may vary in familiar and formal situations
          – also post-editing can take more time than translating from a scratch




University of Jyväskylä
                                                                            SB Program

     Problems (3)

        Morphological analysis
          – e.g. Chinese and Japanese do not use punctuations
              • sentences are not separated by anything
        Syntactic analysis
          – modifiers a problem
             • ”The boy saw a girl with a telescope”
                   – the girl had a telescope vs. the boy used a telescope to see a girl
        Analysis of context
          – 20-40 words in a sentence
              • 100 million possible translations
        There are always going to be problem cases




University of Jyväskylä
                                                   SB Program

     Outline

        Introduction
        Translation and Machine Translation Techniques
        The Early Machine Translation Systems
        Problems of Machine Translation
        Proposed Solutions to the Problems
        Summary




University of Jyväskylä
                                                                            SB Program

     AI-Based Approach

        Raman & Alwar 1990
        Conversations carried out across enquiry counters on railway stations
         in India
        System should understand a text before translating it
          – analysis of text to understand the meaning and storing it in a language-free
            semantic map
          – semantic maps used to generate translations
        Analyzer analyses one sentence at a time
          –   unnecessary adjectives not taken into account
          –   morphological analysis first
          –   building of semantic map second
          –   stages work concurrently
          –   large dictionary needed




University of Jyväskylä
                                                                 SB Program

     AI-Based Approach (2)

        Natural language generator builds a sentence in target
         language
          – analyzer’s result fed into the generator
          – translate everything vs. leave something out
          – definition of structure
              • words in right order and inflected correctly
          – minimal importance to style

        Successful in specific application and a restricted set of
         sentences




University of Jyväskylä
                                                                    SB Program

     Interactive Approach

        Sen, Zhaoxiong and Heyan 1997
        Knowledge of MT systems incomplete -> incorrect translations
        Possibility for an MT system to learn
          – quality should improve
        Interaction starts when a sentence is found that the system cannot
         analyse properly
          – message to the user
          – user responds with a coded message
              • updates systems knowledge base
          – interaction limited to three stages
              • lexical analysis
              • uncertain modifiers
              • multiple translations




University of Jyväskylä
                                                                             SB Program
     Multiple Translation Engines & Sentence
     Partitioning

        Ren, Shi and Kuroiwa 2000
        Multiple MT systems running in parallel
          –   all use different MT techniques
          –   controller coordinates translating
          –   each engine translates a sentence indepedently
          –   controller chooses the best translation
                • no proper translations leads to sentence partitioning
                • process starts from beginning
                • in the end the partitioned sentence is put back together




University of Jyväskylä
                                                                           SB Program
     Multiple Translation Engines & Sentence
     Partitioning (2)

        Parallel processing should improve success rate
          – correct translation preserved through procedures
          – combining the best translations should improve quality
        Morphological analysis
          –   analysis gives results that are used as inpupts for the engines
          –   engines are then ran on parallel
          –   if more than one result amount of engines increase
          –   if no results sentence is partitioned
                 • problem of partitioning a sentence e.g. Chinese & Japanese

        In a test situation with four engines the results improved dramatically
          – consumed time doubled
          – 1 MT system translated 45.6 % of sentences correctly
            with multiple engines the result was 74.2 % (Japanese to Chinese)




University of Jyväskylä
                                                   SB Program

     Outline

        Introduction
        Translation and Machine Translation Techniques
        The Early Machine Translation Systems
        Problems of Machine Translation
        Proposed Solutions to the Problems
        Summary




University of Jyväskylä
                                                                  SB Program

     Summary

        Definite solution is still to be found
        Biggest problems of MT are linguistic
           – it is very hard to cover all the rules and adjust them to all
             possible languages and variations
           – misspellings cause problems which means a very good
             proof-reading function is needed
        There is a long way to go before MT systems replace human
         translators
        Machine Translation can be used in applications where the
         language is very specific




University of Jyväskylä

						
Related docs
Other docs by wuxiangyu
2007 Ohio Summer Honors Institutes - TeacherWeb
Views: 47  |  Downloads: 0
Seasons Greetings - BlueToad
Views: 37  |  Downloads: 0
1HarryBrighouseslide
Views: 44  |  Downloads: 0
2000 census - PPT presentation
Views: 40  |  Downloads: 0
2 BACKGROUND AND REQUEST BRYANT _ STRATTON
Views: 36  |  Downloads: 0
1Scholarships_2008
Views: 49  |  Downloads: 0
2005-doctoral-summary
Views: 39  |  Downloads: 0