Problems of Machine Translation
Document Sample


SB Program
Machine Translation
Research Seminar on Software Business 21.5.2003
Antti Ilmo
University of Jyväskylä
SB Program
Outline
Introduction
Translation and Machine Translation Techniques
The Early Machine Translation Systems
Problems of Machine Translation
Proposed Solutions to the Problems
Summary
University of Jyväskylä
SB Program
Introduction
The Internet and globalisation have increased the
need for localization of documentation and
interaction between different nationalities
Localization is expensive and time consuming
Machine Translation a potential solution
But…
University of Jyväskylä
SB Program
Introduction (2)
MT quality is not good enough
– language works on many levels
• interpretation
– dictionary may tell a meaning, but not how it is
interpreted
» competence, experience and internal models of
language users important
• local usage etc. (Canadian French and French French)
– translation may sound ”wrong” in a dialect
• typos
– syntactic errors occur
University of Jyväskylä
SB Program
Outline
Introduction
Translation and Machine Translation Techniques
The Early Machine Translation Systems
Problems of Machine Translation
Proposed Solutions to the Problems
Summary
University of Jyväskylä
SB Program
What is translation?
Preservation of the original text
– stylistic and semantic characteristics
• word-for-word
• meaning-for-meaning
Rules of language
– e.g. letters ”c”, ”a” and ”t” form a word only in the right order
Translation process (translating) and translation product (translated
text)
– translation concept consists of both of the above
Translator re-codes the message into a different language
University of Jyväskylä
SB Program
MT Technology
Machine Translation (MT)
– machine takes care of translation process
Machine Aided Translation (MAT)
– Machine-Assisted Human Translation (MAHT)
• humans translate, machine assists
– Human-Assisted Machine Translation (HAMT)
• machine translates, humans assist
– e.g. choosing a correct word from a dictionary
Terminology Databanks (TD)
– technical terminology
• most commonly used nowadays
University of Jyväskylä
SB Program
Linguistic Techniques
Direct vs. indirect
– direct uses word replacement
– indirect tries to express a meaning
Interlingua vs. transfer
– Interlingua does not take into account variations in target
languages
– transfer approach uses language-specific meaning
local vs. global
– local scope uses word-level analysis
– global scope analyses sentences or even more
University of Jyväskylä
SB Program
Outline
Introduction
Translation and Machine Translation Techniques
The Early Machine Translation Systems
Problems of Machine Translation
Proposed Solutions to the Problems
Summary
University of Jyväskylä
SB Program
Early Systems (GAT)
Georgetown Automatic Translation
– one of the earliest MT projects
• development began in 1952, in use 1964-1979
– physics texts from Russian to English
– replacement of words
– no real linguistic theory
• ”The spirit is willing, but the flesh is weak” translated to
Russian and then back to English. The result: ”The wine
is agreeable, but the meat has spoiled”
University of Jyväskylä
SB Program
Early Systems (CETA)
Centre d’Etudes pour la Traduction Automatique
– launched in 1961 in Grenoble
– in use 1967-71
• approximately 400,000 words translated
– Russian to French
– sentence based analysis
– Interlingua and transfer mixed
• grammatical level vs. dictionary level
– Realization: Interlingua approach not a good one
University of Jyväskylä
SB Program
Early Systems (SYSTRAN)
one of the first systems marketed
installed in 1970 (US Air Force Foreign Technology Division)
used also at NASA and EURATOM
semantic features ad hoc
negative feedback at first
post-editing found to be a good approach
– GM of Canada claimed the system speeded up the work of human
translators three to four times (3000-4000 words a day, approximately the
same a human translator now translates with the help of translation
workbenches)
University of Jyväskylä
SB Program
Early Systems (TAUM-METEO)
TAUM-METEO was the first truly automatic MT system
developed in 1960’s
used by Canadian Meteorological Center
– scanned network for English weather reports and translated them to
French
corrected its own errors without post-editors
– forwarded offending content to human translators
24,000 words/day
problems
– communication noise
– misspellings
– words missing from the dictionary
specialised language made translations possible
University of Jyväskylä
SB Program
Outline
Introduction
Translation and Machine Translation Techniques
The Early Machine Translation Systems
Problems of Machine Translation
Proposed Solutions to the Problems
Summary
University of Jyväskylä
SB Program
Problems
Translation is not straightforward
– it is not replacing words for words
– word orders
– rewriting of text into another language
– choosing the right words
– e.g. imperative mood in English infinitive in French
University of Jyväskylä
SB Program
Problems (2)
Automation of translation not easy
– quality is poor
– homographs
• ”fan” a ventilator or an enthusiast
• different word classes
– e.g. ”love” both a verb and a noun
– ”you” can be both singular and plural
– idioms
• e.g. ”country music” meaning type of music
– personal pronouns
• second person pronouns may vary in familiar and formal situations
– also post-editing can take more time than translating from a scratch
University of Jyväskylä
SB Program
Problems (3)
Morphological analysis
– e.g. Chinese and Japanese do not use punctuations
• sentences are not separated by anything
Syntactic analysis
– modifiers a problem
• ”The boy saw a girl with a telescope”
– the girl had a telescope vs. the boy used a telescope to see a girl
Analysis of context
– 20-40 words in a sentence
• 100 million possible translations
There are always going to be problem cases
University of Jyväskylä
SB Program
Outline
Introduction
Translation and Machine Translation Techniques
The Early Machine Translation Systems
Problems of Machine Translation
Proposed Solutions to the Problems
Summary
University of Jyväskylä
SB Program
AI-Based Approach
Raman & Alwar 1990
Conversations carried out across enquiry counters on railway stations
in India
System should understand a text before translating it
– analysis of text to understand the meaning and storing it in a language-free
semantic map
– semantic maps used to generate translations
Analyzer analyses one sentence at a time
– unnecessary adjectives not taken into account
– morphological analysis first
– building of semantic map second
– stages work concurrently
– large dictionary needed
University of Jyväskylä
SB Program
AI-Based Approach (2)
Natural language generator builds a sentence in target
language
– analyzer’s result fed into the generator
– translate everything vs. leave something out
– definition of structure
• words in right order and inflected correctly
– minimal importance to style
Successful in specific application and a restricted set of
sentences
University of Jyväskylä
SB Program
Interactive Approach
Sen, Zhaoxiong and Heyan 1997
Knowledge of MT systems incomplete -> incorrect translations
Possibility for an MT system to learn
– quality should improve
Interaction starts when a sentence is found that the system cannot
analyse properly
– message to the user
– user responds with a coded message
• updates systems knowledge base
– interaction limited to three stages
• lexical analysis
• uncertain modifiers
• multiple translations
University of Jyväskylä
SB Program
Multiple Translation Engines & Sentence
Partitioning
Ren, Shi and Kuroiwa 2000
Multiple MT systems running in parallel
– all use different MT techniques
– controller coordinates translating
– each engine translates a sentence indepedently
– controller chooses the best translation
• no proper translations leads to sentence partitioning
• process starts from beginning
• in the end the partitioned sentence is put back together
University of Jyväskylä
SB Program
Multiple Translation Engines & Sentence
Partitioning (2)
Parallel processing should improve success rate
– correct translation preserved through procedures
– combining the best translations should improve quality
Morphological analysis
– analysis gives results that are used as inpupts for the engines
– engines are then ran on parallel
– if more than one result amount of engines increase
– if no results sentence is partitioned
• problem of partitioning a sentence e.g. Chinese & Japanese
In a test situation with four engines the results improved dramatically
– consumed time doubled
– 1 MT system translated 45.6 % of sentences correctly
with multiple engines the result was 74.2 % (Japanese to Chinese)
University of Jyväskylä
SB Program
Outline
Introduction
Translation and Machine Translation Techniques
The Early Machine Translation Systems
Problems of Machine Translation
Proposed Solutions to the Problems
Summary
University of Jyväskylä
SB Program
Summary
Definite solution is still to be found
Biggest problems of MT are linguistic
– it is very hard to cover all the rules and adjust them to all
possible languages and variations
– misspellings cause problems which means a very good
proof-reading function is needed
There is a long way to go before MT systems replace human
translators
Machine Translation can be used in applications where the
language is very specific
University of Jyväskylä
Get documents about "