Direct Translation

Document Sample
Direct Translation Powered By Docstoc
					             Direct translation
• no complete intermediary sentence structure
• translation proceeds in a number of steps, each
  step dedicated to a specific task
• the most important component is the bilingual
• typically general language
• problems with
  – ambiguity
  – inflection
  – word order and other structural shifts
          Simplistic approach
• sentence splitting
• tokenisation
• handling capital letters
• dictionary look-up and lexical substitution incl.
  some heuristics for handling ambiguities
• copying unknown words, digits, signs of
  punctuation etc.
• formal editing
    Advanced classical approach
                (Tucker 1987)
•   Source text dictionary look-up and
    morphological analysis
•   Identification of homographs
•   Identification of compound nouns
•   Identification of nouns and verb phrases
•   Processing of idioms
    Advanced approach, cont.
•   processing of prepositions
•   subject-predicate identification
•   syntactic ambiguity identification
•   synthesis and morphological processing
    of target text
•   rearrangement of words and phrases in
    target text
     Feasibility of the direct
      translation strategy

Is it possible to carry out the direct
translation steps as suggested by Tucker
with sufficient precision without relying on
a complete sentence structure?
   Assignment 1: manual direct
Sv. Ytterst handlar kampen för sysselsättning om att hålla
  samman Sverige.
En. Ultimately, the fight for full employment concerns the
  cohesion of Swedish society.
  (from Statement of Government Policy 1996)

• Define an algorithm and a dictionary (based on
  Norstedts) for simplistic translation of the
• Present the model and the result.
          Assignment 1, cont.
• Improve the result stepwise in accordance with
  the advanced direct translation strategy
  – Specify each step carefully and demonstrate its effect
    on the translation.
• Evaluate and discuss the final result.
• Translate the ex. using Systran
  ( and
  discuss the differences in an evaluative way
• Report the assignment and up-load on the web
        Current trends in direct
• re-use of translations
   – translation memories of sentences and sub-sentence
     units such as words, phrases and larger units
   – lexicalistic translation
   – example-based translation
   – statistical translation

Will re-use of translations overcome the problems with the
  direct translation approach that were discussed above?

If so, how can they be handled?
• System Translation
• developed in the US by Peter Toma
• first version 1969 (Ru-En)
• EC bought the rights of Systran in 1976
• currently 18 language pairs
• demo version sv-en in 2003
              Systran, cont.
• more than 1,600,000 dictionary units
• 20 domain dictionaries
• daily use by EC translators, administrators
  of the European institutions
• originally a direct translation strategy
  – see H&S
• today more of a transfer-based strategy
   Ex. 1: fairly good translation
           /Systran sv-en
• "Enskilda företagare som inte bildat bolag
  klassificeras hit."

• "Individual entrepreneurs that have not formed
  companies are classified here.”

• Systemet har känt igen bildat som en
  perfektform och översätter tempusformen
  korrekt have formed med negationen not på rätt
    Ex. 2: word order problem/
           Systran sv-en
• "När byarna kontaktades hade de inte ens
  utsatts för influensa."

• "When the villages were contacted had
  they not even been exposed to flu.”

• Systemet har inte hittat subjekt och
  predikat och ger därför fel ordföljd.
    Ex. 3: ambiguity problem/
          Systran sv-en
• "Vad kan vi lära av Arrawetestammen?"

• "What can we faith of the Arawete?”

• Systemet hittar inte sambandet mellan kan
  och lära och ser därför inte att lära är ett
     Ex. 4: ambiguity problem/
           Systran sv-en
• ”Extrapoleringen går till så här. "

• ”The extrapolation goes to so here.”

• Systemet känner inte till partikelverbet
  känna till och översätter därför felaktigt ord
  för ord.
  Systran Linguistic Resources
• Dictionaries
  – POS Definitions
  – Inflection Tables
  – Decomposition Tables
  – Segmentation Dictionaries
• Disambiguation Rules
• Analysis Rules
       Systran Processing Steps
• Analysis
   –   Lookup
   –   Compound Decomposition
   –   Disambiguation
   –   Syntactic Analysis
   –   Compound Expansion
• Sentence Transfer
   –   Initial Target Structure
   –   Lookup
   –   Default Transfer of Attributes
   –   Structure Transformation
Systran Processing Steps (cont)
• Sentence Synthesis
  – Structure Transformation
  – Inflection lookup
  – Surface Transformation

Shared By: