Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Machine translation

VIEWS: 0 PAGES: 55

									Introduction to MT

     Ling 580
      Fei Xia
   Week 1: 1/03/06
                 Outline
• Course overview

• Introduction to MT
  – Major challenges
  – Major approaches
  – Evaluation of MT systems


• Overview of word-based SMT
Course overview
                      General info
• Course website:
   – Syllabus (incl. slides and papers): updated every week.
   – Message board
   – ESubmit

• Office hour: Fri: 10:30am-12:30pm.

• Prerequisites:
   – Ling570 and Ling571.
   – Programming: C or C++, Perl is a plus.
   – Introduction to probability and statistics
                 Expectations
• Reading:
  – Papers are online
  – Finish reading before class. Bring your questions to
    class.


• Grade:
  –   Leading discussion (1-2 papers): 50%
  –   Project: 40%
  –   Class participation: 10%
  –   No quizzes, exams
         Leading discussion
• Indicate your choice via EPost by Jan 8.
• You might want to read related papers.
• Make slides with PowerPoint.
• Email me your slides by 3:30am on the
  Monday before your presentation.
• Present the paper in class and lead the
  discussion: 40-50 minutes.
                  Project
• Details will be available soon.
• Project presentation: 3/7/06
• Final report: due on 3/12/06

• Pongo account will be ready soon.
Introduction to MT
        A brief history of MT
 (Based on work by John Hutchins)
• Before the computer: In the mid 1930s, a French-
  Armenian Georges Artsrouni and a Russian Petr
  Troyanskii applied for patents for ‘translating machines’.

• The pioneers (1947-1954): the first public MT demo was
  given in 1954 (by IBM and Georgetown University).

• The decade of optimism (1954-1966): ALPAC
  (Automatic Language Processing Advisory Committee)
  report in 1966: "there is no immediate or predictable
  prospect of useful machine translation."
   A brief history of MT (cont)
• The aftermath of the ALPAC report (1966-
  1980): a virtual end to MT research
• The 1980s: Interlingua, example-based
  MT
• The 1990s: Statistical MT
• The 2000s: Hybrid MT
          Where are we now?
• Huge potential/need due to the internet,
  globalization and international politics.

• Quick development time due to SMT, the
  availability of parallel data and computers.

• Translation is reasonable for language pairs with
  a large amount of resource.

• Start to include more “minor” languages.
            What is MT good for?
•   Rough translation: web data
•   Computer-aided human translation
•   Translation for limited domain
•   Cross-lingual IR

• Machine is better than human in:
    – Speed: much faster than humans
    – Memory: can easily memorize millions of word/phrase
      translations.
    – Manpower: machines are much cheaper than humans
    – Fast learner: it takes minutes or hours to build a new system.
      Erasable memory 
    – Never complain, never get tired, …
Major challenges in MT
           Translation is hard
•   Novels
•   Word play, jokes, puns, hidden messages
•   Concept gaps: go Greek, bei fen
•   Other constraints: lyrics, dubbing, poem,
    …
           Major challenges
• Getting the right words:
  – Choosing the correct root form
  – Getting the correct inflected form
  – Inserting “spontaneous” words

• Putting the words in the correct order:
  – Word order: SVO vs. SOV, …
  – Unique constructions:
  – Divergence
               Lexical choice
• Homonymy/Polysemy: bank, run

• Concept gap: no corresponding concepts in
  another language: go Greek, go Dutch, fen sui,
  lame duck, …

• Coding (Concept  lexeme mapping)
  differences:
  – More distinction in one language: e.g., kinship
    vocabulary.
  – Different division of conceptual space:
Choosing the appropriate inflection
• Inflection: gender, number, case, tense, …

• Ex:
  – Number: Ch-Eng: all the concrete nouns:
      ch_book  book, books
  – Gender: Eng-Fr: all the adjectives
  – Case: Eng-Korean: all the arguments
  – Tense: Ch-Eng: all the verbs:
     ch_buy  buy, bought, will buy
     Inserting spontaneous words
•   Function words:
     –   Determiners: Ch-Eng:
         ch_book  a book, the book, the books, books

     –   Prepositions: Ch-Eng:
         … ch_November  … in November

     –   Relative pronouns: Ch-Eng:
         … ch_buy ch_book de ch_person  the person who bought /book/

     –   Possessive pronouns: Ch-Eng:
         ch_he ch_raise ch_hand  He raised his hand(s)

     –   Conjunction: Eng-Ch:
         Although S1, S2  ch_although S1, ch_but S2

     –   …
Inserting spontaneous words (cont)
• Content words:
  – Dropped argument: Ch-Eng:
     ch_buy le ma  Has Subj bought Obj?

  – Chinese First name: Eng-Ch:
     Jiang …  ch_Jiang ch_Zemin …

  – Abbreviation, Acronyms: Ch-Eng:
     ch_12 ch_big  the 12th National Congress of the
    CPC (Communist Party of China)

  – …
            Major challenges
• Getting the right words:
  – Choosing the correct root form
  – Getting the correct inflected form
  – Inserting “spontaneous” words

• Putting the words in the correct order:
  – Word order: SVO vs. SOV, …
  – Unique construction:
  – Structural divergence
              Word order
• SVO, SOV, VSO, …
• VP + PP  PP VP
• VP + AdvP  AdvP + VP

• Adj + N  N + Adj
• NP + PP  PP NP
• NP + S  S NP

• P + NP  NP + P
      “Unique” Constructions
• Overt wh-movement: Eng-Ch:
  – Eng: Why do you think that he came yesterday?
  – Ch: you why think he yesterday come ASP?
  – Ch: you think he yesterday why come?

• Ba-construction: Ch-Eng
  – She ba homework finish ASP  She finished her
    homework.
  – He ba wall dig ASP CL hole  He digged a hole in
    the wall.
  – She ba orange peel ASP skin  She peeled the
    orange’s skin.
       Translation divergences
• Source and target parse trees
  (dependency trees) are not identical.

• Example: I like Mary  S: Marta me
  gusta a mi (‘Mary pleases me’)

• More discussion next time.
Major approaches
   How humans do translation?
• Learn a foreign language:              Training stage

  – Memorize word translations           Translation lexicon

  – Learn some patterns:                 Templates, transfer rules

  – Exercise:                            Reinforced learning?
     • Passive activity: read, listen    Reranking?
     • Active activity: write, speak
• Translation:                           Decoding stage
  – Understand the sentence              Parsing, semantics analysis?
  – Clarify or ask for help (optional)   Interactive MT?
  – Translate the sentence               Word-level? Phrase-level?
                                         Generate from meaning?
      What kinds of resources are
           available to MT?
• Translation lexicon:
   – Bilingual dictionary

• Templates, transfer rules:
   – Grammar books

• Parallel data, comparable data
• Thesaurus, WordNet, FrameNet, …

• NLP tools: tokenizer, morph analyzer, parser, …

 More resources for major languages, less for “minor”
 languages.
          Major approaches
•   Transfer-based
•   Interlingua
•   Example-based (EBMT)
•   Statistical MT (SMT)
•   Hybrid approach
       The MT triangle
                Meaning
               (interlingua)




              Transfer-based




          Phrase-based SMT, EBMT


          Word-based SMT, EBMT
word                               Word
                Transfer-based MT
•       Analysis, transfer, generation:
    1.    Parse the source sentence
    2.    Transform the parse tree with transfer rules
    3.    Translate source words
    4.    Get the target sentence from the tree


•       Resources required:
    –     Source parser
    –     A translation lexicon
    –     A set of transfer rules

•       An example: Mary bought a book yesterday.
      Transfer-based MT (cont)
• Parsing: linguistically motivated grammar or formal
  grammar?
• Transfer:
   – context-free rules? A path on a dependency tree?
   – Apply at most one rule at each level?
   – How are rules created?
• Translating words: word-to-word translation?
• Generation: using LM or other additional knowledge?

• How to create the needed resources automatically?
                 Interlingua
• For n languages, we need n(n-1) MT systems.
• Interlingua uses a language-independent
  representation.
• Conceptually, Interlingua is elegant: we only
  need n analyzers, and n generators.

• Resource needed:
  – A language-independent representation
  – Sophisticated analyzers
  – Sophisticated generators
            Interlingua (cont)
• Questions:
  – Does language-independent meaning representation
    really exist? If so, what does it look like?
  – It requires deep analysis: how to get such an
    analyzer: e.g., semantic analysis
  – It requires non-trivial generation: How is that done?
  – It forces disambiguation at various levels: lexical,
    syntactic, semantic, discourse levels.
  – It cannot take advantage of similarities between a
    particular language pair.
          Example-based MT
• Basic idea: translate a sentence by using the
  closest match in parallel data.
• First proposed by Nagao (1981).
• Ex:
  – Training data:
     • w1 w2 w3 w4  w1’ w2’ w3’ w4’
     • w5 w6 w7  w5’ w6’ w7’
     • w8 w9  w8’ w9’
  – Test sent:
     • w1 w2 w6 w7 w9  w1’ w2’ w6’ w7’ w9’
                  EMBT (cont)
• Types of EBMT:
  – Lexical (shallow)
  – Morphological / POS analysis
  – Parse-tree based (deep)


• Types of data required by EBMT systems:
  –   Parallel text
  –   Bilingual dictionary
  –   Thesaurus for computing semantic similarity
  –   Syntactic parser, dependency parser, etc.
                    EBMT (cont)
• Word alignment: using dictionary and heuristics
   exact match

• Generalization:
   – Clusters: dates, numbers, colors, shapes, etc.
   – Clusters can be built by hand or learned automatically.

• Ex:
   – Exact match: 12 players met in Paris last Tuesday 
                 12 Spieler trafen sich letzen Dienstag in Paris

   – Templates: $num players met in $city $time 
               $num Spieler trafen sich $time in $city
                    Statistical MT
• Basic idea: learn all the parameters from parallel data.

• Major types:
   – Word-based
   – Phrase-based

• Strengths:
   – Easy to build, and it requires no human knowledge
   – Good performance when a large amount of training data is
     available.

• Weaknesses:
   – How to express linguistic generalization?
             Comparison of resource requirement

                Transfer-   Interlingua   EBMT         SMT
                based
dictionary      +           +             +

Transfer        +
rules
parser          +           +             + (?)

semantic                    +
analyzer
parallel data                             +            +

others                      Universal      thesaurus
                            representation
                                 Hybrid MT
•   Basic idea: combine strengths of different approaches:
     –   Syntax-based: generalization at syntactic level
     –   Interlingua: conceptually elegant
     –   EBMT: memorizing translation of n-grams; generalization at various level.
     –   SMT: fully automatic; using LM; optimizing some objective functions.

•   Types of hybrid HT:
     – Borrowing concepts/methods:
          • SMT from EBMT: phrase-based SMT; Alignment templates
          • EBMT from SMT: automatically learned translation lexicon
          • Transfer-based from SMT: automatically learned translation lexicon, transfer rules;
            using LM
          • …

     – Using two MTs in a pipeline:
          • Using transfer-based MT as a preprocessor of SMT

     – Using multiple MTs in parallel, then adding a re-ranker.
Evaluation of MT
                       Evaluation
• Unlike many NLP tasks (e.g., tagging, chunking, parsing,
  IE, pronoun resolution), there is no single gold standard
  for MT.

• Human evaluation: accuracy, fluency, …
   – Problem: expensive, slow, subjective, non-reusable.


• Automatic measures:
   –   Edit distance
   –   Word error rate (WER), Position-independent WER (PER)
   –   Simple string accuracy (SSA), Generation string accuracy (GSA)
   –   BLEU
             Edit distance
• The Edit distance (a.k.a. Levenshtein
  distance) is defined as the minimal cost of
  transforming str1 into str2, using three
  operations (substitution, insertion,
  deletion).

• Use DP and the complexity is O(m*n).
            WER, PER, and SSA
• WER (word error rate) is edit distance, divided by |Ref|.
• PER (position-independent WER): same as WER but
  disregards word ordering
• SSA (Simple string accuracy) = 1 - WER

• Previous example:
   –   Sys: w1 w2 w3 w4
   –   Ref: w1 w3 w2
   –   Edit distance = 2
   –   WER=2/3
   –   PER=1/3
   –   SSA=1/3
 Generation string accuracy (GSA)

            Move  Ins  Del  Sub
  GSA  1 
                   | Re f |
Example:
  Ref: w1 w2 w3 w4
 Sys: w2 w3 w4 w1

Del=1, Ins=1  SSA=1/2
Move=1, Del=0, Ins=0  GSA=3/4
                          BLEU
• Proposal by Papineni et. al. (2002)
• Most widely used in MT community.
• BLEU is a weighted average of n-gram precision
  (pn) between system output and all references,
  multiplied by a brevity penalty (BP).
                    N
     BLEU  BP *  p n n
                     w

                   n 1

                                                  1
            BP * p1 * p 2 * ... p N
                   N                   ( when wn  )
                                                  N
              N-gram precision
• N-gram precision: the percent of n-grams in
  the system output that are correct.

• Clipping:
  –   Sys: the the the the the the
  –   Ref: the cat sat on the mat
  –   Unigram precision:
  –   Max_Ref_count: the max number of times a
      ngram occurs in any single reference translation.
      Countclip  min( count, Max _ Re f _ Count)
            N-gram precision

             Count
          SSys ngramS
                             clip   (ngram)
   pn 
              Count (ngram)
           SSys ngramS



i.e. the percent of n-grams in the system output
   that are correct (after clipping).
                    Brevity Penalty
• For each sent si in system output, find closest matching
  reference ri (in terms of length).

    Let   c   | s i |,   r   | ri |
                i               i



         1                if c  r
   BP   1r / c
        e                 otherwise

• Longer system output is already penalized by the n-gram
  precision measure.
                 An example
• Sys: The cat was on the mat
• Ref1: The cat sat on a mat
• Ref2: There was a cat on the mat

• Assuming N=3
• p1=5/6, p2=3/5, p3=1/4, BP=1  BLEU=0.50

• What if N=4?
                 Summary
• Course overview

• Major challenges in MT
  – Choose the right words (root form, inflection,
    spontaneous words)
  – Put them in right positions (word order, unique
    constructions, divergences)
              Summary (cont)
• Major approaches
  –   Transfer-based MT
  –   Interlingua
  –   Example-based MT
  –   Statistical MT
  –   Hybrid MT

• Evaluation of MT systems
  – Edit distance
  – WER, PER, SSA, GSA
  – BLEU
Additional slides
       Translation divergences
    (based on Bonnie Dorr’s work)
• Thematic divergence: I like Mary 
  S: Marta me gusta a mi (‘Mary pleases me’)

• Promotional divergence: John usually goes home 
  S: Juan suele ira casa (‘John tends to go home’)

• Demotional divergence: I like eating G: Ich esse gern
  (“I eat likingly)

• Structural divergence: John entered the house 
  S: Juan entro en la casa (‘John entered in the house’)
 Translation divergences (cont)
• Conflational divergence: I stabbed John 
  S: Yo le di punaladas a Juan (‘I gave knife-
  wounds to John’)

• Categorial divergence: I am hungry 
  G: Ich habe Hunger (‘I have hunger’)

• Lexical divergence: John broke into the room 
  S: Juan forzo la entrada al cuarto (‘John forced
  the entry to the room’)
       Calculating edit distance
• D(0, 0) = 0
• D(i, 0) = delCost * i
• D(0, j) = insCost * j

• D(i+1, j+1) =
   min( D(i,j) + sub,
         D(i+1, j) + insCost,
         D(i, j+1) + delCost)

 sub = 0           if str1[i+1]=str2[j+1]
     = subCost     otherwise
               An example
• Sys: w1 w2 w3 w4
• Ref: w1 w3 w2
• All three costs are 1.        w1   w3   w2


                            0   1    2    3
                       w1   1   0    1    2
                       w2   2   1    1    1
                       w3   3   2    1    2
                       w4   4   3    2    2
• Edit distance=2

								
To top