Machine-Translation by lanyuehua

VIEWS: 4 PAGES: 31

									Machine Translation

   A Presentation by:
    Julie Conlonova,
          Rob Chase,
  and Eric Pomerleau
Overview

  Language Alignment System
  Datasets
    Sentence-aligned sets for training (ex. The
     Hansards Corpus, European Parliamentary
     Proceedings Parallel Corpus)
    A word-aligned set for testing and evaluation
     to measure accuracy and precision
  Decoding
Language Alignment

  Goal: Produce a word-aligned set from
   a sentence-aligned dataset
  First step on the road toward Statistical
   Machine Translation
  Example Problem:
    The motion to adjourn the House is now
     deemed to have been adopted.
    La motion portant que la Chambre s'ajourne
     maintenant est réputée adoptée.
IBM Models 1 and 2
-Kevin Knight, A Statistical MT Tutorial Workbook, 1999




            Each capable of being used to produce a
             word-aligned dataset separately.
            EM Algorithm
            Model 1 produces T-values based on
             normalized fractional counting of
             corresponding words.
            Additionally, Model 2 uses A-values for
             “reverse distortion probabilities” –
             probabilities based on the positions of the
             words
Training Data
 European Parliament Proceedings Parallel
  Corpus 1996-2003
 Aligned Languages:
  English - French
  English - Dutch
  English - Italian
  English - Finish
  English - Portuguese
  English - Spanish
  English - Greek
Training Data cont.

Eliminated
  Misaligned sentences
  Sentences with 50 or more words
  XML tags
  Symbols and numerical characters other then
   commas and periods
Ideally…




           http://www.cs.berkeley.edu/~klein/cs294-5
 Bypassing Interlingua: Models I-III

Variables contributing to the probability
 of a sentence:
 Correlation between words in the
   source/target languages
 Fertility of a word
 Correlation between order of words in
   source sentence and order of words
   in target
A Translation Matrix
        Rob    Cat     is   Dog

Rob     1      0       0    0

Gato    0      1       0    0

es      0      0       .5   0

esta    0      0       .5   0

Perro   0      0       0    1
Building the Translation Matrix: Starting
from alignments


Find the sentence alignment
If a word in the source aligns with a word
 in the target, then increment the
 translation matrix.
Normalize the translation matrix
Can’t find alignments




Most sentences in the hansards corpus
 are 60 words long. There are many that
 can be over 100.
100100 possible alignments
Counting

Rob is a boy.     Rob es nino.
Rob is tall.      Rob es alto.
Eric is tall.     Eric es alto.
   …                    …
Base counts on co-occurrence, weighting
 based on sentence length.
Iterative Convergence
 Use Estimation          Rob Is     Tall   boy
  Maximization
  algorithm           Rob .66 .33    .25    .25
 Creates translation
  matrix              es  .30 .66    .25    .25

                   alto   .2   .05   .5     0

                   nino .2     .05   0      .5
Distorting the Sentence

Word order changes between languages
How is a sentence with 2 words distorted?
How is a sentence with 3 words distorted?
How is a sentence with       …

To keep track of this information we use…
A tesseract!

(A quadruply nested default
 dictionary)
This could be a problem if there
 are more than 100 words in a
 sentence.
100x100x100x100 = too big for
 RAM and takes too much time
Broad Look at MT

 “The translation process can be
  described simply as:
  1. Decoding the meaning of the source text, and
  2. Re-encoding this meaning in the target
     language.”
  - “Translation Process”, Wikipedia, May 2006
Decoding

How to go from the T-matrix and A-matrix
 to a word alignment?




There are several approaches…
Viterbi

   If only doing alignment, much smaller
    memory and time requirements.
   Returns optimal path.

   T-Matrix probabilities function as the
    “emission” matrix
   A-Matrix probabilities concerned with
    the positioning of words
Decoding as a Translator

   Without supplying a translated sentence
    to the program, it is capable of being a
    stand-alone translator instead of a word
    aligner.

   However, while the Viterbi algorithm runs
    quickly with pruning for decoding, for
    translating the run time skyrockets.
Greedy Hill Climbing
Knight & Koehn, What’s New in Statistical Machine Translation, 2003




Best first search
2-step look ahead to avoid getting stuck in
 most probable local maxima
Beam Search
Knight & Koehn, What’s New in Statistical Machine Translation, 2003




Optimization of Best First Search with
 heuristics and “beam” of choices
Exponential tradeoff when increasing the
 “beam” width
Other Decoding Methods
Knight & Koehn, What’s New in Statistical Machine Translation, 2003




Finite State Transducer
      Mapping between languages based on a finite
       automaton
Parsing
      String to Tree Model
Problem: One to Many

Necessary to take all alignments over a
 certain probability in order to capture the
 “probability that e has fertility at least a
 given value”




  Al-Onaizan, Curin, Jahr, etc., Statistical Machine Translation, 1999
Results

   Study done in 2003 on word alignment
    error rates in Hansards corpus:
       Model 2 –
           29.3% on 8K training sentence pairs
           19.5% on 1.47M training sentence pairs
       Optimized Model 6 –
           20.3% on 8K training sentence pairs
           8.7% on 1.47M training sentence pairs
   Och and Ney, A Systematic Comparison of Various Statistical Alignment
     Models, 2003
Expected Accuracy


           70%                         overall
   Language performance:
      Dutch
         French
            • Italian, Spanish, Portuguese
                  Greek
                       Finish
Possible Future Work

    Given more time, we would’ve implemented IBM
     Model 3
    Additionally uses n, p, and d fertilities for weighted
     alignments:
       N, number of words produced by one word
       D, distortion
       P, parameter involving words that aren’t involved directly
    Invokes Model 2 for scoring
Another Possible Translation Scheme

Example-Based Machine Translation
  Translation-by-Analogy
  Can sometimes achieve better than the “gist”
   translations from other models
Why Is Improving Machine
  Translation Necessary?
A Chinese to English Translation
         The End
        Are there any
questions/comments?

								
To top