Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau Overview Language Alignment System Datasets Sentence-aligned sets for training (ex. The Hansards Corpus, European Parliamentary Proceedings Parallel Corpus) A word-aligned set for testing and evaluation to measure accuracy and precision Decoding Language Alignment Goal: Produce a word-aligned set from a sentence-aligned dataset First step on the road toward Statistical Machine Translation Example Problem: The motion to adjourn the House is now deemed to have been adopted. La motion portant que la Chambre s'ajourne maintenant est réputée adoptée. IBM Models 1 and 2 -Kevin Knight, A Statistical MT Tutorial Workbook, 1999 Each capable of being used to produce a word-aligned dataset separately. EM Algorithm Model 1 produces T-values based on normalized fractional counting of corresponding words. Additionally, Model 2 uses A-values for “reverse distortion probabilities” – probabilities based on the positions of the words Training Data European Parliament Proceedings Parallel Corpus 1996-2003 Aligned Languages: English - French English - Dutch English - Italian English - Finish English - Portuguese English - Spanish English - Greek Training Data cont. Eliminated Misaligned sentences Sentences with 50 or more words XML tags Symbols and numerical characters other then commas and periods Ideally… http://www.cs.berkeley.edu/~klein/cs294-5 Bypassing Interlingua: Models I-III Variables contributing to the probability of a sentence: Correlation between words in the source/target languages Fertility of a word Correlation between order of words in source sentence and order of words in target A Translation Matrix Rob Cat is Dog Rob 1 0 0 0 Gato 0 1 0 0 es 0 0 .5 0 esta 0 0 .5 0 Perro 0 0 0 1 Building the Translation Matrix: Starting from alignments Find the sentence alignment If a word in the source aligns with a word in the target, then increment the translation matrix. Normalize the translation matrix Can’t find alignments Most sentences in the hansards corpus are 60 words long. There are many that can be over 100. 100100 possible alignments Counting Rob is a boy. Rob es nino. Rob is tall. Rob es alto. Eric is tall. Eric es alto. … … Base counts on co-occurrence, weighting based on sentence length. Iterative Convergence Use Estimation Rob Is Tall boy Maximization algorithm Rob .66 .33 .25 .25 Creates translation matrix es .30 .66 .25 .25 alto .2 .05 .5 0 nino .2 .05 0 .5 Distorting the Sentence Word order changes between languages How is a sentence with 2 words distorted? How is a sentence with 3 words distorted? How is a sentence with … To keep track of this information we use… A tesseract! (A quadruply nested default dictionary) This could be a problem if there are more than 100 words in a sentence. 100x100x100x100 = too big for RAM and takes too much time Broad Look at MT “The translation process can be described simply as: 1. Decoding the meaning of the source text, and 2. Re-encoding this meaning in the target language.” - “Translation Process”, Wikipedia, May 2006 Decoding How to go from the T-matrix and A-matrix to a word alignment? There are several approaches… Viterbi If only doing alignment, much smaller memory and time requirements. Returns optimal path. T-Matrix probabilities function as the “emission” matrix A-Matrix probabilities concerned with the positioning of words Decoding as a Translator Without supplying a translated sentence to the program, it is capable of being a stand-alone translator instead of a word aligner. However, while the Viterbi algorithm runs quickly with pruning for decoding, for translating the run time skyrockets. Greedy Hill Climbing Knight & Koehn, What’s New in Statistical Machine Translation, 2003 Best first search 2-step look ahead to avoid getting stuck in most probable local maxima Beam Search Knight & Koehn, What’s New in Statistical Machine Translation, 2003 Optimization of Best First Search with heuristics and “beam” of choices Exponential tradeoff when increasing the “beam” width Other Decoding Methods Knight & Koehn, What’s New in Statistical Machine Translation, 2003 Finite State Transducer Mapping between languages based on a finite automaton Parsing String to Tree Model Problem: One to Many Necessary to take all alignments over a certain probability in order to capture the “probability that e has fertility at least a given value” Al-Onaizan, Curin, Jahr, etc., Statistical Machine Translation, 1999 Results Study done in 2003 on word alignment error rates in Hansards corpus: Model 2 – 29.3% on 8K training sentence pairs 19.5% on 1.47M training sentence pairs Optimized Model 6 – 20.3% on 8K training sentence pairs 8.7% on 1.47M training sentence pairs Och and Ney, A Systematic Comparison of Various Statistical Alignment Models, 2003 Expected Accuracy 70% overall Language performance: Dutch French • Italian, Spanish, Portuguese Greek Finish Possible Future Work Given more time, we would’ve implemented IBM Model 3 Additionally uses n, p, and d fertilities for weighted alignments: N, number of words produced by one word D, distortion P, parameter involving words that aren’t involved directly Invokes Model 2 for scoring Another Possible Translation Scheme Example-Based Machine Translation Translation-by-Analogy Can sometimes achieve better than the “gist” translations from other models Why Is Improving Machine Translation Necessary? A Chinese to English Translation The End Are there any questions/comments?
Pages to are hidden for
"Machine-Translation"Please download to view full document