Document Sample
Samenvatting_Jeroen Powered By Docstoc
					2e samenvatting Caput (track C)                                                  Jeroen Latour (0021202)

Learning Rules to Improve a Machine Translation System
This article takes a different approach to machine translation than most articles in its field. Instead of
building a translation system from the ground up, this article proposes an algorithm that can be used to
improve the result of an existing machine translation system, in particular those that operate at the
word level. Word level MT systems translate by going through the sentence and looking each word up
in their lexicon. Although these systems often provide bidirectional translation, the lexicon used is
often built separately. The algorithm described in this article generates both context-independent and
context-dependent rules that will compensate for this discrepancy.

Basically, the algorithm works as follows:
   1. Find mistakes: find a word in the sentence that is translated incorrectly
   2. Find corrections: find the correct translation for the incorrectly translated word
   3. Learn correction rules: generate a correction function that produces the correct translation
        given the input word, the incorrect translation and the context

The first two steps can be considered data generation steps. In the implementation described, they are
performed by letting the translation system translate an entire word list. The algorithm works under the
assumption that words that are copied unchanged to the translated sentence could not be translated
by the system. Using f(w) to signify the result of translating word w from L1 to L2 and f’(w) for the
reverse translation, the following cases can be identified:
    1. w = f’(f(w)) ≠ f(w): the word w is translated to a different string, f(w), and translated back to the
         original word. In general, this means that the translation is correct.
    2. w = f(w) ≠ f’(f(w)): the word w could apparently not be translated to L2, but could be translated
         back to L1. This happens when w is a word in both languages, which the translator f is unable
         to translate. This only tells us that f’(f(w)) should be translated to f(w), which may or may not
         be already known.
    3. w ≠ f(w) = f’(f(w)): the word w could be translated to L2, but apparently not back to L1. This can
         occur in two situations:
              a. the translator f’ is unable to translate f(w) to L1. However, we now know that f(w)
                  should be translated to w, which we can add as a rule.
              b. f’(f(w)) is actually the correct translation for f(w). This tells us that f(w) is an ambiguous
                  word that can be translated as either w or f’(f(w)).
    4. w = f(w) = f’(f(w)): the word w is left unchanged. This again can occur in two situations:
              a. f(w) is actually the correct translation for w, and vice versa. No information is gained.
              b. the system is unable to translate w in either direction. This means that the translation
                  will be incorrect, but we are unable to improve it.
    5. w ≠ f(w) ≠ f’(f(w)) ≠ w: none of the words are the same. This happens when:
              a. w is a synonym for f’(f(w)). In the first case, both w and f’(f(w)) are appropriate
                  translations for f(w). These alternatives can be disambiguated using context.
              b. there is at least one error in the translation. No information can be extracted.
Finally, we can extract that if f(w) ≠ f’(f(w)), then f’(f(w)) is apparently a word in L1, and can be added to
the wordlist.

The information extracted during the data generation steps can be used to generate rules. If there is
only one possible translation, or only one of the alternatives occurs in a text (and at least k times, in
other words k-dominant), then a context-independent rule can be produced for this translation.
Otherwise, likelihood ratio tests will have to be used to produce the context (which is currently stored
as a bag of words) in which each of the alternatives occurs, and a context-dependent rule added to
take in account the context while determining the correct translation.

One of the advantages of the described rule learning method is that it is robust to erroneous words in
the list, since these words do not cause a rule to be generated. This allows us to take all the words in
a corpus and add it to the word list, producing a word list containing more domain-specific words.

Experiments with this algorithm shown the generated context-independent rules to have a precision of
99% and the context-dependent rules have a precision of 79%. The precision can be further improved
using alternate rule representations and alternate collocation techniques.
2e samenvatting Caput (track C)                                       Jeroen Latour (0021202)

Analysis of argumentation
The argumentation I found was mostly in the interpretation of the various cases, as described
above and in section 3 of the article. My main problem is with the assumption given at the
bottom of page 3:
       (...), we will assume that equality implies that the system could not translate the word.

This assumption does not take in account the possibility of a word having the same meaning
in both languages, which is certainly possible with languages that are closely related.
Throughout the discussion of the various cases, they do sometimes mention this possibility,
but not consistently. Moreover, when making this assumption, they did not explain why they
are ignoring this possibility, and why it will not affect their results.

As for the further analysis of the various cases, I have no comment about the validity of these
conclusions. Finally, the article leaves it up to the user to draw a conclusion from the test
results, and does not evaluate them. However, the methods used to evaluate the algorithm
are justified.

Shared By: