Learning Center
Plans & pricing Sign in
Sign Out

Grammatical Machine Translation


									Grammatical Machine
Stefan Riezler & John Maxwell

1.   Introduction
2.   Extracting F-Structure Snippets
3.   Parsing-Transfer-Generation
4.   Statistical Models and Training
5.   Experimental Evaluation
6.   Discussion
 Section 1:
Recent approaches to SMT use
• Phrase-based SMT
• Syntactic knowledge

Phrase-base SMT is great for
• Local ordering
• Short idiomatic expressions

But not so good for
• Learning LDDs
• Generalising to unseen phrases that share non-overt
  linguistic info
            Statistical Parsers
Statistical Parsers can provide information to
• Resolve LDDs
• Generalise to unseen phrases that share non-overt
  linguistic info

• Xia & McCord 2004
• Collins et al. 2005

• Lin 2004
• Ding & Palmer 2005
• Quirk et al. 2005
     Grammar-based Generation
      Could grammar-based generation be useful for MT?

Quirk et al. 2005
• Simple statistical model outperforms grammar-base generator of
  Menezes & Richardson 2001 on BLEU score

Charniak et al. 2003
• Parsing-based language modelling can improve grammaticality of
  translations while not improving BLEU score

  Perhaps BLEU score is not sufficient way to test for grammaticality.
                 Further investigation needed
  Grammatical Machine Translation
   Investigate incorporating a grammar-based generator into a
   dependency-based SMT system

The authors present:
• A dependency-based SMT model
• Statistical components that are modelled on phrase-based system
   of Koehn et al. 2003

Also used:
• Component weights adjusted using MER training (Och 2003)
• Grammar-based generator
• N-gram and distortion models
      Section 2:
Extracting F-Structure
Extracting F-Structure Snippets
  SL and TL sentences of bilingual corpus parsed using
  LFG grammars

  For each English and German f-structure pair
  • The two f-structures that most preserve dependencies
     are selected
  • Many-to-many word alignments used to create many-to-
     many correspondences between the substructures
  • Correspondences are the basis for deciding what goes
     into the basic transfer rule
               Extracting F-Structure
Dafur bin ich zutiefst dankbar      I have a deep appreciation for that
<for that> <am> <I> <deepest> <thankful>

Many-to-many bidirectional word alignment:
   Transfer Rule Extraction: Example
From the aligned words we get the following substructure correspondences:
 Transfer Rule Extraction: Example
From the correspondences two kinds of transfer rules are
1. Primitive Transfer Rules
2. Complex Transfer Rules

Transfer Contiguity Constraint
1. Source and target f-structures are each connected.
2. F-structures in the transfer source can only be aligned
   with f-structures in the transfer target and vice versa.
Transfer Rule Extraction: Example
                    Primitive Rule 1:
    pred( X1, sein)         pred( X1, have)
    subj( X1, X2)       subj( X1, X2)
    xcomp( X1, X3)          obj( X1, X3)
Transfer Rule Extraction: Example
                 Primitive Rule 2:
       pred( X1, ich)  pred( X1, I)
Transfer Rule Extraction: Example
                     Primitive Rule 3:
    pred( X1, dafur)         pred( X1, for)
                 obj( X1, X2)
                        pred( X2, that)
Transfer Rule Extraction: Example
                      Primitive Rule 4:
  pred( X1, dankbar)      pred( X1, appreciation)
  adj( X1, X2)     spec( X1, X2)
  in_set( X3, X2)         pred( X2, a)
  pred(X3, zutiefst)      adj( X1, X3)
                  in_set( X4, X3)
                  pred( X4, deep)
  Transfer Rule Extraction: Example
 Complex Transfer Rules
 •  primitive transfer rules that are adjacent in f-structure
    combined to form more complex rules
 Example (rules 1 & 2 above):
        pred( X1, sein)                  pred( X1, have)
        subj( X1, X2)                   subj( X1, X2)
        pred( X2, ich)                   pred( X2, I)
        xcomp( X1, X3)                   obj( X1, X3)

In the worst case, there can be an exponential number of combinations of
primitive transfer rules, the number of primitive rules used to form a
complex rule is restricted to 3 – causing the no. of transfer rules taken to
be O(n2) in the worst case.
         Section 3:
• LFG grammars used to parse source and
  target text

• FRAGMENT grammar is used to augment
  standard grammar increasing robustness
• Correct parse determined by fewest chunk
• Rules applied to source f-structure non-
  deterministically and in parallel
• Each fact of German f-structure translated
  by exactly one transfer rule
• Default rule included that allows any fact
  to be translated as itself
• Chart used to encode translations
• Beam search decoding used to select the
  most probable translations
Method of generation has to be fault tolerant
• Transfer system can be given a fragmentary
  parse as input
• Transfer system can output an non-valid f-
• Unknown predicates
  – Default morphology used to inflect source stem for
• Unknown structures
  – Default grammar used that allows any attribute to be
    generated in any order with any category
          Section 4:
Statistical Models & Training
        Statistical Components
Modelled on statistical components of Pharaoh

Paraoh integrates 8 statistical models
1. Relative frequency of phrase translations in source-to-
2. Relative frequency of phrase translations in target-to-
3. Lexical weighting in source-to-target
4. Lexical weighting in target-to-source
5. Phrase count
6. Language model probability
7. Word count
8. Distortion probability
        Statistical Components
Following statistics for each translation:
1. Log-probability of source-to-target transfer rules,
     where the probability r(e|f) of a rule that transfers
     source snippet f into target snippet e is estimated by
     the relative frequency

 2. Log-probability of target-to-source rules
        Statistical Components
3. Log-probability of lexical translations from
     source to target snippets, estimated from Viterbi
     alignments â between source word positions i = 1, …,
     n and target word positions j = 1, …, m for stems fi
     and ej in snippets f and e with relative word
     translation frequencies t(ej|fi)

4. Log-probability of lexical translations from target-to-
source snippets
        Statistical Components
5. Number of transfer rule
6. Number of transfer rules with frequency 1
7. Number of default transfer rules
8. Log-probability of strings of predicates from root to
   frontier of target f-structure, estimated from predicate
   trigrams of English
9. Number of predicates in target language
10. Number of constituent movements during generation
   based on the original order of the head predicates of the
   constituents (for example, AP[2] BP[3] CP[1] counts as
   two movements since the head predicate of CP moved
   from first to third position)
          Statistical Components
11. Number of generation repairs
12. Log-probability of target string as computed by trigram language
13. Number of words in target string

• 1 – 10 are used to choose the most probable parse from the transfer
• 1 – 7 are are tests on source and target f-structure snippets related
  via transfer rules
• 8 -10 are language model and distortion features on the target c-
  and f-structures
• 11 – 13 are computed on the strings that are generated from the
  target f-structure

   The statistics are combined into a log-linear model whose
   parameters are adjusted by minimum error rate training.
      Section 5:
        Experimental Evaluation
• Europarl German to English
• Sents of length 5 – 15 words

Training set:     163,141 sents
Development set: 1,967 sents
Test set:     1,755 sents (same as Koehn et al 2003)

• Bidirectional word alignment created from word alignment of IBM
  model 4 as implemented by Giza++ (Och et al. 1999)
• Grammars achieve 100% coverage on unseen data
   – 80% as full parses
   – 20% as fragment parses
• 700,000 transfer rules extracted
• For language modelling trigram model of Stolcke 2002 is used
      Experimental Evaluation
For translating the test set
• 1 parse for each German sentence was used
• 10 transferred f-structures
• 1,000 generated strings for each transferred f-

• Most probable target f-structure is gotten by a
  beam search on the transfer chart using features
  1-10 above, with a beam size of 20.
• Features 11-13 are computed on the strings that
  are generated
         Experimental Evaluation
• Promising results for examples that are in-coverage of LFG
  grammars 
• However, back-off to robustness techniques for parsing and
  generation results in loss of translation quality 

Rule Extraction Problems
• 20% of the parses are fragmental
• Errors occur in rule extraction process resulting in ill-formed transfer

Parsing-Transfer-Generation Problems
• Parsing errors  errors in transfer  generation errors
• In-coverage  disambiguation errors in parsing and transfer 
  suboptimal translation
     Experimental Evaluation
• Despite use of minimum error rate training
  and n-gram language models, the system
  cannot be used to maximize n-gram
  scores on reference translations in the
  same way as phrase-based systems since
  statistical ordering models are employed in
  the framework after generation
• This gives preference to grammaticality
  over similarity to reference translations
• SMT model that marries phrase-based SMT with
  traditional grammar-based MT
• NIST measure showed that results achieved are
  comparable with phrase-based SMT system of
  Koehn et al 2003 for in-coverage examples
• Manual evaluation showed significant
  improvements in both grammaticality and
  translational adequacy for in-coverage examples
• Determinable with this system whether or not a
  source sentence is in-coverage
• Possibility for hybrid system that achieves
  improved grammaticality at state-of-the-art
  translation quality

Future Work:
• Improvement of translation of in-coverage
  source sentences e.g. stochastic generation
• Apply system to other language pairs and data
Miriam Butt, Dyvik Helge, Tracy King, Hiroshi Masuichi and Christian Rohrer. 2002 The Parallel
Grammar Project.
Eugene Charniak, Kevin Knight and Kenji Yamada. 2003 Syntax-based Language Models for
Statistical Machine Translation.
Michael Collins, PhilippKoehn and Ivona Kucerova. 2005 Clause Restructuring for Statistical Machine
Philipp Koehn, Franz Och and Daniel Marcu. 2003 Statistical Phrase-based Translation.
Philipp Koehn. 2004 Pharaoh: a beam search decoder for phrase-based statistical machine
Arul Menezes and Stephen Richardson. 2001 A best-first alignment for automatic extraction of
transfer mappings from bilingual corpora.
Franz Och, Christoph Tillmann and Ney Hermann. 1999 Improved Alignment Models for Statistical
Machine Translation.
Franz Och. 2003 Minimum error rate training in statistical machine translation.
Kishore Papineni, Salim Roukos, Todd Ward and Wei-Jing Zhu. 2002 BLEU: a method for automatic
evaluation of machine translation.
Stefan Riezler, Tracy King, Ronald Kaplan, Richard Crouch, John Maxwell and Mark Johnson. 2002
Parsing the Wall Street Journal using LFG and Discriminative Estimation Techniques
Stefan Riezler and John Maxwell. 2006 Grammatical Machine Translation.
Fei Xia and Michael McCord. 2004 Improving a statistical MT system with automatically learned
rewrite patterns

To top