Enrich Hierarchical Phrase-based MT with Automatic Syntactic by xiaopangnv

VIEWS: 1 PAGES: 26

									 Soft Syntactic Constraint for
Statistical Machine Translation

                            Zhongqiang Huang
                 Spring Intern, S2S Translation
                         University of Maryland


     Joint work with Martin Cmejrek and Bowen zhou
Overview

   Enrich hierarchical phrase-based translation models with
    syntactic constraint


   Soft constraint: represent syntax by feature vectors, not symbols



   Consistent improvement observed on English-to-German and
    English-to-Chinese translation
  Hierarchical Phrase-based MT
  (Chiang ACL05)


 Source:           give     the   pen        to       me          .


 Target:             把       钢笔         给         我         。


Phrase pairs:                               Hierarchical rules:

      give the pen || 钢笔 给                          X → give X1 || X1 给
                give || 给                    X → give X1 to me || X1 给 我

           the pen || 钢笔                      X → give X1 to X2 || X1 给 X2
give the pen to me || 钢笔 给 我            X → X1 the pen to me || 钢笔 X1 我
                   …                                       …
Problem
   Formal syntax, not linguistically inspired
       X can be anything

                   give the pen to me || 钢笔 给 我


                  X → give X1 to me || X1 给 我

                           ?
               … give a talk and come back to me …
       Previous work (Zollmann & Venugopal, NAACL06)
                               S
       VBP+NP
          or
        VP/PP VP

                NP                  PP                  VP → give NP1 to me || NP1 给 我
                                                                 OK
                                                                  X
VBP       DT         NN        TO        PRP       .      NP → the pen || 书本
                                                       DT+NN → the book||钢笔
give      the        pen       to        me        .


  把        钢笔              给        我          。
Previous work (cont.)
   Tree-to-String
              VP

      VBP     NP1        PP            把 NP1 给 我
       give        TO         PRP
                    to        me
   String-to-Tree
                                           VP

              把 NP1 给 我             VBP    NP1        PP
                                    give        TO         PRP
                                                 to        me
Previous work (cont.)
(Marton & Resnik, ACL08)



                                     S

                               VP

                              NP          PP
                 VBP    DT     NN    TO    PRP   .
                 give   the    pen   to    me    .
                              pen to ||
               NP cross the pen || ….. …..
           NP match


                     No syntax on rules themselves!
Previous work (cont.)
(Zhou et al., SSST-2)

   Add purity of syntax as a feature



                  X → give X1 to me || X1 给 我


                          Use as Prior
         Soft constraint on rules, but not between rules
This Work
   Based on hierarchical phrase-based models
   Encode a sequence of treebank tags by a
    feature vector
   Encode the syntax of each nonterminal
    (parent X, X1, X2) of a rule by a feature vector
   Use soft syntactic constraint
 Encode Syntax in Feature
 Vector
                F(X) F(X1) F(X2)

                    X → a X1 b X2 c || …
                                                                   S

                                                         VP

                                                        NP              PP
 VP        F(VP) = [0.11 0.99 0.0 0.0]     VBP    DT     NN        TO    PRP       .
 NP          F(NP)=[0.0 0.0 1.0 0.0]       give   the    pen       to    me        .

VBP NP   F(VBP NP)=[0.28 0.97 0.14 0.0]
                                            把      钢笔          给        我      。
  NP            F(NP)
             dot-product
 DT NN        F(DT NN)
Soft Constraint

   Between source-side of rules
              X → give   X1   to me || X1 给 我

                         NP     [0.0 0.0 1.0 0.0]
                                  dot-product = 1 = 0.33
                                   dot-product = 0.97
                                    dot-product
                    NP DT NN RB[0.0 0.0 0.97 0.24]
                         NP
                       CC VB     [0.0 0.67 0.33 0.0]
                                [0.67 0.0 1.0 0.0]

                   → and pen 书本
                X X → thecome back
            X → a talk the book ||||钢笔 || 报告并回来
          … give a talk and come back to me …
Soft Constraint

   Between source-side of rules
                 X → give      X1   to me || X1 给 我

          F(X)         F(X1)
   Between target-side of rules
          F’(X)                                F’(X1)

                 X → give      X1   to me || X1 给 我

   Between both sides of rules
          F’(X)                                F’(X1)

                 X → give      X1   to me || X1 给 我

          F(X)         F(X1)
Soft Constraint
   Between source-side parse tree and source-side of
    a rule
                                                     S

                                             VP

                                            NP            PP
                               VBP    DT     NN      TO    PRP      .
                               give   the    pen     to    me       .

                                            VP
                                            NP         [0.0 0.0 1.0 0.0]
                                                      [0.0 0.0 1.0 0.0]
                                                        dot-product = 1 1
                                                         dot-product =
                                            VP
                                            NP         [0.0 0.0 1.0 0.0]
                                                      [0.0 0.0 1.0 0.0]

                               X → give     X1    to me || X1 给 我
… give the nice book to me …
   Soft Constraint
      Between source-side parse tree and source-side of
       a rule
                                                         S

                                                 VP

                                                NP            PP
                                  VBP     DT     NN      TO    PRP     .
                                  give    the    pen     to    me      .

                                                NP
                                                VP     [0.0 0.0 1.0 0.0]
                                                      [0.0 0.0 1.0 0.0]
                                                         dot-product
                                                        dot-product == 1 0.33
                                             VP       [0.0 0.67 0.33 0.0]
                                         NP CC VB RB [0.67 0.0 1.0 0.0]

                                 X → give       X1    to me ||做 X1 我
… give a talk and come back to me …
At what cost?
   Worse case: for each non-terminal in a rule
       Storage: a feature vector
       Computation: dot product of two feature vectors
   Best case:
       Storage: a pointer to a tag sequence
       Computation: a lookup of pre-computed dot-
        products
What Feature Vector?
   Learn a set of linguistically guided sub
    categories {X1, X2, X3, …}
   For each tag sequence TS, compute its
    distribution of Xi’s:
    D(TS) = [P(X1), P(X2), P(X3), …]
   Convert distribution to feature vector:
    F(TS) = D(TS) / ||D(TS)||
Inducing Sub Categories
   Cannot use treebank categories
       Phrase pairs do not align well with syntactic
        constituents
   Make use of the hierarchy of phrases from
    alignment
   Learn their interdependencies
   Use parsing as guidance
  Phrase Hierarchy Based on
  Alignment               X S

                                                    X
                                                            X    NP                   X
                        S                    X          B       B      B         X        X
              VP                            VBP     DT          NN    TO        PRP       .
                                            give    the         pen    to        me        .
             NP              PP
VBP    DT     NN        TO    PRP       .                          X        S
give   the    pen       to     me       .                       X VP
                                                    X
                                                              X NP
  把     钢笔          给        我      。
                                              X         B        B     B         X        X
                                             VBP        DT      NN    TO        PRP        .
                                             Give       the     pen    to        me            .
Model Parameters
                                                                            ROOT

                  X         S                                              X          S
               X VP                                                     X VP
      X                                    Binarization        X

              X NP                                                     X NP               B PP

 X        B     B       B       X      X                  X        B      B       B            X   X
VBP   DT       NN       TO      PRP    .              VBP      DT       NN       TO        PRP     .

                                                                   ROOT
              Tree Likelihood
                     =                                            X         S
                P(ROOT → X)                                    X VP
              x P(X → S)
                                                     X
              x P(X → X X)
                                                              X NP              B PP
                    …
                                                X         B    B        B         X        X
                    Learning with ML
                                              VBP     DT       NN      TO        PRP       .
    Sub Categories
   X and B are coarse representations             EM Algorithm
    of aligned phrases and unaligned                      Sum over all possible
    phrases, cannot distinguish syntax                     annotations
   Split X and B into subcategories:                     Inside-Outside
       X → {X1, X2, X3, …}
       B → {B1, B2, B3, …}

                                               ROOT

                                                 X1     S
                                          X2    VP
                                   X1
                                         X3 NP             B1 PP
                              X1    B2     B1         B2       X1     X3
                          VBP       DT    NN        TO        PRP      .
Feature Vector of
Tag Sequences
   For each tag sequence TS, compute
    its distribution of X sub categories             VBP NP
    D(TS) = [p(X1), p(X2), p(X3), …]                 P(X1)=0.1
                                                     P(X2)=0.1             ROOT
       Tally the posterior probabilities of the X   P(X3)=0.8
        sub categories for each instance of the
        tag sequence, then normalize                 P(X4)=0.0               X       S
                                                                          X VP

   Convert distribution to feature vector:                      X
    F(TS) = D(TS) / ||D(TS)||                                            X NP            B PP
                                                          X          B    B      B         X    X
                                                        VBP      DT       NN     TO       PRP   .
What tag sequences are
similar, dissimilar?
                Most Similar to      Most dissimilar to

                  DT NN NN                TO
                 NP NP NML              WHADJP
                 DT JJ NN NN             WHNP
DT JJ NN        EDITED DT NN       VB JJ NN NN ADVP
           NP TO AUX VB ADVP NML    VP CC RB VB NP
                    DT NP           VP RB WHNP PP
               NP EDITED JJ NN       VP CONJP SQ


                 NP ADVP VP         VP EDITED CC VB
             JJ JJ NN CC NP VP        MD VP CC JJ
                 NN S NP VP        MD ADVP VB PP DT
 NP VP         NN EDITED VP        VB EDITED IN IN DT
                    UH VP          MD VB RB IN NP CC
                 JJ NN S VP           MD VB NN DT
              NN CONJP VP CC        VB NP CC RB DT
                 S ADVP VP            VB RB NP CC
Experiment
   Baseline
       ForSyn: Hierarchical phrase-based MT
           P(γ|α), P(α|γ), Pw(γ|α), Pw(α|γ), rule frequencies
           PLM(e), word counts
           Rule counts, glue rule counts, nonterminal counts
       GIZA++ for alignment
       MERT
   Soft syntactic constraint
       Between rules and source-side parsing
       Use 16 automatic sub-categories
Data
   English-to-German
       Europarl
       ~300k parallel bitext (~4.5M words)
       1k dev, 1k test
       1 reference
   English-to-Chinese
       Travel domain
       ~500k parallel bitext (~3M words)
       ~1.3k dev, ~1.3k test
       2 references
Result
               English-to-German
                       dev           test
     ForSyn           16.41         16.26
                      17.01         17.06
     +Syntax
                      (+0.6)        (+0.8)


               English-to-Chinese
                       dev           test
     ForSyn           46.47         45.45
                       47.39         45.86
     +Syntax
                      (+0.92)       (+0.41)
Future Work
   Revisit the soft constraint between
    rules
       Earlier investigation was not successful
   Better feature vector
       Not trained to model similarity or
        differentiate between different tag
        sequences

                  Questions & Comments?
                         Thanks!

								
To top