Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

MT-1

VIEWS: 2 PAGES: 61

									                       CS60057
   Speech &Natural Language
          Processing

                       Autumn 2007

                        Lecture 14b

                       24 August 2007



Lecture 1, 7/21/2005     Natural Language Processing   1
      LING 180 SYMBSYS 138
   Intro to Computer Speech and
        Language Processing

          Lecture 9: Machine Translation (I)
                 November 7, 2006
                    Dan Jurafsky
                       Thanks to Bonnie Dorr for some of these slides!!

Lecture 1, 7/21/2005                     Natural Language Processing      2
Outline for MT Week

p   Intro and a little history
p   Language Similarities and Divergences
p   Three classic MT Approaches
     n Transfer
     n Interlingua
     n Direct
p   Modern Statistical MT
p   Evaluation



Lecture 1, 7/21/2005   Natural Language Processing   3
What is MT?

p   Translating a text from one language to another
    automatically.




Lecture 1, 7/21/2005    Natural Language Processing   4
Machine Translation

p   dai yu zi zai chuang shang gan nian bao chai you ting
    jian chuang wai zhu shao xiang ye zhe shang, yu sheng
    xi li, qing han tou mu, bu jue you di xia lei lai.
p   Dai-yu alone on bed top think-of-with-gratitude Bao-chai again listen
    to window outside bamboo tip plantain leaf of on-top rain sound sigh
    drop clear cold penetrate curtain not feeling again fall down tears
    come
p   As she lay there alone, Dai-yu’s thoughts turned to Bao-chai… Then
    she listened to the insistent rustle of the rain on the bamboos and
    plantains outside her window. The coldness penetrated the curtains
    of her bed. Almost without noticing it she had begun to cry.




Lecture 1, 7/21/2005        Natural Language Processing                 5
Machine Translation




Lecture 1, 7/21/2005   Natural Language Processing   6
Machine Translation
p   The Story of the Stone
p   =The Dream of the Red Chamber (Cao Xueqin 1792)
p   Issues:
     n Word segmentation
     n Sentence segmentation: 4 English sentences to 1 Chinese
     n Grammatical differences
             p   Chinese rarely marks tense:
                   § As, turned to, had begun,
                   § tou -> penetrated
             p   Zero anaphora
             p   No articles
      n   Stylistic and cultural differences
             p   Bamboo tip plaintain leaf -> bamboos and plantains
             p   Ma ‘curtain’ -> curtains of her bed
             p   Rain sound sigh drop -> insistent rustle of the rain


Lecture 1, 7/21/2005                   Natural Language Processing      7
Not just literature

p   Hansards: Canadian parliamentary proceeedings




Lecture 1, 7/21/2005   Natural Language Processing   8
What is MT not good for?

p   Really hard stuff
     n Literature
     n Natural spoken speech (meetings, court reporting)


p   Really important stuff
     n Medical translation in hospitals, 911




Lecture 1, 7/21/2005     Natural Language Processing       9
What is MT good for?

p   Tasks for which a rough translation is fine
     n Web pages, email
p   Tasks for which MT can be post-edited
     n MT as first pass
     n “Computer-aided human translation
p   Tasks in sublanguage domains where high-quality MT is
    possible
     n FAHQT




Lecture 1, 7/21/2005   Natural Language Processing      10
Sublanguage domain
p   Weather forecasting
     n “Cloudy with a chance of showers today and Thursday”
     n “Low tonight 4”
p   Can be modeling completely enough to use raw MT output
p   Word classes and semantic features like MONTH, PLACE,
    DIRECTION, TIME POINT




Lecture 1, 7/21/2005      Natural Language Processing         11
MT History
p   1946 Booth and Weaver discuss MT at Rockefeller foundation in
    New York;
p   1947-48 idea of dictionary-based direct translation
p   1949 Weaver memorandum popularized idea
p   1952 all 18 MT researchers in world meet at MIT
p   1954 IBM/Georgetown Demo Russian-English MT
p   1955-65 lots of labs take up MT




Lecture 1, 7/21/2005       Natural Language Processing              12
History of MT: Pessimism
p   1959/1960: Bar-Hillel “Report on the state of MT in US and GB”
     n Argued FAHQT too hard (semantic ambiguity, etc)
     n Should work on semi-automatic instead of automatic
     n His argument
       Little John was looking for his toy box. Finally, he found it. The
       box was in the pen. John was very happy.
     n Only human knowledge let’s us know that ‘playpens’ are bigger
       than boxes, but ‘writing pens’ are smaller
     n His claim: we would have to encode all of human knowledge




Lecture 1, 7/21/2005         Natural Language Processing                    13
History of MT: Pessimism
p   The ALPAC report
     n Headed by John R. Pierce of Bell Labs
     n Conclusions:
             p   Supply of human translators exceeds demand
             p   All the Soviet literature is already being translated
             p   MT has been a failure: all current MT work had to be post-edited
             p   Sponsored evaluations which showed that intelligibility and
                 informativeness was worse than human translations
      n   Results:
             p   MT research suffered
                  § Funding loss
                  § Number of research labs declined
                  § Association for Machine Translation and Computational
                     Linguistics dropped MT from its name

Lecture 1, 7/21/2005                Natural Language Processing                     14
History of MT
p   1976 Meteo, weather forecasts from English to French
p   Systran (Babelfish) been used for 40 years
p   1970’s:
     n European focus in MT; mainly ignored in US
p   1980’s
     n ideas of using AI techniques in MT (KBMT, CMU)
p   1990’s
     n Commercial MT systems
     n Statistical MT
     n Speech-to-speech translation




Lecture 1, 7/21/2005       Natural Language Processing     15
Language Similarities and Divergences

p   Some aspects of human language are universal or near-
    universal, others diverge greatly.
p   Typology: the study of systematic cross-linguistic
    similarities and differences
p   What are the dimensions along with human languages
    vary?




Lecture 1, 7/21/2005   Natural Language Processing      16
Morphological Variation
p   Isolating languages
     n Cantonese, Vietnamese: each word generally has one
        morpheme
p   Vs. Polysynthetic languages
     n Siberian Yupik (`Eskimo’): single word may have very many
        morphemes
p   Agglutinative languages
     n Turkish: morphemes have clean boundaries
p   Vs. Fusion languages
     n Russian: single affix may have many morphemes




Lecture 1, 7/21/2005       Natural Language Processing             17
Syntactic Variation

p   SVO (Subject-Verb-Object) languages
     n English, German, French, Mandarin
p   SOV Languages
     n Japanese, Hindi




p    VSO languages
       n Irish, Classical Arabic
p SVO lgs generally prepositions: to Yuriko
p VSO lgs generally postpositions: Yuriko ni
Lecture 1, 7/21/2005        Natural Language Processing   18
Segmentation Variation

p   Not every writing system has word boundaries marked
     n Chinese, Japanese, Thai, Vietnamese
p   Some languages tend to have sentences that are quite
    long, closer to English paragraphs than sentences:
     n Modern Standard Arabic, Chinese




Lecture 1, 7/21/2005   Natural Language Processing         19
Inferential Load: cold vs. hot lgs

p   Some ‘cold’ languages require the hearer to do more
    “figuring out” of who the various actors in the various
    events are:
     n Japanese, Chinese,
p   Other ‘hot’ languages are pretty explicit about saying
    who did what to whom.
     n English




Lecture 1, 7/21/2005     Natural Language Processing          20
Inferential Load (2)
                                                     All noun phrases in
                                                     blue do not appear
                                                     in Chinese text …
                                                     But they are
                                                     needed
                                                     for a good
                                                     translation




Lecture 1, 7/21/2005   Natural Language Processing               21
Lexical Divergences

p   Word to phrases:
     n English “computer science” = French “informatique”
p   POS divergences
     n Eng. ‘she likes/VERB to sing’
     n Ger. Sie singt gerne/ADV
     n Eng ‘I’m hungry/ADJ
     n Sp. ‘tengo hambre/NOUN




Lecture 1, 7/21/2005   Natural Language Processing          22
  Lexical Divergences: Specificity
p   Grammatical constraints
     n English has gender on pronouns, Mandarin not.
             p   So translating “3rd person” from Chinese to English, need to figure
                 out gender of the person!
             p   Similarly from English “they” to French “ils/elles”
p   Semantic constraints
     n English `brother’
     n Mandarin ‘gege’ (older) versus ‘didi’ (younger)
     n English ‘wall’
     n German ‘Wand’ (inside) ‘Mauer’ (outside)
     n German ‘Berg’
     n English ‘hill’ or ‘mountain’




Lecture 1, 7/21/2005                Natural Language Processing                        23
 Lexical Divergence: many-to-many




Lecture 1, 7/21/2005   Natural Language Processing   24
Lexical Divergence: lexical gaps

p   Japanese: no word for privacy
p   English: no word for Cantonese ‘haauseun’ or Japanese
    ‘oyakoko’ (something like `filial piety’)

p   English ‘cow’ versus ‘beef’, Cantonese ‘ngau’




Lecture 1, 7/21/2005    Natural Language Processing     25
Event-to-argument divergences
p   English
     n The bottle floated out.
p   Spanish
     n La botella salió flotando.
     n The bottle exited floating
p   Verb-framed lg: mark direction of motion on verb
     n Spanish, French, Arabic, Hebrew, Japanese, Tamil, Polynesian,
       Mayan, Bantu familiies
p   Satellite-framed lg: mark direction of motion on satellite
     n Crawl out, float off, jump down, walk over to, run after
     n Rest of Indo-European, Hungarian, Finnish, Chinese




Lecture 1, 7/21/2005       Natural Language Processing                 26
Structural divergences

p   G: Wir treffen uns am Mittwoch
p   E: We’ll meet on Wednesday




Lecture 1, 7/21/2005   Natural Language Processing   27
Head Swapping
p   E: X swim across Y
p   S: X crucar Y nadando

p   E: I like to eat
p   G: Ich esse gern

p   E: I’d prefer vanilla
p   G: Mir wäre Vanille lieber




Lecture 1, 7/21/2005    Natural Language Processing   28
Thematic divergence

p   Y me gusto
p   I like Y

p   G: Mir fällt der Termin ein
p   E: I forget the date




Lecture 1, 7/21/2005     Natural Language Processing   29
Divergence counts from Bonnie Dorr
p   32% of sentences in UN Spanish/English Corpus (5K)

 Categorial            X tener hambre                      98%
                       Y have hunger

 Conflational          X dar puñaladas a Z                 83%
                       X stab Z

 Structural            X entrar en Y                       35%
                       X enter Y

 Head Swapping         X cruzar Y nadando                  8%
                       X swim across Y

 Thematic              X gustar a Y                        6%
                       Y likes X

Lecture 1, 7/21/2005         Natural Language Processing         30
MT on the web

p   Babelfish:
     n http://babelfish.altavista.com/
p   Google:
     n http://www.google.com/search?hl=en&lr=&client=safa
       ri&rls=en&q="1+taza+de+jugo"+%28zumo%29+de+n
       aranja+5+cucharadas+de+azucar+morena&btnG=Se
       arch




Lecture 1, 7/21/2005   Natural Language Processing      31
3 methods for MT

p   Direct
p   Transfer
p   Interlingua




Lecture 1, 7/21/2005   Natural Language Processing   32
Three MT Approaches:
Direct, Transfer, Interlingual




Lecture 1, 7/21/2005   Natural Language Processing   33
Direct Translation



   p    Proceed word-by-word through text
   p    Translating each word
   p    No intermediate structures except morphology
   p    Knowledge is in the form of
         n Huge bilingual dictionary
         n word-to-word translation information
   p    After word translation, can do simple reordering
         n Adjective ordering English -> French/Spanish
Lecture 1, 7/21/2005      Natural Language Processing      34
Direct MT Dictionary entry




Lecture 1, 7/21/2005   Natural Language Processing   35
Direct MT




Lecture 1, 7/21/2005   Natural Language Processing   36
Problems with direct MT

p   German




p   Chinese




Lecture 1, 7/21/2005   Natural Language Processing   37
The Transfer Model

p   Idea: apply contrastive knowledge, i.e., knowledge about
    the difference between two languages
p   Steps:
     n Analysis: Syntactically parse Source language
     n Transfer: Rules to turn this parse into parse for Target
       language
     n Generation: Generate Target sentence from parse
       tree




Lecture 1, 7/21/2005     Natural Language Processing          38
English to French
p   Generally
     n English: Adjective Noun
     n French: Noun Adjective
     n Note: not always true
             p   Route mauvaise ‘bad road, badly-paved road’
             p   Mauvaise route ‘wrong road’)
             p   But is a reasonable first approximation
      n   Rule:




Lecture 1, 7/21/2005               Natural Language Processing   39
Transfer rules




Lecture 1, 7/21/2005   Natural Language Processing   40
Lexical transfer

p   Transfer-based systems also need lexical transfer rules
p   Bilingual dictionary (like for direct MT)
p   English home:
p   German
     n nach Hause (going home)
     n Heim (home game)
     n Heimat (homeland, home country)
     n zu Hause (at home)
p   Can list “at home <-> zu Hause”
p   Or do Word Sense Disambiguation
Lecture 1, 7/21/2005    Natural Language Processing           41
Systran: combining direct and transfer

p    Analysis
       n Morphological analysis, POS tagging
       n Chunking of NPs, PPs, phrases
       n Shallow dependency parsing
p Transfer
       n Translation of idioms
       n Word sense disambiguation
       n Assigning prepositions based on governing verbs
p Synthesis
       n Apply rich bilingual dictionary
       n Deal with reordering
       n Morphological generation Processing
Lecture 1, 7/21/2005        Natural Language               42
Transfer: some problems

p   N2 sets of transfer rules!
p   Grammar and lexicon full of language-specific stuff
p   Hard to build, hard to maintain




Lecture 1, 7/21/2005    Natural Language Processing       43
 Interlingua
p   Intuition: Instead of lg-lg knowledge rules, use the
    meaning of the sentence to help
p   Steps:
     n 1) translate source sentence into meaning
       representation
     n 2) generate target sentence from meaning.




Lecture 1, 7/21/2005     Natural Language Processing       44
Interlingua for
Mary did not slap the green witch




Lecture 1, 7/21/2005   Natural Language Processing   45
Interlingua

p   Idea is that some of the MT work that we need to do is
    part of other NLP tasks
p   E.g., disambiguating E:book S:‘libro’ from E:book
    S:‘reservar’
p   So we could have concepts like BOOKVOLUME and
    RESERVE and solve this problem once for each
    language




Lecture 1, 7/21/2005    Natural Language Processing          46
Direct MT: pros and cons (Bonnie Dorr)
p   Pros
     n Fast
     n Simple
     n Cheap
     n No translation rules hidden in lexicon
p   Cons
     n Unreliable
     n Not powerful
     n Rule proliferation
     n Requires lots of context
     n Major restructuring after lexical substitution




Lecture 1, 7/21/2005         Natural Language Processing   47
Interlingual MT: pros and cons (B. Dorr)
 p   Pros
      n Avoids the N2 problem
      n Easier to write rules
 p   Cons:
      n Semantics is HARD
      n Useful information lost (paraphrase)




 Lecture 1, 7/21/2005    Natural Language Processing   48
The impossibility of translation

p   Hebrew “adonoi roi” for a culture without sheep or
    shepherds
     n Something fluent and understandable, but not faithful:
             p   “The Lord will look after me”
      n   Something faithful, but not fluent and nautral
             p   “The Lord is for me like somebody who looks after animals
                 with cotton-like hair”




Lecture 1, 7/21/2005               Natural Language Processing               49
What makes a good translation

p   Translators often talk about two factors we want to
    maximize:
p   Faithfulness or fidelity
     n How close is the meaning of the translation to the
       meaning of the original
     n (Even better: does the translation cause the reader to
       draw the same inferences as the original would have)
p   Fluency or naturalness
     n How natural the translation is, just considering its
       fluency in the target language

Lecture 1, 7/21/2005    Natural Language Processing         50
Statistical MT:
Faithfulness and Fluency formalized!
p   Best-translation of a source sentence S:



p   Developed by researchers who were originally in speech
    recognition at IBM
p   Called the IBM model




Lecture 1, 7/21/2005    Natural Language Processing      51
The IBM model

p   Hmm, those two factors might look familiar…



p   Yup, it’s Bayes rule:




Lecture 1, 7/21/2005        Natural Language Processing   52
  More formally

p   Assume we are translating from a foreign language
    sentence F to an English sentence E:
     n F = f1, f2, f3,…, fm
p   We want to find the best English sentence
     n E-hat = e1, e2, e3,…, en
     n E-hat = argmaxE P(E|F)
     n       = argmaxE P(F|E)P(E)/P(F)
     n       = argmaxE P(F|E)P(E)
           Translation Model       Language Model


Lecture 1, 7/21/2005   Natural Language Processing      53
The noisy channel model for MT




Lecture 1, 7/21/2005   Natural Language Processing   54
Fluency: P(T)
p   How to measure that this sentence
     n That car was almost crash onto me
p   is less fluent than this one:
     n That car almost hit me.
p   Answer: language models (N-grams!)
     n For example P(hit|almost) > P(was|almost)
p   But can use any other more sophisticated model of
    grammar
p   Advantage: this is monolingual knowledge!



Lecture 1, 7/21/2005   Natural Language Processing      55
Faithfulness: P(S|T)
p   French: ça me plait [that me pleases]
p   English:
     n that pleases me - most fluent
     n I like it
     n I’ll take that one
p   How to quantify this?
p   Intuition: degree to which words in one sentence are
    plausible translations of words in other sentence
     n Product of probabilities that each word in target
       sentence would generate each word in source
       sentence.

Lecture 1, 7/21/2005    Natural Language Processing        56
Faithfulness P(S|T)
p   Need to know, for every target language word,
    probability of it mapping to every source language word.
p   How do we learn these probabilities?
p   Parallel texts!
     n Lots of times we have two texts that are translations
       of each other
     n If we knew which word in Source Text mapped to
       each word in Target Text, we could just count!




Lecture 1, 7/21/2005    Natural Language Processing        57
Faithfulness P(S|T)

p   Sentence alignment:
     n Figuring out which source language sentence maps to
       which target language sentence
p   Word alignment
     n Figuring out which source language word maps to
       which target language word




Lecture 1, 7/21/2005   Natural Language Processing      58
Big Point about Faithfulness and
Fluency

p   Job of the faithfulness model P(S|T) is just to model
    “bag of words”; which words come from say English to
    Spanish.
p   P(S|T) doesn’t have to worry about internal facts about
    Spanish word order: that’s the job of P(T)
p   P(T) can do Bag generation: put the following words in
    order (from Kevin Knight)
     n have programming a seen never I language better
           -actual the hashing is since not collision-free usually
            the is less perfectly the of somewhat capacity table


Lecture 1, 7/21/2005          Natural Language Processing            59
P(T) and bag generation:
the answer

p   “Usually the actual capacity of the table is somewhat
    less, since the hashing is not collision-free”

p   How about:
     n loves Mary John




Lecture 1, 7/21/2005     Natural Language Processing        60
Summary

p   Intro and a little history
p   Language Similarities and Divergences
p   Three classic MT Approaches
     n Transfer
     n Interlingua
     n Direct
p   Modern Statistical MT
p   Evaluation



Lecture 1, 7/21/2005   Natural Language Processing   61

								
To top