Lecture-25-Statistical-Machine-Translation by xiangpeng


									An Overview
of Machine

    A Presentation by:
    Mahsa Mohaghegh

     Introduction
     A brief introduction to Translation technology
     Interest in MT
     Problems Involved in Machine Translation
    Translation Technology
     Knowledge-based systems
     Statistical machine translation systems
     Rule-Based vs. Statistical MT
     Current State of Machine Translation in Use
    Personal Speech-to-Speech Translators

                                     Machine Translation

•These factors have increased both the
demand for translation services and interest in
computerized translation technology.

•Some industry observers say machine
translation, a largely experimental technology
that has been around since the late 1950s, is
now ready to become commercially viable.

                                           Machine Translation

                               The sub-domain of artificial
                               intelligence concerned with the
                               task of developing programs
                               possessing some capability of
              NLP                            ’ a natural language
                               in order to achieve some specific

A transformation from one
representation (the input text) to                Understanding
another (internal representation)

                                Machine Translation

Machine Translation :

The use of computers to
translate from one language
to another.

One of the oldest dreams of
NLP, AI, and CS
(first system in 1954).

                              Machine Translation

    Why Machine Translation?

    •Cheap, universal access to
    world’s online information
    regardless of original language.

    (That’s the goal)

                               Machine Translation
     Interest in MT

                                  Interest in MT

    Commercial interest                                              Academic interest

       U.S. has invested in MT   challenging problems in      Requires knowledge from
                                      NLP research              many NLP sub-areas             transferring resources from one
                                                                                                     language to another

     MT is popular on the web                  lexical semantics                 parsing

      EU spends more than $1                    morphological analysis           statistical
        billion on translation                                                   modeling


                                                           Machine Translation
        Problems Involved in Machine Translation

    are the main problems faced by MT systems.

    A classic example is illustrated in the following pair of

    Time flies like an arrow.

    Fruit flies like an apple.
                                                    Machine Translation
    How can a machine understand these differences?

       Get   the cat with the gloves.

                                  Machine Translation

      Introduction
      A brief introduction to Translation technology
      Interest in MT
      Problems Involved in Machine Translation
     Translation Technology
      Knowledge-based systems
      Statistical machine translation systems
      Rule-Based vs. Statistical MT
      Current State of Machine Translation in Use
     Personal Speech-to-Speech Translators

                                      Machine Translation

            •There are two kinds of machine translation:

                                                 •Knowledge-based systems

                                                •Statistical machine translation
          •Knowledge-based systems

     Traditional translation technology takes a knowledge-based
     These expert systems—used by vendors such as Fujitsu, Logos,
     and Systran—translate documents by converting words and
     grammar directly from one language into another.

                                                  Machine Translation
      Knowledge-based systems

How they work.
                                                                          Hmm, every time he sees
Knowledge based systems rely on                                           “banco”, he either types
programmers to enter various                                              “bank” or “bench” … but if
languages’ vocabulary and syntax      Man, this is so boring.             he sees “banco de…”,
                                                                          he always types “bank”,
information into data bases.                                              never “bench”…

The programmers then write lists
of rules that describe the possible
relationships between a
language’s parts of speech.

The software, which can run on a
high-powered PC, analyzes a
document and examines the rules
                                                                Translated documents
for both the text’s language and
the target language to translate
material.                                        Machine Translation              12
       Statistical machine translation systems

       Statistical machine translation Rather than using the knowledge based system’s direct
                                       word-by-word translation techniques, statistical
                                       approaches translate documents by statistically analyzing
                                       entire phrases and, over time, ―learning‖ how various
                                       languages work.

     How it works. Statistical systems
     start with minimal dictionary and language
     resources. Users then must train the system
     before they can work with it on extensive
     During the training, researchers feed the system
     documents for which they already have
     accurate human translations.
     The system then uses its resources to guess at
      the documents’ meanings.

                                                 Machine Translation
     Statistical machine translation systems

 Statistical systems generally work by dividing
 documents into N-grams, with N the number of
 words, usually three, in a phrase. N-grams are
 statistical translation’s building blocks.

 Analyzing N-grams helps improve translation
 accuracy and performance because, while a
 word by itself may have many definitions, it has
 far fewer potential meanings when used as part
 of a phrase.

                                           Machine Translation
       Statistical machine translation systems


        Books in         Same books,
        English          in Farsi                                  P(F|E) model

     Statistical machine translation (SMT) can be defined as the process of maximizing
     the probability of a sentence s in the source language matching a sentence t in
     the target language. We call collections stored in two languages parallel corpora
     or parallel texts.
                                             Machine Translation
     Statistical machine translation systems

 Statistical machine translation systems, which statistically
 analyze entire phrases and ―learn‖ how various languages
 work, frequently work with other types of systems to improve
 output quality.

 The lexicon system provides translated words and their

 The alignment system assures that phrases from the source
 language are converted to the proper phrases and
 presented in the proper order in the target language.

 The language system performs a morphological analysis of
 individual words or a syntactic analysis of sentences and
 thereby produces translations that read properly.

                                               Machine Translation
     Rule-Based vs. Statistical MT

    Rule-based MT:
        very labour intensive, time-consuming, and expensive
        Rules can be based on lexical or structural transfer
        Each program must be customized for each language-pair it works with.

        Pro: firm grip on complex translation phenomena
        Con: time-consuming, and expensive,Often very labor-intensive -> lack of

    Statistical MT
        Mainly word or phrase-based translations
        Translation are learned from actual data
        In general, in statistical machine translation, if more data will be provided for
         learning; higher will be the quality of translation.

        Pro: Translations are learned automatically
        Con: Difficult to model complex translation phenomena
                                                Machine Translation
     Current State of Machine Translation in Use

     Google Translate is a service provided by Google
     Inc. to translate a section of text, or a webpage, into
     another language, with limits to the number of
     paragraphs, or range of technical terms, translated.
     For some languages, users are asked for alternative
     translations, such as for technical terms, to be
     included for future updates to the translation
     process. Google translate is based on an approach
     called statistical machine translation.

                                                 Machine Translation
     Current State of Machine Translation in Use cont.

             SYSTRAN's methodology is a sentence by sentence approach,
             concentrating on individual words and their dictionary data, then on the
             parse of the sentence unit, followed by the translation of the parsed

            AltaVista’s Babel fish
            Babel Fish is a web-based application developed by AltaVista (now
            part of Yahoo!) which automatically translates text or web pages
            from one of several languages into another. The translation
            technology for Babel Fish is provided by SYSTRAN, whose technology
            also powers a number of other sites and portals.

                                         Machine Translation
     Current State of Machine Translation in Use cont.

            is a Los Angeles, California–based company that was founded in 2002 by the
            University of Southern California's Kevin Knight and Daniel Marcu, to
            commercialize a statistical approach to automatic |language translation and
            natural language processing - now known globally as statistical machine
            translation software (SMTS)
            Language Weaver’s statistically-based translation software is an instance of a
            recent advance in automated translation.

               is a service provided by Microsoft as part of its Windows Live
               services which allow users to translate texts or entire web
               pages into different languages. Computer-related texts are
               translated by Microsoft's own statistical machine translation
               technology for eight supported languages

                                         Machine Translation
Personal Speech-to-Speech Translators

•One of the newest research areas in machine translation is the personal speech to-speech
translator. People on business or personal trips could use these devices to translate on the fly.
Speech-to-speech translation, which is still in the experimental
stage, is a complex process requiring speech-recognition
technology that converts speech to text, machine translation of the text, and then text-to-speech
•IBM is working on the handheld multilingual automatic speech-to-speech translator (Mastor),
which uses a hybrid statistical/knowledge-base engine to translate the content. Mastor tries to
determine the general meaning of a phrase, rather than its exact translation. This approach
requires less database capacity, which makes it more suitable for small devices.

                                                Machine Translation

     •Because of ongoing demand
     for better translation systems,
     research money will continue
     to flow into the field. In
     addition, companies are likely
     to develop and release more
     commercial products.

                                  Machine Translation
     Questions ?

                   Machine Translation

To top