Some issues in Vietnamese Language Processing

Document Sample
Some issues in Vietnamese Language Processing Powered By Docstoc
					                            Outlines




Some issues in Vietnamese Language Processing

                   Vietnamese participants1

           1 Vietnam   National Unviversity, Hanoi, Vietnam


Workshop of Asean Applied NLP for Linguistics Diversity &
            Language Resource Development
                  Bangkok, 08/2006




             Vietnamese participants   Some issues in Vietnamese Language Processing   1/49
                                              Part I: Introduction to Vietnamese
                                   Outlines
                                              Part II: The NLP in Vietnam



Outline of Part I



  1   Vietnamese amongst other languages


  2   Specificities of Vietnamese
        Alphabet and tones
        Syllable
        Grammar
        Specificities




                    Vietnamese participants   Some issues in Vietnamese Language Processing   2/49
                                              Part I: Introduction to Vietnamese
                                   Outlines
                                              Part II: The NLP in Vietnam



Outline of Part I



  1   Vietnamese amongst other languages


  2   Specificities of Vietnamese
        Alphabet and tones
        Syllable
        Grammar
        Specificities




                    Vietnamese participants   Some issues in Vietnamese Language Processing   2/49
                                              Part I: Introduction to Vietnamese
                                   Outlines
                                              Part II: The NLP in Vietnam



Outline of Part II
  3   Activities in organization and experts
        Recent events
        Main NLP groups
        Research content
  4   Dictionary and Corpora
        Overview
        Some corpora
        Development of supporting tools
  5   Status of machine translation research
        Overview
        Rule-based approach
        BTL approach
        Statistical approach
  6   Conclusion
                    Vietnamese participants   Some issues in Vietnamese Language Processing   3/49
                                              Part I: Introduction to Vietnamese
                                   Outlines
                                              Part II: The NLP in Vietnam



Outline of Part II
  3   Activities in organization and experts
        Recent events
        Main NLP groups
        Research content
  4   Dictionary and Corpora
        Overview
        Some corpora
        Development of supporting tools
  5   Status of machine translation research
        Overview
        Rule-based approach
        BTL approach
        Statistical approach
  6   Conclusion
                    Vietnamese participants   Some issues in Vietnamese Language Processing   3/49
                                              Part I: Introduction to Vietnamese
                                   Outlines
                                              Part II: The NLP in Vietnam



Outline of Part II
  3   Activities in organization and experts
        Recent events
        Main NLP groups
        Research content
  4   Dictionary and Corpora
        Overview
        Some corpora
        Development of supporting tools
  5   Status of machine translation research
        Overview
        Rule-based approach
        BTL approach
        Statistical approach
  6   Conclusion
                    Vietnamese participants   Some issues in Vietnamese Language Processing   3/49
                                              Part I: Introduction to Vietnamese
                                   Outlines
                                              Part II: The NLP in Vietnam



Outline of Part II
  3   Activities in organization and experts
        Recent events
        Main NLP groups
        Research content
  4   Dictionary and Corpora
        Overview
        Some corpora
        Development of supporting tools
  5   Status of machine translation research
        Overview
        Rule-based approach
        BTL approach
        Statistical approach
  6   Conclusion
                    Vietnamese participants   Some issues in Vietnamese Language Processing   3/49
Vietnamese amongst other languages
         Specificities of Vietnamese




                                 Part I

         Introduction to Vietnamese




            Vietnamese participants    Some issues in Vietnamese Language Processing   4/49
             Vietnamese amongst other languages
                      Specificities of Vietnamese



Contents


  1   Vietnamese amongst other languages


  2   Specificities of Vietnamese
        Alphabet and tones
        Syllable
        Grammar
        Specificities




                         Vietnamese participants    Some issues in Vietnamese Language Processing   5/49
          Vietnamese amongst other languages
                   Specificities of Vietnamese



Vietnamese amongst other languages

     Viet-Muong group, Mon-Khmer branch, Austro-Asian family
     Characteristics : tonal & isolating (monosyllabic, uninflected)
     Historic hypothesis:
         Origin : Mon-Khmer, non-tonal
         Cultural exchanges : Thai, tonal
         Chinese influence : ideographic writing, vocabulary
         Colonial impact : latin writing, vocabulary, grammar




                      Vietnamese participants    Some issues in Vietnamese Language Processing   6/49
                                                    Alphabet and tones
             Vietnamese amongst other languages     Syllable
                      Specificities of Vietnamese   Grammar
                                                    Specificities


Contents


  1   Vietnamese amongst other languages


  2   Specificities of Vietnamese
        Alphabet and tones
        Syllable
        Grammar
        Specificities




                         Vietnamese participants    Some issues in Vietnamese Language Processing   7/49
                                                  Alphabet and tones
           Vietnamese amongst other languages     Syllable
                    Specificities of Vietnamese   Grammar
                                                  Specificities


Alphabet

     derived from latin alphabet
     23 consonants with 24 pronunciations interpreted by 19
     characters
     16 vowels (13 single vowels & 3 double vowels) with 17
     pronunciations interpreted by 12 characters
     horizontally placed, left to right, syllables separated by spaces
     phonetic transcription: the pronounciation of a word is
     independent of its meaning (like English, unlike Chinese) →
     like Thai




                       Vietnamese participants    Some issues in Vietnamese Language Processing   8/49
                                                    Alphabet and tones
             Vietnamese amongst other languages     Syllable
                      Specificities of Vietnamese   Grammar
                                                    Specificities


Tones

        Vietnamese is a tonal language : each word has a certain pitch
        characteristic with which it must be spoken to be properly
        understood (like Chinese, unlike English) → like Thai
        6 tones: mid, low, high, rising, tilde, and falling
                                 1 mid          a
                                 2 low          à
                                 3 high         ả
                                 4 rising á
                                 5 tilde        ã
                                 6 falling ạ
        each syllable has a default tone



                         Vietnamese participants    Some issues in Vietnamese Language Processing   9/49
                                                       Alphabet and tones
             Vietnamese amongst other languages        Syllable
                      Specificities of Vietnamese      Grammar
                                                       Specificities


Syllable

      syllable ≡ morpheme
      consisting of one or more consonants and a simple or double
      vowel
      components of a single syllable
                                                     Tone
                                                            Ryhme
           Onset                Glide /w/           Nucleous Coda
           (consonnant)                             (vowel)    (consonnant/semi-vowel)

  3 types of syllable:
      have a sense, used like word (lexeme)
      have a sense, constituent of polysyllabic words (especially
      sino-vietnamese words)
      have no sense

                         Vietnamese participants       Some issues in Vietnamese Language Processing   10/49
                                                      Alphabet and tones
            Vietnamese amongst other languages        Syllable
                     Specificities of Vietnamese      Grammar
                                                      Specificities


Word

       one or several syllables seperated by spaces
       morphologically invariable: words are not modified or
       conjugated for tenses, plural, genders, or subject-verb
       agreement → like Thai

 Examples
                          Vietnamese               English
                          công cụ                  tool
                          phức tạp                 complicated
                          đẹp                      beautiful, pretty




                        Vietnamese participants       Some issues in Vietnamese Language Processing   11/49
                                                  Alphabet and tones
           Vietnamese amongst other languages     Syllable
                    Specificities of Vietnamese   Grammar
                                                  Specificities


Sentence

     basic form : suject + predicate
     the order of words is important (S–V–0)
     flexible composition of sentences
     tenses, politeness, nominalization and other language
     phenomena are accomplished with the simple addition of
     various tool words




                       Vietnamese participants    Some issues in Vietnamese Language Processing   12/49
                                                Alphabet and tones
         Vietnamese amongst other languages     Syllable
                  Specificities of Vietnamese   Grammar
                                                Specificities


Phenomena

    reduplicated words, expressions
    difficulty of Vietnamese text tokenization




                     Vietnamese participants    Some issues in Vietnamese Language Processing   13/49
                                                   Alphabet and tones
            Vietnamese amongst other languages     Syllable
                     Specificities of Vietnamese   Grammar
                                                   Specificities


Categories of Vietnamese
  No.   Part-of-speech         Notation
  0.    Noun                      N
  1.    Verb                      V                   Phrases :
  2.    Adjective                 A                        nominal phrase : NP
  3.    Pronoun                   P                           verbal phrase : VP
  4.    Adverb                    R
  5.    Preposition               O                           adjectival phrase : AP
  6.    Conjunction               C                           prepositional phrase : OP
  7.    Determiner                D
                                                              sentence : S
  8.    Numeral                   M
  9.    Interjection              I
  10.   Particle                  T




                        Vietnamese participants    Some issues in Vietnamese Language Processing   14/49
                                                   Alphabet and tones
            Vietnamese amongst other languages     Syllable
                     Specificities of Vietnamese   Grammar
                                                   Specificities


Specificities

       a rich class of classifier words

  một bài thơ = a poem

       a complex system of pronouns

  nó, ông ấy, bà ấy, cô ấy, chị ấy, hắn, lão. . . = he/she/it
       the frequence of grammatical mutation
                 Word        Category                          Meaning
                 trên        adjective                         upper, above
                             adverb, preposition               upper, on, over
                             noun                              the superior
                 trong       adjective                         in, inside, internal
                             preposition, conjunction          within
                             noun                              the interior

                        Vietnamese participants    Some issues in Vietnamese Language Processing   15/49
                                                  Alphabet and tones
           Vietnamese amongst other languages     Syllable
                    Specificities of Vietnamese   Grammar
                                                  Specificities


An example of ambiguity

  Ông già đi nhanh quá!
      The man goes quickly!
      The man died quickly!
      The man gets old quickly!




                       Vietnamese participants    Some issues in Vietnamese Language Processing   16/49
                                                 Alphabet and tones
          Vietnamese amongst other languages     Syllable
                   Specificities of Vietnamese   Grammar
                                                 Specificities


Thank you for your attention!




                           Question and answer




                      Vietnamese participants    Some issues in Vietnamese Language Processing   17/49
 Activities in organization and experts
                Dictionary and Corpora
Status of machine translation research
                             Conclusion




                                   Part II

                  The NLP in Vietnam




             Vietnamese participants      Some issues in Vietnamese Language Processing   18/49
             Activities in organization and experts
                                                      Recent events
                            Dictionary and Corpora
                                                      Main NLP groups
            Status of machine translation research
                                                      Research content
                                         Conclusion


Contents
  3   Activities in organization and experts
        Recent events
        Main NLP groups
        Research content
  4   Dictionary and Corpora
        Overview
        Some corpora
        Development of supporting tools
  5   Status of machine translation research
        Overview
        Rule-based approach
        BTL approach
        Statistical approach
  6   Conclusion
                         Vietnamese participants      Some issues in Vietnamese Language Processing   19/49
               Activities in organization and experts
                                                        Recent events
                              Dictionary and Corpora
                                                        Main NLP groups
              Status of machine translation research
                                                        Research content
                                           Conclusion


Recent events

  Vietnamese Language and Speech Processing (VSLP) workshops:
         VLSP workshop, 29 March 2005, Hanoi
         VLSP workshop, 21 May 2005, Hanoi
         VLSP workshop, July 2005, Hanoi
         VLSP meeting, 21-25 Nov. 2005, JAIST1 , Japan




    1
        Japan Advanced Institute of Science and Technology
                           Vietnamese participants      Some issues in Vietnamese Language Processing   20/49
                Activities in organization and experts
                                                         Recent events
                               Dictionary and Corpora
                                                         Main NLP groups
               Status of machine translation research
                                                         Research content
                                            Conclusion


VLSP national project 2006-2010

         National project with participation of more 10 research groups
         (all active groups on VLSP)
         Leaders: Prof. Ho Tu Bao (JAIST) and Assoc. Prof. Luong
         Chi Mai (IOIT2 )
         Objectives:
           1   Build and develop several typical products for VLSP for public
               end-users
           2   Build and develop indispensable resources and tools for the
               VLSP development




    2
        Vietnam National Institute of Technology
                            Vietnamese participants      Some issues in Vietnamese Language Processing   21/49
           Activities in organization and experts
                                                    Recent events
                          Dictionary and Corpora
                                                    Main NLP groups
          Status of machine translation research
                                                    Research content
                                       Conclusion


Main NLP groups (1/3)

    No.   Group                               Experience
    1     National Center for                 Rule-based    approach       to   English-
          Technology Progress                 Vietnamese MT systems. These are the
                                              only MT commercial systems in Vietnam
                                              (EVTRAN3.0, VETRAN3.0)
    2     Univ. of Natural Sci-               Transfer-based MT using BTL (Bi-
          ences, VNU HCM                      text Transfer Learning) for English-
                                              Vietnamese MT system. Experience in
                                              doing dictionary, bilingual corpus
    3     HCM Univ. of Tech-                  Since 1989 with various trails. Statistical
          nology, VNU HCM                     approach to Vietnamese-English transla-
                                              tion (since 2002) and phrase-based ap-
                                              proach to English-Vietnamese translation
                                              and phrase extraction from Penn Tree-
                                              bank (since 2003)


                       Vietnamese participants      Some issues in Vietnamese Language Processing   22/49
           Activities in organization and experts
                                                    Recent events
                          Dictionary and Corpora
                                                    Main NLP groups
          Status of machine translation research
                                                    Research content
                                       Conclusion


Main NLP groups (2/3)

    No.   Group                               Experience
    4.    JAIST                               Previously: rule-based approach to
                                              English-Vietnamese MT system. The
                                              system is completed but still not pub-
                                              lished. Currently: focus on statistical
                                              MT, and improve the rule-based MT
                                              system using statistical techniques
    5.    Hanoi University of                 Text alignment, bilingual corpus, building
          Science, VNU HN                     tools: POS Tagging, Chunking, Parsing
    6.    College of Technology,              Previously: rule-based approach to
          VNU HN                              English-Vietnamese MT system. Cur-
                                              rently: focus on statistical MT, and
                                              improve the rule-based MT system using
                                              statistical techniques
    7.    Vietnam Lexicography                Experience in doing dictionary, corpora
          Center

                       Vietnamese participants      Some issues in Vietnamese Language Processing   23/49
           Activities in organization and experts
                                                    Recent events
                          Dictionary and Corpora
                                                    Main NLP groups
          Status of machine translation research
                                                    Research content
                                       Conclusion


Main NLP groups (3/3)

    No.   Group                               Experience
    8.    Hanoi Univ. of Tech-                Develop tools: POS Tagging, Chunking,
          nology                              Parsing
    9.    Danang Univ.                        Develop tools, building dictionaries
                                              French-Vietnamese-French     (Papillon
                                              Project)
    10.   IOIT                                Automatic Speech Recognition and Syn-
                                              thesis, Optical Character Recognition,
                                              Building Speech Corpora(VnVoice, Vn-
                                              DOCR)
    11.   International Research              Automatic Speech Recognition and Syn-
          Centre MICA, HUT                    thesis, Optical Character Recognition,
                                              Building Speech Corpora




                       Vietnamese participants      Some issues in Vietnamese Language Processing   24/49
            Activities in organization and experts
                                                     Recent events
                           Dictionary and Corpora
                                                     Main NLP groups
           Status of machine translation research
                                                     Research content
                                        Conclusion


Research content
   1   Basic research : Computation methods for VSLP
   2   Typical products for the end-users
   3   Resources and tools for VLSP




                        Vietnamese participants      Some issues in Vietnamese Language Processing   25/49
           Activities in organization and experts
                                                    Recent events
                          Dictionary and Corpora
                                                    Main NLP groups
          Status of machine translation research
                                                    Research content
                                       Conclusion


Basic research

      Basic research on methods for processing Vietnamese
      language and speech
      Applied research to adapt methods and technologies for
      processing other languages or advanced techniques to
      Vietnamese language and speech.




                       Vietnamese participants      Some issues in Vietnamese Language Processing   26/49
             Activities in organization and experts
                                                      Recent events
                            Dictionary and Corpora
                                                      Main NLP groups
            Status of machine translation research
                                                      Research content
                                         Conclusion


Products for end-user
    1   VnVoice system for VN synthesis
    2   Embedded speech synthesis and recognition system
    3   Large lexicon-based speech recognizer
    4   Domain-specific English-Vietnamese translation system
    5   IREST system for information retrieval, extraction,
        summarization, and translation
    6   Vietnamese spelling checker




                         Vietnamese participants      Some issues in Vietnamese Language Processing   27/49
               Activities in organization and experts
                                                        Recent events
                              Dictionary and Corpora
                                                        Main NLP groups
              Status of machine translation research
                                                        Research content
                                           Conclusion


Resources and tools
    1   Basic resources for speech
    2   Corpus for speech synthesis and recognition
    3   Three basic resources for language
          1   Vietnamese MRD
          2   Annotated corpora (mono, multi)
          3   Entities (rules of VN grammar)
    4   Five basic tools for language
          1   Spelling checker
          2   Vietnamese word segmentation
          3   Vietnamese POS tagger
          4   Vietnamese chunker
          5   Vietnamese syntax analyzer




                           Vietnamese participants      Some issues in Vietnamese Language Processing   28/49
             Activities in organization and experts
                                                      Overview
                            Dictionary and Corpora
                                                      Some corpora
            Status of machine translation research
                                                      Development of supporting tools
                                         Conclusion


Contents
  3   Activities in organization and experts
        Recent events
        Main NLP groups
        Research content
  4   Dictionary and Corpora
        Overview
        Some corpora
        Development of supporting tools
  5   Status of machine translation research
        Overview
        Rule-based approach
        BTL approach
        Statistical approach
  6   Conclusion
                         Vietnamese participants      Some issues in Vietnamese Language Processing   29/49
            Activities in organization and experts
                                                     Overview
                           Dictionary and Corpora
                                                     Some corpora
           Status of machine translation research
                                                     Development of supporting tools
                                        Conclusion


Overview

     Dictionaries and corpora have been developed by each group
     by their need and ability.
     E-V dictionaries are well done, V-E dictionaries are in debate
     Corpora
           Some work in the past
           New plan for corpora




                        Vietnamese participants      Some issues in Vietnamese Language Processing   30/49
          Activities in organization and experts
                                                   Overview
                         Dictionary and Corpora
                                                   Some corpora
         Status of machine translation research
                                                   Development of supporting tools
                                      Conclusion


Japanese EDR-based Dictionary (JAIST)

     Model for such a dictionary (in NLP project 2001-2003)
     Can benefit from Japanese EDR
         English word dictionary
         Concept dictionary with concept primary illustration and
         concept explication in Vietnamese
         English co-occurrence dictionary
         EDR Corpus (English Corpus)
     Components to be newly done
         Vietnamese word dictionary
         English co-occurrence dictionary
         Bilingual dictionary English-Vietnamese, Vietnamese-English
         EDR Corpus (Vietnamese Corpus)




                      Vietnamese participants      Some issues in Vietnamese Language Processing   31/49
             Activities in organization and experts
                                                      Overview
                            Dictionary and Corpora
                                                      Some corpora
            Status of machine translation research
                                                      Development of supporting tools
                                         Conclusion


Dictionary for machine translation (from JAIST group)




  95,000 words; 15,000 phrases; 18,000 translation patterns



                         Vietnamese participants      Some issues in Vietnamese Language Processing   32/49
         Activities in organization and experts
                                                  Overview
                        Dictionary and Corpora
                                                  Some corpora
        Status of machine translation research
                                                  Development of supporting tools
                                     Conclusion


JAIST’s group MT system on the web




                     Vietnamese participants      Some issues in Vietnamese Language Processing   33/49
          Activities in organization and experts
                                                   Overview
                         Dictionary and Corpora
                                                   Some corpora
         Status of machine translation research
                                                   Development of supporting tools
                                      Conclusion


Some corpora

     Monolingual corpora: VLC (Vietnam Lexicography Centre),
     UNS-VNUHCM, etc. for Vietnamese
     Bilingual corpora: The EVC corpus (UNS-VNUHCM)
     consists of 400,000 pairs of E-V sentences (approx. 5,500,000
     words) in the fields of Science and Technology. This EVC has
     been being partially annotated with morphology (word
     boundary, lemmatize), POS and sense tags semi-automatically.




                      Vietnamese participants      Some issues in Vietnamese Language Processing   34/49
          Activities in organization and experts
                                                   Overview
                         Dictionary and Corpora
                                                   Some corpora
         Status of machine translation research
                                                   Development of supporting tools
                                      Conclusion


Development of supporting tools




                      Vietnamese participants      Some issues in Vietnamese Language Processing   35/49
               Activities in organization and experts
                                                        Overview
                              Dictionary and Corpora
                                                        Some corpora
              Status of machine translation research
                                                        Development of supporting tools
                                           Conclusion


Capacity and realization
         Available tools
              Word segmentation, POS tagging
              Deep parsing in TAG3 formalism
              Syllable list and morpho-syntactic lexicon
              Editor for segmentation and tagging revision
              Some utilities for corpus exploration
         Ongoing work
              Improvement of available tools
              Improvement of the tagset for POS tagging
              Building syntactic lexicon based on morpho-syntactic lexicon
              Collection of a balanced corpus following the above criteria




    3
        Tree-Adjoining Grammar
                           Vietnamese participants      Some issues in Vietnamese Language Processing   36/49
             Activities in organization and experts   Overview
                            Dictionary and Corpora    Rule-based approach
            Status of machine translation research    BTL approach
                                         Conclusion   Statistical approach


Contents
  3   Activities in organization and experts
        Recent events
        Main NLP groups
        Research content
  4   Dictionary and Corpora
        Overview
        Some corpora
        Development of supporting tools
  5   Status of machine translation research
        Overview
        Rule-based approach
        BTL approach
        Statistical approach
  6   Conclusion
                         Vietnamese participants      Some issues in Vietnamese Language Processing   37/49
            Activities in organization and experts   Overview
                           Dictionary and Corpora    Rule-based approach
           Status of machine translation research    BTL approach
                                        Conclusion   Statistical approach


Overview

     4 main MT groups
     different approaches:
           Rule-based approach to English-Vietnamese MT systems
           Transfer-based MT using BTL (Bitext Transfer Learning) for
           English-Vietnamese MT systems
           Statistical approach to Vietnamese-English translation
           Example-based, phrase-based approach




                        Vietnamese participants      Some issues in Vietnamese Language Processing   38/49
          Activities in organization and experts   Overview
                         Dictionary and Corpora    Rule-based approach
         Status of machine translation research    BTL approach
                                      Conclusion   Statistical approach


Rule-based approach




                      Vietnamese participants      Some issues in Vietnamese Language Processing   39/49
           Activities in organization and experts   Overview
                          Dictionary and Corpora    Rule-based approach
          Status of machine translation research    BTL approach
                                       Conclusion   Statistical approach


Rule-based approach current status
      MT research group was established in 1990 starting with an
      English to Vietnamese MT system
          Transfer Technology
          Dictionary with 12,000 entries, 500 grammar rules
      1997: EVTRAN 1.0
          2,000 grammar rules, 60,000 entries
      1999: EVTRAN 2.0
          3,000 grammar rules, 250,000 entries
          Commercial software in Vietnam
          Listed in Compendium of Translation Software (EAMT)
      2005: EVTRAN 3.0
          Automatic source language identification
          10,000 grammar rules, 530,000 entries


                       Vietnamese participants      Some issues in Vietnamese Language Processing   40/49
          Activities in organization and experts   Overview
                         Dictionary and Corpora    Rule-based approach
         Status of machine translation research    BTL approach
                                      Conclusion   Statistical approach


BTL approach
     BTL model (Bitext Transfer Learning) for English-Vietnamese
     MT: from annotated-EVC (bitext)
         to automatically extract “transfer rules” (llexical and structure)
         by a learning algorithm
         then apply those rules to tag the target language (Vietnamese
         sentence)




                      Vietnamese participants      Some issues in Vietnamese Language Processing   41/49
         Activities in organization and experts   Overview
                        Dictionary and Corpora    Rule-based approach
        Status of machine translation research    BTL approach
                                     Conclusion   Statistical approach


BTL approach




                     Vietnamese participants      Some issues in Vietnamese Language Processing   42/49
          Activities in organization and experts   Overview
                         Dictionary and Corpora    Rule-based approach
         Status of machine translation research    BTL approach
                                      Conclusion   Statistical approach


BTL approach current status

     two machine translation systems: EVT 1.0 and VCLEVT 2.0
     EVT 1.0
         A rule-based MT system
         Evaluated by PC World Vietnam Magazine in 1998: 65% for
         simple sentences; 50% for normal sentences; and 35% for
         complex sentences
     VCLEVT 2.0
         Using BTL model
         Learning automatically on bilingual corpus
         Gaining better translation quality on informatic documents




                      Vietnamese participants      Some issues in Vietnamese Language Processing   43/49
           Activities in organization and experts   Overview
                          Dictionary and Corpora    Rule-based approach
          Status of machine translation research    BTL approach
                                       Conclusion   Statistical approach


Statistical approach




                       Vietnamese participants      Some issues in Vietnamese Language Processing   44/49
           Activities in organization and experts   Overview
                          Dictionary and Corpora    Rule-based approach
          Status of machine translation research    BTL approach
                                       Conclusion   Statistical approach


Statistical approach current status
      History: 1999-2003
          Developed an English-Vietnamese MT system at an
          Information company in Vietnam. The system based on the
          transfer approach
      2004-present
          Research on modern technologies in MT: Example-Based,
          SMT, Phrase-Based SMT




                       Vietnamese participants      Some issues in Vietnamese Language Processing   45/49
          Activities in organization and experts   Overview
                         Dictionary and Corpora    Rule-based approach
         Status of machine translation research    BTL approach
                                      Conclusion   Statistical approach


MT improvement direction

     Develop a new MT system which combines advantages of rule
     based, example based, and statistical machine translation
     Apply advances of English processing to improve current MT
     system
     Build powerful and intuitive tools which support users
     modifying and editing dictionary




                      Vietnamese participants      Some issues in Vietnamese Language Processing   46/49
             Activities in organization and experts
                            Dictionary and Corpora
            Status of machine translation research
                                         Conclusion


Contents
  3   Activities in organization and experts
        Recent events
        Main NLP groups
        Research content
  4   Dictionary and Corpora
        Overview
        Some corpora
        Development of supporting tools
  5   Status of machine translation research
        Overview
        Rule-based approach
        BTL approach
        Statistical approach
  6   Conclusion
                         Vietnamese participants      Some issues in Vietnamese Language Processing   47/49
          Activities in organization and experts
                         Dictionary and Corpora
         Status of machine translation research
                                      Conclusion


Current and future demands

     Current need for development: tourist, economy,
     communication, etc.
     Demand increases both on human translation and
     automatically translation, especially the translation on the
     Internet
     Lack of translation experts, especially in the foreign languages
     other than English, such as important languages for Vietnam
     such as Japanese, Chinese.
     Demand of translation in future will be increased because of
     the increase of the world integration




                      Vietnamese participants      Some issues in Vietnamese Language Processing   48/49
          Activities in organization and experts
                         Dictionary and Corpora
         Status of machine translation research
                                      Conclusion


Thank you for your attention!




                            Question and answer




                      Vietnamese participants      Some issues in Vietnamese Language Processing   49/49