Arabic Natural Language Processing - University of Balamand by wanghonghx

VIEWS: 36 PAGES: 50

									Arabic Natural Language Processing:
   State of the Art and Prospects

              Rached Zantout, Ph.D.
Electrical and Computer Engineering Department
            Hariri Canadian University
            Mechref, Chouf, Lebanon
                                Outline
   •     What is NLP ?
   •     Why NLP?
   •     MT as a case study!
       –     Problems solved by MT.
       –     Main players in MT.
   •     How does Arabic compare to other Languages as
         far as NLP is concerned?
       –     MT as a case study.
   •     What kind of research is being conducted in
         ANLP?
   •     Recommendations!


11/2006UOB                    Zantout ANLP: State of the Art and Prospects   2
                                Tracing the history of NLP
                1. Coming together of Symbolic and Statistical traditions;
                2. Increased focus on functionality and less on representation; NL and Speech Applications
                3. Availability of large corpora, large disk space, commoditization of computing resources.
Mid-90s-        4. Emergence of the World Wide Web, continues to change the field….
present


                1. Revival of Finite-state Models for NL processing; XEROX, AT&T
                2. Computational implementation of large NL Grammars in different grammatical frameworks and
Mid-80s; mid-
                3. Development of Penn Treebank; a parse annotated corpora
90s
                4. Machine learning coming of age;



                1.      Symbolic tradition: (CS)
                   a. How much representational power is needed for NL? Grammars of increasing power to describe NL: Joshi, Gazdar, Bresnan, Kaplan, Periera,
                        Warren, Shieber.
1970s; mid-        b. AI Researchers: Winograd, Schank, Wilks, Lenhart, Woods: Understanding systems; SHRDLU, scripts, plans and goals LUNAR
1980s              c. Discourse and Dialog Structure: Grosz, Sidner, Hobbs, Perrault, Cohen
                2. Statistical tradition: (EE.) Hidden Markov models for speech recognition;

                 1. Symbolic tradition: (CS) Generative grammar; parsing algorithms;
                 Newell, Simon, Shannon, McCarthy, Minsky, Rochester: Birth of AI; pattern matching based NL understanding system.
  1960s          2. Statistical tradition: (EE.) Probabilistic inferences for OCR; representationally-light models


                 1. Kleene’s and Shannon’s probabilistic finite automaton; Chomsky’s context-free grammars; programming languages; formal language theory
                 2. Shannon’s information theory – information can be measured; decoding paradigm
  1950s


  1939-1945

                 World War II; Need for code-breaking algorithms; ENIAC

  1936           Turing’s model of computation; theoretical basis for computer science


11/2006UOB                                                        Zantout ANLP: State of the Art and Prospects                                                  3
                  NL and NLP definitions
             adapted from http://www.cs.bham.ac.uk/~pxc/nlpa/index02.htm
       • 'natural language' (NL):
             – Any of the languages naturally used by humans,
             – not an artificial or man-made language such as a
               programming language.
             – (Arabic, English, Chinese, Swahili, etc.)
             – evolved over thousands of years.
             – efficient vehicles for human to human communication.
       • 'Natural language processing' (NLP):
             – attempts to use computers to process a NL.
             – Enter computers.
                • What's the connection?


11/2006UOB                       Zantout ANLP: State of the Art and Prospects   4
                                    Why ?
              adapted from http://www.cs.utexas.edu/users/ear/cs378NLP/
       • Is there any reason a computer should know
         English or Chinese or Swahili?
       • Yes. There are several "killer apps" for
         NLP:
             – retrieving information from the web,
             – translating documents from one language to
               another, and
             – spoken front ends to all kinds of application
               programs.


11/2006UOB                       Zantout ANLP: State of the Art and Prospects   5
                             NLP includes
             adapted from http://www.cs.bham.ac.uk/~pxc/nlpa/index02.htm
       • Speech synthesis:
             – is this very 'intelligent„?
             – synthesis of natural-sounding speech is technically complex:
                 • requires some 'understanding' of what is being spoken to ensure, for
                   example, correct intonation. (bear vs. dear)

       • Speech recognition:
             – reduction of continuous sound waves to discrete words.

       • Natural language understanding:
             – moving from isolated words (written or via speech recognition)
             – to 'meaning'.

       • Natural language generation:
             – generating appropriate NL responses to unpredictable inputs.

       • Machine translation (MT): translating one NL into another
11/2006UOB                           Zantout ANLP: State of the Art and Prospects         6
                     Areas Related to NLP
       • Input:
             – Speech Recognition.
             – Natural Language Understanding.
                 • Lip Reading ?
       • Processing:
             – Information Retrieval:
                 • Finding where textual resources reside.
             – Information Extraction:
                 • Extracting pertinent facts from textual resources.
             – Inference: Drawing conclusions based on known facts.
             – Spelling Correction.
             – Grammar Checking.
       • Output:
             – Natural Language Generation.
             – Speech Synthesis.
       • Machine Translation.
       • Conversational Agents.
11/2006UOB                           Zantout ANLP: State of the Art and Prospects   7
                                                   NLP
                            taken from http://tangra.si.umich.edu/~radev/NLP/notes/1.ppt
    •   Information extraction
    •   Named entity recognition
    •   Trend analysis
    •   Subjectivity analysis
    •   Text classification
    •   Anaphora resolution, alias resolution
    •   Cross-document cross-reference
    •   Parsing
    •   Semantic analysis
    •   Word sense disambiguation
    •   Word clustering
    •   Question answering
    •   Summarization
    •   Document retrieval (filtering, routing)
    •   Structured text (relational tables)
    •   Paraphrasing and paraphrasing/entailment ID
    •   Text generation
    •   Machine translation

11/2006UOB                           Zantout ANLP: State of the Art and Prospects          8
                            Sample projects
       •     Noun phrase parser                   •    Text summarization
       •     Paraphrase identification            •    Sentence compression
       •     Question answering                   •    Definition extraction
       •     NL access to databases               •    Crossword puzzle generation
       •     Named entity tagging                 •    Prepositional phrase attachment
       •     Rhetorical parsing                   •    Machine translation
       •     Anaphora resolution, entity          •    Generation
             crossreference                       •    Semi-structured document
       •     Document and sentence                     parsing
             alignment                            •    Semantic analysis of short
       •     Using bioinformatics methods              queries
       •     Encyclopedia                         •    User-friendly summarization
       •     Information extraction               •    Number classification
       •     Speech processing                    •    Domain-specific PP attachment
       •     Sentence normalization               •    Time-dependent fact extraction



11/2006UOB                         Zantout ANLP: State of the Art and Prospects          9
             Main research forums and other pointers
    • Conferences: ACL/NAACL, SIGIR, AAAI/IJCAI, ANLP,
      Coling, HLT, EACL/NAACL, AMTA/MT Summit,
      ICSLP/Eurospeech
    • Journals: Computational Linguistics, Natural Language
      Engineering, Information Retrieval, Information Processing and
      Management, ACM Transactions on Information Systems,
      ACM TALIP, ACM TSLP
    • University centers: Columbia, CMU, JHU, Brown, UMass,
      MIT, UPenn, USC/ISI, NMSU, Michigan, Maryland,
      Edinburgh, Cambridge, Saarland, Sheffield, and many others
    • Industrial research sites: IBM, SRI, BBN, MITRE, MSR,
      (AT&T, Bell Labs, PARC)
    • Startups: Language Weaver, Ask.com, LCC
    • The Anthology: http://www.aclweb.org/anthology


11/2006UOB                  Zantout ANLP: State of the Art and Prospects   10
                                  NLP Sources
       • Journals:
             –   Artificial Intelligence.
             –   Computational Intelligence.
             –   IEEE Transactions on Intelligent Systems.
             –   Journal of Artificial Intelligence Research.
             –   Cognitive Science.
             –   Machine Translation.
       • Conferences:
             –   AAAI: American Association for Artificial Intelligence.
             –   IJCAI: International Joint Conference on Artificial Intelligence.
             –   Cognitive Science Society Conferences.
             –   DARPA Speech and Natural Language Processing Workshop.
             –   ARPA Workshop on Human Language Technology.
             –   Machine Translation Summit series of conferences.
             –   TALN series of conferences.
             –   COLING series of conferences.
       • Collection of papers:
             – Readings in Natural Language Processing.

11/2006UOB                             Zantout ANLP: State of the Art and Prospects   11
                            Why NLP?
                            Numbers
       Information age! Information revolution!
   •   Cheaper PCs
   •   Advances in networking
   •   Internet/www central pillar of modern societies
   •   Massive production of information
   • Growth of www?
        Year      92     93         94              96  Sep. 99
       # Sites    50     250       2000           >100K 43 M
   • 800 Million Documents as of Sep. 1999
   • People?
     US:         6.5 M new adult users between 2/99 & 5/99
     World:      26 Million in 1995
                 163.25 Million as of 9/99
11/2006UOB                   Zantout ANLP: State of the Art and Prospects   12
11/2006UOB   Zantout ANLP: State of the Art and Prospects   13
             More Recent Statistics (2006)




11/2006UOB             Zantout ANLP: State of the Art and Prospects   14
                   Web Characterization: Country Statistics
               http://www.oclc.org/research/projects/archive/wcp/stats/intnl.htm

   1999                                                  2002

   Country         Percent of public sites               Country                Percent of public sites

   US              49%                                   US                     55%

   Germany         5%                                    Germany                6%

   UK              5%                                    Japan                  5%

   Canada          4%                                    UK                     3%

   Japan           3%                                    Canada                 3%

   Australia       2%                                    Italy                  2%

   Brazil          2%                                    France                 2%

   Italy           2%                                    Netherlands            2%

   France          2%                                    Others                 18%

   Others          16%                                   Unknown                4%

   Unknown         10%




11/2006UOB                                   Zantout ANLP: State of the Art and Prospects                 15
                     Web Characterization: Language Statistics
              http://www.oclc.org/research/projects/archive/wcp/stats/intnl.htm
 1999                                                        2002


 Language           Percent of public sites                  Language              Percent of public sites


 English            72%                                      English               72%


 German             7%                                       German                7%


 French             3%                                       Japanese              6%


 Japanese           3%                                       Spanish               3%


 Spanish            3%                                       French                3%


 Chinese            2%                                       Italian               2%


 Italian            2%                                       Dutch                 2%


 Portuguese         2%                                       Chinese               2%


 Dutch              1%                                       Korean                1%


 Finnish            1%                                       Portuguese            1%


 Russian            1%                                       Russian               1%


 Swedish            1%                                       Polish                1%


11/2006UOB                                    Zantout ANLP: State of the Art and Prospects                   16
        What‟s the Use of the Numbers?
       • Prove that there is a “Linguistic Problem”:
             – Domination of the English Language.
             – Alienates non-English Speakers.
             – Computers are our interface to the internet:
                • Computers do not understand a Natural Language.
                • We do not have enough time to guide computers to
                  do what is required of them
                   – E.g. Search for all presentations about NLP on the
                     internet.
                   – Digest them and produce one presentation appropriate for
                     my talk at UOB ;-)



11/2006UOB                      Zantout ANLP: State of the Art and Prospects    17
        What‟s the Use of the Numbers?
       • Middle-East is a growing internet market:
             – Growing very fast.
             – Lots of Arabs (read non-English speakers).
             – Need to communicate with my own language.
             – Need computer to save time for me while
               searching for information.
             – Dream: computer could do most of my work
               and I can just relax 
       • Introducing the A into ANLP.

11/2006UOB                 Zantout ANLP: State of the Art and Prospects   18
                   The Linguistic Problem
        Machine Translation (MT) a Case Study
       English: the de-facto international language
             •   Internet and www (“CyberEnglish”!)
             •   Science and Technology
             •   Trade and Industry
             •   Politics and Media
             •   Tourism
             •   Etc.
    English = key to accessing Knowledge
        in all walks of life!
    Alienation of the HUGE majority of world population
    Impoverishment of world cultures
11/2006UOB                     Zantout ANLP: State of the Art and Prospects   19
             The Linguistic Challenge
   France:
   • 1997:      7% French presence on www
   • Legislation introduced (forcing I. Content providers to
     translate web sites into French)
   • Pres. Chirac: “If in the new media, our language, our
     programs, our creations, are not strongly present, the
     young generation of our country will be economically
     and culturally marginalized”
   • “I do not want to see the European Culture sterilized or
     obliterated by the American Culture”
   French is stronger than Arabic on the internet and the PC.

11/2006UOB               Zantout ANLP: State of the Art and Prospects   20
                           If not General NLP!
                          How about at least MT?

  Languages in the world
        • 6,800 living languages
        • 600 with written tradition
        • 95% of world population
            speaks 100 languages

  Translation Market
        • $8 Billion Global Market
        • Doubling every five years



  (Donald Barabé,
11/2006UOB invited talk, MT Summit 2003)   Zantout ANLP: State of the Art and Prospects   21
                   The Problem
    • Coping with the huge amount of articles, books,
      patents in all disciplines (Assimilation)
    • Coping with the www massive volume
    • Exporting economic products (Dissemination)
    • Facing the Omnipresence of English
      50% of all scientific and technical references
    Linguistic, cultural, social, educational,
      economic, and political factors

11/2006UOB            Zantout ANLP: State of the Art and Prospects   22
    Human Translation too limited                                           MT


        Translation Cost in EU is $1 Billion
             Official Languages: from 11 to 20
         1600 Human Translators




11/2006UOB                    Zantout ANLP: State of the Art and Prospects        23
              Why Machine Translation?
       • Full Translation
             – Domain specific
                • Weather reports
       • Machine-aided Translation
             – Translation dictionaries
             – Translation memories
             – Requires post-editing
       • Cross-lingual NLP applications
             – Cross-language IR
             – Cross-language Summarization
11/2006UOB                    Zantout ANLP: State of the Art and Prospects   24
             MT: A Strategic Choice
       • USA: FCCSET report on MT (1993) on the
         president‟s request.
       • Japan: $200 Million during 15 years till
         1991. (Asian Multilingual MTS since 87)
       • EU: since 1991, 220 projects on Language
         Technology ($30 million on Eurotra!)
              1996 report on the state of MT



11/2006UOB            Zantout ANLP: State of the Art and Prospects   25
                          MT Players
       • Governments:
         US, European, Japan, Canada, ex-USSR
         (cold war), Korea, Malaysia, Indonesia,
         Thailand, etc.
       • International institutions:
             – UN, E. Commission (12 languages; soon to be
               22/23!!), etc.
       • Companies (R&D):Microsoft, Siemens,
             Fujitsu, Hitachi, Toshiba, Oki, NEC,
             Mitsubishi, Sharp…
11/2006UOB                  Zantout ANLP: State of the Art and Prospects   26
                     MT Market
       • World: estimated at $20 billion in 1991
       • MT Tools Market: $20 million in 1994
       • > 160 language pairs
       • > 60 MTSs being developed (as of 98)
       • Globalink claims 600 K users of its MTS
       • Lang. Eng. Corp. income (LogoVista): $2M
       • Smart Communications (Smart Translator):
         $6M
       • Systran (12 languages): 60,000 pages/year
11/2006UOB            Zantout ANLP: State of the Art and Prospects   27
                  AMT




11/2006UOB   Zantout ANLP: State of the Art and Prospects   28
                        ANLP
             AsharqAlawsat (‫30.01.90 )الشرق األوسط‬




11/2006UOB          Zantout ANLP: State of the Art and Prospects   29
     ANLP State Compared to General NLP
       • Script problem:
             – Arabic characters are nowhere near Latin-
               Based Characters.
       • Lack of funding:
             –   Governments.
             –   Pan-Arab Organizations.
             –   Industry ?! Private Sector.
             –   Research ???
             –   Infrastructure !


11/2006UOB                     Zantout ANLP: State of the Art and Prospects   30
                    Progress in Western MT
                          Statistical MT example
             2002                      2003                      Human Translation
   insistent Wednesday        Egyptair Has Tomorrow to Egypt Air May Resume its
   may recurred her trips to  Resume Its Flights to      Flights to Libya Tomorrow
   Libya tomorrow for flying  Libya
                                                         Cairo, April 6 (AFP) - An
                              Cairo 4-6 (AFP) - said an Egypt Air official
   Cairo 6-4 ( AFP ) - an official at the Egyptian
   official announced today Aviation Company today       announced, on Tuesday,
   in the Egyptian lines      that the company egyptair that Egypt Air will resume
   company for flying         may resume as of           its flights to Libya as of
   Tuesday is a company " tomorrow, Wednesday its tomorrow, Wednesday,
   insistent for flying " may flights to Libya after the after the UN Security
   resumed a consideration International Security
   of a day Wednesday         Council resolution to the  Council had announced the
   tomorrow her trips to      suspension of the          suspension of the embargo
                              embargo imposed on         imposed on Libya.
   Libya of Security Council Libya.
   decision trace
   international the imposed
   ban comment .

11/2006UOB                                                            Form a
                                 Zantout ANLP: State of the Art and Prospects talk by Charles Wayne, DARPA
                                                                                                         31
                        A First taste of
                  Arabic Machine Translation
       • English Text:
              – Before more than 30,000 fans who headed to the Cite
                Sportive from all Lebanese region on Sunday Nejmeh
                drew 1-1 with their traditional rivals Ansar in a
                breathtaking showdown, which saw both teams
                performing their best.
       • Human Translation:
        ‫– أمام أكثر من 000.03 متفرج زحفوا إلى ملعب المدٌنة الرٌاضٌة نهار‬
                       ّ
               ‫األحد تعادل النجمة و االنصار 1-1 فً مباراة مثٌرة قدم خاللها‬
                 .‫الفرٌقان عرضا ً طٌّبا ً افتقدته المالعب اللبنانٌة منذ فترة طوٌلة‬
       • Ajeeb Translation:
             ‫– قبل أكثر من 000،03 معجب الّذٌن اتجهوا إلى الٌذكر لعوب من كل‬
             ّ                         ّ
                         ‫المنطقة اللّبنانٌّة ٌوم األحد نجمة رسم 1-1 مع ترادي‬

11/2006UOB                         Zantout ANLP: State of the Art and Prospects      32
             A 1st Taste of Arabic MT
       • A sample of sentences to be translated:




       • Quite disappointing!
       • But, need for a more formal assessment and
         closer scrutiny
11/2006UOB             Zantout ANLP: State of the Art and Prospects   33
                  Multilingual Challenges
                     Morphological Variations

       • Affixation vs. Root+Pattern
             write    written           ‫كتب‬                ‫مكتوب‬
             kill     killed            ‫قتل‬                ‫مقتول‬
             do       done              ‫فعل‬                ‫مفعول‬




11/2006UOB                    Zantout ANLP: State of the Art and Prospects   34
                     Translation Divergences
                                 conflation

             ‫ليس‬                    be                                        etre

      ‫ا نا‬         ‫هنا‬   I         not           here            Je          ne pas   ici



    ‫لست هنا‬                  I am not here                    Je ne suis pas ici
    I-am-not here                                             I not be not here

11/2006UOB                    Zantout ANLP: State of the Art and Prospects                  35
                    Translation Divergences
                categorial, thematic and structural


                    *                                                          be

             ‫ا نا‬       ‫بردان‬                                           I           cold




         ‫انا بردان‬                                                    I am cold
         I cold

11/2006UOB                      Zantout ANLP: State of the Art and Prospects               36
              Translation Divergences
                  head swap and categorial



              swim                                               ‫اسرع‬

               Swam
         I     across
                        quickly                      ‫انا‬         ‫عبىر‬        ‫سباحة‬

              river                                               ‫نهر‬


    I swam across the river quickly ‫اسرعت عبىر النهر سباحة‬
                                    I-sped crossing the-river swimming
11/2006UOB                    Zantout ANLP: State of the Art and Prospects           37
               Translation Divergences
                 head swap and categorial

                       ver
               ‫اسرع‬                                      swim
                       b

         ‫انا‬   ‫عبىر‬   ‫سباحة‬                   I         across quickly

               ‫نهر‬                                       river
                       ver
                         b




11/2006UOB                    Zantout ANLP: State of the Art and Prospects   38
              Fluency vs. Accuracy


                                                                FAHQ
             conMT                                               MT
                                                   Prof.
 Fluency                                            MT
                                                 Info.
                                                  MT



                     Accuracy


11/2006UOB             Zantout ANLP: State of the Art and Prospects    39
                    Evaluation of MTSs
       • Various methodologies put forward
       • Various aspects considered:
         Intelligibility, Fidelity, and other software
         engineering features
       • Mostly human-centered:
              Get users to compare Human and M. T.
              Get users to evaluate MT output on a scale (e.g.
               1-5)
       • Subjective to a large extent

11/2006UOB                   Zantout ANLP: State of the Art and Prospects   40
                Automatic Evaluation Example
                                Bleu Metric

             Test Sentence                       Gold Standard References
  colorless green ideas sleep furiously          all dull jade ideas sleep irately
                                             drab emerald concepts sleep furiously
                                            colorless immature thoughts nap angrily




11/2006UOB                      Zantout ANLP: State of the Art and Prospects          41
                Automatic Evaluation Example
                                Bleu Metric

             Test Sentence                       Gold Standard References
  colorless green ideas sleep furiously          all dull jade ideas sleep irately
                                             drab emerald concepts sleep furiously
                                            colorless immature thoughts nap angrily



     Unigram precision = 4/5




11/2006UOB                      Zantout ANLP: State of the Art and Prospects          42
                Automatic Evaluation Example
                                Bleu Metric

             Test Sentence                       Gold Standard References
  colorless green ideas sleep furiously          all dull jade ideas sleep irately
  colorless green ideas sleep furiously      drab emerald concepts sleep furiously
  colorless green ideas sleep furiously     colorless immature thoughts nap angrily
  colorless green ideas sleep furiously


     Unigram precision = 4 / 5 = 0.8
     Bigram precision = 2 / 4 = 0.5
     Bleu Score = (a1 a2 …an)1/n
                = (0.8 ╳ 0.5)½ = 0.6325  63.25
11/2006UOB                      Zantout ANLP: State of the Art and Prospects          43
                 Evaluating AMT‟s
       • 3 Arabic MT systems tested:
         - Al-Mutarjim Al-Arabey (ATA Software Tech.)
         - Al-Wafi (by ATA Software Tech.)
         - Arabtrans (by Arab.Net Tech.)
       • Sample texts translated.
       • Scoring by a human (1 or 0.5 or 0 )
       • Results:




11/2006UOB              Zantout ANLP: State of the Art and Prospects   44
                   Analysis of the results
  • Poor AMT systems overall
  • Good Lexicon coverage in the domain
    “Internet and Arabisation”
  • Very Poor Grammatical results:
       – detailed analysis focuses on bad areas.
       – Pronoun resolution and semantic correctness
             • barely above average
                – (almost 1 error out of each 2 cases!)

  • The technology used in AMTS‟s is “outdated”

11/2006UOB                        Zantout ANLP: State of the Art and Prospects   45
                           Future Work

   • Develop awareness of the importance of MT and
     NLP for Arabic.

   • Developing our own MT system based on all what
     we have learned from the evaluation
       – Focus on Statistical techniques:
             • Speed of Implementation.
             • Obtaining better results.


11/2006UOB                    Zantout ANLP: State of the Art and Prospects   46
                        AMT and Lebanon
                            ECOMLEB, no.2, 1st Quarter 2005
       • “How can you explain why so many in the IT Field can‟t find a job in
         Lebanon when we keep hearing that we are the best in the region?”,
         Reader‟s Comments, P. 02.
       • “Khan Al-Saboun”, a local soap maker in Tripoli now sells soaps all
         over the world. “University Series, p. 05”
       • “… Lebanon has one of the highest rates of internet usage in the area,
         a good PC penetration, abundant human talent and resources in IT and
         particularly software and web design, and no money transfer
         restrictions” Interview with Minister of Economy and Trade, H.E.
         Adnan Kassar, p. 16.
       • “…[Lebanon needs to] reduce brain drain” Interview with Minister of
         Economy and Trade, H.E. Adnan Kassar, p. 17.
       • “…[Lebanon has] a multiligual and highly educated human resource
         [base]” Interview with Minister of Economy and Trade, H.E. Adnan
         Kassar, p. 17.
       • “B2C e-commerce is expected to cross US$ 1 Billion mark by 2008 in
         GCC countries … particularly in e-shopping … mainly in Saudi
         Arabia and the UAE … compund average growth of 22% over 5 years
         … > 33.33% of transactions are booking for airline and hotels.

11/2006UOB                       Zantout ANLP: State of the Art and Prospects     47
                             Recommendations
   • Develop Arab acceptance of the strategic nature of ANLP/AMT
   • Establishing an Arab Centre for Arabic language processing and
     AMT
        Gather Arab researchers
        Host and sponsor research:
                Morphology,
                Parsing,
                Speech
                semantics, pragmatics
        Building a central repository:
                software,
                lexicons,
                corpora,
                Tools
        and archive (literature)


11/2006UOB                           Zantout ANLP: State of the Art and Prospects   48
                Recommendations (cont.)
   • Strengthen ties between Academia, research centers, and
     industry
   • Sponsor Pan-Arab projects (ESPRIT-like)
   • Sponsor conferences, exhibitions, and trade shows:
       – Coordinate Different Conferences:
             • 2 upcoming ANLP conferences AT THE SAME TIME in 2 Different
               places (KSA and Morocco)
             • Plan for a third (UAE).
   • Strengthen links with western institutions (on NLP/MT):
       – Already western researchers are active in ANLP:
       – A workshop in London in the same time frame as both
         conferences in KSA and Morocco.

11/2006UOB                     Zantout ANLP: State of the Art and Prospects   49
              Thank you for your patience!
•   References:
     – Ahmed Guessoum, Rached Zantout, A Methodology for Evaluating Arabic Machine Translation
       Systems, Machine Translation, Volume 18, Issue 4, Dec 2004, Pages 299 - 335
     – R. Zantout and A. Guessoum, An Automatic English-Arabic HTML Page Translation System,
       Journal of Network and Computer Applications, vol. 4, no. 24, October 2001.
     – Guessoum and R. Zantout, A Methodology for a semi-automatic evaluation of the language
       coverage of machine translation system lexicons, The Journal of Machine Translation, Kluwer
       Academic Publishers, The Netherlands, vol. 16, October 2001.
     – Zantout, Rached and Guessoum, Ahmed, Arabic Machine Translation: A Strategic Choice for the
       Arab World, Journal of King Saud University, Vol. 12, Computer and Information Sciences, pp.
       117-144, A.H. 1420-2000.
     – Ahmed Guessoum, Rached Zantout , Machine Translation, A Startegic Dimension for the Arab
       World, University Forum, University of Sharjah, Issue 41, Year 6, Muharram 1427, February 2006,
       pp. 32-37.
     – Guessoum, Ahmed and Zantout, Rached, Arabizing the Internet and its effect on the development of
       the Kingdom of Saudi Arabia, The 100 years symposium of the King Saud University, Riyadh,
       Saudi Arabia, 18-19/10/1999.
     – Guessoum, Ahmed and Zantout, Rached, Towards a Strategic Effort, with a Central Theme of
       Machine Translation, to meet the challenges of the Information Revolution, 1998 Symposium of
       Proliferation of Arabization and Development of Translation in the Kingdom of Saudi Arabia, King
       Saud University, Riyadh.
     – “Machine Translation: Challenges and Approaches,” Invited Lecture, CS 4705: Introduction to
       Natural Language Processing Fall 2004, Nizar Habash
       Post-doctoral Fellow, Center for Computational Learning Systems, Columbia University.


11/2006UOB                             Zantout ANLP: State of the Art and Prospects                   50

								
To top