Infrastructures and plans boosting Language Technology Research and Innovation

Document Sample
Infrastructures and plans boosting Language Technology  Research and Innovation Powered By Docstoc
					        Infrastructures and plans
                 boosting
          Language Technology
        Research and Innovation

                                   Stelios Piperidis
                                    Athena RC, Greece
                                               spip@ilsp.gr




Co-funded by the 7th Framework Programme of
the European Commission through the contract
T4ME, grant agreement no.: 249119.
Outline

 Introduction
 Current challenges and META-NET
 Language White Paper Series
 Strategic Research Agenda
 META-SHARE
 Next steps




http://www.meta-net.eu              2
Multilingual Europe

 Challenge: Providing each language community with the most
  advanced technologies for communication and information so that
  maintaining their mother tongue does not turn into a disadvantage.

 While research has made considerable progress in recent years, the
  pace of progress is not fast enough to meet the challenge within the
  next 10-20 years.

 All stakeholders – researchers, LT user and provider industries,
  language communities, funding programmes, policy makers –
  should team up for a major dedicated push.




http://www.meta-net.eu                                                   3
Objectives

 META-NET is a network of excellence dedicated to fostering the tech-
nological foundations of the European multilingual information society.




http://www.meta-net.eu                                                4
Four EU-Funded Projects

 Initial project: T4ME (FP7;
  13 partners, 10 countries)
 Three ICT-PSP consortia
  since Feb. 2011: CESAR,
  METANET4U, META-NORD
 All EU member states and
  several non-member states
  covered.
 META-NET in Nov. 2012:
  60 members in 34 countries.

                                http://www.meta-net.eu/members



http://www.meta-net.eu                                           5
   META-VISION

   Language White Paper Series


http://www.meta-net.eu           6
Language White Paper Series

 Reports on the state of our languages in
  the digital age and the level of support through
  language technology.
 Series covers 30 languages.
 Key communication instruments to
  address decision makers and journalists.
 Inform about societal and technological
  problems and challenges as well as economic
  opportunities.
 >2 years in the making.
 >200 national experts as contributors.
 >8.000 copies printed and distributed to
  politicians and journalists.



http://www.meta-net.eu                               7
30 Languages Covered

 Basque                     Galician       Norwegian
 Bulgarian*                 German*        Polish*
 Catalan                    Greek*         Portuguese*
 Czech*                     Hungarian*     Romanian*
 Danish*                    Icelandic      Serbian
 Dutch*                     Irish*         Slovak*
 English*                   Italian*       Slovene*
 Estonian*                  Latvian*       Spanish*
 Finnish*                   Lithuanian*    Swedish*
 French*                    Maltese*       Croatian

 * = Official EU language

http://www.meta-net.eu                                      8
Cross-Lingual Ranking

 In four application areas, each language is assigned to one of five
  clusters, ranging from excellent LT support to weak/no support:
      1. Machine Translation
      2. Speech Processing
      3. Text Analysis
      4. Resources
 Results finalised at a meeting
  in Berlin with representatives
  of all 30 languages
  (October 21/22, 2011).




http://www.meta-net.eu                                                  9
  MT            excellent       good            moderate                         fragmentary                           weak or no support

                                                                                                             Basque, Bulgarian, Croatian, Czech, Da-
                                                                                                             nish, Estonian, Finnish, Galician, Greek,
                                                                     Catalan, Dutch, German, Hungarian,
                               English      French, Spanish                                                    Icelandic, Irish, Latvian, Lithuanian,
                                                                          Italian, Polish, Romanian
                                                                                                                 Maltese, Norwegian, Portuguese,
                                                                                                                Serbian, Slovak, Slovene, Swedish



                excellent       good            moderate                         fragmentary                           weak or no support
Analysis




                                                                      Basque, Bulgarian, Catalan, Czech,
                                            Dutch, French,             Danish, Finnish, Galician, Greek,
                                                                                                               Croatian, Estonian, Icelandic, Irish,
                               English      German, Italian,            Hungarian, Norwegian, Polish,
                                                                                                              Latvian, Lithuanian, Maltese, Serbian
Text




                                               Spanish                  Portuguese, Romanian, Slovak,
                                                                              Slovene, Swedish



                excellent       good            moderate                         fragmentary                           weak or no support
  Speech




                                         Czech, Dutch, Finnish,       Basque, Bulgarian, Catalan, Danish,
                                            French, German,          Estonian, Galician, Greek, Hungarian,        Croatian, Icelandic, Latvian,
                               English
                                          Italian, Portuguese,         Irish, Norwegian, Polish, Serbian,        Lithuanian, Maltese, Romanian
                                                Spanish                     Slovak, Slovene, Swedish
Resource




                excellent       good            moderate                         fragmentary                            weak/no support


                                          Czech, Dutch, French,      Basque, Bulgarian, Catalan, Croatian,
                                           German, Hungarian,        Danish, Estonian, Finnish, Galician,            Icelandic, Irish, Latvian,
                               English
                                         Italian, Polish, Spanish,      Greek, Norwegian, Portuguese,                  Lithuanian, Maltese
                                                  Swedish            Romanian, Serbian, Slovak, Slovene
s




           http://www.meta-net.eu                                                                                                            10
Europe’s Languages and LT



                                                   Basque
                                                               Croatian
                                      Catalan     Bulgarian
                                                               Estonian
                           Dutch       Czech       Danish
                                                               Icelandic
                          French      Finnish      Galician
                                                                 Irish
          English         German     Hungarian      Greek
                                                                Latvian
                           Italian     Polish     Norwegian
                                                              Lithuanian
                          Spanish    Portuguese   Romanian
                                                                Maltese
                                      Swedish       Slovak
                                                                Serbian
                                                   Slovene


   good support through                                        weak or
   Language Technology                                        no support




http://www.meta-net.eu                                                     11
Key Observations

 When it comes to Language Technology support, there are massive
  differences between Europe’s languages and technology areas.
 LT support for English is ahead of any other language.
 Even support for English is far from being perfect.
 The gap between English and the other languages keeps widening!
 Several languages – Icelandic, Latvian, Lithuanian, Maltese – receive
  the weakest score in all four areas!
 At least 21 European languages in danger of digital
    extinction!(Languages put into the “weak or no support” category at
    least once.)



http://www.meta-net.eu                                                12
   META-VISION

   Strategic Research Agenda


http://www.meta-net.eu         13
Three Ingredients



      Appropriate                      Appropriate
      Programme                          Actors
                                          Research &
       Vision & Agenda
                                       Commercialisation



                         Appropriate
                          Support
                           Funding



http://www.meta-net.eu                                     14
Strategic Research Agenda

 META-NET Strategic Research Agenda
    for Multilingual Europe 2020.
 Addresses the problems we found during
  the white paper study.
 Three priority research themes and
  application/innovation scenarios.
 Can put Europe ahead of its competitors
  in this technology area.
 190+ contributors.
 Final version ready today!
 SRA will be presented to the EC and
  national bodies.



http://www.meta-net.eu                      15
Strategic Research Agenda




http://www.meta-net.eu      16
Priority Themes: 3 + 2

 Three Priority Research Themes:
      § Translation Cloud
      § Social Intelligence and e-Participation
      § Socially-Aware Interactive Assistant
 Two additional themes:
      § European Language Technology
        Platform
      § Core Technologies for Language
        Analysis and Production




http://www.meta-net.eu                            17
   META-SHARE

   Open Resource Infrastructure


http://www.meta-net.eu            18
The power of data
 Scientific data has the potential to transform and drastically improve
  our lives

 Evidence from many domains – geo & earth sciences, biotechnology
  – shows data & tools become valuable through opening and sharing
   § Both for research and technology development & evaluation
   § Supporting innovative applications

 Making the Human Genome Project results accessible, leveraged ~
  €3 billion R&D investment, ~ €500 billion in economic activity

 “Alzheimers’ researchers recently pooled genetic data and
  discovered 5 new genes and important evidence about the disease”

 “Data is too valuable to be locked away”

http://www.meta-net.eu                                                19
Language data? Tools?

 And what about language, one of the most complex cognitive
  faculties for which we know so little?

 What about language technology which we unconsciously use on a
  daily basis to process content, interact with people and machines?

 On all language and language technology related fora and lists
  requests
   § from English-X parallel corpora at any annotation level
   § whether they are aligned, validated, ...
   § language identification in twitter streams
   § similarity measures and related tools (e.g. auditive and visual
      data similarity)


http://www.meta-net.eu                                             20
Strategic Research Agenda




http://www.meta-net.eu      21
LRs in the SRA




http://www.meta-net.eu   22
LRs Discovery? Availability?


 According to past and recent studies only a portion of language
  resources (LRs) is known/ announced / shared / traded / ...

 … despite the fact that data collection, cleaning, annotation, curation
  and maintenance is a very costly business

 To make any progress, enable the development of useful applications,
  we need all those scientific, technical, legal, organisational, societal
  mechanisms that enable the necessary resources to be shared,
  recycled, repurposed




http://www.meta-net.eu                                                  23
META-SHARE rationale

 Language resources (data and tools) are dynamic living entities
   § they evolve over time in various dimensions (quantity,
      annotation levels, conversion to new formats, addition of new
      languages)
   § they are usually the product of collaborative work
   § they may come with varying restrictions, ...
 Need solutions that enable every language resource provider, at
  any granularity level (individual/lab/organisation), to
   § Create his own repository of LRs
   § Describe, document and update LR descriptions
   § Link to a network of repositories of other providers
   § Keep track of the use of his LRs, trade LRs, …
 Need solutions that enable every language resource consumer to
   § Discover what LRs suitable for his/her purposes exist
   § Get information about, download / acquire them
http://www.meta-net.eu                                           24
META-SHARE: what it is

 META-SHARE tries to match LR providers and consumers
  needs and expectations by enhancing visibility,
  documentation, identification, availability, preservation of
  language data and (basic language processing) tools

 It launches a long-term multidimensional endeavour by which
  language resources will contribute to boosting research,
  technology and innovation through wide availability, pooling,
  openness and sharing




http://www.meta-net.eu                                       25
META-SHARE architecture

                                     META-SHARE portal              User oriented and
                                                                     support services
                         Registration – authentication - authorisation

                Search / browse      licence      download          statistics

  E                 mappings        reporting   recommenders    Billing / payment
  xt
  er             META-SHARE               META-SHARE             Resources provision
                                                                META-SHARE
  n                inventory                inventory                  services
                                                                  inventory
  al
  re
  p                                   metadata harvesting
  os


                  Inventory
                   LR repo
http://www.meta-net.eu
                                  Inventory
                                   LR repo
                                                …       Inventory
                                                         LR repo
                                                                       Inventory
                                                                        LR repo
                                                                                    26
META-SHARE provider side

 All facilities for creating your
  own META-SHARE-compliant
  repository and linking to the
  META-SHARE network :
        § Open source repository software
        § Functionalities for
           documenting, updating
           descriptions, storing/linking
           LRs
        § Provider support services
           (helpdesks, forum, knowledge
           base)
        § Each repository maintains an
           inventory with all LRs MD,
           exports MD for harvesting
        § Harvested MD are stored in
http://www.meta-net.eu                      27
           synchronised central servers
META-SHARE user side

 Users (LR consumers) can
     § search the central inventory
     § browse using multiple facets
     § access the actual resources by
       visiting the respective repositories
       to get legally interoperable
       licence(s) to download and use
       them
     § get support through an online
       user forum and helpdesks
       dedicated to technical,
       metadata and legal issues
     § access a knowledge base



http://www.meta-net.eu                        28
Join META-SHARE as ...


                                  Third Party Consumers

                    Associate members

                                     Depositing-only Members
                     Repository Service Providers
                                                    Local
    Hosting              Core and User Support      repositories
    (non-local)          Service Providers
     repositories
Legal provisions for LR
sharing
 Language Resources Sharing Charter – high level principles

 Memorandum of Understanding – aka membership agreement

 Licensing templates and deposition agreements
      § Inclusive mix of open and openness inspired models
            - Creative Commons licences (starting with Creative Commons Zero (CC-0)
              and all possible combinations along the CC differentiation of rights of use)
            - META-SHARE Commons licences, fully developed CC-based licensing
              tool that allows META-SHARE members to make their resources available
              inside the network only
            - META-SHARE “No Redistribution” licences, allowing use and
              exploitation of the Resources while permitting the LR Owner to have full
              control over the Resource distribution.
            - Software tools and web services are either provided though one of the
              standard Open Source licenses or under a custom commercial license.
http://www.meta-net.eu                                                                 30
META-SHARE legal features

 Rights based on type of use rather than type of user

 Differentiation along the following axes

      §   Attribution or No Attribution
      §   Open – share with everybody or within the network only
      §   Redistribution vs. No Redistribution
      §   Commercial – non commercial
      §   Derivative vs. Non-Derivative
      §   Share alike
            - Re-deposition of derivatives , as a soft norm in the membership agreement,
              to act as a driver for collaborative LR building




http://www.meta-net.eu                                                                31
META-SHARE today…

 A network of 24 language resources repositories in 19 EU
  countries, with >1550 LRs

 META-SHARE software, open source, under a permissive licence
  (BSD), to set up a language resource repository

 Legal instruments catering for a range of uses

 Software-based services for both LR providers and LR consumers

 User support services
     § User Forum
     § helpdesks

 Mapping services to big resource inventories (CLARIN, OLAC, …)
http://www.meta-net.eu                                             32
In the immediate future…

 More META-SHARE nodes and respective language resources will be
  integrated – integration of ELRA supported initiatives, LRE Map,
  Language Library

 Adoption of the META-SHARE platform and framework by ELRA

 Full deployment of the services of META-SHARE members – from
  software availability, maintenance and technical assistance to
  language resources storage and preservation as well as support
  related to metadata and legal issues

 Coordination with upcoming initiatives (iCordi, Research Data
  Alliance, …)

 Official launch : 25 January 2013
http://www.meta-net.eu                                          33
   META-NET

   Conclusions


http://www.meta-net.eu   34
Conclusions

 Our white paper press campaign shows that Europe is extremely
  interested in and passionate about its languages.
 Two Parliamentary Questions in the European Parliament on the
  “digital extinction of languages” topic.
 Now is the time to move forward with a continent-wide, systematic
  push and to invest in strategic research.
 A modest investment is required.
 This push will generate a countless number of opportunities.
 Horizon 2020 and Connecting Europe Facility can provide sufficient
  resources to make our visions for Europe’s citizens and economy a
  reality.


http://www.meta-net.eu                                                35
http://www.meta-net.eu   36
Q/A




Thank you very much!

office@meta-net.eu

http://www.meta-net.eu
http://www.facebook.com/META.Alliance

                                        37

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:1
posted:1/16/2014
language:English
pages:37