CLARIN Overview - CLARIN-ES

W
Shared by: yaoyufang
Categories
Tags
-
Stats
views:
1
posted:
2/13/2013
language:
English
pages:
29
Document Sample
scope of work template
							          CLARIN:
   The common language
 resources and technology
       infrastructure
               Steven Krauwer
            CLARIN Coordinator
Utrecht institute of Linguistics UiL-OTS (NL)
Overview
•    Problem & Mission
•    Some why-questions
•    Some who-questions
•    Overall plan
•    What CLARIN is NOT about
•    How we work
•    Funding
•    Structure
•    Where we stand
•    Some dreams
•    To conclude
    Steven Krauwer   CLARIN - Barcelona 06-02-2009   2
The problem
 • Much data in digital archives language based
 • Existence often only known to insiders
 • Archives mostly unconnected, even at the national
   level
 • Every archive has its own standards for storage and
   access
 • Normally only simple retrieval of files (text, audio or
   video documents)
 • Other tools exist but are hard to use for non-specialist
 • Social sciences and humanities researchers are not
   language or speech technologists
 • They are often not aware of the potential benefits of
   using language and speech technology
  Steven Krauwer     CLARIN - Barcelona 06-02-2009        3
The CLARIN Mission
 What:
 • Create an infrastructure that makes language
   resources and technology (LRT),
   available to scholars of all disciplines, especially
   social sciences and humanities (SSH)
 How:
 • Unite existing digital archives into a European
   federation of archives with unified web access
 • Provide existing language and speech technology
   tools as web services operating on language data
   in archives
  Steven Krauwer    CLARIN - Barcelona 06-02-2009     4
Towards strong and
persistent centers




• need to add a persistent infrastructure layer on top of the existing
  landscape which is formed by accidental and temporary collaborations
• should be easily accessible for everyone
• should offer high availability (always on-line) so that people can rely on it
• will be different types of centers dependent on the service
• need strong national support for many years
  Steven Krauwer            CLARIN - Barcelona 06-02-2009                   5
Why a European
infrastructure?
 •   too much fragmentation
 •   lack of coordination across countries
 •   lack of visibility
 •   lack of interoperability
 •   lack of sustainability
 •   expertise exists but not in all countries
 •   language independent tools can be shared
 •   language dependent tools can often be ported
 •   most countries not able to bear the cost

 Steven Krauwer    CLARIN - Barcelona 06-02-2009    6
Why now?
• Exponential growth of digital data
• Increasing maturity of language and speech
  technology:
    – high speed
    – large volumes
    – new research questions
• Growing interest at EU level in Research
  Infrastructures (RI), also for soft sciences
• RI Roadmap published in 2006 by ESFRI
• includes 35 accepted proposals for RIs
• CLARIN is one of them and has EC funding for a
  1-3 year preparatory phase
 Steven Krauwer     CLARIN - Barcelona 06-02-2009   7
Who we are and where we
come from
 • The CLARIN consortium has now 32 partners from
   22 EU and associated countries (and more on the
   waiting list)
 • The CLARIN community has 148 members in 32
   countries (Feb 2009)
 • CLARIN is based on 4 earlier broad European
   initiatives with many participants:
     –   LangWeb
     –   EARL
     –   TELRI
     –   (and later) DAM-LR

  Steven Krauwer       CLARIN - Barcelona 06-02-2009   8
Who else do we need?
 • Both our membership and our consortium are
   quite unbalanced:
     –   Written language technology over-represented
     –   Speech & multimodality under-represented
     –   Humanities other than linguistics under-represented
     –   Social sciences under-represented
     –   Some countries and languages (national and regional)
         still missing
 • There is no money to extend the consortium but
   we have to fill these gaps to ensure balanced
   coverage

 Steven Krauwer        CLARIN - Barcelona 06-02-2009        9
Overall plan for CLARIN
 •       Preparatory phase (2008-2010): Put everything
         in place
 •       Construction phase (2011-2015): Build and
         populate with tools and resources
 •       Exploitation phase (2016-….): CLARIN in full
         service
 •       Budget Prep phase
             •   4.1 M€ from EC
             •   ??? from countries (process still ongoing)
 •       Estimated budget until 2020: ca 200 M€
             •   mostly from national and regional funding agencies
             •   max 20% from EC (not yet formally decided)
     Steven Krauwer            CLARIN - Barcelona 06-02-2009          10
4-dimensional approach
in the preparatory phase
 First 3 years dedicated to the design:
 • The technical dimension
 • The language dimension
 • The user dimension
 • The governance and legal dimension




 Steven Krauwer   CLARIN - Barcelona 06-02-2009   11
Technical
 • Technical specification of the infrastructure
 • Construction of a prototype
 • Validation on rich variety of
     – languages (>20)
     – resources
     – services
 •   Federation of existing archives
 •   Based on existing resources, tools
 •   Strong focus on interoperability standards
 •   Conversion of existing resources
 •   Encapsulation of existing tools

 Steven Krauwer      CLARIN - Barcelona 06-02-2009   12
Languages
 • Cover all languages spoken or studied in
   participating countries, including regional
   languages
 • Representational and descriptive standards
   should be adequate and validated for all
   languages
 • Same minimal coverage of basic resources and
   tools for all (living) languages
 • BLARK (Basic Language Resources Toolkit) to
   be defined and implemented (funds from other
   sources needed)

 Steven Krauwer   CLARIN - Barcelona 06-02-2009   13
Language technology
activities
 Activities during preparatory phase
     – survey of resources and tools, including:
          • encoding and annotation data
          • quality indicators
     – developing taxonomies and ontologies
     – agreeing on common standards
 Focus on
     –   integration of tools
     –   interoperability
     –   usage scenarios
     –   creating missing essential resources
     –   validating specifications and prototype
 Steven Krauwer          CLARIN - Barcelona 06-02-2009   14
User

 • Users are SSH scholars (including linguists,
   translation experts)
 • Do WE know what they need?
 • Do THEY know what they need?
 • Actions:
     – analyze past and ongoing SSH projects
     – user consultation
     – launch typical example projects to show potential (see
       Call for Humanities Projects)
     – expertise centers
     – awareness actions
 Steven Krauwer       CLARIN - Barcelona 06-02-2009        15
Legal and ethical
 IPR and ethical issues
 • aim at open source, but IPR for existing and
   future non-open resources must be
   accommodated
 • federation of archives requires authentication,
   authorization and trust between archives
 • aim at limited number of template license
   agreements for most common cases
 • respect national legislation
 • address ethical issues
 Steven Krauwer    CLARIN - Barcelona 06-02-2009     16
Governance and
Funding
 Agree on e.g.:
 • Who is going to pay for the construction and
   exploitation of the infrastructure
 • How will it be managed
 • How will it be coordinated with national policies
 Actions:
 • Analyse best practice in funding and
   management of transnational projects
 • Prepare agreement between (now) 22 countries
   about long term joint funding of CLARIN

 Steven Krauwer    CLARIN - Barcelona 06-02-2009       17
What CLARIN is NOT
(yet) about
 • building the infrastructure – during this phase we
   are just preparing it
 • creating new resources – at this stage we want to
   use what is there and adapt it if necessary
 • creating new applications – except maybe some
   essential tools or demonstrators
 • focusing on the big languages – we find all
   languages equally important
 • strengthening European industry – our target
   audience are SSH researchers, but we don’t want
   to exclude anyone
 Steven Krauwer    CLARIN - Barcelona 06-02-2009   18
How we work (1)
 Work packages:
 • WP1: Management and coordination
 • WP2: Designing the infrastructure and building a
   prototype
 • WP3: Humanities overview
 • WP5: Language resources and technology
   overview
 • WP6: Dissemination
 • WP7: IPR and business models
 • WP8: Construction and exploitation agreement

 Steven Krauwer   CLARIN - Barcelona 06-02-2009   19
How we work (2)


                                        WP8
                                     Org&Legal
                                     Framework
                          5

                                             1
                 WP7                                      8
             IPR, A&A,        4
              licensing                   WP2
                                     Infrastructure
                                       Prototype
                   3                                  6
                              2
              WP5                                               WP3
                                         7                    Humanities
               LRT
            Exploration                                        Projects




 Steven Krauwer                   CLARIN - Barcelona 06-02-2009            20
How we work (3)
 • Most tasks executed in Working Groups (WGs)
 • WGs consist of project partners & other experts
   (CLARIN is open!)
 • Some WGs do work (e.g. build prototype),
   others collect data or create consensus
 • Participation by others essential as e.g.
   standards cannot be imposed
   by a small group
 • Unfortunately no EC funding available for WG
   participation – only reward is influence!
 Steven Krauwer   CLARIN - Barcelona 06-02-2009      21
Funding &
what to use it for
 • From EC: 4.1 M€, used for generic, language independent
   tasks
 • From countries: ??? M€, to be used for preparing CLARIN
   at the national or regional level in every country:
    – build and organize local national CLARIN communities
    – support for participation in working groups (e.g. travel)
    – validation tasks for own language(s)
    – creation or adaptation of essential resources
    – pilots and demonstrators & humanities projects
    – (co-)organisation of local or international events
    – preparing for future role (expertise centers,
       repositories)

  Steven Krauwer      CLARIN - Barcelona 06-02-2009          22
Structure
 • Executive Board, consisting of the 7 WP leaders
   plus a special representative to liaise with the
   humanities community (a.o. through the DARIAH
   sister project)
 • Boards:
    – Scientific Board
    – Strategic Coordination Board
    – International Advisory Board
 • Meetings (virtual or face to face):
    – Consortium meetings
    – Member meetings
    – Working group meetings

 Steven Krauwer      CLARIN - Barcelona 06-02-2009   23
Where we stand
 • We have just finished the 1st year (still 2 to go)
 • Various working groups have been set up and are
   already active – but you can still join:
   http://www.clarin.eu/join-a-working-group
 • We have regular workshops on various topics:
   see http://www.clarin.eu/all_events
 • Public documents are published on
   http://www.clarin.eu/documents
 • We have just launched a Call for Humanities
   Projects http://www.clarin.eu/wp3/wp3-
   documents/call_final-version
 Steven Krauwer    CLARIN - Barcelona 06-02-2009   24
Our dreams
An example:
   – Ethnologists have a recording of a dance with singing,
     and a transcription; they want to search for certain
     textual patterns and then return to the corresponding
     recorded dance fragments
   – For a 3 minutes recording no problem
   – 30 minutes might just be doable
   – … but what about 3, 30 or 300 hours of video?
   – To do this and to save time they would need to align
     media and transcriptions
   – There are “aligner tools”
   – But who is able to use them and will they work on the
     transcription format?
 Steven Krauwer      CLARIN - Barcelona 06-02-2009            25
… more dreams …
Another example:
• Historians want to access all material from physics, politics
  and sociology to understand the reasons for the marine
  dominance of the Serene Republic of Venice
• to do this they need to search for concepts in all material,
  extract summaries, relate fragments, add and exchange
  comments etc
• they need to do this collaboratively
• currently this involves a huge amount of handwork to
  overcome institutional, linguistic (morphological
  normalization, translation), semantic boundaries
• but who is able to carry out such work, who can operate the
  tools
 Steven Krauwer       CLARIN - Barcelona 06-02-2009         26
… and more
 One day any SSH scholar should be able to
   ask without any difficulty:
 • “List all uses of enthusiasm in 19th century
   English novels written by women”
 • “Find all video clips of Prince Charles
   talking about architecture in 2007”
 • “Summarize the inaugural speech of
   Obama - in Catalan”

 Steven Krauwer   CLARIN - Barcelona 06-02-2009   27
To conclude (1)
 • CLARIN is a long term endeavour with lots of
   challenges of very different types
 • For the medium and longer term I see the
   following main challenges (where we could really
   fail):
     – Agreeing on standards and actually using them
     – Persuading users to formulate requirements and to use
       the infrastructure
     – Making the CLARIN infrastructure resistent to
       technological developments
     – Securing long term funding
 • In CLARIN there is room for all languages
 • If it succeeds it will give a boost to SSH research
 Steven Krauwer      CLARIN - Barcelona 06-02-2009        28
To conclude (2)
 More information:
 • CLARIN Website: http://www.clarin.eu
 • CLARIN Office: clarin@clarin.eu
 • CLARIN Newsletter (issue 4 just out):
   http://www.clarin.eu/newsletter
 • CLARIN Members & how to join:
   http://www.clarin.eu/members

                        Thanks!
 Steven Krauwer   CLARIN - Barcelona 06-02-2009   29

						
Related docs
Other docs by yaoyufang
FAQs Contactors
Views: 22  |  Downloads: 0
The DIRECTV Group_ Inc
Views: 328  |  Downloads: 1
GM Korea’s Roadside Assistance
Views: 5  |  Downloads: 0
REGULAR COUNCIL MEETING A G E N D A
Views: 1  |  Downloads: 0
Music Listening Today Chapter 27-28 Questions
Views: 4  |  Downloads: 0
CORPUS CHRISTI RECTIFIER SEMINAR (DOC)
Views: 8  |  Downloads: 0
801
Views: 8  |  Downloads: 0
Spring Fling Basket Themes
Views: 10  |  Downloads: 0
Northern Arizona Behavioral Health Authority
Views: 2  |  Downloads: 0