    Exploring and Enriching a LR Archive
    via the Web

                       Marc Kemps-Snijders,
                       Alex Klassmann, Claus Zinn,
                       Peter Berck, Albert Russel,
                       Peter Wittenburg
                       MPI for Psycholinguistics
                       DOBES Endangered Languages Project
    What is a digital archive?

    Two essential dimensions
       • Long-term Preservation of all resources and relations
       • Accessibility and Interpretability
Why preserve?
• face the loss of our cultural memory on electronic media
       UNESCO: 80% of the recordings about languages and cultures
                are highly endangered

There are no guarantees for preservation but we can increase chances of survival
      • store everything in a well-organized repository
      • take care of redundancy, migration and curation on various dimensions
      • establish organizations that take responsibility

          Digital Archives are living Entities!
Live Archives Concept: allow enrichments (standoff), relations etc
         What is in MPI’s archive?

• Endangered Language Documentation resources
    –    Representative record of a language in its cultural context
    –    Crucial is the active involvement of the community
    –    May help in maintaining and revitalizing languages
    –    Therefore: trend towards complementing linguistic information with
         ontological one in collaborative spaces

• Child language, bilingualism, gesture, sign language, corpus
  spoken Dutch, sound corpora, second learner corpora, etc.

Mostly annotated audio/video recordings
   30 Terabyte, 53.000 AV resources, 24.000 annotation files,
   60 Mio annotations, lexicons, sketch grammars, etc.

All from a large number of depositors
     DOBES Languages

40 language teams from the DOBES program documenting about
 60 languages and working independently
          Language Archiving Technology

   Shoebox/CHAT           Annotation + Lexicon
       XML                        IMDI
                        Data Organization, Metadata        integration

                      Data Uploading and Management   Archive Grid
                            Access Management          Federation
LAT to support
 operations during     Data Archiving and Copying
 resource life-time
                               IMDI / GIS                   utilization
support standards     Metadata Browsing & Searching

where possible           ANNEX/LEXUS/IMEX/
                         Complex Access via Web
                           ADDIT/VICOS/MEL             framework
             LAT Dimensions: Management & Upload

                                   • take care of consistency
                                   • check uploaded formats
                                   • convert where possible
                                   • create presentation formats
                 resources         • create indexes
repository       metadata          • allow access rights definition
                                   • add unique & persistent IDs
                                   • take care of distribution

                                   • basis is a robust repository
                                     system with reliable mechanisms

                metadata editing
    LAT Dimensions: Complex Access

                                     • access to annotated
                                       media or multimedia
                                     • callable via any other
                                       web application
         LAT Dimensions: Customized views

• fostering the creation of special web-sites by REST interfaces and templates
• fostering GIS presentations by special converters
    Who are our users?

     Stakeholder       Interest
     archivist         easy management, easy discovery, consistency,
                       statistics, versioning, ..
     researchers       easy visualization, easy discovery, virtual
                       collections, extensions, permissions, ..
     communities       semantic exploration, extensions, permissions, ..
     journalists       appetizers, easy inspection, ..
     students          curiosity, navigation, inspection, ..

      Still in a download first paradigm – not cyberinfrastructure usage
          (result of an ESF/NSF workshop)
      ‘Download first…’ problems and disadvantages

• Tool and format updates are propagated to users at a slow rate
    ’legacy’ formats offered to archives pose an increasing
     burden on archives or tool builders (conversion/migration)
    New techniques slowly spread through the community

• Orchestration between tools becomes much more difficult if not
• Users need to install tools locally

        Can we provide more incentives on the tools side?
          How to extend LAT?
• Paper dictionaries’ limited usefulness in language maintenance &
   language revival (Manning et al., 2000)
• “Linear” lexicons not at all interesting except for linguists
• Speech community may prefer explicit semantic acces and links, possibly
   of a wide variety of types (i.e. beyond formal systems)
• Semantic view not limited to lexicons, but should include all fragments

Therefore, introduction of conceptual spaces, where concepts are
 related to others anchored in language illustrated with multimedia

Extension of LAT with ADDIT and VICOS
 towards cyberspace paradigm
          relations between arbitrary fragments
          VICOS (Visualizing Conceptual Spaces):
          relations within and across lexicons
          and easy visualization

make VICOS a collaborative tool
          ADDIT: Commentary & Relations

• allow authorized people to make arbitrary comments on and relations between
  object fragments
• visualize them in tools and via VICOS
      VICOS: Lexical relations & navigation

• Allow users to create relations within and across lexicons
  across: cognate sets etc
• Visualize and allow easy navigation in conceptual spaces
• Empower community members to actively describe their L&C
  and to learn from such resources
   – Decide which words offer key access to cultural concepts
   – Technology needed to link words (and the associations they
     evoke) to other words and to all sorts of relevant fragments
• Conceptual Spaces = informal ontology of fuzzily-defined
  concepts and relationships
       • But where “concepts” are anchored in corresponding
         formal lexicon entries
        Team and Acknowledgements

                                 LAT Team
                                 • System Managers
                                 • Archive Managers & Digitization
                                 • Software Developers

The work was funded by the VolkswagenFoundation, the European
Commission, the Dutch Science Organization, the Dutch Institute for
Lexicology, the Max Planck Society and the Max Planck Institute for

