Content New DoBeS Teams by dfgh4bnmu


									LAN Vol. 1, Nr. 2 April 2004        ISSN: 1573-4315              Contact:       
                            Editors: Hennie Brugman, Romuald Skiba (responsible), Peter Wittenburg

 New DoBeS Teams
           Loretta O’Connor & Peter Kröfges - Lowland Chontal of Oaxaca                                1
           Dagmar Jung - Beaver knowledge systems: documentation of a
                 Canadian First Nation language from a placenames' perspective                         2
           Frank Seifart, Nikolaus Himmelmann, Doris Fagua, Jürg Gasché and
                 Edmundo Pereira - Documenting the Languages of the People of
                 the Center, Especially Bora and Ocaina (North West Amazon)                            2
           Anna Margetts - Towards the documentation of Saliba/Logea an
                 endangered language of Papua New Guinea                                               2

 New developments
           Herbert Baumann, Reiner Dirksmeyer, Peter Wittenburg - Long-Term
                 Archiving                                                                             3
           Daniel Broeder, Freddy Offenga - IMDI Metadata Set 3.0                                      3
           Hennie Brugman – ELAN Releases 2.0.2 and 2.1                                                4
           Romuald Skiba, Florian Wittenburg & Paul Trilsbeek - New DOBES web
                 site: contents & functions.                                                           4

 News in brief
           Andreas Claus - Access Management System                                                    5
           Jost Gippert - DoBeS conference and summer school                                           5
           Asifa Majid – Data elicitation methods                                                      6
           Paul Trilsbeek - DOBES Training Course                                                      6
           Peter Wittenburg - Training Course in Lithuania                                             6
           Peter Wittenburg - New Person at Archiving Team                                             6

                                                                ethnic groups in ancient Mesoamerica. After the
New DoBeS Teams                                                 Spanish conquest, the Oaxaca Chontales were
                                                                long perceived as barbaric cave-dwellers, a
                                                                stereotype that hampered anthropological and
Loretta O’Connor & Peter Kröfges - Lowland
                                                                linguistic scholarship. In many respects, the area
Chontal of Oaxaca
                                                                still represents a land of strangers.
The Volkswagen Foundation has funded a three-
                                                                This project builds on data and analysis begun
year project to document Lowland Chontal of
                                                                during the investigators’ doctoral research. Primary
Oaxaca, an unclassified and highly endangered
                                                                linguistic results will include a series of
language spoken near the Pacific coast of
                                                                thematically-based Chontal-Spanish dictionaries, a
southern Mexico. Principal investigators are
                                                                comprehensive grammatical description of the
linguist Loretta O’Connor (University of California,
                                                                language, and an archive of digitized recordings
Santa Barbara, and Max Planck Institute for
                                                                and annotated texts. The anthropological
Psycholinguistics, Nijmegen) and anthropologist
                                                                component will focus on the documentation of local
Peter Kröfges (State University of New York,
                                                                knowledge of landmarks, settlements and territorial
Albany), with project director Prof. Dr. Ortwin
                                                                boundaries, soil classification and agriculture, and
Smailus (University of Hamburg).
                                                                sacred sites and religious practices. Members of
                                                                the Chontal community will participate in all
The ethnic designation ‘Chontal’ derives from the
                                                                activities and will share in the results.
Nahuatl term Chontalli, meaning ‘stranger’, which
the Aztecs used to refer to various unfamiliar
    Dagmar Jung - Beaver knowledge systems:               each major type of communicative event, including
    documentation of a Canadian First Nation              formal and informal discourses as well as drum
    language from a placenames' perspective               communication. In addition, specimens for the
                                                          moribund language Resígaro (three native
    Beaver is an endangered Northern Athabaskan           speakers), will be included, as well as old audio
    language spoken in several communities in             recordings for Witoto which document types of
    Canadian British Columbia and Alberta. The            ritual speech no longer practiced today. Taken
    present number of speakers is estimated by our        together, this data set will be a representative
    team as ca. 60-80 fluent speakers in British          documentation of the linguistic and cultural
    Columbia , and ca. 20-30 fluent speakers in           practices of the People of the Center as a whole.
    Alberta. The documentation aspect of our study
    focuses on narratives of place, thereby creating a
    'conceptual map' of the Beaver territory that has     Anna Margetts - Towards the documentation of
    been defined by a traditional hunter-gatherer         Saliba/Logea an endangered language of
    society. The heavy textual component provides         Papua New Guinea
    the opportunity to collect 'rich' data, i.e.
    contextual data relating the significance of places   Saliba and Logea are two closely related dialects
    to individuals and the overall community.             spoken on neighboring islands in Milne Bay
    The core team members: Dagmar Jung                    Province, Papua New Guinea. The estimated
    (assistant professor at the University of Cologne,    number of speakers is 2,500. The dialects belong
    Linguistics) has a background of research and         the Papuan Tip Cluster of the Western Oceanic
    fieldwork on Southern Athabaskan languages,           language group.
    especially Jicarilla Apache. Julia Miller (PhD
    student with Sharon Hargus at the University of       Given that the community of speakers is
    Washington in Seattle) started to do phonetic         traditionally small, Saliba/Logea must be
    research on Beaver tone two years ago. Olga           considered highly endangered as English is
    Müller (PhD student at the University of Cologne)     encroaching on many aspects of daily life. While
    works currently as a research assistant on a          the degree of endangerment is serious, the
    dictionary project of Tanacross Athabaskan.           documentation capacity is still very good. The
    Patrick    Moore      (assistant     professor   of   Saliba and Logea people are continuing to lead a
    Anthropology at the University of British             traditional life of fishing and subsistence farming
    Columbia)       has      extensive       experience   and it is still possible to work with the last
    documenting Kaska, one of the neighbouring            generation of old speakers who have essentially
    Athabaskan languages.                                 no knowledge of English, as well as with children
                                                          who are growing up as monolingual speakers, at
                                                          least in the first few years of their life.
    Frank Seifart, Nikolaus Himmelmann, Doris
    Fagua, Jürg Gasché & Edmundo Pereira -                The languages of the Papuan Tip Cluster are of
    Documenting the Languages of the People of            special typological interest as they show features
    the Center, Especially Bora and Ocaina (North         not found elsewhere in the Oceanic language
    West Amazon)                                          group. Some of these features may be explained
                                                          by early contact with Papuan languages.
    This project aims at documenting the endangered
    languages of the People of the Center, a              The project aims at a multimodal documentation of
    culturally relatively uniform, but linguistically     the language in its cultural context. The main
    diverse group in the Peruvian part of the North       investigator will be Anna Margetts (Monash
    West Amazon. Speaking seven mutually                  University) who wrote her Ph.D. thesis on aspects
    unintelligible languages, the People of the Center    of Saliba grammar. The German host of the project
    are characterized by some unique cultural             will be Ulrike Mosel at the University of Kiel.
    practices, including completely memorized ritual
    discourses that may last up to three hours,           The team also includes John Hajek from Melbourne University
    repertoires of thousands of songs performed at        who will work on phonetics and phonology, Rhys Gardner
    festivals, as well as efficient systems of drum       from the Auckland Museum working on ethnobotany and
    communication that build on the structures of the     Andrew Margetts documenting the building and use of sailing
    individual languages. Two of the seven                canoes. The team is also seeking a German Ph.D. student in
    languages, Bora and Ocaina, will be the subject       Linguistics to join in the documentation project.
    of       exemplary       and       comprehensive
    documentations, consisting of fully annotated
    video recordings of a representative sample of

2   Language Archive Newsletter Vol.1, Nr. 2
                                                       storage support for their data, since many
                                                       disciplines share this fundamental problem.
New developments
                                                       In the committees dealing with this question of
                                                       long-term preservation it is consensus that
Herbert Baumann, Reiner Dirksmeyer, Peter              guaranteeing the interpretation of the bit-streams is
Wittenburg - Long-Term Archiving                       a task of the community and not the centers.
                                                       Adherence to open standards and organizational,
                                                       encoding and format coherence will be relevant
It was reported frequently that two aspects are        criteria to determine the chance that the data will
important to increase the chance of a survival of      be     migrated    in   time   to    state-of-the-art
the bit-stream representations of the material we      representation standards.
are storing about languages and music traditions
that will be extinct soon. (1) The data has to be
migrated frequently to guarantee that state-of-
                                                       Daniel Broeder, Freddy Offenga - IMDI Metadata
the-art storage media are used that are fully          Set 3.0
supported by hardware and software. (2) The
data has to be copied and distributed to cope
with all kinds of risks – even political ones - that   Based on the experiences and on a broad
                                                       discussion process including field linguists, corpus
could destroy the storage media used. The MPI
                                                       linguists and language engineers, the IMDI set 3.0
team has finished its activity to have at least 5
                                                       [1] was designed as part of the INTERA and
copies of the DOBES data. Two copies are               DOBES projects and is available as an XML-
automatically created in the MPI storage system        Schema. It was adapted to simplify the content
(RAID Disk Array and Tape Library). A third copy       description and the artificial distinction between
is stored on a standard PC system having a large       collectors and other participants - probably
RAID Disk Array that is within the control of the      influenced by Dublin Core - was removed.
MPI team, but located in another building.             Three major extensions were applied: First, it is
A fourth copy is transferred to the computer           now possible to describe written resources that are
center of the Max-Planck-Society in Göttingen          not annotations or descriptions. This was
(GWDG) by using the RSYNC protocol provided            necessary, since most language collections
for example in standard UNIX systems. The              contain written resources in the form of field notes,
transfer is initiated by the GWDG, the protocol is     sketch grammars, phoneme descriptions and
efficient but lacks modern encryption capabilities.    others more. Second, as a consequence of long
To achieve the full transmission speed of 5            discussions with participants of the MILE lexicon
MByte/sec five sessions are started in parallel. A     initiative, it is now possible to describe lexicons
fifth copy was generated in the mean time at the       with a specialized set of descriptor elements.
other computer center of the Max-Planck-Society        Third, it is now possible to define and add project-
in Munich (RZG). Here the well-known Andrew            specific profiles. In the earlier version IMDI
File System (AFS) is used as protocol. At the MPI      supported already the possibility of extensions at
an AFS client was installed that establishes           various levels in the form of user defined category–
connections with the AFS server in Munich, i.e.,       value pairs, i.e., the user was able to define a
                                                       private category and associate values with it. This
the transfer is initiated by the archivist. AFS
                                                       feature was used by individuals and also projects
makes use of state-of-the-art authentication and
                                                       to include special descriptors, however, these
encryption. Also here several channels are             descriptors were not fully supported by the IMDI
opened in parallel to achieve the full 2.5             tools. In the new version projects or sub-domains
MByte/sec exchange speed.                              such as the Dutch Spoken Corpus respectively the
                                                       Sign Language community can define a set of
Both procedures guarantee that at regular              important categories and these are supported
intervals the changes in our DOBES archive are         while editing or searching.
synchronized with the two computer centers. At         Therefore, IMDI exists of its core definitions that
both centers, GWDG and RZG, local strategies           have to be stable to assure users that their work
are applied to maintain several copies of all          will be exploitable even after many years and of
stored data in different buildings, i.e., all DOBES    sub-community        specific    extensions,   which
data is now stored in at least 7 different storage     nevertheless are result of discussion processes.
systems. The DOBES archivist sees it as an             [1] Detailed description of the IMDI 3.0 metadata
advantage that two different protocols are applied     elements:
and that the two centers use different storage
technologies. Currently, the Max-Planck-Society        .0.4.pdf
discusses at a high level what kind of guarantees      [2] IMDI Web-site:
can be given to the institutions for long-term

                                                                            Language Archive Newsletter Vol.1, Nr. 2   3
    Hennie Brugman – ELAN Releases 2.0.2 and                   •  New Unicode input methods for Korean,
    2.1                                                         Georgian and Turkish.
                                                               • Preferences are now stored between Elan
    The version 2 is a major upgrade. Elan’s viewer             working sessions. These are both preferences
    and media handling internals are completely re-             for Elan (like last used directories for eaf files,
    engineered, as is the handling of user                      media files, shoebox type files) and preferences
    commands. The user interface is completely                  for individual documents (like media time,
    redesigned, including shortcut keys.                        selection, active tier, etcetera).
                                                               • Even if media files for some eaf file are
    Main new features and changes:                              completely missing the document can still be
    • All viewers for one annotation document are               opened for inspection and modification.
     now shown in one document window. The                     • A ‘shift’ mode is added to help alignment of
     video panel can be detached into a second                  imported data. Unlike the already existing
     window. This can for example be useful to                  ‘bulldozer’ mode gaps between annotations are
     display MPEG-2 video on a separate monitor.                maintained.
    • Several new and/or revised viewers (for
     details see:          Romuald Skiba, Florian Wittenburg & Paul
    • ‘Save As’ is now supported.                              Trilsbeek - New DOBES web site: contents &
    • Time selections are now made or modified in              functions.
     a completely new way. Next to dragging and
     shift-click in the time line viewer or wave form          The new DOBES web site combines the
     panel Elan now has a special ‘selection mode’:            information that was available on the old site with
     all time navigation and playback buttons modify           an adaptation of the DOBES-DEMO that was
     either the begin or the end of the selection              created for the VW-endorsed exposition “Science +
     when in selection mode.                                   fiction”. The latter part is created in such a way that
    • Two time-synchronized video panels are                   it is informative for the general public and not only
     supported now. The user can specify the begin             for specialists.
     time for each of the two separately.
    • Media files do not have to have the same                 The layout of the site allows for navigation in
     name as the matching .eaf file anymore, and               different ways. On the left side you find a
     do not have to be in the same directory either.           traditional navigation panel for accessing different
     When files can not be found at the locations              parts of the site quickly, e.g. using the Site Map.
     stored in the .eaf file, first the eaf file’s directory   The main section on the right side starts off with a
     is checked, then the user is prompted to                  graphics based presentation that is intended for
     specify a location.                                       exploration rather than quick access. By slowly
    • Elan’s user interface can be localized on the            moving over the interactive dots, which are marked
     fly. Currently supported languages are English            white, different topics of the site can be accessed.
     and Dutch. It is now easy and straightforward
     to support other languages. Volunteers for                The main page has three sections: Documentation,
     translation of English user interface texts to            Endangerment and Languages. Each section is
     some other language are welcome.                          again subdivided in a number of topics:
    • Formats (for details see:          Languages
                                                               Under this section you can find information on the
    • Time accuracy: all times in all viewers are
                                                               following topics:
     correct, and synchronized at all times. There is
                                                               - projects: contains links to the websites of the
     one annoying issue that can not be fixed on the
                                                                 individual DOBES projects
     short term: when playing a time selection,
                                                               - data-types: gives an overview about the different
     playback of the video continues a few frames
                                                                 sorts of data contained in the data base (arts &
     after the end of the selection. How much
                                                                 handcraft, religion & medicine, dance, music,
     depends on the computer or operating system
                                                                 everyday work, environment)
     running Elan. Right after this ‘overshoot’ the
                                                               - field work locations: shows the worldwide
     media time is set to the exact end time of the
                                                                 location of the places where fieldwork for the
     selection, resulting in a little jump in the video
                                                                 DOBES project is done
     playback. Audio does NOT have this problem.
                                                               - transcripts: contains several examples of modern
    • Support for template documents to make
                                                                 and traditional transcriptions
     reuse of tier setups easier.
                                                               - annotations: contains examples for grammatical
                                                                 analysis and translation of language samples

4   Language Archive Newsletter Vol.1, Nr. 2
  including direct links to the underlying media        The AMS can be accessed by clicking on the "set
  (videos)                                              access rights" link at the URL
- meta data: illustrates among other things how
  the IMDI Editor works (a tool for entering and        As of this release only the project coordinators
  organizing metadata)                                  have accounts. The coordinators (definers) can
                                                        create groups and accounts. There are two kinds
Endangerment                                            of accounts - users with read-permit and account
Under the section Endangerment you can find             managers (definers). Account managers can have
some      explanations    about     reasons    of       the same rights as the project coordinators
endangerment       (e.g.   death     of    speech       themselves: they can create accounts, groups and
communities, religious education, cultural              rules for the ARM.
dominance, industrialization, social reputation).       Optionally the account managers can associate an
Selected quotations from David Christal’s book          acceptance declaration that pertains to the data in
“Language death” are presented. The following           the archive. All users must agree to this
submenu points are accessible:                          declaration the first time they log in. The inclusion
- endangerment                                          of the acceptance declaration is the first step
- revitalization                                        towards a more elaborated AMS in the second
The crucial points of revitalization are: shaping       version.
awareness and the creation of a positive image          We also see the need that users should have the
of the language, availibility of material on the        possibility to enter feedback to the results of the
internet and using of new technologies, culture         usage (e.g. references).
specific teaching methods, teaching material            The resources are by default not accessible to
(examples from DOBES are given), regional               everybody. It is possible that they can be made
centers for language instruction and minority           accessible to a certain group or to the world.
rights.                                                 Access can be defined for all video-, audio-,
                                                        image-, info- and annotation-files which are linked
Documentation                                           to the metadata. You can define different rights for
Under the section Documentation you can find            each of these types of data. By default only the
information on the following topics:                    metadata files are accessible to all.
- goals (e.g. scientific analysis, archiving,           The access rights are hierarchically organised. A
     material for teaching)                             change at a higher point in the corpus structure will
- stages shows the different stages that a              be handed down to the ‘child records’.
     prototypical piece of recorded data has to
     pass: recording - digitization - editing -
     metadescription - annotation - integration.        Jost Gippert - DoBeS conference and summer
- tools (for some of the steps mentioned under          school
- archive (illustrates how the data are organized       The Volkswagen Foundation has confirmed the
     in the archive, what is done for the security of   funding of both the conference on "A World of
     the data etc.).                                    Many Voices" (Frankfurt, Sep. 4-5th, 2004) and the
At the moment the site is written in such a way         summer school on Language documentation
that it works well with Internet Explore on             (Frankfurt, Sep. 1-11th, 2004).
Windows, with Windows Media Player to play the          The ten-day summer school is intended to
audio and video files. You can still view the site      introduce promising students (max. 50 persons) of
with other browsers and operating systems, but          linguistics and adjacent disciplines (ethnology,
some things may not work. We will try to make           anthropology, African Studies, Asian Studies, etc.)
the site more platform and browser independent          into the aims, objectives and methods of fieldwork
in future versions.                                     with a view to the documentation of endangered
We hope you will enjoy using the site!                  languages. The participants will be taught and                                trained by members of the DoBeS programme and
                                                        other internationally renowned specialists. The
News in brief                                           teaching will be undertaken in form of lectures,
                                                        lecture tutorials, and seminars; the application of
                                                        fieldwork methods will be trained in fieldwork
                                                        tutorials. Please note that the deadline for
Andreas Claus - Access Management System
                                                        applications is May 15th, 2004.
                                                        More details under:
We have released the first version of the Access
Management System (AMS) for the Corpora
housed by the Max-Planck-Institute in Nijmegen.

                                                                                Language Archive Newsletter Vol.1, Nr. 2   5
    Asifa Majid – Data elicitation methods                       Peter Wittenburg - Training Course in Lithuania

    The Language & Cognition group of the Max-                   Due to his experience gathered within the DOBES
    Planck Institute for Psycholinguistics is involved           and ECHO projects Peter Wittenburg was invited
    in language documentation, i.e., describing                  by the UNESCO to carry out a 5 days workshop
    previously under-described languages; linguistic             about “Digital Archiving” for the major cultural
    typology, i.e., establishing how similar and                 heritage institutions in Vilnius (Lithuania) together
    different languages are from one another; and                with a colleague from Lund University. The
    investigating the relationship between language              members of the various institutions participated
    and thought. To this end, the group maintains                with great enthusiasm and the workshop mutated
    about a dozen fieldsites around the world at any             to an interactive seminar about ongoing
    one time in which research can be conducted in a             developments. The program was modified almost
    sustained way, using a full range of                         every day to fit with the expectations of the
    anthropological, linguistic and psychological                participants as closely as possible. The major
    methods.                                                     topics were metadata, metadata interoperability,
    In order to conduct comparative research, the                archiving      standards,      container      models,
    Language & Cognition group publishes a field                 architectures for long-term preservation of digital
    manual annually. The field manual consists of a              data, the difference between presentation and
    series of tasks to help researchers in different             representation formats and management issues.
    fieldsites to collect data in a standardised way.            The final agenda of the workshop that took place
    The tasks belong to one of the core projects of              at the Lithuanian Folklore Center is available under
    the group, such as Space, Event Representation,    
    or Multimodal Interaction. Each task addresses a
    specific research question about language                    Most of the presentations were developed online
    documentation, linguistic typology or the                    on flip charts, however, and are now owned by the
    relationship between language and thought. For               Folkcenter in Vilnius that took care of an excellent
    further details please contact the Language &                and creative environment and atmosphere.
    Cognition group, and see the website              Peter Wittenburg - New Person at the Archiving
    LAC/index.html                                               Team

                                                                 The MPI team in DOBES realized that more
    Paul Trilsbeek - DOBES Training Course 2004                  conversions will have to be carried out to come to
                                                                 a coherent archive of language resources. In
    A new training course is scheduled for the                   particular in the area of textual material, people are
    second week of may (10 to 14 May). This course               obviously using different tools and mixing various
    is devoted to very practical matters as they are             character sets. All this material has to be
    relevant to keep the documentation work within               converted to proper XML and in the case of
    DOBES at a maximum level of coherence.                       annotations to EAF. To better cope with these
    Therefore, it is dedicated to participants that              needs Paul Trilsbeek was integrated into the team.
    primarily come from existing and new DOBES                   He will take care of technical archive matters and
    teams. In contrast to the DOBES summer-school                will interact with DOBES members about these
    that is directed to a broader scope of topics                aspects. Paul has a musical background and will
    relevant for the documentation work and to                   also become active in the ethnomusicology
    interested young people, the coming training                 working group.
    course has to cover topics such as the concrete
    agreements within DOBES and the necessary
    workflow aspects as well. We invite everyone to
    comment on the suggested schedule (see

                                       Send contributions for the next
                                           issue to:

                                           before June 31, 2004

    Language Archive Newsletter Vol.1, Nr. 2

To top