Web software for participant-driven language archiving
Existing language archives have made a wealth of endangered language
materials available to scholars, community members, and the general public. Materials
are uploaded, managed, and controlled by data depositors, following from relationships
negotiated between depositors and archives. This model, implemented by archives such
as ELAR and DOBES, highlights and privileges the role of the data depositor. But there
are other important roles in language documentation projects. In this paper, I report on
work in progress to create web software highlighting the role of participants in language
The paper uses two language documentation projects as case studies: the
Tibetan and Himalayan Library's (THL) corpus of Tibetan language learning videos, and
UCLA's Bolivian Quechua corpus. Both sets of documentation consist of a corpus of
video recordings along with associated ELAN annotation files. Not only does this
represent a common type of language documentation project, but it also presents a
genuine opportunity for participant-driven language archiving, given that speaker
participants are encoded as distinct data tiers in the ELAN annotations.
The web software consists of several interacting components, from a back-end
search index to a front-end user authentication and management system. However, the
main tool to be demonstrated is the Speech Bubble Player. This online video player
allows users to view ELAN annotations as superimposed speech bubbles on a video.
Each bubble is positioned next to a participant, and can be turned on and off by clicking
on the participant.
Far from being a comic-book novelty, the Speech Bubble Player facilitates the
establishment of a direct link between archive users and participants. Instead of relying
on object-level metadata to retrieve information about the various participants in a
recording, a user can go directly to a participant's archive page by clicking on the
If a speaker participant has an account with the archive, then the Speech Bubble
Player and associated tools provide even greater potential, by opening a communication
channel between users of the archive and that participant. This is made possible by an
additional software component that allows an archive user to assert identity with a
participant in the corpus.
In the long run, treating participants as participants opens many doors. For
example, it might help language consultants to gain additional consultancy work from
new linguists, thus helping language documentation projects to "give back" to the
community in a very practical way. Moreover, it might help in those cases where an
individual or community wants to request some level of control over the materials that
concern them - for example, to restrict or track access.
The software I present is based on an open-source and standards-based
development stack, including the ELAN annotation tool, the Apache SOLR search
It can be installed on most server configurations, and can be used in any Drupal
Digital Resources for the Study Quechua: http://wp.cdh.ucla.edu/quechua/
Tibetan and Himalayan Library Audio & Video Archive: http://www.thlib.org/avarch/mediaflowcat/