Current Trends in Documentation of Endangered Languages

Document Sample
Current Trends in Documentation of Endangered Languages Powered By Docstoc
					         Current Trends in
    Documentation of Endangered
            Languages



Peter K. Austin
ELAP, Department of Linguistics
SOAS
vThanks to Lise Dobrin, Lenore Grenoble, David
 Nathan for discussion of the ideas in this talk;
 they are absolved of responsibility for errors
Outline


vDocumentary linguistics and language
 documentation
vComponents and skills for documentation
vSome current issues and future concerns
vConclusions
 Documentary linguistics

v new field of linguistics “concerned with the methods,
  tools, and theoretical underpinnings for compiling a
  representative and lasting multipurpose record of a
  natural language or one of its varieties” (Himmelmann
  1998, 2006)
v has developed over the last decade in large part in
  response to the urgent need to make an enduring record
  of the world’s many endangered languages and to
  support speakers of these languages in their desire to
  maintain them, fuelled also by developments in
  information and communication technologies
v essentially concerned with role of language speakers
  and their rights and needs
  Features of documentary linguistics
v Himmelmann (2006:15) identifies important new features of documentary
  linguistics:
v Focus on primary data – language documentation concerns the collection and
  analysis of an array of primary language data to be made available for a wide
  range of users;
v Explicit concern for accountability – access to primary data and
  representations of it makes evaluation of linguistic analyses possible and
  expected;
v Concern for long-term storage and preservation of primary data – language
  documentation includes a focus on archiving in order to ensure that
  documentary materials are made available to potential users into the distant
  future;
v Work in interdisciplinary teams – documentation requires input and expertise
  from a range of disciplines and is not restricted to linguistics alone;
v Close cooperation with and direct involvement of the speech community –
  language documentation requires active and collaborative work with
  community members both as producers of language materials and as co-
  researchers.
 A contrast

vlanguage documentation: activity of systematic
 recording, transcription, translation and analysis
 of the broadest possible variety of spoken (and
 written) language samples collected within their
 appropriate social and cultural context
vlanguage description: activity of writing
 grammar, dictionary, text collection, typically for
 linguists
vRef: Himmelmann 1998, Woodbury 2003
  Uses of documentation

v documentation outputs are multifunctional for:
   v linguistic research - phonology, grammar, discourse,
     sociolinguistics, typology, historical reconstruction
   v folklore - oral literature and folklore
   v poetics - metrical and music aspect of oral literature
   v anthropology - cultural aspects, kinship, interaction styles,
     ritual
   v oral history, and
   v education - applications in teaching
   v language revitalisation
Users of documentation

v collection, analysis and presentation of data
   • useful not only for linguistics but also for research into the
      socio-cultural life of the community
   • analysed and processed so it can be understood by
     researchers of other disciplines and does not require any
     prior knowledge of the language in question
   • usable by members of the speaker community
   • respects intellectual property rights, moral rights, individual
     and cultural sensitivities about access and use and is done
     in most ethical manner possible
 The documentation record

v core of a documentation is a corpus of audio and/or video
  materials with transcription, multi-tier annotation, translation
  into a language of wider communication, and relevant
  metadata on context and use of the materials
v the corpus will ideally be large, cover a diverse range of
  genres and contexts, be expandable, opportunistic, portable,
  transparent, ethical and preservable
v as a result documentation is increasingly done by teams
  rather than ‘lone wolf linguists’
v need to see grammatical analysis and description as a
  tertiary-level activity contingent on and emergent from the
  documentation corpus
Phases in documentation project

vProject conceptualisation and design
vEstablishment of field site and permissions
vFunding application
vData collecting and processing
vCreation of outputs
vMonitoring, evaluation and reporting
Phases in data collection and analysis

vRecording – of media and text (including
 metadata)
vCapture – analogue to digital transfer
vAnalysis – transcription, translation, annotation,
 notation of metadata
vArchiving – creating archival objects, assigning
 access and usage rights
vMobilisation – publication and distribution of
 materials
Documentation projects

• Hans Rausing Endangered Languages
  Project
• Volkswagen Foundation DoBeS project
• NSF-NEH DEL project
• NWO
• others
HRELP

• Hans Rausing Endangered Languages Project
  (HRELP) funded by Lisbet Rausing Charitable Fund,
• ELDP – distributes £1million per year in 5 types of
  grants; funds 70 teams of researchers around the world
  documenting languages and cultures
• ELAR – digital archive at SOAS
• ELAP – academic programme for training MA, PhD,
  post-doctoral researchers
• Publishing books, newsletter, CD-ROMs, website,
  workshops, training courses
ELDP projects 2003-2005
Volkswagen DoBeS project

• funded by Volkswagen Stiftung (Euro 30 million), archive
  based in Max Planck Institute, Nijmegen, Netherlands
• 40 teams of researchers around the world documenting
  languages and cultures in a wide range of community
  contexts
• Major archive in Netherlands with new software tools
  (ELAN, IMDI) and research methods
• Volkswagen supports conferences, summer schools (eg.
  Endangered Iranian Languages, Keil University, August
  2007), local training courses (eg. Bali, June 2007),
  theoretical research and book publication (eg. Fundamentals
  of Language Documentation)
DoBeS
Other projects

• Documentation of Endangered Languages (DEL)
  interagency project of NSF, NEH, Smithsonian in USA - $4
  million in 2004, $2 million in 2005, $2 million in 2006
• NWO Netherlands - research funds, conferences
• Smaller projects - Foundation for Endangered Languages,
  Endangered Languages Fund, GfBS
• UNESCO - intangible cultural heritage, holds conferences,
  distributes some grants, developing handbook of language
  documentation and good practice in language revitalisation
Some current issues and challenges

•   Documentation versus description
•   The ‘representative’ record
•   Quality of language documentation
•   Commodification
•   Interdisciplinarity
•   Training for language documentation
•   Communicating with the wider world
Documentation vs description

Himmelmann and others have tried to distinguish language
     documentation from language description, but it is unclear whether
     such a separation is truly meaningful, and even if it is where the
     boundaries between the two might lie.
Documentation projects must rely on application of theoretical and
     descriptive linguistic techniques, if only to ensure that they are
     usable (i.e. have accessible entry points via transcription, translation
     and annotation) as well as to ensure that they are comprehensive.
It is only through linguistic analysis that we can discover that some
     crucial speech genre, lexical form, grammatical paradigm or
     sentence construction is missing or under-represented in the
     documentary record.
Without good analysis, recorded audio and video materials do not
     serve as data for any community of potential users. Similarly,
     linguistic description without documentary support is sterile, opaque
     and untestable.
The “representative” record

v On a theoretical level, once can define “representative”
  documentation as the collection of sample texts of all discourse
  types, all registers and genres, from speakers representing all ages,
  generations, socioeconomic classes, and so on. On a practical level,
  however, there are concrete limitations to the range and number of
  texts which can be collected, transcribed and analysed. Most
  linguists cannot devote their entire careers to time in the field, which
  would be required for a truly thorough collection and analysis of
  data.
v A solution (proposed by Siefart in LDD 5) is sampling, ie.
  identification of some subset of types that is representative of the
  language as a whole – but how do we do this in a meaningful way:
  (i) for an individual language (ii) cross-linguistically in a comparable
  manner?
Sampling criteria

Criteria for differentiation of communicative events:

v “Ways of speaking“ as distinguished in specific culture
  / speech community (Ethnography of Communication)
v Medium: spoken / written
v Plannedness: unplanned / planned
v Register: formal / informal
v Manner of obtaining data: spontaneous (‘natural’) vs.
  elicitation vs. stimulated
v Target: child-directed / adult-directed / foreigner-
  directed
v It is clear that the success of a documentation project rests on
  intimate collaboration with community members. In the ideal, they
  can be trained to be engaged in data collection themselves,
  thereby expediting the process (eg. Florey 2004). Even if this is
  not possible, community members can direct (external) linguists to
  varying discourse types and to differing speech patterns.
v Note however that this could result in focus on
  rare/unusual/unique discourse types that were in no sense
  ‘representative’
v Himmelmann (2006:66) identifies five major types of
  communicative events ranged along a continuum from unplanned
  to planned (next slide) however it is not clear that this typology is
  applicable to all languages and all speech communities – just
  what is a ‘representative’ account of language in use remains
  unclear, and perhaps should be abandoned
Himmelmann genres
Parameter   Major Types      Examples

Unplanned   exclamative      Ouch! Fire! Jishin da!

            directive        Scalpel! Sit! Achi ike!

            conversational   greetings, small talk, chat, discussion,
                             interview

            monological      narrative, description, speech,
                             formal address

            ritual           prayer, ceremonial address
Planned
Quality of documentation

v There is a tendency among some researchers to equate
  documentation outcomes with archival objects (part of what David
  Nathan has termed ‘archivism’), that is, the number and volume of
  recorded digital audio and/or video files and their related
  transcription, annotation, translation and metadata.
v Mere quantity of objects is not a good proxy for quality of research.
v Equally, some would argue that outcomes which contribute to
  language maintenance and revitalization are the true measure of the
  quality of a documentation project (what better success of an
  endangered language project than that the language continues to be
  used?).
v So how could we measure ‘quality’ of a documentary corpus? What
  parameters might be included?
Possible metrics

v volume (quantity) as a proxy
v form
  v media – audio, video, stills – how measured?
  v text – explicit, transparent, well-structured,
   standardised, richly detailed, machine-readable
  v links (relations, hypertext, multimedia) – explicit,
   well-structured, machine readable
More possible metrics

v content:
   v   new – never inscribed before
   v   unique – not readily replicable
   v   interesting
   v   …

v organisation and management (workflow,
 transformations, archiving)
v relevance and use of outputs for stakeholders
v impact on community of speakers (or other
 stakeholders)
v impact on future of language
Commodification

v reduction of languages to things and their treatment as if they
  were a tradeable commodity
v reflected in language documentation through the transformation of
  languages into bounded objects, indices, technical encodings, and
  exchangeable goods
v results from forces of objectification, standardisation and audit that
  shape the management of information in contemporary Western
  culture, especially academic culture with its focus on outputs and
  counting (eg. RAE, RQF, citation indices, research impact
  statements etc)
v also reflects a theoretical and methodological vacuum that has
  been filled not by linguistics but by preservationists, archives and
  technologists
Languages as bounded objects

vselections of phenomena crystalised into a
 singular “language”
vlanguages placed within boundaries, on maps
 etc.
Languages as indices

v language vitality indicators: Unesco defines 9 criteria
  with 6 scoring levels; SIL uses 8 indicators
v these objectify languages: the vitality of an individual
  language can be quantified, and languages can be
  ranked according to degree of endangerment
v Unesco presents a deterministic relationship between
  the 9 factors and the vitality and function of languages:
  “taken together, these nine factors can determine the
  viability of a language, its function in society and the
  type of measures required for its maintenance or
  revitalization”
Languages as exchangeable goods

v goal of research is for languages to be ‘preserved’ as
  ‘resources’ that ‘consumers’ (linguists et al) discover and
  access via ‘service providers’ (OLAC publicity)
v linguists’ professional obligations to speaker communities
  now often formulated in grant applications and elsewhere
  in terms of transacted objects (language primers, CDs,
  books) rather than knowledge sharing, joint engagement
  in language maintenance activities or other interactions
v granting agencies require linguist’s bona fides to be
  distilled into a ‘letter of support’ from ‘an appropriate
  representative of the language community’ thus turning a
  complex of social and political dynamics into an object that
  is used to legitimise the research
Languages as technical encodings

v quantifiable properties (recording hours, data volume, file
  parameters) and technical desiderata (‘archival quality’,
  ‘portability’, standardised ontologies) have become
  reference points in discussing and assessing the methods
  and goals of documentation
v results in grant application by formula: 100 hours of 16 bit
  44.1MHz audio, 25 hours MPEG-2 video, 10% ELAN .eaf
  files and Toolbox annotations
v technical parameters replace balanced discussion of
  documentation methods; eg. video recordings proposed
  without reference to hypotheses, goals or methodology;
  avoidance of data compression substitutes for knowledge
  of art of audio recording; file formats named rather than
  corpus structure described
   Interdisciplinarity

v Himmelmann and others have pointed to the importance of taking a
  multidisciplinary perspective in language documentation and drawing in
  researchers, theories and methods from a wide range of areas,
  including anthropology, musicology, psychology, ecology, applied
  linguistics etc (see Harrison 2005, Coelho 2005, Eisenbeiss 2005).
v True interdisciplinary research, is difficult to achieve, both because of
  theoretically different orientations, and practical differences in approach
  (ranging from differences in linguists’ and anthropologists’ practices
  concerning payments for consultants traditionally have differed, to more
  significant differences in academic paradigm that make communication
  and understanding fraught).
v Mainstream linguistics has tended to turn away from other disciplines
  and to emphasise its ‘independence’ by concentrating on theoretical
  concerns that are of internal interest to linguists only (minimalism, OT
  phonology – see Libermann 2007).
v Documentary linguistics opens new doors to interdisciplinary
  collaboration but we need to work out how to achieve it.
Training for language documentation

 vAt SOAS some experiments:
 vPostgraduate courses – MA in Language
  Documentation and Description and PhD in
  Field Linguistics
 vTraining courses for ELDP grantees
 vSpecialised training days (recording, XML)
 vSeminars and workshops to supplement usual
  taught programme
Reaching the wider world

 vThere are great opportunities for
  communicating about language and language
  issues to the general community
 vAt SOAS we have run “Endangered
  Languages Week”, film showings, public
  lectures, exhibitions (“Disappearing Voices”),
  David Crystal’s play (“Living On”)
Films
Exhibition
New opportunities

v New initiatives like the World Language Centre in Reykjavik and
  Linguamon Casa des Llengues in Barcelona offer the possibility of
  reaching out to the wider community to engage their interest in
  languages, language endangerment and the urgent task of
  language documentation
v By working together and collaborating with active researchers and
  museum specialists we can open up new communication
  possibilities
v But historically linguists have been only partially effective in
  communicating with these constituencies (cf. Liberman) – EL and
  Lang Doc gives us a powerful ‘hook’ to market our wares but we
  must be adopt professional strategies to compete in this
  marketplace
Identifying the gaps

v The discourse of endangered languages and language documentation
  has a strong moral and emotional power which has not been matched
  by conceptual guidance on what linguistics and linguists can do in
  response
v publications and debates about effective and appropriate documentary
  methodologies for linguists have been slow to develop, resulting in
  many unanswered questions:
   v   are the goals of documentary linguistics social or formal?
   v   are its data symbolic or digital recordings of events?
   v   what role(s) should archives play?
   v   how could we decide between competing interests?
v we lack a framework for assessing quality, value, effectiveness and
  progress of our work so documentary linguists fall back on established
  patterns like quantifiable indices and technical standards
 Setting some agendas

v recognising that some of the challenges described here
  derive from bureaucratic and technological contexts and
  should not be taken for granted as defining the discipline
v we need to develop a new approach to language
  documentation that implements the moral and ethical vision
  that has attracted new participants
v replacing the rhetoric that documentation is a separate
  discipline from descriptive linguistics with a better
  understanding of their respective goals, methodologies and
  evaluative criteria
v and locating documentation within a wide range of
  interdisciplinary approaches to human language
v with development of appropriate training and outreach
The end