Docstoc

Archiving DDL

Document Sample
Archiving DDL Powered By Docstoc
					           Archiving
               David Nathan
      Endangered Languages Archive
    Hans Rausing Endangered Languages
                  Project
        SOAS, University of London


1
        Topics

       Introducing ELAR and digital language archives
       Preservation
       Archive interactions with documentation
       What and how to archive
       Protocol
       Metadata
       Evaluation of audio
       Archives and revitalisation
       Archivism : mobilisation
       Video
       Conclusions
2
    Introducing ELAR and digital language
    archives




3
     Endangered Languages ARchive (ELAR)

     one of 3 semi-autonomous programs of the
      Hans Rausing Endangered Languages
      Project
     staff of 3; archivist, software developer,
      technician, (research assistants etc)
     develop preservation infrastructure,
      cataloguing and dissemination; policies;
      facilities; training and advice; materials
      development and publishing


4
     What is a digital language archive?

     a trusted repository created and
      maintained by an institution with a
      commitment to the long-term preservation
      of archived material
     will have policies and processes for
      materials acquisition, cataloguing,
      preservation, dissemination, migration to
      new digital formats
     a collection of managed materials


5
     What is archiving of language materials?

     preparing materials in a structured form
      suitable for long-term preservation
     creating long-term relationships
     it is not backup
     it is not dissemination/publication
     it should not impinge on good linguistic
      practice



6
     What can a language archive offer?

     Security - keep your electronic materials safe
     Preservation - store your materials for the long
      term
     Discovery - help others to find out about your
      materials
     Protocols - respect and implement sensitivities,
      restrictions
     Sharing - share results of your work, if appropriate
     Acknowledgement - create citable
      acknowledgement
     Mobilisation - create usable language materials for
      communities
     Quality and standards - advice for assuring your
      materials are of the highest quality and robust
      standards
7
     Kinds of language archives

     many cross-cutting classifications:
       Indigenous vs outsider, eg. Squamish Nation
       regional vs international, eg. AILLA, Paradisec;
        DoBeS, ELAR
       associated with research institute, eg. AIATSIS,
        ANLC
       granter-funded, eg. DoBeS, ELAR, OTA
       digital vs physical vs mixed, eg. DoBeS vs
        Vienna Sound Archive, ANLC



8
     Potential users

     speakers and their descendants - up to
      95% of users of UCB are community
      members
     depositors - to create or renew materials
     other researchers - comparative/historical
      linguists, typologists, theoreticians,
      anthropologists, historians, musicologists
      etc etc
     other “stakeholders”, eg educationalists
     journalists and the wider public
9
      Archives networks and bodies

      Digital Endangered Languages and
       Archives Network (DELAMAN)
        ELAR, DOBES, ANLC, Paradisec, EMELD,
         LACITO, AIATSIS, AMPM (Maori)
      Open Language Archives Community
       (OLAC)
      others, eg. D-LIB
        http://www.dlib.org/
        Open Archives Initiative


10
                  Digital archive architectures

                  OAIS archives define three types of
                   ‘packages’
                   ingestion, archive, dissemination:




                                                                      afd_34
                                 afd_34                             dfa dfadf
                                dfa dfadf                           fds fdafds
                                fds fdafds
                                                     afd_34
                                                    dfa dfadf
                                                    fds fdafds




                                          afd_34                  afd_34
                                       dfa dfadf                 dfa dfadf
                                       fds fdafds                fds fdafds




     Producers      Ingestion      Archive                                       Dissemination    Designated
11                                                                                               communities
                  ‘Live Archives’ - architecture

                  Boundary between depositors, users and
                   archive:
                    users add, update content; customise outputs




                                                                      afd_34
                                 afd_34                             dfa dfadf
                                dfa dfadf                           fds fdafds
                                fds fdafds
                                                     afd_34
                                                    dfa dfadf
                                                    fds fdafds




                                          afd_34                  afd_34
                                       dfa dfadf                 dfa dfadf
                                       fds fdafds                fds fdafds




     Producers      Ingestion      Archive                                       Dissemination    Designated
12                                                                                               communities
      The way we were ...

      eg 1993: ASEDA Aboriginal Studies
       Electronic Data Archive at AIATSIS
       Canberra (modelled on Oxford Text
       Archive)
      opportunistically collect and catalogue
       electronic materials that were at risk or not
       accessible
           lexica
           grammars
           texts
           etc

13
         How things have changed ..

        types of data (modalities and some genres)
        means of storage
        standardisation and metadata
        dissemination
        (most explosive) expanded into practice
         and workflow of linguists




14
      ELAR’s holdings

      ELAR currently holds about 45 deposits
       with a total volume of approx 1.1 TB.
      the average deposit is about 25 GB,
       however, the sizes vary widely, with a few
       much larger deposits. The median size is
       around 10GB
      we expect volume to nearly double over
       the next year
      see next slides for distribution of data types



15
            ELAR holdings by data type

                                           Volume
                               Data type     (MB)    Files
                               audio       360,411   6,312
                               video       208,995    895
      data types for a        image        28,592   2,221
       representative sample   msword         223     404
       (70%) of holdings       pdf            196     134
                               eaf             33     176
      data type by volume     text            32     781
       (MB) and number of      lex              9      29
       files, sorted by        trs              5     246
       volume                  xls              1      19
                               imdi             1      26
16
         If you are a depositor, ELAR will

        preserve your deposited materials
        provide for making changes where possible
        provide web-based metadata management
        implement your access restrictions etc
        give feedback about materials
        provide advice, general and specific
        assistance, eg data conversion
        provide some equipment and services
        on a case by case basis, develop
         resources
17
     Preservation




18
         Preservation issues

        making materials robust
        making storage robust
        organisational, ownership and policy issues
        changing technologies
           refreshing
           migrating




19
      Changing technologies

      advantages of digital preservation
         primarily: copying
         items no longer unique
         also transmission, dissemination
      other implications
         robust formats (standard, open, explicit)
         formats with long horizons
         formats easy to refresh
         formats that don’t require particular software
          (sometimes software is intrinsic!)
         may have to describe software or even archive
          the software
20
      Two preservation models

      “preserve the bytestream”
        keep the exact original at all costs


      LOCKSS
        “lots of copies keep stuff safe”
        http://lockss.stanford.edu/
        guess which community it came from!




21
      Some backup issues

      risk management
      undetected problems and useless backups
      aspects of professional backup:
        scheduled frequencies, eg monthly, weekly,
         daily
        retention
        media and locations
        naming/versions
        proven restoration


22
         Top 10 worst ways to collect/manage data

        1. No backup
        2. Divergent versions of same data
        3. Unlabeled disks/media
        4. Non-standard or undocumented filenames
        5. Master recordings used to review/analyse data
        6. Don’t know how characters are encoded
        7. Never tried to convert/export data
        8. Unprocessed or unedited audio and video
        9. Inconsistent recording
        10. Unmonitored recording


23
     Archive interactions with documentation




24
         Documenter and archive interactions

        grant formulation and application
        communications, questions, advice
        training
        archiving services




25
     Documenter & archive interactions




26
      Query/interaction topics

      analysis of approx 150 queries from
       documenters/linguists over nearly 2 years




27
     What and how to archive




29
      What can you archive (at ELAR)?

      media - sound, video
      graphics - images, scans
      text - fieldnotes, grammars, description,
       analysis
      structured data - aligned and annotated
       transcriptions, databases, lexica
      metadata - structured, standardised
       contextual information about the materials


30
      Archive objects

      informed by traditions, eg document archives
      sometimes called “resources”, bundles
      it could be a file, a set of files, a directory, a
       “session” or a coherent item with many parts
      should have archival qualities eg Bird & Simons
       “7 Dimensions” (or see Thieberger in LDD2)
      may impose standard structures or formats
      need deposit event and processes
           legal and protocol
           verification
           accession
           ongoing processes

31
      Archive objects should be selected

      example: video: How much volume
       allocated?
      answer: ...

      however, e.g.:
        unlikely that linguist is in position to plan and
         consistently create excellent video, so selection
         is unavoidable


      data has always been selected!
32
      (... selection)

      in your typical work you also:
        selected
        labeled
        transformed/processed/edited
        added, corrected, expanded
        made links
        made or assumed relationships between
         “whole” and processed units; invented labels,
         IDs, scope etc
        imposed formats

33
      Data portability

      Bird and Simons 2003:

       (for language documentation) our data
       should have integrity, flexibility, longevity
       and utility




34
         Data portability

        complete
        explicit
        documented
        preservable
        transferable
        accessible
        adaptable
        not technology-specific
        (also appropriate, accurate, useful etc!!)

35
      Formats - media - preferred

      sound - WAV
      image - BMP, TIFF, JPEG
      video - MPEG2




36
      Formats - documents - preferred

      plain text, with or without markup
      PDF (PDF/A)
      XML, other systematic markup (with description of
       markup system)
      well-structured documents in common Office
       formats - ELAR will eventually convert them to
       archive formats
      character encoding :
         preferred encoding is ASCII or Unicode
         clearly document any other encodings used, e.g. ISO
          8859-5
         discuss with us if you use font substitution to handle non-
          Roman characters
37
      Formats - characters - preferred

      character encoding :
        ASCII or Unicode (UTF-8)
        you must clearly document any other encodings
         used, e.g. ISO 8859-9
        discuss with us if you use font substitution to
         handle non-Roman characters




38
      Filenames and directories

      characters [A-Z], [a-z], [0-9], underscore
       and a single full stop before the extension
      correct MIME extension
      favour lower case letters
      maximum length 30 characters
      maximum directory depth 8
      = ASCII only, no spaces



39
      Semantics of filenames


      don’t stuff meaningful information into
       filenames - use metadata instead
      versions
      use directory structures wisely




40
                Data format duty cycle examples


                  Raw       Working   Interchange    Archive    Dissemination

     Video     DVI        software-   MPEG-2        MPEG-2     MPEG2, AVI, QT
                          specific
     Fieldnotes Shoebox   Shoebox     FOSF          XML        WWW, print
               page                                            dictionary


     Audio     ATRAC      WAV         WAV           BWF        MP3
     Complex   multiple   FM Pro      RTF, XML      XML        Interactive
     data                 database                             application


     Multi-    multiple   multiple    as above      as above   Multimedia
     modal                                                     application



41
     Evaluation and conversion examples




42
      Characters

      did my characters come
       through?
      answer: ...         há pa ki hená mázaska
                             wikcémna nú pa iyóphe-
                             wa-ye kst DBW
      however:
        perhaps ELAR       wóz?az?a-s?ni yeló DB OK
         should do it?      wash things-NEG ASS.M
                            'he didn't do the wash'

                            wózaza-sni yeló DB OK
                            wash things-NEG ASS.M
43
                            'he didn't do the wash'
      Preservation

      Is my file preservable?


      Note:
          characters?
          inconsistent segmentation
          data as comments      Text transcription: “Korimáka”
                                 Language: Choguita Rarámuri
          conventions/metadata Language used Elenatranscription: Spanish
                                 Consultant: Luz
                                                 for
                                                     León Ramírez
                                           Linguist: abriela Cabaero
                                           Transcription: erth Fuen & Gabrela Cabaero
                                           Date recorded: 11/02/2006
                                           Date tranbscribed: 11/02/2006
                                           Recording: rec6-LEL.wav

44
              Knowledge representation 1 - before



     wama momol chi naron mon chayako (LB) / wama momol chi naron chayako
     (MD)
     wama momol chi nan mon chayako (more emphatic(LB) / wama momol chi nan
     chayako (MD)
     Why don't you and him do it?
     + Notes have both of these sentences without the negator mon.

     OK runon naynangkroy ile ri
     He ate their sago.

     * kipin kannangkroy ngolu
     intended: We ate their cassowary.

     OK kipin kanangkroy ngolu
     We ate their cassowary.
45
                 Knowledge representation 1 - after



 * kipin kannangkroy ngolu         <sentence.set num="75">
 intended: We ate their cassowary.    <version>
                                          <walman>Kipin kannangkroy ngolu</walman>
 OK kipin kanangkroy ngolu                <judgement>*</judgement>
 We ate their cassowary.              </version>
                                      <english>We ate their cassowary. </english>
                                   </sentence.set>
                                      <sentence.set num="76">
                                      <version>
                                          <walman>Kipin kanangkroy ngolu</walman>
                                          <judgement>OK</judgement>
                                      </version>
                                      <english>We ate their cassowary.</english>
                                   </sentence.set>
46
             Knowledge representation 2

            avoid generic software “convert to XML”
     <?xml version=“1.0” encoding=“UTF-8”?>
     <FMPXMLRESULT xmlns=“http://www.filemaker.com/fmpxmlresult”>
       <PRODUCT BUILD=“06/26/2002” NAME=“FileMaker Pro” VERSION=“6.0v2”/>
       <DATABASE DATEFORMAT=“M/d/yyyy” LAYOUT=““ NAME=“Videos”
              RECORDS=“13” TIMEFORMAT=“h:mm:ss a”/>
       <METADATA>
           <FIELD EMPTYOK=“YES” MAXREPEAT=“1” NAME=“Index name” TYPE=“TEXT”/>
           <FIELD EMPTYOK=“YES” MAXREPEAT=“1” NAME=“Image desc” TYPE=“TEXT”/>
           <FIELD EMPTYOK=“YES” MAXREPEAT=“1” NAME=“Date” TYPE=“TEXT”/>
           <FIELD EMPTYOK=“YES” MAXREPEAT=“1” NAME=“Content” TYPE=“TEXT”/>
       </METADATA>
       <RESULTSET FOUND=“13”>
           <ROW MODID=“16” RECORDID=“40”>
              <COL><DATA>Morly Beeta</DATA></COL>
              <COL><DATA>Interview with Morly Beeta</DATA></COL>
              <COL><DATA>Jan/13/05</DATA></COL>
              <COL><DATA>Obu history by Morly Beeta</DATA></COL>
47         </ROW>
       ELAR conversion - original

     Language                    Unangam Tunuu [Aleut Language]
     Dialects                    Qawalangin [Eastern Aleut]
                                 Nii}u}i{ [Western Aleut]
     Speakers                    Maria Turnpaugh, Nick Lekanoff, Clara Golodoff
     Place recorded              Unalaska, AK. Ray Hudson Room, Unalaska Public Library.
     Date recorded               7.21.04
     Recording name              UNAK2trk1
     Duration                    16:21 min.
     Recorded by                 Alice Taff
     Recording equipment         Marantz CDR 300 recorder with one flat filtered table-mounted cardiod
                                 microphone. Also audio/video miniDV - Canon GL2.
     Translated by               Alice Taff with Maria Turnpaugh 000-493sec. Millie Prokopeuff 455-499sec.
     Transcribed by              Alice Taff
     Reviewed and corrected by   Moses Dirks

     129        ET   Kamagala, afternoon
                     afternoon

     135        CG   Aang
                     yes

     136        ET   Sla{chxisaada{, ii? Nice weather.
                     nice weather

     140        CG   Yeah. Maku{
                     that's all right

     143        ET   Alqutaadaltxichin? How are you?
                     How are you all?

48
          ELAR conversion - XHTML


     <?xml version=“1.0” encoding=“UTF-8”?>
     <!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
         “http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”>
     <html xmlns=“http://www.w3.org/1999/xhtml” xml:lang=“en” lang=“en”>
     <head><title>ANC14trk1</title>
     <link href=“taff.css” type=“text/css” rel=“stylesheet”></link></head><body>
     <table class=“metadata”>
     <tr><td>Language</td><td class=“language”>Unangax̌ (Aleut)</td></tr>
     <tr><td>Dialect</td><td class=“dialect”>Niiĝuĝix̌ (Western Aleut)</td></tr>
     <tr><td>Speakers</td><td class=“speaker”>Alice Petrivelli, Vera Snigaroff, Mary Snigaroff, Vivian
     Koenig</td></tr>
     <tr><td>Place recorded</td><td class=“place”>Anchorage, Alaska </td></tr>
     <tr><td>Date recorded</td><td class=“date”>Mar. 15, 2005</td></tr>
     <tr><td>Recording name</td><td class=“rec_name”>ANC14trk1</td></tr>
     <tr><td>Recorded by</td><td class=“rec_by”>Alice Taff, Piama Oleyer</td></tr>
     <tr><td>Recording equipment</td><td class=“rec_equip”>Marantz CDR300 CD recorder with one flat-
     filtered, table-mounted cardioid microphone. </td></tr>
     <tr><td>Translated/Transcribed by</td><td>Simeon L. Snigaroff, December 2005</td></tr>
     </table>
49
           ELAR conversion - XHTML


     <table class=“transcript”>
     <tr><td class=“time”>1</td><td class=“speaker”>ap</td><td class=“transcription”>Uqlaĝiix̌, x̌aayax̌,
     uqlaĝil agach aliguutax̌ ax̌.</td></tr>
     <tr><td>&nbsp;</td><td>&nbsp;</td><td class=“translation”>To take a bath, Steam bath, to take a
     bath is the one that is Aleut</td></tr>
     <tr><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
     <tr><td class=“time”>5</td><td class=“speaker”>vs</td><td class=“transcription”>uhmm</td></tr>




50
     ELAR conversion - in browser


      Language                          Unangax̌ (Aleut)
      Dialect                           Niiĝuĝix̌ (Western Aleut)
      Speakers                          Alice Petrivelli, Vera Snigaroff, Mary
                                        Snigaroff, Vivian Koenig
      Place recorded                    Anchorage, Alaska
      Date recorded                     Mar. 15, 2005
      Recording name                    ANC14trk1
      Recorded by                       Alice Taff, Piama Oleyer
      Recording equipment               Marantz CDR300 CD recorder with
                                        one flat-filtered, table-mounted
                                        cardioid microphone.
      Translated/Transcribed by         Simeon L. Snigaroff, December 2005

      1 ap    Uqlaĝiix̌, x̌aayax̌, uqlaĝil agach aliguutax̌ ax̌.
              To take a bath, Steam bath, to take a bath is the one that is Aleut



51
      Delivery of materials

      mostly we expect to receive copies on
       computer-readable media such as hard disks
       or CD/DVD
      DVDs seem consistently unreliable
      some digitisation of media may be possible




52
     Protocol




53
      Protocol

      sensitivities, restrictions: identification,
       description and implementation




54
      Protocol grows naturally with documentation

      focus on recorded data » more people, more
       genres, less researcher knowledge
      focus on revitalisation » which language to teach?
       who to host and teach? who can learn? etc
      community participation » framework for speakers
       to shape documentation process and products
      mobilisation » selecting, juxtaposing; community
       participation
      time » significance and sensitivities change over
       time
      access » increasing scope for dissemination,
       control of IP
55
      ELAR Deposit Form “Section C”

      ELAR pays careful attention to any
       sensitivities or restrictions that apply to any
       part of your deposit. There are four ways
       that Access Protocol is implemented:
         you define permissions for the whole deposit or
          for individual files (or parts of files)
         we provide defaults to protect your data if you
          do not define permissions
         you/we keep permissions up to date
         you list other rights holders

56
       ELAR Deposit Form “Section C”

     P1. Anyone                                                                           
         Any person may view/listen to or receive a digital copy of any part of the deposit
     P2. Certain people or groups
         Choose any combination of P2A, P2B, and P2C:
         P2A Research community members
                What level of access (choose one only)?
                   P2A1. They can receive a digital copy of requested material          
                   P2A2. They can view/listen but cannot receive a digital copy         
         P2B. Language community members
                See below regarding identifying members
                What level of access (choose one only)?
                   P2B1. They can receive a digital copy of requested material          
                   P2B2. They can view/listen but cannot receive a digital copy         
         P2C. Particular named people or bodies                                         
                See below regarding identifying people/bodies
     P3. Depositor is asked permission for each request
                You will be contacted and asked for permission on each request.
                How do you want to be contacted?
                   P3A. Requester is given address to contact you directly              
                   P3B. ELAR will relay requests to you                                 
     P4. Only the depositor has access                                                   
                Persons other than the depositor will not be able to request access.

57
       ELAR Deposit Form “Section C”

     Identifying people/bodies
        If you chose P2B or P2C, tell us how ELAR should determine who is a
        member of a group (e.g. language community, educational body). Choose
        one of the following:
        M1. You tell ELAR how to determine membership (tell us in Part D) 
        M2. ELAR will ask you on each occasion                                
        M3. ELAR will make a judgement about membership                       
        If you chose P2C, then list the names of the people or bodies in Part D.
     Contacting you
        If you choose P3A or P3B, you will be able to decide about each particular
        request. If the choice is P3A, we will send your address to the requester, who
        can then ask you directly for permission. You then send us your decision. If
        the choice is P3B, ELAR will act as an intermediary, and pass on the request
        to you, so that your privacy is maintained. However, if you chose one of P3A
        or P3B and you (or your delegate) are not contactable, ELAR will need to
        make the decision or change the access permissions.
        Similarly, if we need to contact you to ask about group membership, and you
        (or your delegate) are not contactable, we will need to make the decision or
        change the access permissions.



58
      Other

      deposit, file or object-level protocol
      depositor-oriented
      we will provide means to change/manage
       protocol
      delegate
      other rights holders
      sunset clause



59
     Metadata




60
      Metadata

      Metadata
         the data about data that enables the
          management, identification, retrieval and
          understanding of that data
         reflects the knowledge and practice of
          data providers
         defines and constrains audiences and
          usages for data
      documentation’s data orientation heightens
       the importance of metadata
61
      Metadata

      ELAR metadata set =
        selection from IMDI*, OLAC*, EAD, TEI
        ELAR-specific (e.g. protocol, geographical)
        depositor metadata
       * ie. a set of metadata elements that maps onto both IMDI and OLAC




                  { {
         Archive                       ELAR metadata set
                        Deposit        Your metadata
                                       All other files

62
      Types of metadata


      depositor's / delegates' details
      descriptive metadata
      administrative metadata
         preservation metadata
      access protocols
      metadata for individual files




63
         Depositors and delegates


        name
        address
        contact details (telephone, fax, email, URL)
        role
        affiliation
        date of birth
        nationality



64
         Descriptive metadata


        title, description, subject, summary
        keywords
        subject Language, Community
        location
        time span




65
      Administrative metadata


      project details
         funding and hosting institutions
      details of external copies
      modifications and status
      details of accession agreement
         cf. deposit form




66
         Preservation metadata


        carrier media
        formats, size
        provenance (source)
        access
           access protocols (see elsewhere)
           group membership identification




67
      File-level metadata

      media files
         duration, file size
         MIME type, content type
      text files
         font, character set, encoding
         format, markup
      metadata files
         schema
         scope
         validity
68
      Metadata formats

      common or standard:
        IMDI (‘ISLE Metdata Initiative’, from DoBeS)
        OLAC (Open Language Archives Community)
        EAD, and others
      ELAR: has created its own set, currently in
       implementation
        deposit-scope metadata in deposit form
        file level metadata (will be) by web form
        also, depositor’s own metadata


69
      Metadata formats

      each depositor can also have different
       metadata!
      our goal: to maximise the amount and
       quality of metadata
      quality and extent is more important than
       standards and comparability
      many depositors are sending extensive
       metadata in a variety of formats including
       spreadsheets - see examples

70
      What’s missing from metadata?

      pedagogy has typically been left out of the
       documentation agenda
      linguists are better at problematising
       languages than teaching them
      we should mobilise informed, effective and
       accountable pedagogy
      a Hippocratic imperative




71
      Relationships

      relationships between documenters/
       documentation and pedagogy
         nonexistent/poor cousin
         by-product

        documentation is a vector of language
         transmission!



72
         Who could be documenters?

        community members
        audio recordists
        videographers (documentary filmmakers)
        educators
        ethnobotanists
        anthropologists
        computer experts
        activists, missionaries
        linguists
73
         Multipurpose documentation?

      linguists of various specialisations
      anthropologists, historians, botanists ...
      do any have priority?
      who are documentation’s main
       beneficiaries?
      can we tell?




74
       ... yes ...

      Metadata
        the data about data that enables the
         management, identification, retrieval and
         understanding of that data
        reflects the knowledge and practice of
         data providers
        defines and constrains audiences and
         usages for data


75
      The key is metadata


      examples: IMDI, tiered morphological
       glossing etc
      standard (or “best practice”) metadata is
       strongly oriented to descriptive linguistics
       and typology (“aggregators”)
      How could metadata serve pedagogy?




76
         Pedagogically oriented metadata


         demarcation, names and descriptions of
          socially/culturally relevant events such as songs
          (great interest to community members, and
          valuable teaching materials)
             should enormous amounts of time be spent providing
             morpheme-by-morpheme glosses if we cannot simply
             retrieve a song?

         phenomena that provide learning domains, such
          as “numbers”, “kinship”, “greetings”, “tense”
         socially important phenomena such as register,
          code switching

77
      Pedagogically oriented metadata

      notes on learner levels
      links to associated materials that have
       explanations, examples
      notes on the previous selection and use of
       material for teaching
      notes on how to use the material for teaching
      notes and warnings about restricted materials or
       materials which are inappropriate for young or
       certain classes of people (e.g. profane, archaic etc)
      and of course easily findable basic information
       such as name of language or variety, speaker,
       gender, speaker’s country etc

78
     Evaluating audio




79
      Dobbin

      software for audio evaluation, processing
       and reporting




80
     Dobbin




81
     Dobbin




82
     Dobbin




83
     Dobbin




84
     Dobbin




85
     Dobbin




86
     Archives and revitalisation




87
      Keeping ‘means of transmission’ alive

      Romaine: co-ordinated efforts at
       revitalisation mean that institutions
       increasingly become the vector of
       language transmission, cf intergenerational
       transmission (Fishman)
      at the limit, documentations, and archives
       that foster, preserve, and disseminate
       them, become the means of transmission



88
      Archives and revitalisation

      Penfield: toward a theory of documentation
           collaborative efforts
           onsite training
           document for revitalisation
           community-based protocols for the use of
            materials
      these have implications for the lifecycle of
       ‘data’




89
     Archivism




90
      What have we missed?

      Woodbury: most developments are "what's been
       happening around the emergence of a
       documentary linguistics", particularly technology,
       which has raised expectations more than changed
       practices




91
      What have we missed?

      Contact with wisdom and
       experience of established
       fields e.g.
         radio/broadcasting (eg mics,
          MD)
         cinematography (eg quality
          and specialisation)
         journalism (eg equipment
          handling)
         audio archives (linguists had
          input to IASA before 80s
          or so)


92
      What did we get?

      advice about formats, parameters, what to
       avoid
      'silver bullet' equipment and formats
      fundamentalism and format wars




93
      Archivism

      Archivism: capitulation of language documenters
       to the agenda and priorities of archives and
       information technology
      why did this happen?
         for historical reasons
         rapid changes in technology
         we left a vacuum




94
     Mobilisation




95
      Mobilisation

      use of documentation resources to make
       relevant, useful, effective resources for
       language support and revitalisation




96
      Gamilaraay/Yuwaalaraay song player

      uses ‘familiar’ data such as from Shoebox,
       Transcriber
      adds genre, functionalities, design etc




97
     Song player data

     <?xml version="1.0"
     encoding="ISO-8859-1"?>
     <!DOCTYPE Trans SYSTEM
     "trans-14.dtd">
     <Trans scribe="elar"
     filename="YugalTrack33"
     version="1"
     version_date="050608">
     <Episode>
     <Section type="report"
     startTime="0"
     endTime="87.445">
     <Turn startTime="0"
     endTime="87.445">
     <Sync time="0"/>
     \newsong14 [track33] music
     <Sync time="2.588"/>
     verse 1 line1
     <Sync time="5.619"/>
     verse 1 line2
98   <Sync time="8.339"/>
     Song player data

     \song 34 [track28]
     \ti Gugan gaaynggul /Brown-skin baby
     \co Words and music: (c) Bob Randall
     \s Roger Knox
     \ln Gamilaraay

     \verse1
     Dhayndalmuu ngaya dhurriyawaanhi
     dhayndalmuu ngaya dhurriya-y -waa-y -nhi
     priest      I     ride,      -moving -Past
     s20148      m1590 m721       -m1733 -m1699
     As a preacher I used to ride

     Yarraamanda         binaal     nhama      wagibaaga.
     yarraaman -ga       binaal     nhama      wagibaa -ga
     horse     -in,at,on peaceful   that,the   plain    -in,at,on
     m2020     -m755     m244       m1686      s20467   -m755
     A quiet horse on the plains.

     Walaaybaaga               gamila ngaya muurr gigi
99
     walaay -baa      -ga      gamila ngaya muurr gi-gi       -gi
       Song player data

       Chunking data:

         verses etc: [2,4,6,8,10,12,14,16,18,20,22,24]

         labels: [1:"Verse 1", 3:"Chorus", 4:"Verse 2",
          6:"Chorus", 7:"Verse 3", 9:"Chorus", 10:"Verse
          4", 12:"Chorus"]




                                    Play it
100
       Other examples of ‘mobilisation’

       Simple or conventional games etc can take
        on new significance

         Memory game play
         Crossword play




101
       Video in documentation and archiving

       “Questioning the role of video in
         language documentation & archiving:
          is a moving picture worth 1,000 texts?”




102
       The rise and rise of video

       increase in claims about video
       rise from about 25% to 75% of ELDP
        applicants
       funders have been demanding that some
        applicants make video




103
       One size fits all?

       Himmelmann:
        the core of a language documentation, then, is
          constituted by a comprehensive and
          representative sample of communicative events
          as natural as possible. Given the holistic view
          of linguistic behaviour, the ideal recording
          device is video recording.




104
       Goals and methodology of documentation

       cultural and cognitive aspects can be documented
        or augmented by video (examples from Harrison)
          counting methods/systems
          locative expressions
          behaviours or appearances of plants animals etc that are
           described as part of language-encoded knowledge:
            • information about plant toxicity and preparation could
              usefully be video
            • swimming formations (eg Marovo people of Solomon
              Islands who have rich set of terms for fish behaviour and its
              relationships to the calendar and hunting)
            • Gila Pima (Arizona) name a plum tree "dog's testicles", and
              an edible banana "looks like an erection" (umm, what will
              the videos show?)
           However, David Crystal estimates that such
           culturally/environmentally specific aspects are only about 10%
           of any languages’ content

105
       Goals and methodology of documentation

       discourse and genre
          distinguishing participants (McConvell)
          transparently capturing “stories” (Wittenburg)
       adding or enhancing methodology
            stimulus materials
            the camera adds theatricality (Jukes)
            the camera as a participant (Atkins)
            enhance transcription through motivating community
             participation
       sign language work
          treat video as inscription
          cameras, lighting, orientation, clothing etc
       appreciated by communities
106
       Goals and methodology of documentation

       documentation can’t aim to capture everything
        (Austin)
       and the video camera cannot either!
       argument for accountability has caused confusion
        between events and recordings. Result: fantasy
        that video is what happened and provides
        empirical evidence for all kinds of claims
       argument:
          video can do X => we should do video
          fails without goals and methodology for X
       many pro-video arguments could be equally
        applied to capturing other phenomena:
          e.g. palatography
          collecting other text-based metadata eg on social setting

107
       Goals and methodology of documentation

       there must be different methodologies
        (linguistic AND video) for different
        purposes (cf. sign)
       Himmelmann:
        [each potential discipline’s usages] influence the
          recording and presentation of the data
          inasmuch as certain kinds of information are
          indispensable for a given analytical procedure
          (no phonetic analysis is possible without some
          high-quality sound recording, no analysis of
          gestures is possible without videotaping, etc.)

108
       Goals and methodology of documentation

       so if there are distinct methodologies for
        different purposes
          how adequate could a generic video be?
          how can video serve purposes that
           documenters don’t have?




109
       Goals and methodology of documentation

       explicit claimed purposes for video:
         in ELDP applications, many applicants request
          funds for video equipment but have no video-
          related documentation goals
                  and
         video exponents describe the potential of video
          but few documenters actually have these goals




110
       Goals and methodology of documentation

       many phenomena can't be represented on
        video:
         complex family structures and their
          terminologies
         changes in moon shape and phase (better as
          still photos or diagrams); other calendric and
          geographic expressions
         time and distance eg Tofa (Siberia) have words
          for the distance you can cover in a day on
          reindeer back
         morphological, grammatical and most lexical
          information
         (also relationships, staging, motivations,
111       histories...)
       Video: a community oriented technology

       video is good for:
         community oriented content
         community involvement
         members will best know what/how to shoot
         skills transfer
         creating directly usable materials, including for
          revitalisation
         why should a linguist shoot video at all?




112
       Video workflow and workload

      a disorder of magnitudes ...
       skills, workload, intrusion, volumes - all
        increase by orders of magnitude
            skills - equipment, shooting, editing, production
            equipment - choice, usage, maintenance
            power supplies
            capturing, conversion
            annotation
            editing, production
            data volumes


113
       Workflow and workload

       annotation:
         could easily involve a time ratio of up to 100 (1
          hour of video may take100 hours to process)
         in practice, most documenters do not annotate
          the phenomena that they did (or didn’t) identify
         fallacy that annotation etc can be done later
           • video amplifies the value of event-participant
             knowledge




114
       Video: conclusions

       video can:
         add to the representational methods used by
          linguistics
         encourage us to look at diverse phenomena
         challenge our methodologies
         provide new and effective ways of
          disseminating language and cultural events and
          knowledge




115
       Video: conclusions

       video and multimedia
         little encouragement to produce multimedia
         multimedia:
            • distinguishes medium from mode of
               knowledge representation
            • richer and more explicit interleaving of
               various types of knowledge
            • imposes its costs in more appropriate areas




116
       Video: conclusions

       generic, amateur video fails to respect
        participants by not recognising linguistic
        specialisation, complexity or expertise to
        the same degree as “real” linguistic work
       naive video achieves “authenticity” mainly
        by not editing (and thereby not producing
        usable products!)




117
       Video: conclusions

       there is a lot of tradition in evaluating the
        descriptive value of linguistic work, but little
        in defining the documentation value of
        video
       if video really represents the claimed range
        of linguistic phenomena, it is a key mode of
        documentation: documenters (and their
        teachers) need to pay much closer
        attention to its methodologies!
       it is not clear that it is linguists who should
        be making video
118
      Conclusions




119
          Conclusion: we ask depositors to

       manage materials well
       collect and provide protocol information
       deliver materials, metadata
       send trial samples etc
       not withhold materials
       share/manage/delegate custodianship of
        materials
       maintain relationships with language
        stakeholders and ELAR

120
       Conclusion

       digital language archives combine
        traditional preservation with new ways of
        supporting creators and users of materials
       an archive can be more effective if
        materials are prepared as “portable”
       ultimately it is up to documenters to define
        what good documentation is
       ELAR welcomes you to discuss your
        archiving goals

121

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:6/1/2012
language:English
pages:121