Current issues in digital archiving by KF3fUk

VIEWS: 8 PAGES: 30

									Current design issues for
    digital archives


           Robert Munro
           (presented by David Nathan)

           Endangered Languages
           Archive (ELAR), School of
           Oriental and African Studies,
           London
    Outline

    1. Introduction
    2. Archive architectures
    3. Current Issues
      1. value-adding interaction from ‘end’ users
      2. flexibility in access to materials
      3. granularity of description of materials
    4. Conclusions


2
    Introduction – ELAR

    Part of the Hans Rausing Endangered
     Languages Project (HRELP).
    Open for deposits since October 2005.
    In the process of designing and implementing
     key systems.




3
    Introduction – ELAR

    ELAR will be the first language archive that allows
     users to:
      add metadata in the language of their choice
      add new metadata (comments, descriptions, links) to
       existing materials
      translate metadata into a language of their choice
      select language preference(s) for viewing existing
       metadata
      add metadata to archived materials at different levels of
       granularity
4
    Introduction – current issues

    ‘End’ users adding value to archive materials
      who will moderate such additions?
    Flexible support of access
      can an archive explicitly support multilingual users?
    Metadata – comments / description of materials:
      should the granularity of description be at the level of:
        files,
        collections of files,
        and/or sub-subsections of a file?
5
    Archive architectures

    The classic ‘silo’ view of an archive:
        little more than disaster-proof backup




    Producers         Silo
6
    Archive architectures

    The producers are not the only users:
      different dissemination formats are required…




                   Silo              Dissemination
    Producers
7
    Archive architectures

    The producers are not the only users:
      different dissemination formats are required…
                             …for different user communities




                    Silo             Dissemination    Designated
    Producers
8                                                    communities
    Archive architectures

    Working formats are not preservation formats:
        materials may need to be transformed on ingest




    Producers   Ingestion    Silo     Dissemination    Designated
9                                                     communities
     Archive architectures

     You cannot rigidly preserve digital data:
         file need to refreshed and migrated to current formats



                                                                   afd_34
                              afd_34                             dfa dfadf
                             dfa dfadf                           fds fdafds
                             fds fdafds
                                                  afd_34
                                                 dfa dfadf
                                                 fds fdafds




                                       afd_34                  afd_34
                                    dfa dfadf                 dfa dfadf
                                    fds fdafds                fds fdafds




     Producers   Ingestion      Archive                                       Dissemination    Designated
10                                                                                            communities
     Archive architectures

     …but the objects, metadata and structures are
      still backed up in disaster-proof silo’s.



                                                                   afd_34
                              afd_34                             dfa dfadf
                             dfa dfadf                           fds fdafds
                             fds fdafds
                                                  afd_34
                                                 dfa dfadf
                                                 fds fdafds




                                       afd_34                  afd_34
                                    dfa dfadf                 dfa dfadf
                                    fds fdafds                fds fdafds




     Producers   Ingestion      Archive                                       Dissemination    Designated
11                                                                                            communities
     Archive architectures

     Archives need to define three types of ‘packages’
         ingestion, archive and dissemination:



                                                                   afd_34
                              afd_34                             dfa dfadf
                             dfa dfadf                           fds fdafds
                             fds fdafds
                                                  afd_34
                                                 dfa dfadf
                                                 fds fdafds




                                       afd_34                  afd_34
                                    dfa dfadf                 dfa dfadf
                                    fds fdafds                fds fdafds




     Producers   Ingestion      Archive                                       Dissemination    Designated
12                                                                                            communities
     Ingestion (Accession) packages

     Formats & structures that can be converted to
      archive formats with minimal effort:
       open file formats
       well-documented structures: XML with schema ideal
     The content needs to take into account the many
      potential uses of the materials:
       high quality sound and video
       a variety of genres
       detailed metadata and structural information
13
     Dissemination packages

     Many potential users of archived materials:
       researchers
       speakers
       educators
       publishers
     With many different requirements:
       access to materials by various methods
       archive services
       continuation of ownership of language materials
14
     Current issues – value adding

     The current model is fairly uni-directional
         but users can/should add value to archive materials



                                                                   afd_34
                              afd_34                             dfa dfadf
                             dfa dfadf                           fds fdafds
                             fds fdafds
                                                  afd_34
                                                 dfa dfadf
                                                 fds fdafds




                                       afd_34                  afd_34
                                    dfa dfadf                 dfa dfadf
                                    fds fdafds                fds fdafds




     Producers   Ingestion      Archive                                       Dissemination    Designated
15                                                                                            communities
     Current issues – value adding

     Users should be able to add to existing materials:
         speakers’ comments on content
         results of recent research


                                                                   afd_34
                              afd_34                             dfa dfadf
                             dfa dfadf                           fds fdafds
                             fds fdafds
                                                  afd_34
                                                 dfa dfadf
                                                 fds fdafds




                                       afd_34                  afd_34
                                    dfa dfadf                 dfa dfadf
                                    fds fdafds                fds fdafds




     Producers   Ingestion      Archive                                       Dissemination    Designated
16                                                                                            communities
     Current issues – value adding

     The archive needs to trust certain users to add
      metadata to existing materials:
       should the identity of users be recorded / open?
       should users be able to challenge existing metadata?
     Who to trust?
       depositors cannot moderate all comments on objects,
        especially if comments can be in any language
       but can an archive deny a speaker’s request to add
        comments to a recording of them speaking?

17
     Current issues – flexibility of access

     The archive cannot create different dissemination
      packages for every language and/or user:



                                                                   afd_34
                              afd_34                             dfa dfadf
                             dfa dfadf                           fds fdafds
                             fds fdafds
                                                  afd_34
                                                 dfa dfadf
                                                 fds fdafds




                                       afd_34                  afd_34
                                    dfa dfadf                 dfa dfadf
                                    fds fdafds                fds fdafds




     Producers   Ingestion      Archive                                       Dissemination    Designated
18                                                                                            communities
     Current issues – flexibility of access

     Users should be able to personalize access:
         language preference(s) for metadata
         preference on type of materials


                                                                   afd_34
                              afd_34                             dfa dfadf
                             dfa dfadf                           fds fdafds
                             fds fdafds
                                                  afd_34
                                                 dfa dfadf
                                                 fds fdafds




                                       afd_34                  afd_34
                                    dfa dfadf                 dfa dfadf
                                    fds fdafds                fds fdafds




     Producers   Ingestion      Archive                                       Dissemination    Designated
19                                                                                            communities
     Current issues – flexibility of access

     Flexibility of search / browse:
       keyword ‘search engine’ type search
       rich relationships between objects for browsing
       geographic searches
       research community specific search




20
     Current issues – flexibility of access

     Flexibility of language:
       most metadata in most archives is in English
       should metadata be multilingual?




21
     Current issues – flexibility of access

     If a user prefers to speak Quechua, then
      Spanish, then English:
       rather than accessing via one interface per
        language…


                                OR                        OR …
        pan                          bread

        Fotografia tomada por        Photograph by Juan
        Juan Pérez Martínez          Pérez Martínez
        Enero 2006                   January 2006

22
     Current issues – flexibility of access

     If a user prefers to speak Quechua, then
      Spanish, then English:
       …users should get all languages at once, according
        to availability of data and their preferences
          label in Quechua:
          photographer in Spanish:
          date in English:                 t’anta

                                       Fotografia tomada por
                                       Juan Pérez Martínez
                                       January 2006

23
     Current issues – granularity

     Archives tend to treat archived files as ‘atomic’
       metadata only refers to files as a whole
     What about
       a specific comment about a 20 second subsection of
        the file?
       a general comment applying to many files?




24
     Current issues – granularity

     For example, suppose we have an annotated
      sound recording of some event:




25
     Current issues – granularity

     Some metadata is about the file as a whole:
       date recorded, speakers, title




26
     Current issues – granularity

     Some metadata is about sub-segments:
       name of a significant person or place
       specific linguistic phenomena




27
     Current issues – granularity

     It is likely that users will want to:
        add comments to such subsections
        richly link subsections to other items
        make unambiguous reference to subsections
     At the time of deposit, no one can predict which
      subsections of files will later be significant:
        users need to be able to explicitly define subsections
         of archive objects


28
     Conclusions

     Archives are not static repositories:
        an archive supports materials for multiple different
         user communities in parallel
     Value-adding interaction:
        archived materials can be further enriched by users
     Flexibility in access to materials:
        personalizable interaction with archive materials
     Granularity of description of materials:
        user defined granularity of materials
29
     Thank you




30

								
To top