Introduction to Digital Archives

Document Sample
Introduction to Digital Archives Powered By Docstoc
					                                                                a centre of expertise in data curation and preservation




       Introduction to Digital Archives
                                            Maureen Pennock

          EAOLUG Spring/Summer Meeting 2006

                                                                                                               Funded by:
          This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK:
          Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-
          sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San
          Francisco, California, 94105, USA.


EAOLUG :: RSC :: Cambridge                                                                                      23 May 2006
                                 a centre of expertise in data curation and preservation




                    Today’s talk
    • The DCC
        • Background & Context
        • What We Do
    • Digital Archives & Archiving
        • Definitions
        • Main Issues
        • OAIS
        • Systems




EAOLUG :: RSC :: Cambridge                                             23 May 2006
                               a centre of expertise in data curation and preservation




     UK Digital Curation Centre
    • JISC Circular 6/03 called for bids in digital curation
    • JISC and the e-Science Core Programme funding
       • for development, services and outreach in digital
         curation
       • for a research programme
    • Impetus to action
       • Growth in e-Science activity and data creation
       • Recognition that continuing access to digital
         information is needed


EAOLUG :: RSC :: Cambridge                                           23 May 2006
                              a centre of expertise in data curation and preservation




                     Partners
    • University of Edinburgh (lead site)
        • Chris Rusbridge, Prof Peter Buneman
    • University of Glasgow - HATII
        • Prof Seamus Ross, Director of HATII and Erpanet
    • University of Bath - UKOLN
        • Dr Liz Lyon, Director of UKOLN
    • Councils for the Central Laboratory of the
      Research Councils (CCLRC)
        • Dr David Giaretta


EAOLUG :: RSC :: Cambridge                                          23 May 2006
                                           a centre of expertise in data curation and preservation




                         Objectives
    •   Lead a vibrant international research programme to improve quality in
        data curation and digital preservation
    •   Deliver effective, efficient and high demand services
         • undertake evaluation of tools, methods, standards and policies
         • work with the community to establish registries of tools and
           technical information
    •   Create an active, innovative and collaborative Associates Network
    •   Connect communities
         • Universities and Research institutions
         • Scientific data and documents
         • International & cross-sector




EAOLUG :: RSC :: Cambridge                                                       23 May 2006
                                a centre of expertise in data curation and preservation




                     Research
    •   Annotation in Databases
    •   Data archiving
    •   Socio-economic and legal issues
    •   Metadata extraction and curation
    •   Provenance and databases
    •   Data transformation, integration and publishing
    •   Security
    •   Supporting technologies
    •   Organisational and cultural challenges to digital
        curation


EAOLUG :: RSC :: Cambridge                                            23 May 2006
                                 a centre of expertise in data curation and preservation




                  Development
    • DCC Approach to Digital Curation (white paper) –
      sets out the path for development activities:
        • Monitoring international standards
        • Development of a Representation Information
          Registry/Repository (DCC RIR)
        • Development of recommendations for tools and methods for
          generating Representation Information
        • Creating testbeds for digital curation tools
        • Creating auditing and certification processes for trusted
          repositories




EAOLUG :: RSC :: Cambridge                                             23 May 2006
                                   a centre of expertise in data curation and preservation




                        Services
    • Information Services
        •   Community-developed Digital Curation Manual
        •   Briefing Papers & FAQ’s
        •   Technology Watch
        •   Case Studies
        •   Best Practice Checklists
    • Advisory Services
        • Events: information days, workshops, training, conferences
        • Helpdesk
    • Audit and Certification Services


EAOLUG :: RSC :: Cambridge                                               23 May 2006
                             a centre of expertise in data curation and preservation




                    Summary
    • Support and promote continuing improvement in the
      quality of data curation and preservation activity
    • Nurture strong community relationships between
      practitioners, researchers, and curators
    • Address digital curation from all aspects of the
      records life-cycle
    • Develop and promote curation knowledge, tools and
      techniques
    • Identify and research new organisational, technical,
      and supporting curation challenges

EAOLUG :: RSC :: Cambridge                                         23 May 2006
                              a centre of expertise in data curation and preservation




               Digital Curation
    • Digital curation is all about maintaining and adding
      value to a trusted body of digital information for
      current and future use; specifically, we mean the
      active management and appraisal of data over the
      life-cycle of scholarly and scientific materials.

    • Digital Curation brings a whole host of challenges
    • The range of stakeholders that affect the survival of
      digital material cuts across the whole life-cycle
    • Everyone plays an important role

EAOLUG :: RSC :: Cambridge                                          23 May 2006
                                                     a centre of expertise in data curation and preservation




                      Digital Archiving
    • Digital archiving is a curation activity
    • Ensures that
       • Data is properly selected
       • Data is properly stored
       • Data can be accessed
       • The logical and physical integrity of the data is
         maintained over time
       • Data is secure and authentic *

    * Lord & MacDonald, e-Science Data Curation Report, 2003



EAOLUG :: RSC :: Cambridge                                                                 23 May 2006
                                                     a centre of expertise in data curation and preservation




                 Digital Preservation
    • Digital preservation is an archiving activity
    • Ensures that specific items of data are maintained
      over time so that they can still be accessed and
      understood through changes in technology *
    • Includes content files and associated metadata
    • Combats digital obsolescence
    • Keeps data authentic despite technological change
    • Has technical, organisational, and cultural challenges

    * Lord & MacDonald, e-Science Data Curation Report, 2003



EAOLUG :: RSC :: Cambridge                                                                 23 May 2006
                                    a centre of expertise in data curation and preservation




      What is a Digital Archive?
    • Inconsistency in use of the terms digital archive,
      digital repository, and digital library
    • Task Force on Archiving Digital Information 1996:
            “Defines digital archives strictly in functional terms as
       repositories of digital information that are collectively
       responsible for ensuring, through the exercise of various
       migration strategies, the integrity and long-term accessibility of
       the nation’s social, economic, cultural and intellectual heritage
       instantiated in digital form.”
    • Provide reliable solutions for life-cycle and long-term
      management of digital archival materials
    • System driver is Preservation, leading to Access
EAOLUG :: RSC :: Cambridge                                                23 May 2006
                             a centre of expertise in data curation and preservation




   What is a Digital Repository?
    • Collections of digital objects: content + metadata
    • Cross-domain implementation
    • Offer minimum set of basic services – Get, Search,
      Access control
    • Sustainable & trusted; well-supported and managed
    • Policies, processes, services, people
    • Overall commitment to stewardship of digital
      materials
    • Enables quick & remote access to digital materials


EAOLUG :: RSC :: Cambridge                                         23 May 2006
                                a centre of expertise in data curation and preservation




 Main Issues for Digital Archives
    •   User Requirements
    •   Transfer & Ingest
    •   Metadata
    •   Standards
    •   Digital preservation strategies
    •   Linkage
    •   Audit and Certification
    •   Legal Issues
    •   Access restrictions


EAOLUG :: RSC :: Cambridge                                            23 May 2006
                             a centre of expertise in data curation and preservation




                        OAIS
   • Open Archival Information System Reference Model
   • ISO 14721:2003
   • "An archive, consisting of an organisation of people
     and systems, that has accepted the responsibility to
     preserve information and make it available for a
     Designated Community"
   • Establishes a common framework of terms and
     concepts
   • Defines an Information Model
   • Identifies basic Functions of an OAIS



EAOLUG :: RSC :: Cambridge                                         23 May 2006
                               a centre of expertise in data curation and preservation




           OAIS Functional Model
   • Functional model has six entities:
       •   Ingest;
       •   Archival Storage;
       •   Data Management;
       •   Administration;
       •   Preservation Planning;
       •   Access
   • Described using UML diagrams




EAOLUG :: RSC :: Cambridge                                           23 May 2006
                                      a centre of expertise in data curation and preservation




       OAIS Functional Entities
  P                                                                                           C
                              Preservation Planning
  R                                                                                           O
  O                                               Descriptive
                                                                                  DIP         N
                                                     info.
  D             Descriptive
                                                                                 queries      S
                   info.
  U                                    Data
                                                                                result sets
                                                                                              U
       SIP
  C                                 Management                  Access                        M
  E             Ingest                                                           orders       E
  R    SIP                      Archival                                                      R
                    AIP         Storage             AIP

       SIP                                                                        DIP


                                 Administration


                              MANAGEMENT                  OAIS Functional Entities (Figure 4-1)




EAOLUG :: RSC :: Cambridge                                                      23 May 2006
                                   a centre of expertise in data curation and preservation




                         DSpace
    • DSpace: “DSpace is a groundbreaking digital
      repository system that captures, stores, indexes,
      preserves, and redistributes an organization's
      research data [...] the DSpace software platform
      serves a variety of digital archiving needs.”
    • Open source software
    • Example use:
        •   American Museum of Natural History Research Library
        •   Chapel Hill, SILS, Theses & Dissertations
        •   University of Cambridge – Academic & related content
        •   Edinburgh Research Archive (ERA)

EAOLUG :: RSC :: Cambridge                                               23 May 2006
                                    a centre of expertise in data curation and preservation




                          EPrints
    • Eprints: “GNU EPrints is generic archive software
      under development by the University of
      Southampton. It is intended to create a highly
      configurable web-based archive.”
    • Open Source software
    • Example uses:
        •   Southampton Crystal Structure Report Archive
        •   Central Connecticut State University Digital Archive
        •   Central European University – Preprint Archive
        •   Curtin institute of Technology Institutional Repository
        •   DLIST – Digital Library of Information Science & Technology

EAOLUG :: RSC :: Cambridge                                                23 May 2006
                                  a centre of expertise in data curation and preservation




                        Fedora
    • Fedora: “Open source software that gives
      organisations a flexible service-oriented architecture
      for managing and delivering their digital content.”
    • Open source software
    • Example uses:
        • Digital Case, Case Western Reserve University's electronic
          repository and archive: stores, disseminates, and preserves
          faculty research in digital formats (both born digital and
          digitised)
        • University of Queensland eSpace – research digital
          repository with published articles and conference papers,
          book chapters, theses and other forms of written research

EAOLUG :: RSC :: Cambridge                                              23 May 2006
                             a centre of expertise in data curation and preservation




                      Others
    • Other systems such as Digital Commons institutional
      repository service
    • Other, custom-built systems
       • NARA Electronic Records Archives (ERA) project
       • UK National Archives
       • Public Record Office, Victoria
       • KB eDepot, Netherlands
       • Several other large bodies whose archive pre-
         dates development of aforementioned repository
         software
    • Commercial systems
EAOLUG :: RSC :: Cambridge                                         23 May 2006
                             a centre of expertise in data curation and preservation




                 In conclusion
    • There is much in common between digital archives,
      libraries, and repositories
    • Intention and subsequent functionality is the key to
      defining digital storage systems
    • Digital Archives offer a framework for maintaining &
      preserving the authenticity and integrity of records
      over time
    • Several software solutions are available
    • Development is ongoing
    • Need technical know-how to implement
    • There is still a lot of work to do... .
EAOLUG :: RSC :: Cambridge                                         23 May 2006
                             a centre of expertise in data curation and preservation




                    Thank you.
                    Questions?
                     Maureen Pennock
                  m.pennock@ukoln.ac.uk

           Join the DCC Associates Network at
                   http://www.dcc.ac.uk


EAOLUG :: RSC :: Cambridge                                         23 May 2006