Repositories for and preservation of digital record of science

Document Sample
Repositories for and preservation of digital record of science Powered By Docstoc
					The Alliance: stakeholders join forces to help create a European Digital Information Infrastructure
Peter Tindemans Acting Chair Alliance for Permanent Access
Brussels, 15 November 2007


• Accessing digital records of science: serious strategic problem; publications and data • Principles, new legal regulations? • Technical, organisational, financial, legal implementation issues • Key characteristics of a European Digital Information Infrastructure • Alliance for Permanent Access: key stakeholders offer accelerate and assist in creating this infrastructure • Major role for EU

Storing and accessing the digital records of science: a serious strategic problem
• ESA: > 30% of requests for earth observation data for old data, number is increasing; also ESA offers its capabilities to the tens of repositories with earth observation data to get common solutions • CERN‟s LHC: how to store PBytes in first place, and then to keep them accessible • KB backs up electronic journals for several large publishers: only sustainable if long-term preservation becomes „industrialised‟ • European Bioinformatics Institute vital facility for life sciences: still no sustainable funding mechanism; needs full interoperability with chemistry databases etc • Combining, searching, mining, re-using many different datasets increasingly at heart of innovation and advances in science: requires more structured, transparent, interoperable world of repositories • CESSDA: how to ensure funding for distributed data archives; how to widen coverage in sciences traditionally, and maybe inherently, more dispersed

Publications and primary data
• Research and applications require access to publications and primary (experimental, observational, survey…) and processed (cf EBI) data • Both are digital, and increasingly multimedia • Publications and primary data are different in many respects, but some important principles are very similar. • And the „infrastructure‟ for storage, preservation, and access in accordance with these principles is in fact one „virtual‟ infrastructure, though for the time being probably with separate repositories for data and documents • Not only in, say, space science/astronomy collections of data are (planned) to be made available: ESFRI Road Map projects in social sciences, arts and humanities, biodiversity. However, too little focus on preservation and interoperability • In addition, increasing links (e.g. reference in article to primary or processed data; end user products refer to ancillary data sources,…) • Building archives or repositories, and incorporating long-term preservation and access, is very largely independent of the way we go with respect to OA

Differences primary data, documents
Primary data
No copyright, but data policy limitations for commercial use Highly non-standardised Need special Representation Information (structure, semantics, software) to be understood and processed, but machinereadable Digital volume huge Very dispersed at present

Copyright Much more standardised Human readable when displayed, but not machine-understandable

Digital volume modest or even small In journals or archives with publishers, libraries

Business models that include storage and preservation almost absent

Several models for storage, access 5 and preservation

Storing and accessing: work in four dimensions
• Principles, and the need for legal or other regulations or guidelines • Implementation issues (technical, financial, organisational,..) • An „infrastructure‟: coherent set of organisations, tools and procedures to provide storage and (permanent) access, and therefore to present solutions for the implementation issues • Approach and steps to establish „infrastructure‟: Alliance for a European Digital Information Infrastructure

Principles, regulations, guidelines
Open access to primary publicly funded research data Access to publications (at cost, versions of OA, …) Compliance with intellectual property rights •Quality of content •Quality of repositories Policies on quality assurance of repositories will emerge •Publications: peer review etc •Repositories: test with Draft Audit Checklist •Funding/business models •Custody models (e.g. national libraries backing up publishers for electronic journals)

New regulations, guidelines?
OECD Ministerial Declaration, 2004; OECD Recommendation end 2006 Mandatory deposit by funding agencies increasing

Implementation examples

•SCOAP3 •Digital Access and Rights Management

•Deposit and selection •Ensuring long-term sustainability

•Publications: adapt legal deposit to digital world; ongoing •Data: for specific areas (e.g. clinical trials) international guidelines; for general research data increasing deposit policies of funding agencies in designated repositories. US Interagency Group prepares plan for public infrastructure as permanent home for researchers‟ data.


Some implementation issues 1
• Who takes responsibility for the data, national libraries, research institutes, subject repositories, institutional repositories; what is best for optimal long-term storage, thematic, i.e. community-based solutions, or nationally-based ones; will storage be central or distributed • Will there be separate repositories for data and documents What links will need to be created back to the primary literature, who will do this and how will links be monitored and repaired

• What standards will need to be created and adopted for meta data, storage and retrieval • What best practices are emerging, such as the cross-linking of data repositories and document repositories to Current Research Information Systems (to have a semantically homogeneous and stable reference) • R&D&D into tools for preservation, for managing complex dynamic datasets, etc 8

Some implementation issues 2
• Digital rights and access management policies and tools

• Develop life-cycle costings, value-chain analysis • Funding models: maybe some income from (re)use; largely part of normal science funding mechanisms: awareness needed! • International coordination per field (~ SCOAP3 for transiting PP publications to OA journals) or across fields?


Community A

Community B

Community C

labs special data providers

„labs‟ special data providers

labs special data providers

general scientific publishers, general open archives, university libraries, deposit libraries, conventional archives special publishers special research libraries special publishers special research libraries

special publishers

special research libraries

research funding bodies, scholarly/professional societies, universities, ICT industry, national competence networks/coalitions



community-based 10

and cross- community standards and enabling mechanisms

A European Digital Information Infrastructure to take care of these issues
1. 2. 3. 4. 5. 6. 7. Identify core physical digital archives/repositories in several initial communities and among cross-community organisations. Do this for documents and for data These must OAIS-compliant to ensure proper archiving, interoperability and long-term preservation Framework for metadata, Framework for persistent identifiers, and number of registries, possibly other standards Cost-effective preservation methods and services must be available Common framework of principles and guidelines for management of access and rights (underlying the technical tools to implement this framework) Create Financial mechanism for developing and testing implementation tools, techniques and services, and for strengthening collaboration and training a. Certification service providers, accredited according to b. Common European accreditation mechanism.

Source: Task Force Permanent Access, December 2005 11

ALLIANCE for PERMANENT ACCESS wants to contribute to realise this infrastructure
Key Stakeholders from Science and Science Information committed at Board level to develop coherent European solution from which they will benefit themselves
Research organisations • ESA, CERN, Max Planck Gesellschaft/Max Planck Digital Library, STFC, CNET Funding agencies • ESF (representing all national funding agencies), JISC (UK) National libraries and archives • British Library, Koninklijke Bibliotheek, Deutsche Bibiliothek, Swedish National Archive; Publishers • International Association of STM Publisher; National ‘coalitions’ • DPC, NESTOR

Alliance aims to:
• establish wide consensus at strategic level among major stakeholders as to main characteristics of a European Digital Information Infrastructure (including long-term preservation and access ). Initial focus on records of science; • accelerate creation of main building blocks of this infrastructure; • work with industry vendors, national competence networks and coalitions, and international partners to ensure that technology, skills, and standards are in place; • work with science funding agencies on realistic funding/business models; • be key enabling mechanism for national governments and the EU to strengthen European strategies and policies for preservation and access, and their implementation; contribute to Europe as Information Society; • offers platform for effective coordination of individual technical projects and communities (see e.g. ESFRI projects in social sciences, arts & humanites, biodiversity), and nucleus for small eventual body to facilitate permanent coordination • strengthen role of European parties in world-wide efforts in preservation and access; • build, articulate and maintain continuing R&D programme.

Alliance Work Programme
Positioning the Alliance

Annual conference: launching conference 15 November Brussels, for governments, EU, funding agencies, research organisations, FP7 and ESFRI projects, publishers, national libraries, archives etc A.o. getting the European Digital Information Infrastructure on updated ESFRI Road Map

Working with communities
• starts consultation of a few communities (PP, Space, Social Sciences, Life Sciences, Global Change,..), gradually more

– what to store,in which repositories, standards for metadata, sharing of technical tools, testbeds…
• Manual of good practice etc (FP 7 support likely: PARSE-INSIGHT)

Coordination and stimulation of R&D
– – Revisiting outline R&D programme from Task Force Permanent Access Support from FP7 likely: PARSE INSIGHT

Developing accreditation mechanism Relations with US, Japan, Australia etc Developing funding/business model
14 Connects to Digital Cultural Heritage as a whole, as libraries and archives are members

EU to play major role
• • • • • • • Support throughout EU OECD Recommendations for open access to primary data No particular need yet for other guidelines on preservation and access; await consultation with communities EU to prescribe any eventual guidelines for EU-funded R&D projects Alliance is prepared to lead efforts of working with communities, strengthening coordination between individual projects, and building European Digital Information Infrastructure FP7 (“Digital Libraries and Content”; Research Infrastructures”) and i2010 Digital Libraries eContentplus provide useful support for individual projects Need for coordination of R&D (Financial) support for starting implementation of European Digital Information Infrastructure:
– Help pilot communities establishing requirements – Promote standards and establish European accreditation mechanism – Carry out a few large-scale pilots, including benchmarking and evaluation to validate the building blocks for individual repositories as well as their interoperability as part of the European Digital Information Infrastructure – Outreach activities: training, information exchange, knowledge transfer to member states with so far fewer activities in preservation of Records of Science


EU High Level Group on Digital Libraries will likely mark major milestone