DataCite Infrastructure

Document Sample
DataCite Infrastructure Powered By Docstoc
					Technical Highlights

   25th August 2011
   Sebastian Peters
   German National Library of
   Science and Technology

• Developer Core Group was founded in September 2010 to
  build a central registration infrastructure for DataCite

• Implementation of services in cooperation with DataCite’s
  Service Group

• Developers from BL, CDL, CISTI, TIB

• All projects are open source

• Source Code is hosted on GitHub


Production state:
• Metadata Store (MDS, DOI/Metadata management)

Beta state:
• Metadata Export (OAI)
• Search

Future development:
• Content Negotiation
Metadata Store (MDS)
Overview I

MDS is DataCite’s central infrastructure for DOI Management:
•   Minting/Updating of DOIs
•   Storing of metadata

•   December 2010 (public beta)
•   June 1st 2011 (v2, production state)

MDS is only accessible by DataCite members (and their datacentres):
Metadata Store (MDS)
Overview II

User Roles:
• Allocators (=DataCite Member) can create and manage
  accounts for associated datacentres
• Datacentres can mint/update DOIs and store metadata

• User interface (UI)
• Programmatic interface (API)
Metadata Store (MDS)
Metadata Store (MDS)
Metadata Handling

• We only accept XML
• Metadata must validate to one of DataCite Metadata
  Schemas (
• Every single version is stored (a dataset can have multiple
  versions of metadata – the last one is the „real” one)
• XML is stored as it comes (no transformation)
Metadata Store (MDS)
Quality Assurance

• Restrictions per datacentre:
   • List of valid domain names of landing pages
   • DOI prefix(es)
• Metadata must be valid
• Periodically checking if landing pages exists (coming soon)
Metadata Store (MDS)
User Interface

• Best for low volume operations
• Maintain DOIs and metadata with simple web forms
• Optimal for datacentres with only a small amount of DOIs
Metadata Store (MDS)

• Best for bulk operations
• Can be integrated in existing infrastructures
• Simple RESTful API
• All the traffic goes via HTTPS
• All requests require HTTP Basic authentication
• API documentation:
             PUT /metadata/10.5072/TEST HTTP/1.1
             Authorization: Basic Rk9PLkJBUjoxMjM0NTY3OA==
             Content-Type: application/xml;charset=UTF-8

             <?xml version=“1.0“ encoding=“UTF-8“>
Metadata Store (MDS)
Direct Access or Custom Frontend?

•Datacentres can access MDS            •Allocator can use MDS via API
directly if allowed by its allocator   and develop custom frontend for
•Favored e.g. by BL,TIB
                                       •Favored e.g. by CDL
Metadata Export (OAI)

• Service for 3rd parties to harvest metadata stored in the
  DataCite Metadata Store (MDS)

• using the Open Archives Initiative Protocol for Metadata
  Harvesting (OAI-PMH).

• Metadata formats so far: Dublin Core, DataCite Metadata

• Sets for each allocator and datacentre for easy harvesting

• Beta version is available at

Metadata Export (OAI)
Metadata Search

• Open service to search metadata stored in the DataCite Metadata
  Store (MDS)

• Based on Lucene Solr

• User Interface

   • Complex boolean query language

   • Facets (Drilldown)


• Beta version available at

Metadata Search
Home Page
Metadata Search
Result List
Metadata Search
Metadata Search
Advanced Search

• Special form for advanced search

• We also support complex lucene query
  syntax, e.g.
   • title:laser OR subject:laser

   • publicationYear:[1990 TO 1995]
Content Negotiation for DOIs
Inspired By CrossRef

• Service for getting metadata of a DOI

• Uses DOI proxy (

• First implemented by CNRI and CrossRef:

   “but the beauty of the setup is that from now on, any DOI
    registration agency can enable content negotiation for their
     constituencies as well. DataCite- we're looking at you ;-) .”
                    (Geoffrey Bilder on CrossTech blog)

• Prototype (for some selective prefixes) is running at

Content Negotiation for DOIs
What is Conneg?

• HTTP Content Negotiation is a simple way for HTTP clients to get
  different representations of the same resource.
• Client only needs to know the internet media type (MIME type)
• We will expose metadata in DataCite Format via
• Other media types will follow (Dublin Core, RDF, BibTex, etc.)
• Could also be used to link directly to the dataset

        Resolving DOI:               Getting Metadata:

        GET /10.5072/TEST HTTP/1.1   GET /10.5072/TEST HTTP/1.1
        Host:             Host:
                                     Accept: application/x-datacite+xml
Thank you for your attention!

Shared By: