Go-Geo! – the Geo-Data Portal Project Use of OAI to disclose metadata as an output from the portal Date: 14 August 2003 Status of document: Final Prepared by: The Project Team As part of the Go-Geo! project, a local catalogue of metadata records was created. These records describe data held at EDINA, MIMAS and the Archaeology Data Service1. The records are internally stored using the FGDC (Extensible Stylesheet Language Transformations) metadata standard and were created in XML format. One of the objectives of the project was to investigate the use of OAI to disclose metadata as an output from the portal. To do this, the suitability of the OAI-PMH protocol was investigated. The main OAI website is at: http://www.openarchives.org. The OAI protocol specifies that at a minimum, the data involved MUST be in Dublin Core (DC) format. This meant that the Go-Geo! records had to be mapped (with some loss of data) from the FGDC standard to the DC format using a standard mapping at: http://cuadra.nwrc.gov/pubs/crosswalk/dc7casestudy.htm Other metadata formats can be supported, although for this investigation, only DC was used. The way different metadata formats are supported by the OAI protocol is through the use of 'Sets'. This allows an OAI server to collect the records into logical groups that can be downloaded separately by a client. It is important to note that for implementers coming from a Z39.50 background this is an important difference between OAI and Z39.50. There is no concept in the OAI protocol of a result set of records being created by a search query, merely a server-defined mechanism to group related records together. This is because OAI is concerned with harvesting records rather than searching for records. Whilst this is more limited in scope than Z39.50, it is much more simple to implement and use, and lends itself to a model of localised record searching better than the distributed 'over-the-internet' nature of Z39.50. The OAI protocol works via HTTP, so it was necessary to develop a web-based OAI server. This would enable OAI clients to download the records via a web browser, or some other HTTP-compliant software (such as a web spider or robot). After a search of the tools available, listed at: http://www.openarchives.org/tools/tools.html, it was decided to use software provided by DLRL, the Digital Library Research Laboratory at Virginia Polytechnic Institute and State University, http://www.dlib.vt.edu. The DLRL OAI server package "OAI-PMH2 XMLFile file-based data provider", version 2.1, http://www.dlib.vt.edu/projects/OAI/software/xmlfile/xmlfile.html was used. This offers a simple suite of files containing Perl modules and an XML format configuration file. A supplied Perl script using the OAI server module was installed as a CGI script in a standard Apache web server running on a Unix platform. The configuration file and the DC XML record files were installed alongside the CGI script. This was all that was needed to set up the OAI server. 1 In the longer term it is assumed that the geo-data portal would cross-search nodes provided by MIMAS and the Archaeology Data Service. Records were held locally for demonstration purposes. To retrieve Go-Geo! Catalogue records using this OAI server, it is necessary to call the CGI script as a URL with parameters. This URL, entered into a browser, will retrieve ALL the Go-Geo! catalogue records: http://nevis.ed.ac.uk:9200/cgi-bin/OAI/OAI-XMLFile- 2.1/XMLFile/GoGeo/oai.pl?verb=ListRecords&metadataPrefix=oai_dc The OAI protocol specifies the use of 'Verb' parameters, and 'ListRecords' is one of these. The records are downloaded wrapped in OAI-conformant XML, with each record having a ’header‘ appended to it containing a server-defined date stamp and an ID. These date stamps can be used by a harvester client to only download records that have changed or been added since the previous download. DLRL also offer also a Perl client harvester at: http://www.dlib.vt.edu/projects/OAI/software/harvester/harvester.html However, it would be quite simple to build a customised harvester client. DLRL also offer an OAI repository explorer for testing OAI servers at http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai. This enables testing of a server's support for the various OAI options. In conclusion, it was found that OAI is a simple, easy-to-implement, lightweight standard to allow the harvesting of web-based XML data. Tools are available to set up a basic client-server system conforming to the protocol, in a relatively short time and with minimal configuration. In this investigation, only the local Go-Geo! database was made accessible to OAI searching. Although, this could be extended to include other records within the Geo-data Network and within nodes making up the GIgateway network, those responsible for these services expressed reservations about permitting this. These services have to show that they are of value to the community and this is done, partially, by reporting the number of records retrieved as a result of searches of their databases. Once the records are harvested, they have no way to obtain information about the use of their records. They also had concerns about retention of IPR once records were harvested and then, possibly, harvested again by others.