Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

OAI

VIEWS: 4 PAGES: 3

  • pg 1
									Go-Geo! – the Geo-Data Portal Project

Use of OAI to disclose metadata as an output from the portal




Date: 14 August 2003




Status of document: Final

Prepared by: The Project Team
As part of the Go-Geo! project, a local catalogue of metadata records was created.
These records describe data held at EDINA, MIMAS and the Archaeology Data
Service1. The records are internally stored using the FGDC (Extensible Stylesheet
Language Transformations) metadata standard and were created in XML format.

One of the objectives of the project was to investigate the use of OAI to disclose
metadata as an output from the portal. To do this, the suitability of the OAI-PMH
protocol was investigated. The main OAI website is at: http://www.openarchives.org.

The OAI protocol specifies that at a minimum, the data involved MUST be in Dublin
Core (DC) format. This meant that the Go-Geo! records had to be mapped (with some
loss of data) from the FGDC standard to the DC format using a standard mapping at:
http://cuadra.nwrc.gov/pubs/crosswalk/dc7casestudy.htm

Other metadata formats can be supported, although for this investigation, only DC
was used.
The way different metadata formats are supported by the OAI protocol is through the
use of 'Sets'. This allows an OAI server to collect the records into logical groups that
can be downloaded separately by a client. It is important to note that for implementers
coming from a Z39.50 background this is an important difference between OAI and
Z39.50. There is no concept in the OAI protocol of a result set of records being
created by a search query, merely a server-defined mechanism to group related
records together. This is because OAI is concerned with harvesting records rather than
searching for records. Whilst this is more limited in scope than Z39.50, it is much
more simple to implement and use, and lends itself to a model of localised record
searching better than the distributed 'over-the-internet' nature of Z39.50.

The OAI protocol works via HTTP, so it was necessary to develop a web-based OAI
server. This would enable OAI clients to download the records via a web browser, or
some other HTTP-compliant software (such as a web spider or robot). After a search
of the tools available, listed at: http://www.openarchives.org/tools/tools.html, it was
decided to use software provided by DLRL, the Digital Library Research Laboratory
at Virginia Polytechnic Institute and State University, http://www.dlib.vt.edu.

The DLRL OAI server package "OAI-PMH2 XMLFile file-based data provider",
version 2.1, http://www.dlib.vt.edu/projects/OAI/software/xmlfile/xmlfile.html was
used. This offers a simple suite of files containing Perl modules and an XML format
configuration file. A supplied Perl script using the OAI server module was installed as
a CGI script in a standard Apache web server running on a Unix platform. The
configuration file and the DC XML record files were installed alongside the CGI
script. This was all that was needed to set up the OAI server.




1
 In the longer term it is assumed that the geo-data portal would cross-search nodes provided by
MIMAS and the Archaeology Data Service. Records were held locally for demonstration purposes.
To retrieve Go-Geo! Catalogue records using this OAI server, it is necessary to call
the CGI script as a URL with parameters. This URL, entered into a browser, will
retrieve ALL the Go-Geo! catalogue records:
http://nevis.ed.ac.uk:9200/cgi-bin/OAI/OAI-XMLFile-
2.1/XMLFile/GoGeo/oai.pl?verb=ListRecords&metadataPrefix=oai_dc

The OAI protocol specifies the use of 'Verb' parameters, and 'ListRecords' is one of
these. The records are downloaded wrapped in OAI-conformant XML, with each
record having a ’header‘ appended to it containing a server-defined date stamp and an
ID. These date stamps can be used by a harvester client to only download records that
have changed or been added since the previous download.

DLRL also offer also a Perl client harvester at:
http://www.dlib.vt.edu/projects/OAI/software/harvester/harvester.html

However, it would be quite simple to build a customised harvester client.

DLRL also offer an OAI repository explorer for testing OAI servers at
http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai. This enables testing of a server's
support for the various OAI options.

In conclusion, it was found that OAI is a simple, easy-to-implement, lightweight
standard to allow the harvesting of web-based XML data. Tools are available to set up
a basic client-server system conforming to the protocol, in a relatively short time and
with minimal configuration. In this investigation, only the local Go-Geo! database
was made accessible to OAI searching. Although, this could be extended to include
other records within the Geo-data Network and within nodes making up the
GIgateway network, those responsible for these services expressed reservations about
permitting this. These services have to show that they are of value to the community
and this is done, partially, by reporting the number of records retrieved as a result of
searches of their databases. Once the records are harvested, they have no way to
obtain information about the use of their records. They also had concerns about
retention of IPR once records were harvested and then, possibly, harvested again by
others.

								
To top