Implementing SRU in Perl
As a part of a sponsored National 1. The last thirty days of metadata
Science Foundation (NSF) grant called content available from the NSF
Ockham, the University Libraries of OAI Repository is harvested and
Notre Dame implemented a set of SRU saved locally.
modules and scripts written in Perl. This
text describes this process in more detail. 2. The content is indexed.
Ockham 3. Searches against the index are
returned as HTML pages, email
Ockham is a sponsored NSF Digital messages, or RSS feeds.
Library grant with co-PI’s at Emory
University, Virginia Tech, Oregon State 4. Everyday content is harvested
University, and the University of Notre from the Repository that is one
Dame. The primary purpose of the grant day old.
is/was to explore and implement
programatic methods of better 5. Everyday content older than
integrating NSF digital library content thirty days is deleted from the
into traditional library settings through local cache.
the use of “light-weight” protocols. In
general, light-weight protocols were 6. Go to Step #2.
characterized as non-proprietary,
modular in design, and Web Services- Through such an algorithm, the user is
based methods for data exchange and expected to articulate one or more
display. We used the name Ockham to searches against the index and then save
denote our desire to be not overly useful queries as RSS feeds in their RSS
complicated. Our example news reader. Using this approach the
implementations included, among other user should be able to read their news
things, a registry of digital library feeds on a daily basis as view an ever-
collections and services, a Find More changing set of results.
Like This One service, and an Alerting
service. For more information about Implementation
Ockham see http://ockkham.org/.
With the help of a very able and expert
Alerting service Perl programmer, the Alerting service
was implemented through a set of object
Notre Dame was charged with oriented Perl modules and
implementing the Alerting service. This accompanying scripts:
service, analogous to a current
awareness service, is intended to provide Ockham::Alert supports the
the means of learning “What’s new?” harvesting/caching/indexing
from the NSF Digital Library. In a process. Given a set of one or
nutshell, this is how it works: more OAI URL’s and associated
dates, this module allows the
developer to harvest OAI For example, through a cron job the
content, save it to a local cache content of the cache is successfully
(relational database), and dump updated and indexed on a daily basis.
the data from the cache to an Freetext, rudimentary Boolean, and
indexer. Since traditional library fielded queries are accurately supported
content also manifests itself as by the SRU client and server. Since
MARC data, the module supports search results are returned to the client
the incorporation of this data into as XML, it is easy to transform the
the cache as well. results into HTML, email messages, or
RSS news feeds.
SRU::Response facilitate the The downside includes problems with
implementation of an SRU the indexer. Swish-e only supports 7-bit
server. The Request module is characters and consequently non-ASCII
used to read the SRU operation characters are indexed and displayed
parameter and create an poorly. Just as much of a problem is the
associated Request object. Based regular harvesting of the data. While
on the type of the Request object, harvests are completed smoothly, new
the SRU::Response module items to an OAI repository were not
initializes and builds Response necessarily written recently. Instead,
objects. The result of this build new items in an OAI repository are
process is an XML stream in defined as items recently added. Things
compliance with the SRU written in 1996 are not necessarily new,
schema. but they are returned in harvests because
they are new to the repository. This can
CQL-Parser provides the ability be confusing and frustrating to users.
to read CQL statements and
convert them into queries Thus, we consider our implementation a
supported by an underlying qualified success. It is light-weight,
indexer. The indexer used in this standards-compliant, and non-
implementation is swish-e. CQL- proprietary. Libraries or other content
Parser is essentially a port of providers could take the tools we have
Mike Taylor and IndexData’s created and apply them to their own
cql-java package, and we are settings for their own purposes. A
appreciative of their support. different indexer could be used, and an
institution could make an effort to only
Links to source code and our add truly new items to their repository.
implementation are available at
Discussion Eric Lease Morgan
University Libraries of Notre Dame
The implementation more or less does
what it was designed to do. June 14, 2005