Implementing SRU in Perl

Document Sample
Implementing SRU in Perl Powered By Docstoc
					                 Implementing SRU in Perl
As a part of a sponsored National                1. The last thirty days of metadata
Science Foundation (NSF) grant called               content available from the NSF
Ockham, the University Libraries of                 OAI Repository is harvested and
Notre Dame implemented a set of SRU                 saved locally.
modules and scripts written in Perl. This
text describes this process in more detail.      2. The content is indexed.

Ockham                                           3. Searches against the index are
                                                    returned as HTML pages, email
Ockham is a sponsored NSF Digital                   messages, or RSS feeds.
Library grant with co-PI’s at Emory
University, Virginia Tech, Oregon State          4. Everyday content is harvested
University, and the University of Notre             from the Repository that is one
Dame. The primary purpose of the grant              day old.
is/was to explore and implement
programatic methods of better                    5. Everyday content older than
integrating NSF digital library content             thirty days is deleted from the
into traditional library settings through           local cache.
the use of “light-weight” protocols. In
general, light-weight protocols were             6. Go to Step #2.
characterized as non-proprietary,
modular in design, and Web Services-          Through such an algorithm, the user is
based methods for data exchange and           expected to articulate one or more
display. We used the name Ockham to           searches against the index and then save
denote our desire to be not overly            useful queries as RSS feeds in their RSS
complicated. Our example                      news reader. Using this approach the
implementations included, among other         user should be able to read their news
things, a registry of digital library         feeds on a daily basis as view an ever-
collections and services, a Find More         changing set of results.
Like This One service, and an Alerting
service. For more information about           Implementation
Ockham see http://ockkham.org/.
                                              With the help of a very able and expert
Alerting service                              Perl programmer, the Alerting service
                                              was implemented through a set of object
Notre Dame was charged with                   oriented Perl modules and
implementing the Alerting service. This       accompanying scripts:
service, analogous to a current
awareness service, is intended to provide         Ockham::Alert supports the
the means of learning “What’s new?”                harvesting/caching/indexing
from the NSF Digital Library. In a                 process. Given a set of one or
nutshell, this is how it works:                    more OAI URL’s and associated
                                                   dates, this module allows the
       developer to harvest OAI              For example, through a cron job the
       content, save it to a local cache     content of the cache is successfully
       (relational database), and dump       updated and indexed on a daily basis.
       the data from the cache to an         Freetext, rudimentary Boolean, and
       indexer. Since traditional library    fielded queries are accurately supported
       content also manifests itself as      by the SRU client and server. Since
       MARC data, the module supports        search results are returned to the client
       the incorporation of this data into   as XML, it is easy to transform the
       the cache as well.                    results into HTML, email messages, or
                                             RSS news feeds.
    SRU::Request and
     SRU::Response facilitate the            The downside includes problems with
     implementation of an SRU                the indexer. Swish-e only supports 7-bit
     server. The Request module is           characters and consequently non-ASCII
     used to read the SRU operation          characters are indexed and displayed
     parameter and create an                 poorly. Just as much of a problem is the
     associated Request object. Based        regular harvesting of the data. While
     on the type of the Request object,      harvests are completed smoothly, new
     the SRU::Response module                items to an OAI repository were not
     initializes and builds Response         necessarily written recently. Instead,
     objects. The result of this build       new items in an OAI repository are
     process is an XML stream in             defined as items recently added. Things
     compliance with the SRU                 written in 1996 are not necessarily new,
     schema.                                 but they are returned in harvests because
                                             they are new to the repository. This can
    CQL-Parser provides the ability         be confusing and frustrating to users.
     to read CQL statements and
     convert them into queries               Thus, we consider our implementation a
     supported by an underlying              qualified success. It is light-weight,
     indexer. The indexer used in this       standards-compliant, and non-
     implementation is swish-e. CQL-         proprietary. Libraries or other content
     Parser is essentially a port of         providers could take the tools we have
     Mike Taylor and IndexData’s             created and apply them to their own
     cql-java package, and we are            settings for their own purposes. A
     appreciative of their support.          different indexer could be used, and an
                                             institution could make an effort to only
Links to source code and our                 add truly new items to their repository.
implementation are available at
http://alert.ockham.org/.

Discussion                                                        Eric Lease Morgan
                                                  University Libraries of Notre Dame
The implementation more or less does
what it was designed to do.                                              June 14, 2005