Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Clumps virtual success or failure by dfhercbml


									Clumps as catalogues: virtual success or failure?
Peter Stubley
If the concept of parallel searching of catalogues via Z39.50 is stimulating, the initial
manifestation is truly exciting. Maybe not exactly Alexander Graham Bell or Archimedes
territory but life-enhancing nevertheless: to have been working on the implementation of an
idea for over twelve months, as the UK eLib clumps projects have, and suddenly see
bibliographic records returned simultaneously from a search across multiple library
catalogues, makes it seem that all the arguments, stress and technical tinkerings have finally
been worthwhile. Only as the first exhilaration subsides does reality kick in, the questions start
and systematic testing take over.

There is little doubt that Z39.50 in its current state of development is not the panacea for all
the interoperability and resource discovery ills assaulting the conjoined international bodies in
librarianship and information science. Many of the problems – and the reason for the problems
– have been well documented and, accordingly, are common knowledge in the Z39.50
community: three recent papers have appeared in Ariadne alone [1], [2], [3] and this present
contribution re-visits some of the issues and considers their potential effect on end users.

At the same time, the rationale of using clumps to kick-start large-scale resource discovery was
a major incentive behind their initiation, particularly as leading from this could be seen the
possibility of the creation of a UK virtual union catalogue. Of course, in parallel with the
clumps, JISC, the Joint Information Systems Committee, has also been funding COPAC, the
CURL OPAC [4], as a physical union catalogue and a study comparing these two approaches
to a national holdings database is expected during 2000 (a study on the implications of Z39.50
on COPAC was conducted on behalf of CURL in 1998 [5]). So, with current Z39.50
limitations, will the clumps projects – bearing in mind that they are not yet complete – be
consigned to the dustbin of eternity, seen as an interesting side road that inadvertently ended up
as a cul-de-sac? Against this, the projects have always insisted that they were not purely
technical exercises and, in considering success or failure, not only the impact of the technical
limitations needs to be considered – with the knowledge that improvements here will definitely
be forthcoming – but also wider issues relating to service. This paper is an attempt to provide a
brief, critical review of the current state of clumping in the UK. It begins by making some
comparisons between clumps and the alternative of a physical union catalogue, a comparison
that should not be seen, in any way, as a criticism of COPAC but simply as ‘a view from the
clumps’. Having established that clumps are ‘a good thing’, the paper then goes on to outline
ways of focusing searching via collection descriptions and dynamic clumping, considers some
technical concerns about Z39.50, and ends by looking at the possibilities for inter-connecting


Catalogues as collections

Apart from perhaps those sectors of society where a lingering idealism remains, the concept of
perfection, certainly of the perfect creation, has long since been overtaken by sheer
pragmatism, lack of time and economics. The idea of the perfect national union catalogue, a
comprehensive, monolithic edifice within which can be located quickly and precisely all items
for research, education or pleasure falls somewhere down the possibility stakes, though,
sensibly, that does not stop librarians re-invigorating the idea from time-to-time and updating it
to take account of the latest technologies. The key to a useful national union catalogue is
comprehensiveness, making it the ideal jumping-off point for ‘known item, only-one-in-the-
country searches’; the key to a useable national union catalogue mixes comprehensiveness
with other characteristics such as a good user interface, ease of access, speed of response and

Comprehensiveness in this context might be easy to define – all the books, monographs and
other items published by the nation, together with all foreign publications purchased by the
nation’s libraries – but creating (the perfection of!) the national union catalogue presents a
number of practical hurdles. Not least is the fact that significant collections of interest to staff
and students in higher education are held in non-HE libraries and, while cross-sectoral
initiatives are increasingly encouraged, the funding required of such a large-scale project is
unlikely to be forthcoming in the near future, in spite of the aspirations created by the
possibilities of joined-up government. In other words, a physical national union catalogue will
inevitably be manifest as a partial national union catalogue. Whether or not a physical
national union catalogue restricted to HE holdings alone would be doomed to failure remains
open to speculation. The sheer size of such an undertaking would appear daunting: CURL
currently has around 20 full members, representing approximately 20% of HE institutions in
the UK, thus leaving COPAC significantly short on the comprehensiveness front. Furthermore,
as the collections of the major UK research libraries are the foundation of COPAC, the law of
diminishing returns would be expected to kick-in before too long, with each new library
representing a significant additional implementation work load for the data centre while
providing a decreasing number of unique items for the union database.

The importance of COPAC as a major tool used internationally by researchers, librarians and
others in checking the existence of books and their locations and when searching for
bibliographic records is beyond doubt. This pre-eminence will remain, whether or not COPAC
expands. As stated above, COPAC owes its importance to the significance of the collections of
its founding libraries but, even within the overall database, these collections could be viewed
as related but defined (and sometimes overlapping) groups: the legal deposit libraries;
universities having research excellence in particular subject areas; universities who have
gained excellent teaching quality marks in particular subject areas; and university libraries
with special collections. From this perspective, a physical union catalogue like COPAC is a
‘clump of clumps’ offering centralised access through a common interface, though in this
scenario there is still a risk that the ‘unique item’ search will be unsuccessful due to the
economic forces limiting comprehensiveness. A further limitation on comprehensiveness is the
retrospective cataloguing backlog in many university libraries, though this will equally effect
physical and virtual union catalogues.

The library collections in the eLib clumps reflect a similar – if currently smaller – spread.
RIDING [6], for example, contains a deposit library (BLDSC) and two CURL members
(Leeds and Sheffield) together with other strong research and teaching institutions and a broad
mix of special collections: a strong clump with a lower national comprehensiveness factor than
COPAC, though with the advantage that it includes catalogues not in COPAC. A combination
of clumps would increase the comprehensiveness factor while bringing in even more non-
CURL libraries. And, as COPAC itself is Z39.50-compliant, this could be searched as an
additional database in parallel with the clumps: at this point, the comprehensiveness factor
increases significantly, as, consequently, do the chances of locating that one-off unique item.
The opportunities for a further improvement in comprehensiveness lie with those clumps that
form strong links with public libraries, providing access to a myriad of special collections that
might otherwise be lost to higher education. RIDING and CAIRNS [7] are expecting to be
exemplars of cross-sectoral clumps though, as funding becomes available there is the strong
possibility that public libraries will form their own clumps which will, in turn, link to those
from HE. The idea of considering a national union catalogue (physical or virtual) as a group of
collections that can be combined in different ways depending on the needs of the searcher
provides some potential for the unique item search, which could be ‘fired-off’ to those
collections most likely to match the topic of the item, then broadening out from there as the
condensed search failed. The ways of achieving this in a clumped environment, through the use
of collection descriptions and dynamic clumping are indicated in the next main section.

Union catalogues and services

One of the key differences between virtual and physical union catalogues would appear to
revolve around service. The four UK clumps projects are being developed by consortia and a
consortium implies a group of libraries working together to provide, as far as possible,
extended co-operative services that benefit all members. A consortium implies trust, the
willingness to test new ideas and services with others, while offering degrees of protection for
one’s own institution and users. As the originator of COPAC, this is just what CURL has done,
and it has explored – though not yet initiated – interlending possibilities as an extension of its
database. As far as the clumps are concerned, extended services – including access,
interlending and sometimes borrowing – have been closely related to each Z39.50 gateway.
For example, RIDING has created its RAP, the RIDING Access Policy, negotiated during the
project and allowing all ‘accredited researchers’ (postgraduate students, researchers, academic
staff) to borrow from any member library on production of a RIDING Access Card;
interlending services associated with the clump gateway are also being piloted among sites.

Services such as these necessitate a wide involvement at all staffing levels and across many
functional areas before being put into effect, with the consequence that the clumps have not
developed as isolated projects but have been integrated into existing services to real users at all
member libraries. A key advantage of this approach is that staff across each consortium know
each other well, understand and support the aims of the clump and accordingly respond to
users’ needs in a pro-active manner. By comparison, it is suspected that as the physical
catalogue model grows to improve its comprehensiveness it will become monolithic,
necessarily separated from real services, real users and individual libraries, and interpose a
layer of impersonality and, potentially, bureaucracy between the end user and the item
required. Certainly in comparison with an integrated resource discovery and request service
such as being developed with Fretwell-Downing Informatics for RIDING, where an item
identified in a catalogue search can be dropped into an interloan request form and dispatched
to the home library for mediation, the relationship between a bibliographical record of an item
and its actual use is more remote in the physical union catalogue model.

Closely associated with this higher degree of service orientation from the virtual catalogue is
the ability to customise the gateway interface so that it more closely reflects the needs of
librarians and users of the home consortium. This might include information on consortium
services, details of individual libraries and their collections (though see below), descriptions of
limitations of the responses of individual Z-targets, and authorisation and authentication
procedures. The ways in which targets are presented to users may be changed to reflect
consortial circumstances, together with links to other clumps, some of which may have inter-
consortial relationships, some of which may not. Similarly, with greater ‘home’ control will
come the possibility of adding other, non-catalogue or non-bibliographical Z-compliant
resources, widening the range of the gateway and its potential usefulness.

Moving from the concept of the virtual union catalogue – the clump as ‘a good thing’ – to its
practical manifestation free of technical problems, is not a seamless operation and some of the
issues surrounding the transition are covered in the next two sections. But even if the
technology is flawless and even if clumps can themselves readily be clumped, how can users
navigate through the jungle to find the tree they need? Little enough is known about the links
between user needs, search behaviour and results, though the present generation of web search
engines does not always give cause for celebration, encouraging as it does undirected
searching. This approach does not help the user, particularly the increasing numbers with
limited time, and it potentially puts increased loads on networks and servers, neither of which –
this being emergent technology – have been tested. It could, however, be reasonably predicted
that a virtual national union catalogue that ground networks to a halt, slowed to a crawl
individual library management systems through external load, and wasted the time of users
would not be an unqualified success. From a different perspective, extremely high demand
could also blight the performance of the network connections to, and the machine performance
of, a physical union catalogue. Methods that enable the user to refine searches of the virtual
union catalogue should alleviate the problems and current attempts are focused on the
identification of collections. The use of collection level descriptions is being investigated by
RIDING and the rather different approach of dynamic clumping being pioneered by CAIRNS
is described by Dennis Nicholson in this issue of Ariadne.

Through the use of collection descriptions, it is hoped that users will discover collections of
interest and then search just the relevant databases rather than all those available. Furthermore
it is hoped that users will be able to perform searches across multiple collections in a controlled
way and that software will perform such tasks based on known user preferences. The eLib
Collection Description Scheme has been developed as 29 metadata elements [8] and is being
implemented in RIDING; ways of encouraging use prior to a clump search, particularly
through interface design, are under discussion.

Several papers have dealt with the technical issues surrounding Z39.50, considering, in
particular, some of the deficiencies and difficulties of implementing the protocol across a range
of library management systems, in much the same way as the clump projects. Results of testing
the clumps in their present developing state indicate similar inconsistencies in bibliographic
data retrieved and the potential for confusion when these records are returned to end users
provides some cause for concern.

The RIDING experience of analysing Z39.50 search results is fairly typical. It suggests that
substantial inconsistency occurs in the response of targets to author searches: while a simple
surname search invariably delivers the expected goods, though with some data overload if the
surname is non-specific, there are pronounced differences between targets on the introduction
of first names into the search string. In these cases, some targets respond with zero hits
irrespective of the format of the first name, others produce a reduced sub-set using first name
initial, while others produce yet a different sub-set with first name in full. Attempting to
increase search specificity by combining keywords – say author and title – similarly works
well with some targets and completely fails with others.

With a knowledge of the operation of the Z39.50 attributes, their combinations and order, the
methods behind the construction of the indexes of individual library management systems, and
other reactions of each target, responses such as that described above can be explained. It is
also possible that with ‘query adaptation’ results can be matched more closely to expectations
by adjusting the value of attributes and order in which they are transmitted to targets. If such
query adaptations do not work, then analytical explanations on the performance of targets
serve little purpose if the signals sent to end users result in a lack of confidence. One view –
already expressed by more than one RIDING librarian – is that the users of our library systems
possess well-honed searching skills and that any move away from the excellent capabilities of
the existing systems will result in negative feedback and bad clump publicity. Alternatively,
the overwhelming use of Web search engines, with an acceptance of high recall and low
relevance, must also have affected the expectations of users of library OPACs. It is extremely
difficult to predict reactions to new systems and it is just possible that users will bring a
tolerance to Z39.50 gateways not currently shown by some expert searchers of OPACs. This
tolerance, coupled with a little experience, familiarity and understanding – the qualities needed
in life – may elicit an enthusiasm borne of the additional possibilities and services provided, in
spite of inherent limitations. The clumps intend to obtain feedback on these issues during

This is not to say that clump projects should (or will) rest on their laurels and fail to develop
better searching possibilities within the current restrictions of the Z39.50 standard: all the
projects are committed to implementing the best gateway possible. But when all the technical
steps have been taken, there will probably remain a need for pop-up boxes, help screens and
other devices to provide explanations on use and limitations, at least in the immediate future.
RIDING, for one, is developing these as an active service to users and Ridley [2] has indicated
similar support mechanisms in BOPAC (the Bradford [or ‘Better’] OPAC), to clarify the
retrieval of items that don’t, at first sight, appear to match the search criteria. Other help is also
at hand: there is little doubt that the formerly fragmented world of Z39.50 implementations has
benefited tremendously from the internationalisation of co-operation that has occurred over the
last few years and the recently announced Bath Profile [9] is the latest manifestation designed
to bring a greater harmony to the interpretation and use of attribute combinations and
catalogue interoperability.

One of the other current limitations with Z39.50 systems – the general lack of holdings and
circulation data – will hopefully have a platform for resolution in the near future. The ZIG
(Z39.50 Implementors Group) Holdings Schema [10] is expected to be ratified in January
2000 and some vendors are committed to implementing this within 6 months though, of
course, others may take longer (while some targets can already return at least holdings details).
Interpreting holdings information is further complicated by variations in cataloguing practice
between (and sometimes within) institutions.

While the previous section indicates a generally optimistic outlook for future developments in
Z39.50, the clumps projects have spent much of their two years or so duration resolving and
clarifying technical issues to ensure that gateways to their groups of resources are up and
running. To move these individual clumps towards the beginnings of a virtual national union
catalogue by clumping the clumps ramps up the complexity and is something that has yet to be
explored in practice. It is hoped that the comments in the section below will contribute to that

The addition of an external resource to an existing clump for either one-off or regular
searching is straightforward: a secondary list of potential targets – COPAC, Library of
Congress, etc. – can be provided and users select those of relevance (possibly guided by
collection descriptions). For the duration of the search, the additional resources become, in
effect, part of the original clump. In exactly the same way, the secondary list could incorporate
other clumps, each of which would behave as a single target to the originating gateway. It is
assumed, in this scenario, that the originating gateway would not need to know the attribute
settings used by the secondary clumps for their individual targets: the gateway would be
treated just like any other end user. There is also the question of resource and network
efficiency which may lead to forcing a search of the home clump first. From the way the
existing clumps have developed, it is clear that there is good reason for this: a clump such as
RIDING has been established to offer services to the combined clientele of its institutions. For
example, non-RIDING members may search library catalogues via the RIDING Gateway but
they will not be eligible for added value services such as ILL. In this way, for members, the
home clump may be considered an access tool while for non-members it is a resource discovery

The practicalities of this scenario need testing clump to clump but, on the surface would seem
to offer few problems. The most effective way of presenting other clumps to end users needs
addressing, in addition to a consideration of the requirements of end users for searching
secondary clumps. These might include: distance from study base (‘I’m prepared to travel x
miles or for y hours’); a geographical zone (‘I’m going to London for the weekend; which
libraries will offer…’); and resource strengths. Resolving these questions will raise user
interface issues and the possibility of extending collection descriptions by incorporating
geographic and travel information. In terms of technical efficiency, this approach would put
less load on any one node (e.g. the home clump) by ‘stacking’ servers, though it does not
reduce the overall network traffic or load on the resources being searched. Indeed it could
increase this by encouraging users to search the whole of a remote clump or even searching the
same resource twice: where, for example, the same catalogue was included in a geographic
clump and a subject clump.

Of course, the straightforward scenario is simplistic. Few users will need to search all the
catalogues of a secondary clump, particularly with the implementation of effective collection
descriptions. Their needs will preferably be met by a pick-and-mix approach, choosing
catalogues from the home clump, adding a further resource from a secondary clump and yet a
further database from another clump location. The question then arises, ‘How do you do this
technically?’: in the absence of a universal adoption of the Bath Profile, can the attribute
settings, carefully adjusted to provide efficient searching from one clump gateway, be utilised
‘on the fly’ to external gateways? A further question is, ‘How do you do this in user interface
terms?’: what is the best way of presenting clumps to be used in a pick-and-mix manner? The
relevance of these issues to national clumping discussions has become apparent during the
existing projects and current timescales and budgets will not allow their exploration in detail.
Additional work is required to take this further.

At present, both the physical and the virtual routes to a national union catalogue have
limitations. In terms of today’s technology, the physical model predominates but would require
the resolution of a number of organisational and political issues to provide the
comprehensiveness expected. For example, if the physical model were to be based on COPAC,
CURL would have to agree a way forward for a significant expansion of the database in
addition to discussing methods for incorporating non-HE records. Technology for the virtual
national union catalogue is moderately sound as far as it goes – basic search and retrieve –
while tomorrow’s developments will lead to improved interoperability and services to users via
initiatives such as the Bath Profile, the ZIG Holdings Schema, further refinements to collection
descriptions and the use of dynamic clumping. A considerable amount of work remains to be
done to develop inter-clumping and the user interface associated with it but the clumps do
indicate an attractive way forward, based as they are on services, customisation for their home
client groups and the possibility of extensions to include non-bibliographic databases. What is
clear, is the tremendous enthusiasm on the part of both physical and virtual clumpers for
moving the technology forward and an awareness that they need each other to achieve real
changes in services for end users.

[1]. Dye, Juliet and Harrington, Jane (1999), Clumps in the real world: what do users need?
     Ariadne, no. 20. <>
[2].     Ridley,      Mick      (1999),     Practical     clumping.     Ariadne,    no.     20.
[3].     Miller,      Paul     (1999),      Z39.50      for     all.    Ariadne,    no.     21.
[4]. COPAC: <>
[5]. CURL feasibility study to investigate potential applications and strategic implications of
     Z39.50 technology on the COPAC service: <>
[6]. RIDING: <>
[7]. CAIRNS: <>
[8]. Brack, Verity (2000), The eLib Collection Description Scheme. Vine, no. 116 (due
[9]. The Bath Group, October 15 1999. The Bath Profile: an international Z39.50
specification for library applications and resource discovery. <
[10]. Z39.50 Holdings Schema: <>

To top