Sabbatical Report Digital Collections University of Cincinnati by alicejenny



Sabbatical Report –
Linda Newman – March 2008

This document reports on a sabbatical that was
undertaken in four two-month segments:

January 2007: Stanford University Libraries (Palo Alto,

February 2007:     California Digital Library (Oakland,

May 2007: University of Wyoming (Laramie, Wyoming)
with visits to the University of Colorado, the Alliance
(formerly CARL), and on the return trip, the University of

June 2007: OhioLINK (Columbus, Ohio)

September/October 2007: development work from home

January/February 2008: development work from home
Sabbatical Report
Linda Newman
March 2008

This document reports on a sabbatical that was undertaken in four two-month segments:

January 2007: Stanford University Libraries (Palo Alto, California)

February 2007: California Digital Library (Oakland, California)

May 2007: University of Wyoming (Laramie, Wyoming) with visits to the University of
Colorado, the Alliance (formerly CARL), and on the return trip, the University of Kansas

June 2007: OhioLINK (Columbus, Ohio)

September/October 2007: development work from home

January/February 2008: development work from home

Stanford University Libraries

Attached is a report (which in turn has several Stanford documents attached in turn) about the
breadth and scope of digital projects at Stanford University Libraries and Information Resources
(SULAIR). Perhaps the most eye-opening discovery was the amount of resources – especially
staff resources – devoted to the development of repositories that the users would never directly
touch. Digital preservation is being undertaken with a major research effort that may prove to
benefit all digital repositories. The SULAIR organization seems to have made a qualitative leap
where digital library work is no longer comprised of extra projects on the side, but is fully
incorporated into the primary mission, and this is reflected in the organizational structure and
the deployment of resources.

I spent a significant amount of time working with Glen Worthy, head of the Humanities Digital
Information Service (HDIS), comparing notes on how we administer our Luna Imaging
installations. I was able to observe methods that Stanford uses to build collections bypassing
the Luna Studio client. I was also able to observe these methods, which were complex,
breaking at times. See attached document entitled “Building Collections for Luna Insight –
source tables versus Studio/Inscribe. “       We also consulted about different metadata
standards, and I imported a small collection of records we had built in using our metadata
standards based on VRA3 (Visual Resources Association) into a demonstration collection at
Stanford built using a VRA 4 schema. I gave a presentation to a group of Stanford librarians
about the emerging VRA 4 metadata standard, and how to build collections using Luna’s
provided clients.

      1   Sabbatical Report – Linda Newman – March 2008
California Digital Library –

 I worked with Rosemary Lack (Director of Digital Special Collections) Lena Zentall (Design
Analyst) and Brian Tingle (Technical Lead). My visit happened to coincide with a major effort
to product an internal report that would recommend future directions for the Image Service, a
program targeted specifically at Art and Design libraries and visual resources curators.   A
prototype system using Luna Software had been offered to selected libraries but had proven
difficult to manage. (The ‘thick’ Luna client – something Luna Imaging is actively working to
eliminate with the version 6.0 platform – is not well suited to a consortium of multiple

At the same time the CDL had spent considerable time developing ‘Calisphere’
( Calisphere is described as the University
of California's free public gateway to a world of primary sources, with more than 150,000
digitized items — including photographs, documents, newspaper pages, political cartoons, works
of art, diaries, transcribed oral histories, advertising, and other unique cultural artifacts revealing
the diverse history and culture of California and its role in national and world history. One
possibility was developing the Calisphere platform so that it could support the image Service.
However, the input mechanism for Calisphere was based on METS (Metadata Encoding and
Transmission Standard – see and an ingest method using
METS that was directly supportable for library practitioners had not yet been developed.

A third possibility was using ARTStor, which had just announced the possibility of hosting
institution specific (and access limited) collections. However, these collections would be
physically hosted by ARTStor and it was unclear how the same images and metadata could be
preserved by CDL in its preservation system.

Although it was recognized that no solution was perfect, the decision was made to give ARTStor
the go-ahead for trial collections. It is this author’s opinion that eventually CDL will enhance its
open source and self-developed platforms such as Calisphere to the point that proprietary
software will not be desirable, but in the meantime, the hosted ARTStor solution appeared to be
more manageable than Luna for the multiple Art, Design and Visual Resources collections
within the University of California system.

I also spent one afternoon meeting, along with Brian Tingle, with Martin Haye, Publishing
Systems Architect, who is a primary developer for XTF, the “eXtensible Text Framework”, an
open source search and retrieval engine for texts, including texts presented using TEI (“Text
Encoding Initiative”), PDFs and other structured texts. Calisphere makes use of XTF as well.
The discussion convinced me that XTF could be a good fit as the University of Cincinnati
libraries start to support more text resources online.

      2   Sabbatical Report – Linda Newman – March 2008
University of Wyoming –

The University of Wyoming proved to be perhaps the closest fit to the circumstances at
University of Cincinnati, both in staff resources and environmental factors such as membership
in a consortium.

Included with the documents grouped with this report are my notes about the University of
Wyoming (“UWYO.doc”), a document describing the mission and standards used for digital
collections at the University of Wyoming (“AHC Digitization.pdf”), and a document describing
the projects underway at Wyoming (“ADR Testbed Overview.Doc”).              In this last document
note the number of collections slated for both the Alliance Digital Repository (ADR), a
consortium of libraries with a mission somewhat similar to OhioLINK, and Luna, which
Wyoming also uses. UWYO libraries anticipate using Luna’s xml gateway to export some
collections to ADR, supporting them in both places – other collections, typically faculty and staff
or student scholarly work may go only to the ADR.

Although the University of Wyoming’s student population is roughly 13K, as the only state,
graduate level University there are many roles and responsibilities that are mandated from
supporting the research needs of the students and faculty to the cultural heritage responsibilities
for the state (support Wyoming State Historical Society for web site and digital projects, for
example) to State Archives responsibilities. Thus, the mission of the University and the
University Libraries can be fairly compared to that of much larger institutions. The amount of
work accomplished by a small Library Systems staff was impressive given the scope and breadth
of the expectations of their share holders.

University of Colorado

I met for half a day with Holley Long, Systems Librarian for Digital Initiatives at the University
of Colorado at Boulder. Colorado also uses Luna Imaging for collections arising from Library
special collections, and anticipates using the ADR for faculty and student work. More details are
provided in my notes (“University of Colorado at Boulder.doc”).

ADR – Alliance Digital Repository

I met for half a day with Jessica Branco Colati, Project Director of the ADR
( See details noted in the included “ADR.doc” report, but it is
notable that the ADR had made progress with FEDORA, integrating the ‘FEZ’ interface to
Fedora from Australia. However, using FEZ meant that some FEDORA enhancements could not
be implemented until FEZ was first updated, and this was proving to be difficult as the ADR in
particular waited for the support in FEZ of custom collection/institution specific ‘skins’ or
branding. In the meantime the ADR was supporting two access interfaces to largely the same
set of materials – one based on XTF from the California Digital Library and one using FEZ.
Both search engines did things the other did not and it was felt that one would eventually win

      3   Sabbatical Report – Linda Newman – March 2008
out. XTF searches within text documents themselves, while FEZ did a better job of searching
the metadata.

University of Kansas –

I had a one day layover in Lawrence Kansas on the return trip and met with John Miller, Special
Projects Librarian. The University of Kansas also uses Luna software for images arising from
library special collections, and John was a key manager in that effort. The Libraries also had a
search interface to TEI and pdf texts using XTF and has an institutional repository (managed at
the Computing Center) based on dSpace. There was also a separate GIS and Data Services unit
and some reorganization combining Digital Initiatives and the GIS service was possible but not
definite, for the future. Perhaps most surprising to me was that there appeared to be no
discussion about merging areas of functionality maintained separately by Luna, XTF and dSpace.

At Kansas they have been clear all along that Luna Insight is not the ‘repository of record’, as up
until the most recent release it did not manage original images, only derivative (jpg2000) images.
Kansas also has a digital master machine which they term Be Safe (“Big Electronic Safe”).
Metadata is exported from BeSafe to Luna. (“Vireocat” has also been used as the cataloging tool
for some collections.) They have found that Luna’s XML gateway is the best way to get
metadata out of Luna.

OhioLINK –

I spent June 2007 working next door to John Davidson (Assistant Director, DRC Development)
and Peter Murray (Assistant Director, New Service Development) at OhioLINK, and also spent
time talking to other staff in particular Thomas Dowling (Assistant Director of Library Systems,
Client/Server Applications), Meg Spernoga (Assistant Director of Library Systems, User
Services), Sheila Yeh (Senior Systems Developer, working with XTF), and Anita Cook (Director
of Library Systems).

I shared the information I had obtained about the state of the Colorado/Wyoming Alliance
FEDORA project with John Davidson, confirming a suspicion we both had that development
with FEDORA was not coming along at anything close to the pace that OhioLINK users
required, although much interesting and high potential work was in progress. I was able to
observe the Committee meetings where OhioLINK made the decision to switch from FEDORA
to dSpace for a trial institutional repository. At the same time the eBooks project was moving
forward, migrating from OhioLINK’s earlier platform, using XTF (from the California Digital
Library).    The DMC had been recently migrated from proprietary software to another open
source platform, DXLS image class, but this was still seen as a transitional move – not the
platform for an OhioLINK institutional repository.

OhioLINK also faced the dilemma that much of its journal indexes (the “EJC”) that were hosted
in-house were sitting on something called the ‘Science server’. This platform which had recently
been picked up by Endeavor was proving to be too elaborate in its latest release (from Endeavor)

      4   Sabbatical Report – Linda Newman – March 2008
and was referred to by Thomas Dowling as an ‘orphaned site’. It was thought that FEDORA
might still prove a replacement for this content, but this was also proving not to be a quick
development process. (It appears from email correspondence after my sabbatical visit that work
is now continuing with Postgres and SOLR.)

Electronic Theses and Dissertations (ETDs) reside on a system developed at OhioLINK.
Thomas Dowling continues to enhance this system and brought up an enhancement while I was
there.   It is unclear whether OhioLINK will eventually merge the Dissertations and Theses
system with the dSpace institutional repository.

Still another platform was anticipated for EADs (electronic archival descriptions) specifically for

Given that OhioLINK will continue in the near future to be supporting collections on several
platforms (DXLS, SOLR, dSpace, XTF, and other in-house-developed systems and legacy
platforms), the need for a master search engine was apparent. Thomas Dowling is, I believe,
now developing such a search engine (the ‘All-in-One’ index) at OhioLINK.

Library practitioners will be faced with some practical challenges in that specific formats may be
supported well under some platforms, while still having a need to have access integrated through
the institutional repository, the master index, or both. For example, if libraries have TEI texts or
PDFs with machine-readable text, it will be useful to be able to search within the texts, for
example using the XTF platform that is used for the eBooks collection. But, unlike the eBooks
collection, these texts may be part of a multi-format (videos, images, etc.) collection that should
also be part of an institutional community in the dSpace DRC repository. (This is similar to the
dilemma that found the Alliance ADR project based in Colorado using two interfaces –
FEZ/FEDORA and XTF – for the same set of documents.)

While OhioLINK has for the time being moved away from the earlier expansive plans to utilize
FEDORA as an underlying and unifying platform for all delivery systems, they are nevertheless
leveraging the open source platforms at hand to provide real and more immediate functionality
for the OhioLINK community. That desire for a unifying yet flexible platform may become a
priority again, as these topics are often cyclical. It may be that after developing a robust
institutional repository with diverse content OhioLINK will in the future be better positioned to
develop, with the help of its library community, the next generation of repositories.

Development work –

Much of my remaining time during this sabbatical was spent, while also continuing committee
assignments and other production support work, installing a prototype system of XTF (on my
personal laptop) and confirming that it would indeed provide an index of the UC News Record
collection, as well as be an appropriate platform for other texts that were developed for the
University of Cincinnati Digital Press publications, and for the still proposed Pacific Railroad
project where we will use TEI for delivering text. I was able to get an XTF system to be

      5   Sabbatical Report – Linda Newman – March 2008
installed and to function, with some help from Martin Haye on the XTF Developers Forum, as I
uncovered a minor bug for the windows environment (see included “XTF Users List.pdf”).


No Metadata Schema exists to fit all needs: Unlike digital projects at the University of
Cincinnati, none of the library systems and consortia I visited had attempted to describe all
collections with the same metadata schema, although Stanford did have a master schema from
which they created different sets of metadata elements for different collections.

No One Software Platform exists to fit all needs: Institutions with a wide range of available
resources are still facing the same problems of access, integration and preservation, without
finding that any one system, whether proprietary or open source, provides all three needs well.
Most of the library systems and consortia I visited are not expecting to invest in just one platform
for digital projects in the near future, but instead anticipate different systems for repositories of
record and for user access, and are also making distinctions between platforms for library
collections and the scholarly work of students and faculty. Instead of trying to accommodate
these needs on one platform, these libraries and consortia are looking for ways to provide master
search engines and integrated presentations of different approaches to the same materials.

Digital Preservation is Outpacing Digital Access as an Emerging Area: Digital Preservation,
and accomplishing what are sometimes called ‘dark archives’, or at least repositories of record
that are not accessible to users and the public, is becoming an increasing topic of interest, and at
some institutions, for example at the California Digital Library and at Stanford, demanding equal
if not more resources than the user-accessible systems. Although not the original scope of this
sabbatical, I felt that I had learned enough about digital preservation to make a presentation at
the October 2007 Academic Library Association of Ohio annual conference. Please see these
links to a power point presentation
and a paper
(                  (Peter
Murray graciously agreed to co-present on OhioLINK’s digital preservation plans, although his
presentation is not archived on the ALAO site.)

Digital Projects are Central to Libraries’ Missions: Digital Projects as a whole are no longer
seen as boutique efforts to highlight collections of interest, but are becoming one of the Libraries
most important and central missions. While few institutions will be able in the next decade to
share Stanford’s current goal to digitize its entire print collection, all institutions and consortia
are looking at digitization efforts as requiring a comprehensive and consistent plan, not an ad hoc

      6   Sabbatical Report – Linda Newman – March 2008

To top