Document Sample
XML Powered By Docstoc
					                          XML: A New Technology for Libraries

        XML, or eXtensible Markup Language, has had subtle but pervasive and positive
effects on Libraries. It is non-proprietary, relatively simple, and expandable. These and
other features make it invaluable for librarians in many different ways and contexts. It is
integral in web interfaces, in linking interlibrary loan and other consortia, and
exceedingly helpful in indexing and other uses for full text documents.

                                           What XML Is

       XML is an acronym that stands for Extensible Markup Language. What this
means is that the document is “marked up” with tags. A tag is a bracketed label that is
appended in front of and at the end of a bit of content. For example: <tag>example</tag>.
Another type of markup language you may be familiar with is HTML. HTML uses
predefined tags to control the display of the content. For example you can us a tag to
make a piece of text displayed in bold font by appending a bold tag: <bold>text</bold>to
give you “text” on, a web site.

        However, XML is more than that. Kyle Banerjee says, “On a related note, it is
better to think of XML as a grammar than as a language. XML establishes rules for
defining new formats.” (Banerjee 2002a) Whereas HTML has an exclusive list of
predefined tags that can be used to do only a certain list of predefined actions, XML also
gives a standard for creating your own, making it customizable to specific needs.
       “XML differs fundamentally from HTML in that it specifies neither a tag set (vocabulary) nor a
       system governing the meaning of particular tags (semantics). Instead, XML provides a set of
       specifications within which different publishers, authors, and document producers can create their
       own tags as a means to describe and organize their own particular content. As long as certain rules
       are obeyed, these tags can be read and processed correctly by any web browser no matter what
       computer system or software was used to create them.” (Miller and Clarke 2004)
This is one of the features that is making XML widely accepted and applied as a standard
for use on the web. It is infinitely customizable for each users needs, and that
customizability does not take anything away from its compatibility.

        What makes it particularly useful to librarians is that XML separates form from
display. XML tags are meant to define or label the content in ways that can be accessed
later on. “Many humans recognize the meaning embedded in various character strings,
but computers do not. In order to make selected strings of text addressable for machine
processing, a markup convention is necessary to distinguish one group of strings from
another.” (Miller & Clarke 2004) For example, a number can have many meanings in
many contexts. It can be a date, a page number, a dollar amount, or a count of objects.
However, the computer doesn’t know the difference. With a markup language like XML,
however, you can tell it the difference, with tags. I was born in <year>1978</year> which
makes me <age>28</age> years old and I live at <address>1978</address>
<street>Sample Street</street>. With this information stored in an XML document as
such, I have the potential to command the computer to act on this information. If I were
to perform an activity on this document that searches for birth years, I won’t also be
given parts of the document that relate to my address.
        The universality of XML is another one of its prized features, and one that makes
it so versatile and valuable, especially to libraries. Libraries are not tied down to some
particular type of software or hardware to continue to enjoy the benefits afforded them by
integration of XML into their various projects. They are not tied to a product that will
become obsolete in future upgrades of their, or others’ systems. . “Due to Unicode
support and platform neutrality, XML offers the greatest promise of data longevity (or
future-proofing) as hardware, software, and network protocols continue to change.”
(Miller 2000)

         These features of XML, its ability to separate display from content, its extensible
nature, and its universality, make it a wonderful tool for libraries and librarians. It can
assist in many areas of library challenges, including expanded and easier communication
with consortia and interlibrary loan, full-text document markup for indexing and
searching, and richer metadata resources.

                              What XML Can Do for Libraries

        One of the major steps libraries have taken is to combine catalogs and access to
information in the form of consortia and the extensive use of interlibrary loan. This has
been a revolution for libraries. Patrons of one library can have access to information kept
in a wide range of cooperating libraries. They can use one search engine to reach to all of
this information. However, this process has been limited by the use of proprietary
software enabling the linking of catalogs. The universality of XML allows it to be used
for creating more wide-spread and direct communication between such cooperating
libraries. It can make the cooperation easier and more expansive.

         XML is also very useful in marking-up full text documents. This enables quicker
and more relevant searching of these documents for information needed. Conventional
indexing is greatly enhanced by the use of XML. “Document-centric XML formats
promote the consistent markup of full-text articles, archival finding aids, books, etc., in
digital repositories or local websites. … To avoid problems of impermanence, metadata
embedded as content may best represent a source of uniform and reliable content by
which to support the automated extraction/harvesting of metadata maintained separately.”
(Miller & Clarke 2004)

        Metadata resources, bibliographic databases and especially online catalogs are
benefited through the use of XML. It allows more direct interface with web browsers in a
format that is accepted through all sorts of web-related software. It is more forgiving to
new types of documents, such as web resources and electronic documents, than the rigid
structure of MARC. One advantage of XML in this area is its handling of a hierarchal
structure. This allows better handling of relationships between items.
       “A system of discrete bibliographic entities coupled with a consistent linking mechanism would be
       very powerful and enable more sophisticated retrieval. Cataloging is laden with relationships that
       are not covered by or that do not utilize existing linking entry fields. XML offers sophisticated
       linking techniques, and related records need not even reside in the same system to be directly
       linkable.” (Miller and Clarke 2004)
                                 What XML Has Done for Libraries

       Many Libraries have created specific projects experimenting with taking
advantage of the multitude of benefits of XML. Most of these combine XML with other
programs, languages and ways of handling data. None of them, however, would have
been possible without the use of XML technology.

        Interlibrary loan programs have been greatly advanced through the use of new
XML-based applications. Applications can be created to make internal handling of
interlibrary loan requests easier and faster, as well as assist communication among
systems working together in consortia. Oregon State University is one example of a place
that has taken advantage of this technology to advance its internal handling of interlibrary
       “Since 1998, Oregon State University has been using an application called Interlibrary Loan
       Automated Search And Print (ILL ASAP) to automatically search interlibrary loan requests and
       print request forms sorted by location and call number, complete with availability information,
       scannable Ariel addresses, shipping labels (if no Ariel address is present), and billing data
       customized to the borrowing library or consortium involved. This free application has been
       adopted by dozens of libraries around the country.” (Banjeree 2002b)
This application was originally telnet based until the limitations of such technology
became too much to surmount. It crashed with certain types of updates, and it began to
not be compatible with current operating systems, making itself obsolete. OSU needed to
redesign the project so that it would be more flexible and cross-platform. They found the
best way to do this was to design an XML-based ILL ASAP. This conversion to XML
also had the advantages of making the application easily manipulated by those with more
modest technical training. Advanced programmers were not needed to make common
modifications. The effect XML had on OSU’s automated ILL system was to make it
more resistant to problems with upgrades, easier to maintain, and generally easier to run.
They are able to successfully save time and money on their interlibrary loan costs.
(Banjeree 2002b)

        The Washington Research Library Consortium had a different sort of need that
was able to be addressed by XML technology. Rather than an internal automation of
request handling, they needed better, faster, and simpler ways to maintain communication
among the different sources and users in the Consortium. “The Washington Research
Library Consortium uses XML to provide access to subscription databases, digital
collections, materials requested via interlibrary loan, and library catalogs that run on a
combination of commercial, open source, and locally developed platforms. This system,
known as ALADIN (Access to Library And Database Information Network) not only
delivers content to seven academic research libraries, but also performs critical related
tasks such as patron authentication using XML messages transmitted between
applications over the Web.” (Banjeree 2002a)ALADIN is a service that is spread over
several applications in several areas of the consortium’s systems. In order to bring it into
an integrated whole, they built an XML middleware to bring communication between the
different parts that was simple and universal.. It allowed them to communicate fluently,
without the need to have the systems too closely tied, which would have caused them
difficulties with maintenance. (Gourley 2002)
        Where the index was once the one great resource for researchers exploring
nonfiction work, the capabilities of searching with full-text electronic documents allow us
much faster and easier research. It only makes sense to make as much use of this
advantage of electronic documents as we can. With XML, we can go beyond the benefits
of searching for terms and work on finding the relevant, desired information, without
being inundated with unnecessary, unrelated information. In a project at Halton Hills
Public Library, they expedited their indexing work with XML by deciding to
“…automate the initial stages of markup, flag ambiguous terms, and then follow up with
a review by someone familiar with the work.” (Lewis 2002) This greatly reduced the job
of the indexer while making the results of the effort much more efficient for the
researcher. When working with full-text, XML is an invaluable tool for managing and
locating needed information.

        Another large area where XML impacts libraries is in work with metadata. The
introduction of XML technology may change the face of bibliographic records
permanently. Machine Readable Cataloging, known as MARC, is a format of data
storage that has been the standard for bibliographic record-keeping in libraries since the
beginning of automated cataloging. As it has grown and developed over time, many
difficulties have developed, and many of them can be solved with the use of
modernization involving XML. (Miller 2000) MARC includes inconsistencies,
unnecessary complexities, useless information, and a lack of potential to work in
hierarchies. (Johnson 2001) Stanford University’s Lane Medical Library is just one place
that has decided to tackle this challenge by converting MARC records into XML.
However, during the process of inventing a straight conversion, XMLMARC, they
realized that they could solve a lot of the problems with MARC with a more extensive
adjustment and conversion to an XML structure.
       “In looking for a data structure that would support library information on the web and also address
       what were seen as problems in current bibliographic description, Lane Library chose XML. Mr.
       Miller observes that the “significant aspect of XML may be its separation of content, presentation,
       and linking, so that each may be handled optimally.” Additionally, XML was seen as “inherently
       hierarchical” and “advanced and web-oriented.”” (Johnson 2001)

         Libraries all over are coming to the same realization. MARC is limited, and XML
can provide just the type of modernization that they need to overcome their current
issues. Lane Library is not alone in developing tools to bring MARC into XML. Rebecca
Guenther, in an article describing MODS, another XML-based conversion of MARC
data, developed by the Library of Congress’ Network Development, concludes that, “The
emergence of XML as a standard encoding language necessitates rethinking MARC for
use in a new environment.” (Guenther 2003) XML and the design of XML-based
technologies for modernization of cataloging data has motivated librarians all over to
reconsider the structure of current cataloging standards. It is not just a simple conversion
task, but a trigger to take a new look at an old system with the opportunity to recreate it
not only to be more friendly with new technology, but to be more efficient and effective
at its job.

        XML is not the kind of technology that catches patrons’ attention and makes them
marvel at the great new things their libraries are finding the funds to provide. From the
viewpoint of the end receiver if its benefits, it is subtle, almost invisible, as it makes
familiar services become increasingly more effective, more efficient, and more pervasive.
XML is making integration with the web, with online catalogs, with interlibrary
communication easier and more thorough. Librarians have become more familiar with
the multiplicity of ways in which XML can make their lives easier and their jobs more
effective. This technology is creating new ways of managing old information, as well as
new ways of containing and accessing information. In the end, it may revolutionize not
only the way in which we can treat and interact with data, it may also allow us to reinvent
the way we organize and access data for the better. It has already done so through many
libraries’ experiments, including those mentioned above as well as many others not
mentioned here. It will continue to become more pervasive and more effective in libraries
around the world. and patrons will likely only notice that their service is just getting


Banerjee, K., (2002a). How Does XML Help? Computers in Libraries, September 2002,

Banerjee, K., (2002b). Improving Interlibrary Loan with XML. In Roy Tennant, Roy
   (Ed.), XML in Libraries (pp. 31-41). New York: Neal-Schuman.

Gourley, D., (2002). Integrating Systems with XML-based Web Services. In Roy
   Tennant, Roy (Ed.), XML in Libraries (pp. 181-195). New York: Neal-Schuman.

Guenther, R. S., (2003). MODS: The Metadata Object Description Schema. Information
   Technology Perspectives, 3, 1, 137-150.

Johnson, B. C., (2001). XML and MARC: Which is “Right”? Cataloging &
   Classification Quarterly, 32(1) 81-90.

Lewis, W., Richardson, G., & Cannon, G., (2002). Expediting the Work of the Indexer
   with XML. In Roy Tennant, Roy (Ed.), XML in Libraries (pp. 77-86). New York:

Miller, D. R., (2000). XML: Libraries’ Strategic Opportunity. netConnect, Summer 2000.

Miller, D. R., & Clarke, K. S., (2004). Putting XML to Work in the Library: Tools for
   Improving Access and Management. Chicago, IL: American Library Association.

Shared By: