Tone Merete Bruvik
HIT Centre, University of Bergen
XML in MALVINE and LEAF
MALVINE and LEAF. Gateways to Europe's Cultural Heritage
4th of December 2000 in the Staatsbibliothet zu Berlin
XML is changing from a buzzword for Internet freaks to a basic standard for the encoding of
document structures and the interchange of information. In the MALVINE project we have
made a converter that translates archive catalogues held in various local formats into one
common format: EAD - Encoded Archival Description. EAD is a DTD - Document Type
Definition written in XML, made by the Society of American Archivists. Once catalogues are
held in EAD, it is demonstrated how easily they can be transformed into other formats, for
instance into HTML or other local cataloguing formats. In the upcoming LEAF project we
will work, together with other groups, to develop and use a general DTD for the encoding of
biographical information in XML. A DTD is this field will make it easier to interchange and
harvest biographical information from various sources, which is one of the main goals of the
LEAF project.
Contents
What is XML?
How to use XML?
Why is XML important?
So what is XML really?
Demonstration of XML in the MALVINE project
XML in LEAF
What is XML?
Before talking about the use of XML in the MALVINE project, I will give a short
introduction to XML.
XML - Extensible Markup Language, a W3C - World Wide Web Consortium –
Recommendation from February 1998 [XML 2000]. XML is developed from SGML -
Standard Generalized Markup Language [ISO 8876].
XML looks similar to HTML, but XML is a meta language: a language to describe markup
languages.
Diagram 1. The relationship between SGML, XML, HTML and XHTML.
A markup language defines four things (from A Gentle Introduction to SGML, p. 13 in [TEI
P3]):
What markup is allowed
What markup is required
How markup is to be distinguished from the content of the document
What the markup means
Why is XML important?
Separates data from the software.
Software provider independent.
Simple and readable for both humans and computers.
Easy to transform into other formats.
The same information might be presented in many different ways.
Robustness over time.
So what is XML really?
This is a very simple XML document:
]>
Markup languages
There are several text markup languages, for
example:
HTML (HyperText Markup Language)
XML (eXtensible Markup Language)
SGML (Standard Generalized Markup Language)
This document does not contain any information about the layout, that information is kept in a
stylesheet. In this case an XSLT - Extensible Stylesheet Language Transformation [XSLT]
stylesheet called “apage.xsl” is used. Displayed on the Web, this XML document using the
given XSLT stylesheet will look like this:
Demonstration of XML in the MALVINE project
Our SGML/XML feasibility study in the MALVINE project has shown that the various
catalogue formats used by the libraries and archives can be very well represented in XML or
SGML, and that the translation can be done with a rather simple computer programme. After
considering various DTDs we decided to use the EAD - Encoded Archival Description
developed by the Society of American Archivists [EAD 1998]. Our use of EAD has shown
that this DTD is very well suited for this kind of material. We have made the Local catalogue
format to EAD converter available on the web at
http://helmer.hit.uib.no/malvine/EADconverter.html. It works as shown in diagram 2:
Export XML
Local The EAD
Catalogu converter
e
Conv.
Table
Diagram 3. The EAD converter.
On the Catalogues of manuscripts and letters, encoded in XML using EAD page at
http://helmer.hit.uib.no/malvine/EADpage.html samples of manuscript catalogues from some
of the MALVINE data providers are given. On this page we also demonstrate the power of
XSLT - Extensible Stylesheet Language Transformation, which is used to show different
views of the same XML encoded catalogue.
The EAD converter we have made is used in the MALVINE project whenever an exchange
from one format to another is needed. For instance, some of the collections available in the
MALVINE cluster do not have the Z39.50 protocol. In order to make these catalogues
available through the MALVINE Search Engine, these catalogues are converted from the
locally keep catalogue into a new catalogue hosted by someone with Z39.50 protocol
available, using our EAD converter.
Local
Catalogue
Without
Z39.50
Malvine search XSLT
engine
The EAD
Converter
File XSLT
Processor
EAD
Local Catalogues Copy of Catalogue
With Z39.50 Local Catalogues
With Z39.50
Diagram 4. The EAD converter used in the MALVINE project.
XML in LEAF
In the upcoming LEAF project, we at HIT Centre at the University of Bergen will develop
and use a general DTD for the encoding of biographical information in XML. This work will
be done together with other groups working in the same field. We hope to work together with
the community behind the development of EAD, especially Daniel V. Pitti at University of
Virginia, who is the main architect of EAD. He pointed out the need for a new DTD to cover
biographical information in his paper “Encoded Archival Description, An Introduction and
Overview” [Pitti 1999]. This DTD will be based on ICA’s International Standard Archival
Authority Record for Corporate Bodies, Persons, and Families (ISAAR(CPF))
[ISAAR(CPF)].
A DTD like this will be a grammar for a general language to express the information used in
any kind of biographical record. This DTD will not replace the various local formats; it will
only make communication between the various systems easier.
A DTD in this field will make it easier to interchange and harvest biographical information
from various sources, which is one of the main goals of the LEAF project. In the same way as
we have used XML to interchange archival records in the MALVINE project, will we use
XML to interchange information about persons, families and corporate bodies in the LEAF
project.
Conclusion
The use of XML might look very technical. In one way it is, and in one way not. Although
documents encoded in XML might look very technical, they are really very, very simple
compared to for example the document format used in Microsoft Word! XML is about having
control over your own documents, independent of the software provider.
References
[TEI P3] TEI Guidelines for Electronic Text Encoding and Interchange, edited by C. M.
Sperberg-McQueen and Lou Burnard, Chicago, Oxford 1994. Also available at
http://www.tei-c.org/Guidelines2/
[ISO 8876] International Organization for Standardization, ISO 8876: Information
processing - paragraph and office systems - Standard Generalized Markup Language
(SGML), 1986.
[XSLT] XSL Transformations (XSLT) Version 1.0, W3C Recommendation 16 November
1999, http://www.w3.org/TR/1999/REC-xslt-19991116
[XML 2000] Extensible Markup Language (XML) 1.0 (Second Edition),
http://www.w3.org/TR/2000/REC-xml-20001006
[Pitti 1999] Pitti, Daniel V.: “Encoded Archival Description, An Introduction and Overview”,
D-Lib Magazine, Vol. 5 No. 11, November 1999, available at
http://www.dlib.org/dlib/november99/11pitti.html
[EAD 1998] EAD Encoded Archival Description Tag Library, Version 1.0, The Society of
American Archivists, Chicago 1998. Also available at http://lcweb.loc.gov/ead/eadtlweb.html
[ISAAR(CPF)] International Council on Archives, International Standard Archival Authority
Record for Corporate Bodies, Persons, and Families (ISAAR(CPF)), Ottawa
1996. Available at http://dobc.unipv.it/obc/add/infap/archdes/isaar_e.html