Docstoc

MARC format and its rivals

Document Sample
MARC format and its rivals Powered By Docstoc
					MARC format and its rivals
Juha Hakala
Director, Information Technology
Helsinki University Library – The National Library of Finland
P.O.Box 26
FIN – 00014 Helsinki University
Finland
juha.hakala@helsinki.fi


This text gives an overview of how the technical infrastructure used in libraries may
change and what kind of impact this change might have on metadata and metadata
formats we will use. Next section describes a framework for evaluation of metadata
formats and analyses MARC and Dublin Core formats as practical examples. MODS
and ONIX, two metadata initiatives related to Dublin Core, are briefly discussed
before summary in which we finally have some tools to analyse the question of
whether MARC has rivals and if so, in which areas.

Introduction

The MARC format was developed in Library of Congress in 1966-1968. Since then it
has been actively used by libraries, first in simple applications which enabled
maintenance of bibliographic records and production of printed catalogues and
bibliographies, and later in what become known as integrated library systems (ILS).
These systems came into existence in the early 80s, and evolved thereafter steadily,
until in the late 90s some kind of saturation point was reached. There were many ILSs
in the market that were able to satisfy the needs of most of the libraries needs
reasonably well. There was no vendor dominating the market. As in the 80s, the
favourites were changing from one year to another, and libraries used a wide variety
of different systems.                                                                      Kommentti: [pois]


In 2003, this equilibrium has all but broken down. There are two main reasons for
this; the need to provide efficient and unified access to the Web content, including
electronic resources licensed by the libraries such as databases and e-journals, and the
need to handle electronic resources created and held by the libraries themselves. To
satisfy these needs, we must have lots of new features from our systems. We can
group these features into two functional areas: digital object management and
information retrieval portal services.

Many library system vendors have started developing applications for these new
areas.

The practical implementations vary; some vendors (e.g., Innovative) provide all
functionality in one system, some make available two systems available (e.g.,
Voyager and ENCompass from Endeavor), while at least one vendor has three
applications. The Ex Libris product family consist of Aleph, the portal system
MetaLib, and DigiTool, which is a digital object management system (DOMS) .



                                           1
In the future we shall see how even the present monolithic integrated library systems
will be split into smaller pieces; alongside the traditional ILS libraries may use the
Z39.50 client from one vendor, the ILL module from another, and the acquisitions
system from yet another software house. This will, of course, require seamless
technical interoperability, secured by a large set of international, national or in a pinch
industry standards. Integrated systems will be replaced by modular ones, which by
definition are a perfect fit for networked or consortia environments, and allow the
libraries to combine the best features available in diverse products.

As of this writing we are still far from the vision presented above, but systems used
by libraries are going through a very active phase of development. In fact, I have
never seen anything like the present speed of change during my 15-year involvement
with library automation. As is usually the case when things change quickly, there are
a lot of interesting problems we have not solved.

One of these issues is metadata. MARC – and cataloguing rules such as AACR2
which provide the semantics for the syntactic representation of data defined in the
format – have dominated our thinking since late 60s. However, it seems likely that it
the future libraries will use multiple formats and probably also different rules in order
to describe different kinds of things and cater for different needs. For instance, basic
bibliographic description can and probably will still be done with AACR2 and some
MARC format such as MARC21, but description of library’s collections or
preservation metadata may be based on an another rule set and format, especially if
this metadata is not stored in integrated library system. The actual choice will largely
depend on future format development and systems used by the library.


Evolution of systems used by libraries

Library system providers have chosen different strategies to deliver the new services
needed by their customers and to support their own operation. Some extend their
present system to include all functional areas of the “triangle”: IR portal, ILS and
digital object management systems, while others have built two, three and in the
future perhaps even more applications. Some vendors may lack resources to provide
the new functionality, or their present technical infrastructure does not allow them to
build the required services unless the systems are rewritten entirely. Thus one
outcome of the present change may be that the number of vendors will continue to
diminish. The companies left will be those few which can develop the new software
in house or license it from elsewhere.

Compared with ILS, IR portals and especially digital object management systems are
still in their infancy. There is nothing surprising in this. It took quite a long time until
ILS became mature products, but portals have only been available since 1998, and the
first digital object management systems from library system vendors were released as
late as 2001.

As portals and digital object management systems remain somewhat uncompleted, so
are our expectations regarding them. When Helsinki University Library launched the
project to select the portal for Finnish university libraries, there were very few portal


                                             2
RFPs that we could use as models. During the project’s lifetime (1999-2003) portals
evolved from rather modest tools for federated searching into versatile applications
that contain a lot of functionality – for instance, context-sensitive linking – not
available at all when our project began. Thus our original RFP did not contain nearly
all functionality we eventually got with MetaLib. It is difficult to demand services you
are not familiar with, and quite impossible to define them well enough so that they
can be implemented.

Parallel to the gradual maturation of portal applications, our expectations of them
have become much better defined and extensive. Any library launching a portal
project now can find a number of RFPs – but will do wisely to use only those that are
recent. The art of writing a portal RFP is still evolving quickly.

Compared with ILS and even portals, digital object management systems, (especially
those developed by the library system vendors) are still in the early stages of
development. It is somehow surprising that these systems are emerging only now.
After all, Clifford Lynch postulated already in previous millennium that being able to
manage electronic resources efficiently would be the next great step for library
automation. However, integrated library system vendors did not quickly take up this
market opportunity. Building an entirely new system requires a lot of funding and
human resources, and few ILS vendors were able to make the investment, at the same
time maintaining their traditional ILS. Moreover, the demand for these systems
among libraries is still modest, although growing all the time.

One may also claim with at least some justification that building a digital object
management system is an entirely different challenge than that of building an ILS. A
library can tell the vendor in details what an ILS must do; no such consensus exists on
functionality of digital object management systems. It follows that there is no
agreement on metadata formats and communications protocols these systems should
support either. But we may assume that all these things will emerge during the next
10-15 years; this is what happened with ILS.

Digital object management systems from library system vendors seem to some extent
bound to the workflows of a traditional library system. While in an ILS much of the
data input is still manual, in digital object management systems most processing is
done automatically. For instance, any library establishing a large collection of           Kommentti: of
electronic newspaper articles will not process the incoming texts manually, but will
build tools which automate the process of loading the new articles into the system.
Yet, while digital object management systems do have some batch import and export
capabilities, they are so far relatively unsophisticated.

From the point of view of metadata, the situation is very unclear, if not chaotic.

In order to function properly, a portal must contain many kinds of metadata. The
system must contain accurate descriptions of how to connect to remote databases,
using either the Z39.50 search and retrieval standard, Z39.50 International Next
Generation (ZING) standard draft, or some proprietary means. These resource
descriptions may be very complex and require a lot of time to create, especially if
there are no automated means of collecting the access parameters (such as which



                                            3
Z39.50 Bib-1 attributes and attribute combinations are supported) from the remote
database.

When the number of databases available via a portal grows, the importance of
describing important collections embedded in these databases1 grows as well. No user
can make sense of a system providing him access to 500 databases, unless the system
can tell the user something about the contents of these systems. Up to now this
information has been unstructured and sparse; in the future we must be able to provide
structured and exhaustive descriptions; in short, use some kind of format and rules,
and a lot of human effort to create this metadata.

Often portals provide access to electronic serials, using OpenURL protocol. Licence                       Kommentti: Electronic serials
                                                                                                          are databases, too…
information is then vital; the system must know which instance of a networked article,
if any, the user is allowed to access.

Digital object management systems have the same need for collection description than
portals; objects stored in a single system may be stored in large number of collections
which may have different access rights and search parameters. A user must be aware
of these differences in order to make efficient searches from diverse collections.
There is also an urgent need for developing a standard for preservation metadata.2
Libraries – especially national libraries – need to preserve published digital cultural
heritage for future generations, and this is not possible unless we have appropriate and
sufficient preservation metadata about archived objects.                                                  Kommentti: Maybe you could
                                                                                                          explain preservation metadata
                                                                                                          here. Such as ‘metadata to record
Thus, even a superficial analysis implies that in order to use efficiently portals and                    preservation processes that have
                                                                                                          been performed or will have to be
digital object management systems these applications must contain a lot of different                      performed (are planned) on the
kinds of high quality metadata, which in some cases is volatile – for instance, access                    electronic resource in order to
                                                                                                          maintain its usability’. You can do
parameters to a database or licences tend to change often. I believe that there is no                     it better ☺
way individual libraries or system vendors could keep this information up to date by
themselves; we need international cooperation between the libraries and other portal
and digital asset management systems users in order to provide the metadata.

There is nothing new in this; libraries have shared each others’ bibliographic metadata
since late 60s. But this was made possible via the adoption of MARC format and
sufficiently similar cataloguing rules. For the time being, nothing like MARC or
AACR2 exists for portal and digital object management system metadata. Thus this
information can only be exchanged between the users of the same system. This has
led to duplication of effort in metadata creation. Since a thorough description of a

1
  As defined here, any OPAC or union catalogue may contain one or several catalogued collections; for
instance, Helsinki University Library’s OPAC contains part of the library’s famous Slavic collection,
approximately 60.000 records.
2
  There is no universally agreed specification of preservation metadata, but we may say that it is data
about digital objects and the technical environments in which these objects were created and used.
Preservation metadata should enable continuous access to resources (via emulation, if needed) or
migration of these resources into new document formats. Preservation metadata can be split into two
levels; properties belonging to certain class of documents (technical requirements of every Word 2000
document) and properties of an instance (specific properties of this Word 2000 documents, such as a
possibility of incorporating footnotes into the proper place in the text). Preservation metadata is not
stable; any migration or technical changes in emulation will have an impact on preservation metadata
both on document and document class levels.


                                                   4
remote database into a portal may take up to one week, this is a serious problem. We
need metadata standards for resource and collection description, licence information
and preservation of electronic resources. If we do not have these standards, we can not
exchange this metadata, which will lead to duplication of work. This would be
frustrating, given our long tradition of sharing bibliographic metadata. This tradition
also gives us some ideas of how to solve the present, somewhat chaotic situation.


Standards and interoperability

Technical interoperability between library systems was for a long time limited to the
possibility of exchanging bibliographic data via ISO 2709, the international exchange
format. Internet, and subsequent creation of library networks and consortia changed
this. It became necessary to pass queries and result sets and ILL messaging between
systems. ANSI/NISO Z39.50 Information Retrieval standard (ISO 23950) and ISO
Interlibrary Lending (ILL) standard were developed in late 1980s as a response to         Kommentti: [pois]
these needs.

Nowadays Z39.50 is supported by every major library system, although usually only a
small subset of the standard has been implemented. Demand for ISO ILL has been
much more limited, not least because important legacy systems have been slow in
building support for the standard.

Supporting only ISO 2709 and the two well-established protocols mentioned above
will however not be sufficient when co-operation within and between library
consortia becomes more versatile as the libraries acquire IR portals and digital object
management systems. Moreover, research libraries must also be able to use the
systems built by academic publishers for dissemination of scientific content. New
interoperability requirements include support for the following:

   •   OpenURL. This protocol, which enables context sensitive linking, is gaining
       acceptance very quickly. Shortly after the testing of OpenURL 1.0 begun in
       May 2003 there were already 20 organisations on the implementer list. With
       Z39.50 it took a long time to get this far, and all companies working with the
       protocol were from library domain. This is not the case with OpenURL or with
       some other protocols presently under development in NISO or ISO TC46,
       which have traditionally been seen as library domain.
   •   ZING, or Z39.50 International Next Generation. This protocol, which is
       currently under development, combines the best features of the traditional
       Z39.50 search and retrieve protocol and the Web, will hopefully extend the
       use of Z39.50 semantics well beyond the library domain. As of this writing the
       applications remain few and far between, but are certain to become more
       numerous in the near future. My personal wish is that popularity of Explain
       will grow; as the data is in machine readable form, via Explain ZING clients
       will be able to learn about remote databases and other services – which is one
       of the many prerequisites for the Semantic Web.
   •   NCIP. NISO Circulation Interchange Protocol will enable libraries to
       exchange data about patrons, items and circulation transactions. In consortium
       setting, this functionality is urgently needed.



                                           5
   •   LDAP and Shibboleth. Authentication of patrons will be essential in the
       future, when most of the content purchased at least to the research libraries
       will be electronic. With Shibboleth queries about a patron’s right to access a
       resource may be done efficiently, without violating the users privacy.
   •   OAI. Open Archives Initiative Protocol for Metadata Harvesting allows
       national and/or international cooperation in collecting metadata about
       dissertations, preprints or in principle any other resources, be they digital or
       printed and creation and maintenance of central indexes.
   •   Dublin Core, ONIX, MODS, and other bibliographic metadata formats. Our
       future systems must be able to deal with different kinds of metadata, because
       we will receive data from and send it to many organisations, and not all of
       these will be using MARC. Although libraries will rely on AACR2 for a long
       time, and may still for a few years use primarily MARC21 syntax, we must be
       prepared for other choices. It is quite unlikely that portal and digital object
       management systems will rely on MARC21, and I do not think that AACR2
       will be extended to the required new areas in the near future. As it seems very
       likely that multiple formats will be used in parallel, we must understand the
       implications of this, for instance to semantic interoperability, and take this into
       account when designing the new formats. This task is not made simpler by the
       fact that in the future a significant part of metadata creation will be automatic.
       For instance, a good digital object management system must be able to glean
       whatever kind of embedded metadata resources loaded into the system have,
       and then index this metadata in a meaningful way.
   •   Future formats for diverse other needs, such as long time preservation of
       electronic resources, digital rights management, licensing information,
       description of learning objects, government publications. In the best case these
       formats will be built using existing formats. For instance, we may have Dublin
       Core application profile for collection description, or MARC, Dublin Core and
       ONIX specifications for preservation metadata, hopefully each applying
       similar semantics.

NISO and ISO TC46 will have a full agenda in years to come in developing new
standards, revising existing ones and providing guidelines on how to use them. It is to
be hoped that libraries will participate actively in standardisation work also in the
future. There is no better basis for efficient technical interoperability than a good
standard, and good guidelines for how to implement it.

Standards are of little use if they are not implemented. Libraries must demand that
their vendors rely on standards instead of proprietary solutions. Otherwise our future
modular systems will not be able to communicate with each other properly, and our
vendors will be forced to do a lot of duplicate work in creating application specific
interfaces against multiple other systems. This would endanger the shift from the
present single vendor dominated environments to a more versatile future.

At present portals and digital object management systems are proprietary from
metadata point of view. MetaLib users can not exchange metadata with the users of
other portals. Moreover, they can not even pass metadata directly to each other; all
metadata sharing is based on database maintained by Ex Libris Ltd. This leads to
duplication of effort; there are already at least two other databases containing access
information for a large choice of freely accessible Z39.50 databases, one maintained


                                            6
Index Data in Denmark, and another by SeaChange company in Canada. How many
such systems, maintained either by libraries or commercial vendors, will emerge
before we can agree on exchange standard, and an organisation which will maintain a
union catalogue of this type of data?


Resource description

Anglo-American cataloguing rules and other well established rules for bibliographic
description based on ISBD and MARC formats (MARC21, UNIMARC etc) have
given us a solid basis for resource description since the early 1970s. These standards
have provided us semantics and syntax for bibliographic data, and more generally
defined the limits of what can and should be catalogued into an integrated library
system. Due to ubiquitous support of ISO 2709 in library systems we nowadays take
for granted the possibility of exchanging MARC records between ILSs.

No such solid background exists yet for portals and digital object management
systems. There are neither cataloguing rules nor exchange format for portal metadata.
In spite of diverse local attempts there is as of yet no universal agreement on a
preservation metadata element set for electronic resources. There is no standard for
description of (electronic serial) license information. The scope of resources that
require description has widened very fast, due to implementation of digital object
management systems and portals, and the need to exchange metadata between these
systems.

Let us take a quick look at a crystal ball and speculate a little what is going to happen.

Rules and formats

The present chaotic state of the art will, if it continues for a long time, lead to lot of
duplicate work. It may take up to one week to accurately encode, map, and
troubleshoot the metadata needed to connect a remote database accessible via Z39.50
to a portal application. If a lots of libraries in different parts of the world do this for
the same database, over and over again, the result is huge waste of human effort. To
make things worse, access parameters often change, especially for non-Z39.50
systems. This means that portal metadata is not only difficult to encode and map,
there is also a need for frequent updates of this information. It is difficult to see how
this information could be properly maintained without co-operation between libraries
which hopefully also involves the database hosts.

The library community has developed cataloguing rules and the MARC formats in
order to be able to share the burden of bibliographic description. In the future, just as
we are now sharing bibliographic information, we will also share the new kinds of
metadata used in portals as well as the mappings that make one type of metadata
interoperable with another. There will be new rules and metadata formats for this
metadata (and metadata mappings), and there will also be union catalogues which will
make it easy to share this metadata on global scale. Libraries have developed a co-
operative model which works very well for bibliographic data; I see no reason why
we could not and should not transfer the same basic ideas to new areas of resource
description in order to improve the efficiency of library operations.


                                             7
Although the general direction is reasonably clear, many details will only be solved as
a part of the actual development effort. But it seems obvious that in the future there
will be multiple sets of rules for resource description, and consequently many
metadata formats used in parallel. We may even see cases when one set of rules is
applied in different environments to provide for instance MARC Preservation format
and Dublin Core application profile plus XML & RDF syntaxes for preservation
metadata.

Given the variety of tasks ahead there is no way a single team of developers could
supply everything we need. One group of experts will develop the rules and exchange
format for e.g. description of resources in a portal; there will be an another group
responsible of defining rules and syntax for preservation metadata, and so on and so
forth. At least at this point it is hard to see any organisation that could co-ordinate this
work in such a way that any conflicts between the present and future specifications
could be avoided. There may be significant semantic overlap and various conflicts
between them, especially concerning what kind of metadata should go into systems.

One example of such a conflict has already emerged. The traditional means of
providing links from bibliographic records to networked resource is provision of
URLs in the 856 field of the MARC record. This static linking –based approach has
numerous shortcomings, including difficult maintenance in a consortium setting
(when URLs change, and it happens frequently, the data must be updated in every
MARC database) and the lack of a standard approach to handling authenticated
access; every user who sees the record, including those who are not entitled to access
the resource, see the link.

OpenURL-based dynamic linking is functionally superior to static linking, but
requires that linking information is removed from the 856 tag, and instead stored into
the resolver system. Thus data about and electronic resource will be split between two
systems, one containing bibliographic information, the another holding information
about access rights and the resources’ physical location. If and when preservation
information is placed into digital object management systems, there will three systems
which all contain information about the same resource, possibly in yet another format.
There will be an urgent need for linking of information in these three systems.

Ownership

In the early stages of the portal business at least some vendors were tempted to see the
metadata stored into their application as a strategic asset of the company. In the long
run such approach may not be viable, because:

    1. As more and more remote resources are made accessible through portals –
       there are thousands of bibliographic databases out there – it becomes
       impossible to maintain the metadata about these systems centrally.
    2. Metadata for resource access is often highly volatile. A central maintenance
       should react very quickly to customer feedback whenever a remote database
       becomes non-accessible via the portal. This may happen due to a very small,
       innocent-looking change in the target system, such as fixing a spelling error
       (especially if the target is not compliant with Z39.50 or any other generally


                                             8
       known information retrieval standard). Practical experience from using portals
       has already shown that there are problems in keeping resource descriptions in
       portals up to date.
    3. A centralised maintenance organisation will not be able to access all target
       systems due to lack of access rights or simply because it lacks the language
       skills. And even if they can connect to a system and use it, the staff does not
       necessarily know how to use the system efficiently. Only the people who use a
       database frequently are capable of providing thorough and complete access
       information.
    4. If and when centrally provided metadata is incomplete or contains faults, data
       must be edited locally. Combination of local and centralised maintenance
       poses some interesting logistical problems, especially when the same record
       has been modified in two places.

If and when the portal metadata and mappings are created and maintained by libraries
there should be no question about the ownership of this data. As of this writing this
issue is still not clear, partly due to the major investments some vendors have made in
developing portal metadata. While we acknowledge this effort, in the long run it is
clear that libraries must maintain and own this metadata, just like we (usually) own
the metadata in integrated library systems.


Evaluation of formats

We tend to look at metadata formats as technical tools. This is also the primary point
of analysis in this text; in the following chapter I will largely ignore cultural, political
and organisational aspects related to format development and maintenance. These
issues have however had major impact on how MARC formats have developed over
time.

Any thorough technical evaluation of metadata formats should pay attention to at least
the following aspects:

Pragmatics

What kind of resources or things can be described by the format, and for what
purpose? Most metadata formats have initially had limited aims, which have little by
little been broadened. For instance MARC initially enabled only bibliographic
description, and even on that area provided only limited means (the original MARC
covered only a limited set of materials). But the format was designed in such a way
that extending its scope was easy. In its present form MARC can be used to describe a
diverse set of things, ranging from printed books to maps to music (and even museum
collections) to holdings and authority data. Likewise Dublin Core has been extended
to new areas via usage of application profiles.

Metadata was traditionally created and used for resource discovery, but it is no longer
by no means the only reason why e.g. libraries spend a lot of funds for resource
description. For instance long time preservation of electronic resources is dependent
of high quality metadata, and usage of these materials may require encoding of
licence information in libraries’ portal applications.


                                             9
Semantics

Metadata is structured data about data. Unless there are some rules which describe
how the metadata is extracted from described resources or from other sources, there is
little chance that the metadata created in different organisations will be interoperable,
even if the same format is applied. Dublin Core –based projects generally have a
relatively low level of semantic interoperability.

Thus any proper evaluation of a format should also investigate underlying rules.
Sometimes there is a close connection between a format and rules; MARC21 and
AACR2 are a good example of this, although in the beginning no such link existed;
when the MARC standard was under development, the people responsible of the work
had to rely on the Library of Congress cataloguing rules, since AACR was at that time
also still under development.

Dublin Core is an example of a format which does not have a relation to any existing
formal rules for resource description. Any project or community using the format
should and probably will develop its own “cataloguing rules”, relying on the general
guidelines provided by the Dublin Core Metadata Initiative. These community or
project based rules may be either very generic or extremely complex, depending on
the aims of the work.

Semantics of metadata is often based on existing practices of a community in its
cultural and geographical setting. Thus archives have a different view on their
resources than libraries, even if both organisations were dealing with the same kind of
material, for instance personal archives. Moreover, even within one community
different history and traditions generated variety. German libraries developed
different cataloguing rules than American ones, due to cultural differences between
these countries. It is interesting that the need for sharing bibliographic data has now
overruled the previous practices; in Germany a decision has been made to discontinue
the development and use of both national cataloguing rules (RAK) and format (MAB)
and start using AACR2 and MARC21. If and when AACR2 and its future versions
become the global standard there is a major need to build the system in such a way
that it truly suits to all countries. Ongoing harmonisation of major cataloguing rules
(ISBD and AACR, triggered also by fundamental changes in cataloguing such as
FRBR) will help the library community in achieving this aim.

Complexity of formats, and subsequently difficulty of using them, varies a lot. For
instance, MARC21 has about 2000 different data elements (fields, sub-fields and
codes) while Dublin Core in its basic form has only 15 elements. This difference can
be explained by the different backgrounds and uses these formats have. Some critics
with library background have claimed that Dublin Core is just simplified and badly
done MARC. MARC was built for library community and has to (or should) satisfy
all our resource description needs. But Dublin Core is a generic tool, independent of
any community requirements in its basic form (although Dublin Core application
profiles are usually optimised for some smaller group of users or for certain resource
description need). Therefore, if Dublin Core were suitable to libraries – and libraries
only – this would be a fundamental problem in the system. Because formats have



                                           10
often been built for different communities, any one to one comparison between them
is useless.

A fruitful approach for semantic analysis of formats is creation of crosswalks. This
gives the user communities of these formats basic understanding of semantic
interoperability between two systems. Any deeper understanding of this issue requires
however actual conversion of data. Even if two formats have a similar data element
(such as title in MARC and Dublin Core) conversion may be difficult because the
actual syntax of data is different, or because the two communities have different
pragmatics, uses of the metadata element in question.

Syntax

Even a very detailed semantics for metadata, such as AACR2, does not make this data
computer readable. Syntax is a means of representing the metadata in such a form that
it can be processed.                                                                    Kommentti: Something is
                                                                                        missing….

While semantics of a format is usually – but not always – independent of technology
and relatively stable once it has been defined, its syntax is dictated by existing
information technology (e.g. ISO 2709 was originally developed for magnetic tape).
As regards encoding of data, during the last decades there have been at least three
standard solutions for exchanging information in computer independent form. These
are ASN.1/BER, SGML/XML and RDF.

One of the main aims of the MARC pilot project was to enable exchange of metadata
between libraries. This can only be done if the data can be represented in machine
independent form. In 1960s this was a major challenge, since none of the standards
mentioned above existed at that time. Moreover, the means in which computers stored
data varied then even more than nowadays. There were no character set standards and
relational databases had not been even invented yet.

In this situation, development of MARC as a generic means of exchanging
bibliographic information was a major achievement. As far as I know, no other
community has been able to develop anything like MARC format; its popularity and
longevity are exceptional. We will analyse MARC in more details later; the next
chapters will be dedicated to the three systems mentioned above.

ASN.1/BER or Abstract Syntax Notation One / Basic Encoding Rules was developed
in 1980s as a part of ISO’s Open System Interconnection or OSI standardisation work.
Like many other OSI standards it never become very popular, because rival standards
developed elsewhere turned out to be more practical. Like OSI X.400 email standard
lost competition to the Internet email protocol SMTP, ASN.1/BER has been replaced
by another standard – also prepared in ISO.

In library automation there is however one important ASN.1/BER implementation. In
Z39.50 information retrieval standards, most messages exchanged between client and
server applications are encoded in ASN.1. The only exception from this is the result
set of the query; bibliographic records are encoded in the traditional MARC syntax,
specified in ISO 2709. The benefits of replacing this encoding with ASN.1/BER
would in any case have been minimal, due to limited popularity of the standard.


                                         11
Requirement for ASN.1/BER support has lately become a stumbling block for
Z39.50, because since late 90s the most popular data encoding standard has been
XML, Extensible Markup Language (http://www.w3.org/XML/), which is a
simplified version of SGML, Standard Generalized Markup Language, ISO 8879-
1986. The new version of Z39.50 called ZING (Z39.50 International Next
Generation) is XML-based and therefore easier to implement even outside the library
domain.

The need for replacing the traditional MARC syntax (ISO 2709) with a more modern,
XML-based solution has been recognised by the maintenance organisation of the
standard, Library of Congress. The library's Network Development and MARC
Standards Office is developing a framework for working with MARC data in a XML
environment. The framework itself includes many components such as schemas,
stylesheets, and software tools (http://www.loc.gov/standards/marcxml/).
MARCXML in its present form, is an XML schema for representing MARC 21 data
in XML. If a library using e.g. FINMARC wishes to convert to XML, it should first
convert to MARC 21 and then to MARCXML.

It is possible to develop an XML schema which would encompass every MARC
format in use today. Whether the resulting schema would make any sense is a
different issue. MARC formats put same data elements to different fields (part title in
245 or in 248, for instance) and top level XML schema should either replicate all this
redundancy, or try to eliminate it without losing the granularity of the originals. If and
when the development of XML-based exchange format begins in ISO TC 46, these
things must be decided. One option is to standardise just MARCXML, and thus
mandate the use of MARC 21 semantics.

For the time being we do not yet have a MARC syntax for Resource Description
Framework (RDF), which is the cornerstone of the W3 Consortium’s Semantic Web
initiative. The aim of the initiative is to make things in the Web machine
understandable, instead of just machine readable.

RDF model and syntax specification (http://www.w3.org/TR/1999/REC-rdf-syntax-
19990222/) contains the following statement:

       Resource Description Framework (RDF) is a foundation for processing
       metadata; it provides interoperability between applications that exchange
       machine-understandable information on the Web. RDF emphasizes facilities
       to enable automated processing of Web resources.

From libraries’ point of view, RDF may in the future provide an excellent means of
exchanging metadata with other domains (publishers, book sellers, other memory
organisations). However, the future of RDF is connected to the future of the Semantic
Web initiative, and as of this writing it is not clear if this W3C project will ever
become truly popular. Thus the lack of RDF syntax for MARC data is not a problem,
for the time being.




                                           12
We may well ask which aspect of formats, pragmatics, semantics or syntax, is the
most stable one. The answer is not as obvious as it might seem, at least in the case of
MARC.

It would be natural to think that syntax, being most dependent on technology, is also
the most transitory part of formats. But such a view underestimates the difficulty of
developing and maintaining software. Generally, application development seems to be
built on principle “if it ain’t broke, don’t fix it”. Software is frequently used much
longer than originally intended; this was one reason of the so called Year 200
problem.

In the case of MARC, the original syntax developed in 1968s is still in use without
changes. Of course semantics has changed a lot over the years in each MARC format,
but this has had no impact on syntax. Pragmatics and semantics have undergone much
more fundamental changes, as the MARC usage has been extended to new areas.
Although MARCXML, and before it MARCSGML, has been available for a few
years, library system vendors have not been in a hurry to support the standard.
Libraries already have a workable system for exchanging metadata between
themselves, who needs an another option?

However, it is almost certain that MARC and other metadata formats will in the future
rely on XML- or probably XML/RDF-based syntaxes. Actually, most other formats
except MARC already do. MODS and ONIX rely solely on XML, and Dublin Core
has both XML and RDF/XML syntaxes.

From internal point of view, moving from ISO 2709 syntax to XML would not give
any major benefits. MARC is used just for exchange; internally, if the database can
process XML data more easily, MARC records can be converted to XML and then
back to MARC if there is need for exchanging the record. The strong point of XML is
that it may enable exchange of metadata across domains. Supporting a community
standard such as ISO 2709 is difficult for anyone outside the domain in which the
standard was developed; at least in theory XML provides a more simple
implementation path. However, the real problem is not reading the data, but
understanding it. In this respect, there is no difference between ISO 2709 and XML.
Whether RDF can provide us some real help in this respect remains to be seen.

MARC

Although the Library of Congress was not the first library to implement information
technology, its MARC pilot project is the only 1960s initiative which had a lasting
impact on library automation. This is largely due to farsightedness of the people who
were involved with the project. Especially the impact of Henriette Avram as the
project manager was remarkable.

The main design criteria for MARC format were (Avram et al., p. 10):

             … the achievement of a structure that would have wide applicability to
             all types of bibliographic data. The assumption was that it was not
             possible to completely analyze and categorize data elements for all
             kinds of material at this time. Therefore, the major goal was the design


                                           13
             of a format structured in such a way that it would be hospitable to all
             kinds of bibliographic information. That is – the structure of all machine
             records will be identical, but for any given type of item, e.g. serials,
             maps, music, etc., the components of the format may have specific
             meanings and unique characteristics.

             Another goal was to develop a format which could be used in a wide
             variety of computers to manipulate machine-readable bibliographic
             data. Thus, any user library would have to contend with only one format
             even though it might receive and/or transmit from or to many sources.

Since 1968 MARC has indeed been proven to be hospitable to all kinds of
bibliographic and other information, and its suitability for data exchange is obvious.
Thus the criteria listed above have been fulfilled very well.

MARC syntax

The development of the MARC syntax took place in three phases. The original
version, MARC I was completed in April 1966, while MARC II was built in 1967-
1968. Then, finally, MARCII was modified quite a lot due to feedback from the UK;
the revisions are listed in the supplement one to the MARC II format.

MARC I and II are quite different from one another; the former was based on fixed
length records and had many other shortcomings; miraculously MARC II, especially
in the revised form, was quite perfect. The basic syntax specified in it has however
remained unchanged 35 years. MARC 21 format, the direct descendant of MARC II,
is available in its entirety at http://www.loc.gov/marc/.

MARC is the only metadata format which has been in use for decades. Due to this the
library community tends usually take MARC for granted. But creating the format was
not that easy; compared with MARC II, MARC I had some serious limitations.
Maximum record size was limited to 1396 characters, since all records had to fit into
1400 character blocks in IBM System/360 mainframe (the missing four characters
were used to indicate the block size). More importantly, MARC I did not have leader
and directory, which make every MARC record from MARC II onwards to some
extent self explanatory; each record contains encoding information needed for reading
the bibliographic data it contains. This makes the MARC syntax very robust.

Every MARC record consists of three parts: leader, directory and data content fields.
Such structure was not common in 60s; on the contrary, MARC syntax was very
different from application data formats used in the mainstream of computer work at
the time MARC was developed.

There were at least three factors which contributed to the changes in MARC II:

   •   Experiences from usage of MARC I
   •   The change of scope: instead of delivering data from one library, a network of
       libraries exchanging data with each other was foreseen




                                          14
   •   Hardware improvements: the library’s IBM System/360 Model 30 mainframe,
       which initially had only 16 kilobytes of memory and limited amount of disk,
       got another 16 kilobytes of memory and two more disks.

It is difficult to estimate the relative importance of the above factors. What is clear is
that the format was not changed lightly in the middle of the project, since it required
extensive reprogramming of applications not only in the Library of Congress, but also
in all pilot libraries which had been experimenting with the usage of data. Since many
of the changes were related to record size (a MARC II record is much longer than the
same record encoded in MARC I) we may assume that the impact of hardware
extension was not trivial.

The emphasis the MARC pilot project put on syntax is understandable; they had to
invent one by themselves. Since MARC pilot project was completed almost two
decades before ASN.1/BER and SGML were completed the project has truly been
ahead of its time. Character sets were an another domain in which the MARC pilot
project was a pioneer: the project extended recently introduced ASCII 7-bit character
set with more than 50 additional character and diacritics which covered most
languages written with Latin alphabets. There was neither time nor resources to
attempt exhaustive encoding; it has taken years from large number of experts to
develop Unicode.

The impact of MARC syntax on library automation has been a fundamental one. With
two notable exceptions – PicaMARC in the Netherlands and MAB in Germany –
most countries use formats which are compliant with ISO 2709. In order to
understand proliferation of national MARC formats we need to look at semantics.

MARC Semantics

The content of MARC II record was largely based on cataloguing rules used in the
Library of Congress. Although the first version of AACR was being developed at the
same time than MARC II, the MARC pilot project staff was not given a copy of the
AACR draft. In spite of this the two efforts were somehow in line, since AACR was
not in conflict in MARC II.

However, some countries were not satisfied with MARC II, because it was not fully
compliant with AACR. It was also felt that it was not hospitable to local variations in
cataloguing practices. Stemming from this background there was a general feeling
that semantic diversity is actually desirable. This opinion was well formulated by R.
E. Coward (Coward, p. 19):

       The question of establishing international standards for use in the exchange of
       material between national centres is not the same as trying to agree a
       universal MARC system down to the last subfield code. This, I think, is neither
       possible nor desirable. National systems will have their own national
       characteristics and one must be careful not to overstep the mark and substitute
       a dull uniformity for the present anarchy. Indeed a reasonable diversity should
       surely be encouraged.




                                           15
Views such as this led to the development of UKMARC, FINMARC and many other
national MARC formats. Now that CANMARC, AUSMARC and also UKMARC are
no longer developed and Canada, Australia and United Kingdom – and several other
countries – have decided to implement MARC 21, we may ask what has changed the
minds of format developers. Why is semantic diversity no longer something desirable,
but a thing to get rid of?

Probably the most important single factor encouraging libraries to use MARC 21 is
harmonisation of cataloguing rules. Once this process is finished, and the resulting
changes have been encoded into MARC 21, there should be no need to continue the
maintenance of any domestic rules or format. But the adaptation process may be
painful; for instance in Finland cataloguing of music has traditionally been done
differently than in the U.S.A., and this difference was reflected both in our national
cataloguing rules and MARC format, FINMARC. Giving up such local traditions is
not easy.

The second reason for harmonisation is the difficulty of MARC conversion.
Advocates of the national MARC formats made an assumption that converting from
one MARC format to another will be easy. In practice, it was not. Even though the
semantic content is the same record conversion may be difficult if the structure of data
varies. It is almost always hard to split one field or sub-field in source format to
multiple ones in target format. Even worse are the situations when the target format
lacks a data element existing in the source, which happens if there are semantic
differences between formats – which is not rare.

Complexity of dealing with semantics in conversion on the one hand, and simplified
copying due to Z39.50, Unicode and Internet support on the other, have created a
situation in which the pressure towards adopting not only the same syntax (ISO 2709)
but also the same semantics (AACR2, as implied by MARC 21) is constantly
growing. It is quite certain that in 10-15 years there will be very few national MARC
formats left. This, in the words of R. Coward, will be the ultimate victory of dull
uniformity. National variations will be sacrificed for greater efficiency, but also for
greater uniformity, which will make it easier for our patrons to create virtual union
catalogues and seek information from many different places at the same time.

The ultimate victory of MARC 21 has its pros and cons. On the negative side, all
national arrangements such as the Finnish way of dealing with music are doomed. But
the negative effect this may have can be alleviated by careful design of cataloguing
rules so that they can accommodate different national practices. Nowhere is the
impact of this more evident than in authority control, where no form of the term is
nowadays seen as superior; all 100+ ways of saying “United Nations” in different
languages are equal.

A positive aspect of the shift to MARC 21 is that then conversion to MARCXML
would be easy. Universal acceptance of MARC 21 would also change library system
market. Domestic vendors who were protected by the national format which no
foreign vendor was able and willing to support would be exposed to competition from
US-based and other vendors who are able and willing market their products
internationally. This would eventually lead to diminishing number of companies
selling library systems.


                                           16
The most promising aspect of the one syntax, one semantics –approach for the
libraries is however the improvements in copy cataloguing. But we must keep in mind
that technical development only is not sufficient; we also need contracts which enable
libraries to utilise each other’s work. Whether there is willingness for such data
exchange on equal terms remains to be seen.

MARC Pragmatics

There are five MARC Concise formats: Bibliographic, Authority, Holdings,
Classification and Community information. The MARC Pilot project dealt only with
the first one, all the other ones are later additions by different interest groups (and
even the MARC Bibliographic has been substantially expanded since 1960s).

Are all these five formats relevant, and do we need more of them? And if new
formats are developed, should we first have cataloguing rules?

From the point of view of a Finnish user, the answers to these questions are no and
yes. While the tree first formats are actively used in Finland, Classification and
Community information formats have not been relevant to the Finnish library
community. I doubt whether the latter will ever be needed. On the other hand, there is
an urgent need for new specifications, such as MARC format for collection
description.

As of this writing there are many active working groups developing and implementing
new kinds of metadata. One of these groups is PREMIS (PREservation Metadata:
Implementation Strategies; http://www.oclc.org/research/projects/pmwg/). This group
will eventually develop a proven element set for preservation metadata, which
hopefully will eventually become ANSI/NISO or ISO standard.

An obvious next step would be an implementation of preservation metadata semantics
on diverse formats and syntaxes. In order to enable creation and storage of
preservation metadata in integrated library systems, digital object management
systems and possibly also in portals we should have both MARC Preservation Data
format, Dublin Core Application Profile for preservation metadata and possible even
other specifications in e.g. ONIX environment. In addition to the formats and
syntaxes such as ISO 2709 and MARCXML for this information, we should also have
crosswalks which simplify conversion of data from one format to another.

An another group which is working on a specification relevant to MARC community
is Dublin Core Collection Description Working Group
(http://www.dublincore.org/groups/collections/). It published in August 2003 a
proposal for a DC-based metadata schema (i.e. a set of attributes/properties and their
semantics) that can be used for the collection-level description of a wide range of
collections. Since this proposal summarises much of the effort done in numerous
previous collection description initiatives, it might form a good starting point for
development of a MARC Collection description format. The fact that the proposal is
based on Dublin Core is not a problem; the proposed elements can also be mapped to
the MARC format.



                                           17
It remains to be seen if the Library of Congress and MARBI (see
http://lcweb.loc.gov/marc/marbi.html) will be willing and able to extend MARC from
its present scope to the above mentioned and other new areas remains to be seen.
What is certain is that libraries will start creating preservation metadata and collection
descriptions; whether this will be done using MARC and/or other metadata formats
remains to be seen. What we know for sure is that the basic structure of the format is
flexible enough for such extension. We can only hope that the user community will
show equal flexibility.

One possible threat for MARC 21 is that its growing popularity may lead to
stagnation. Countries using the format as the national cataloguing format should be
able to participate in the future development of MARC 21, but this will lead to more
complex decision making. In the same time technical development requires increasing
flexibility. Five years ago my library lacked both sophisticated tools and urgent need
for collection level description; now both the tool and need are evident, due to the
implementation of the MetaLib portal. Many other libraries in different parts of the
world are in similar situation.

MARC - Summary

MARC format is in 2003 more popular than ever before. Harmonisation of
cataloguing rules and global movement towards MARC 21 will strengthen the status
of this already 35 years old system further. Most criticism towards MARC
concentrates on the proprietary syntax as specified in ISO 2709; this problem can be
resolved by implementation of MARCXML-based systems.

More fundamental problem for MARC is that of participation, which is twofold. First,
how do we guarantee that those countries which use MARC 21 as their national
format will be heard during the standard development? And second, how to get the
communities developing standards for e.g. preservation metadata involved with
MARC standards development?

If new MARC 21 users feel that their needs are not taken into account this may at
some point lead to alienation from the central maintenance organisation, and
development of diverse local MARC 21 dialects. While some local diversity is
unavoidable (in Finland we need to be able to use the Finnish General Subject
Headings) it is to be hoped that all such data elements are properly encoded (we can
indicate in the 650 tag when the Finnish headings are used). If libraries adopt sub-
optimal practices, such as putting domestic subject headings into 650 as if they were
LCSH terms, data exchange will be difficult even if syntactically there is nothing
wrong with the records.

Lack of relevant MARC formats will force libraries to use separate systems for
creating e.g. preservation metadata even if they wanted to use their integrated library
systems for storing this kind of information as well. Of course any library system
vendor may extend their product with a preservation module even if its metadata were
non-MARC or even non-standard, but a library selecting such an application may find
it difficult to convert such metadata into a next generation library system.




                                           18
Dublin Core

Compared with MARC Dublin Core Metadata Element Set is a recent invention. The
work for developing the format begun in 1995 in the first DC workshop; the first, 13
element version of the DC was published in 1997 and the final, 15 element version
was published in 1998. The current version, 1.1, which still has the same 15 elements
was released in June 2003 (http://www.dublincore.org/documents/dces/).

Although OCLC has an important role in maintenance of the format, Dublin Core was
never intended to serve just libraries or any other domain interested in resource
description. Dublin Core is indeed a core, group of key metadata elements which
should be relevant in any domain. On the other hand, Dublin Core will never match
the needs of e.g. libraries equally well than MARC, which has been built for libraries,
that is, it contains those metadata elements we need, and implicitly has a “library-
like” view on resource description.

DC Semantics

As Dublin Core developers could not rely on any domain, they could not rely on
cataloguing rules or existing resource description practices in the same way as for
instance MARC developers were able to do. The 15 Dublin Core elements, and 70+
qualifiers which specify in greater detail the semantics of these elements were decided
upon via community consensus. Looked at from within – I have been closely involved
with Dublin Core metadata initiative (DCMI) since the second DC workshop in
Warwick in 1996 – this process was sometimes painfully slow, and there is some
support to the argument that the end result may not be entirely logical – for instance,
why separate Source from other Relations, and why not merge Creator and
Contributor?

A claim could also be made that one or more of the present elements actually do not
belong to the core metadata, or that some core metadata elements are missing. This
argumentation is however useless, because the present element set is an axiomatic
system for DCMI. Because there are no external things from which a domain
independent set of core metadata can be derived, the consensus of the DCMI either
has to be approved without further proof, or disapproved on the basis of some existing
community practice, as some cataloguing experts in libraries have done. However,
library cataloguing rules and practices should only be used to check whether the
MARC format (the metadata format for library community) is extensive enough.
Using AACR2 for measuring a format that was not designed to be compliant with it
has no sense, just like DC is not compliant with EAD format the archives use.

Because reaching consensus on the core metadata elements was difficult within
DCMI, it is very unlikely that this particular Pandora’s box will be reopened via
introduction of new core elements. There are other means via which DCMI can
extend the semantics of its format. In addition to the core elements, there are other
elements and qualifiers (terms) which have been endorsed by the Dublin Core
community (see http://www.dublincore.org/documents/2003/03/04/dcmi-terms/).
Thus Dublin Core has been extended way beyond its original limits, although the core
itself remains more or less the same (only the definitions of the 15 elements have been
slightly modified, if a consensus has been reached on some proposals). But any


                                          19
proposal for adding a new key element or removing an existing one would most likely
be regarded as an abomination. Thus for instance merger of Source and Relation is
not going to happen, unless a miracle happens (and sometimes they do happen just
when we would have been able to manage without one).

Dublin Core has already been extended to number of new areas. As of present there
are seven working groups (Administrative metadata, Agents, Citation, Collection
description, Education, Government and Libraries) which are building domain
specific extensions to the format. These extensions typically consist of elements and
qualifiers (terms), some of them belonging to the basic DC, some only to the
application profile. For instance, the latest draft of the Dublin Core Collection
Description Application Profile (http://www.ukoln.ac.uk/metadata/dcmi/collection-
ap-summary/2003-08-25/) consists of 8 Dublin Core elements, 7 DC qualifiers and 17
metadata elements specified within the profile only. These include
accumulationDateRange (the range of dates over which the collection was
accumulated) and contentsDateRange. In practice, in the application profiles the scope
and specificity of the DC semantics may exceed MARC. The 15 DC elements provide
only a starting point on which various interest groups can build their own special
formats, or in DCMI terminology, application profiles.

Because there are no technical (syntax-based) or cataloguing rules –based limitations
on how DC application profiles are built, and because the bureaucracy needed for
creating a profile is slim, Dublin Core is an ideal platform for building new formats.
Indeed it would be relatively easy although a bit tedious to build a DC MARC
Application profile which would cover all data elements in MARC format. But this
has never been the aim of the DC Library application profile. Few projects have been
interested in lossless metadata conversion between MARC and Dublin Core (OCLC’c
CORC is the only such project familiar to the author; the project used a proprietary
extension of Dublin Core which enabled all MARC data to be converted to DC and
back to MARC without any loss of information).

The first step in creation of a new application profile is establishment of a working
group. DC Advisory Board approves new WGs. Once the working group is ready, the
proposal for a new application profile is reviewed by DC Usage Board, which is
primarily concerned by semantics. If for instance a qualifier for an existing DC
element does not define the semantics of that field in more details but in fact extends
it, the proposal will most likely be turned down. But once Usage Board has approved
the working group’s proposal, no further steps are necessary.

DC Syntax

If MARC developers paid a lot of attention to syntax and relatively little on semantics
(since the idea was to add new MARC tags in the future) Dublin Core builders chose
the opposite view. The key issue was semantics because the core elements had to be
fixed; syntax was the easy part of the project because HTML was the one and only
encoding used in the Web in 1997. There was no question the DC syntax could have
been a proprietary solution such as the one defined in MARC II format.

Alas, the syntax issue was not quite that easy for Dublin Core, because the syntactical
basis of the Web was a moving target. HTML was before long replaced by XML, and


                                          20
eventually XML was complemented by RDF, Resource Description Framework. To
make things even more difficult, the details on how to do HTML (or XML) encoding
tended to vary. Early implementers did occasionally suffer quite a lot from these
changes. It is hard to tell whether many of us begun to see benefits in proprietary but
stable syntaxes.

As of present, simple (15 element) Dublin Core can be expressed both in HTML
(http://www.ietf.org/rfc/rfc2731.txt) and RDF/XML
(http://www.dublincore.org/documents/2002/07/31/dcmes-xml/). HTML
specification, published as the Internet standard RFC 2731, encompasses also
encoding for qualified DC. The specification of encoding qualified Dublin Core in
XML /RDF (http://www.dublincore.org/documents/2002/05/15/dcq-rdf-xml/) is as of
this writing a proposed recommendation of the DCMI, but changes – if any – before
the final approval are likely to be small.

Google and other search engines could easily extract and index Dublin Core metadata
embedded in documents, yet few engines do this. The reason for this is apparently the
small amount of truly useful metadata available in the Internet. Many documents do
contain some metadata, but more often than not this data is useless or even
counterproductive.

DC Pragmatics

When MARC was built, it was immediately obvious who would use the format and
for what purpose: libraries, for bibliographic description. This was not the case for
Dublin Core. It can be said that it took only a short while to come up with core
semantics and a few alternative syntaxes, but even at that point the DC community
did not know who was using the format and for what ends. Most likely the DC
metadata initiative will never be aware of even all major implementers.

For libraries and especially cataloguing experts DC posed an interesting problem. If it
was seen as an alternative of MARC for bibliographic description, then obviously it
was a duplicate effort which, adding insult to injury, did not serve libraries nearly as
well as the existing MARC format. In spite of this potential conflict of interests, many
librarians tested Dublin Core, and the library community as a whole has contributed a
lot to the initiative. Our role especially during the early stage of the DC work was
important.

As of this writing there are many user communities actively involved with Dublin
Core, and libraries are just one of these domains. This widening interest towards
Dublin Core is very evident in activities around application profiles. From libraries’
point of view some of them are not that interesting, but for instance DC Collection
Description Application Profile may provide some ideas as regards how our
community could use Dublin Core, in addition or instead of basic bibliographic
description.

If libraries have had problems trying to think appropriate uses for Dublin Core, library
system vendors have been at loss as well. Few ILS allow cataloguing in Dublin Core,
or loading of data in that format. But ENCompass and DigiTool, digital object



                                           21
management systems build by ILS vendors Endeavor and Ex Libris, do support
Dublin Core, and for the former system it is the only metadata format supported.

DC - Summary

There is no reason why libraries should replace MARC with DC for routine
cataloguing; after all MARC was optimised for this work in a way basic DC or even
its Library application profile will never be. But there are new areas such as collection
description metadata in which, in absence of the appropriate MARC format or other
viable choice, Dublin Core is probably the best choice. Dublin Core may also be used
for exchange of metadata between domains; this is one of the tasks it was designed to
take care of, and one in which it is eminently suitable - while MARC is not.

Although there are a lot of metadata formats, there are only a few on which the library
community has a strong impact. MARC, being “our” format, is also managed by us.
In DCMI, libraries are one of the interest groups fostering the system, and we are
quite well represented in the different groups managing of the system, including
Usage Board and Board of Trustees. Many other formats, such as ONIX, are beyond
our control.


DC – Related activities

There are two metadata formats which are closely related to Dublin Core. Metadata
Object Description Schema (MODS; http://www.loc.gov/standards/mods/) and Online
Information eXchange (ONIX). There are two ONIX versions for books and serials;
the standard is developed by EDItEUR (http://www.editeur.org/) jointly with Book
Industry Communication and the Book Industry Study Group.

MODS has 19 elements. Some of them are identical with DC core elements, some (for
instance, name) merge together multiple DC core elements and in some cases a single
DC core element (for instance, Date) is represented by multiple elements in MODS.
According to the MODS – simple Dublin Core crosswalk developed by the Library of
Congress, all Dublin Core metadata from the 15 basic elements can be converted to
MODS, and vice versa.

Library of Congress described the format in this way:

       The Library of Congress' Network Development and MARC Standards Office,
       with interested experts, has developed a schema for a bibliographic element
       set that may be used for a variety of purposes, and particularly for library
       applications. As an XML schema, the "Metadata Object Description Schema"
       (MODS) is intended to be able to carry selected data from existing MARC 21
       records as well as to enable the creation of original resource description
       records. It includes a subset of MARC fields and uses language-based tags
       rather than numeric ones, in some cases regrouping elements from the MARC
       21 bibliographic format.




                                           22
To a person familiar with Dublin Core MODS looks like a format where the semantic
overlapping between the 15 core elements have been eliminated, and at the same time
the format has been much more hospitable to some key MARC data elements.

Since MODS does not include the axiomatic semantic kernel of Dublin Core, it would
have been impossible for the DCMI to approve it as the Library application profile.
Thus MODS had to be developed by the Library of Congress without any formal links
to the Dublin Core community. This may create a situation where MODS is generally
regarded as a competitor of Dublin Core, intended primarily for libraries. Due to its
limited semantics MODS is certainly not a serious alternative for MARC 21, unless
cataloguing rules undergo great simplification. MODS syntax is XML-based; this is
an obvious choice in all new formats, including ONIX.

ONIX has only a very thin relation to Dublin Core. In 1998 the book industry did
investigate the possibility of aligning Dublin Core and their own requirements (see
Bearman et al.). Although both sides did not lack good will and eagerness to co-
operate, the result was that the book trade decided to build their own format. At that
point Dublin Core still concentrated on resource discovery (by now the scope of the
initiative is already much broader, due to application profiles) and the book industry
may have wanted to build a system they can administer by themselves.

ONIX Web site says:

       The ONIX for Books Product Information Message is the international
       standard for representing and communicating book industry product
       information in electronic form, incorporating the core content which has been
       specified in national initiatives

ONIX version 1.0 had 211 elements. Version 2.1 was released in June 2003, and has
more elements than its predecessors. From complexity point of view ONIX is much
closer to MARC than Dublin Core. ONIX 1.0 elements have been mapped to both
MARC 21 and UNIMARC in 2000, unfortunately more up to date crosswalks are
missing. Since in ONIX 2.1 many old elements have been deprecated and many new
elements have been added, updating the crosswalk will not be easy. On the other
hand, even a quick review of the existing crosswalk indicates that the format
conversion is in this case not an easy one.

Many ONIX elements which are closely related to the needs of the book trade often
do not have a matching MARC element, which means that some ONIX data is lost in
the conversion. The same will happen if conversion is done from MARC; many
library related MARC elements and basically everything not related to books or
serials, do not map well to ONIX. But crosswalk indicates that core metadata can be
translated both ways, in spite of these problems.

If MODS may have some problems finding implementers since it falls between
MARC and Dublin Core, ONIX has not suffered from identity problems. Book
industry in many countries including Finland have built ONIX implementations. So
for instance national bibliographic agencies may soon receive from publishers
metadata that could assist them in creating the national bibliography.



                                          23
But it remains to be seen how well conversions from ONIX to MARC formats works
in practice; success is dependent on accurate and up-to-date crosswalks and efficient
converters using them, but also on quality of the metadata the cataloguers are
producing. At least in Finland some publishers have had quality problems. But even if
everything were in order there is some reason to be concerned, given the collective
experience the library community has on difficulty of converting records from one
MARC format to another. How could it be any easier to carry out conversions
between entirely different metadata formats? Given the high quality requirements of
national bibliography cataloguing, we may end up continuing our old practice or
primary cataloguing, even if ONIX metadata were available in MARC form.

Summary

Our title was MARC format and its rivals. Now its time to ask if MARC has any true
rivals?

Well, yes and no. From syntax point of view, there is some internal competition.
MARCXML is a rival of traditional MARC syntax as codified in ISO 2709. And there
is internal competition within MARC family between for instance MARC 21 and
UNIMARC. In this area MARC 21 is pushing its rivals out of business. But from
semantics point of view, there is nothing out there that could challenge the so called
Paris principles and their practical implementation in ISBD specifications, which in
turn have been defined in more details in AACR 2 and basically any other modern
cataloguing rules. Metadata formats which are not based on library cataloguing rules
do not have a slightest chance to be used for serious bibliographic description in
libraries. Of course a library can modify these formats in order to make them fully
compliant with e.g. AACR 2, but doing this does not make sense.

The most interesting challenge for MARC is a pragmatic one: does the MARC
community deem it useful to extend the scope of the standard to new areas such as
preservation metadata and collection description metadata (or, product information
metadata, encoded in detailed manner in ONIX). In many of these new areas, MARC
does contain some of the required metadata elements but not all of them; the question
is, how far should we go into these rabbit holes?

Anyway, it seems clear that in basic bibliographic description, libraries will rely on
the so called Paris principles and AACR/ISBD for quite a long time, although these
cornerstones will gradually evolve in order to accommodate new requirements set by
the changing socio-technical environment for libraries. Implementation of FRBR           Kommentti: Fundamental
                                                                                         changes such as FRBR will
during the next 5-10 years may however force us to modernise heavily the                 dramatically influence the
fundamental principles of bibliographic description and their implementation into        development.
cataloguing rules, even if these principles would still remain unbroken.

As regards the exchange syntax of our data, ISO 2709 will be complemented and
eventually replaced by an XML-based alternative such as MARCXML, since it may
make sharing our metadata with other domains such as museums and archives easier.
Our systems must also be able to read and write bibliographic metadata in other
formats, including ONIX and Dublin Core.




                                          24
In spite of the continuing central role of AACR/MARC in integrated library systems,
we must prepare ourselves for other choices in other systems. The role of MARC in
IR portals and digital object management systems is unclear; certainly as if this
writing the metadata requirements of encoding the access parameters of a remote
database go beyond the scope of AACR/MARC.

Using many metadata formats and cataloguing rules (which may not even exist) in
parallel will pose interesting challenges for library staff and patrons. This multiplicity
also has interesting implications to the design of our information systems. Systems
that are based on different metadata must maintain some level of semantic
interoperability in order not to alienate patrons. It remains to be seen what role Dublin
Core will have in defining the core semantics that must be shared by our future
metadata formats, and maybe even in providing a platform in which to develop them.




                                           25
References

Avram, Henriette; Freitag, Ruth & Guiles, Kay: A proposed format for a standardized
machine-readable catalog record. A preliminary draft. (ISS planning memorandum
number 3). [Washington, DC]: Library of Congress, 1965.

Avram, Henriette: The MARC pilot project. Final report on a project sponsored by the
Council on Library Resources, Inc. Washington, DC: Library of Congress, 1968.

Coward, R.E: MARC: national and international co-operation. In: The exchange of
bibliographic data and the MARC format = Austausch bibliographischer daten und
das MARC format. Proceedings of the international seminar on the MARC format
and the exchange of bibliographic data in machine readable form, sponsored by the
Volkswagen Foundation, in Berlin, June 14th – 16th 1971. Berlin: Verlag
Dokumentation, 1972. p. 17-26.

McCallum, Sally: MARC: Keystone for library automation. IEEE Annals of the
History of Computing. April-June 2002.


Web sites

MARC: http://www.loc.gov/marc/

Dublin Core: http://www.dublincore.org/

MODS: http://www.loc.gov/standards/mods/

ONIX: http://www.editeur.org/

AACR2: http://www.nlc-bnc.ca/jsc/

IFLA Cataloguing section (incl. ISBDs): http://www.ifla.org/VII/s13/sc.htm




                                          26

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:22
posted:12/12/2011
language:English
pages:26