PIPER Deliverable Template 3.0 by mimama

VIEWS: 45 PAGES: 31

									Deliverable D3.5
Title: DESIRE metadata registry framework




                 DESIRE II: PROJECT DELIVERABLE
Project Number:                  RE 4004 (RE)

Project Title:                   DESIRE II - Development of a European Service for Information on
                                 Research and Education II

Deliverable Type:                PU



Deliverable Number:              D3.5

Contractual Date of Delivery:    31.12.99

Actual Date of Delivery:         31.03.00

Title of Deliverable:            DESIRE metadata registry framework

Workpackage(s) contributing      WP3
to the Deliverable:

Nature of the Deliverable:       RE

Author:                          Rachel Heery, Tracy Gardner, Michael Day, and Manjula Patel

Contact Details:                 UKOLN: the UK Office for Library and Information Networking
                                 University of Bath
                                 Bath BA2 7AY, UK

                                 Tel: +44 1225 826580
                                 Fax: +44 1225 826838
                                 E-mail: r.heery@ukoln.ac.uk
                                 Web: http://www.ukoln.ac.uk/

URL                              http://www.desire.org/html/research/deliverables/D3.5/



Abstract                         Metadata registries enable authoritative information about metadata
                                 schemes to be declared and thus support the extensibility and
                                 evolution of element sets and provide some basis for interoperability.
                                 The DESIRE metadata registry demonstrates how a metadata
                                 registry might work. Elements from several different metadata
                                 element sets, including Dublin Core, have been added. This report
                                 gives a detailed technical overview of the DESIRE metadata registry
                                 implementation and its data model, additional information on the
                                 element sets (namespaces) included in the registry and some
                                 comments on metadata mappings and cross-walks.

Keywords                         Metadata registries
                                 Dublin Core
                                 ISO/IEC 11179




Project RE 4004 (RE)                                                                        Page 1 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


Distribution List:               DESIRE Project Team, European Commission, DESIRE project
                                 Web site.
Issue:                           1.0

Reference:                       registry-v10.doc

Total Number of Pages:           33




Project RE 4004 (RE)                                                                Page 2 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework



TABLE OF CONTENTS
PART I.................................................................................................................................. TITLE PAGE

DESIRE II: PROJECT DELIVERABLE ................................................................................................ 1

PART II.................................................................................................................................................... 5

Document Control ................................................................................................................................... 5
Executive Summary ................................................................................................................................ 5
Scope Statement ..................................................................................................................................... 6
PART III................................................................................................................................................... 7

Glossary .................................................................................................................................................. 7
1.     Introduction .................................................................................................................................... 11
     1.1    Metadata registries ................................................................................................................. 11
     1.2    Examples of metadata registries ............................................................................................ 12
       1.2.1    National Health Information Knowledgebase (Australia) ................................................ 12
       1.2.2    Environmental Data Registry (USA) ................................................................................ 12
       1.2.3    ROADS Template Registry (UK) ..................................................................................... 12
     1.3    Metadata registries and the DESIRE Registry framework ..................................................... 12
2.     Technical Overview........................................................................................................................ 13
     2.1   ISO/IEC 11179 ........................................................................................................................ 13
     2.2   Usage of the term Metadata ................................................................................................... 14
     2.3   Multiple Namespace Coverage .............................................................................................. 14
     2.4   Semantic Mapping via BSR .................................................................................................... 14
3.     Data Model ..................................................................................................................................... 15
     3.1    Namespaces and Versions ..................................................................................................... 15
     3.2    Semantic Layer ....................................................................................................................... 15
       3.2.1    Data Elements ................................................................................................................. 16
       3.2.2    Application Profiles .......................................................................................................... 16
       3.2.3    Schemes.......................................................................................................................... 16
     3.3    Qualified Dublin Core.............................................................................................................. 17
     3.4    BIBLINK Application Profile .................................................................................................... 17
     3.5    DC 1.0 / ROADS Cross-walk .................................................................................................. 18
4.     Usage of Prototype Implementation .............................................................................................. 18
     4.1    Navigation ............................................................................................................................... 18
       4.1.1    Index ................................................................................................................................ 18
       4.1.2    Browse ............................................................................................................................. 19
       4.1.3    Search ............................................................................................................................. 19
     4.2    Viewing Registered Entities .................................................................................................... 19
       4.2.1    Registration Authorities ................................................................................................... 19
       4.2.2    Namespace Concepts ..................................................................................................... 20
       4.2.3    Namespaces.................................................................................................................... 20
       4.2.4    Semantic Units ................................................................................................................ 21
       4.2.5    Elements.......................................................................................................................... 21
       4.2.6    Schemes.......................................................................................................................... 22
       4.2.7    Application Profiles .......................................................................................................... 22
     4.3    Generating a Cross-walk ........................................................................................................ 23
     4.4    Registry Glossary ................................................................................................................... 24
5.     Element Sets in Demonstrator ....................................................................................................... 24
     5.1    BIBLINK .................................................................................................................................. 25
     5.2    Dublin Core ............................................................................................................................. 25
       5.2.1    Dublin Core 1.0................................................................................................................ 25
       5.2.2    Dublin Core 1.1................................................................................................................ 25



Project RE 4004 (RE)                                                                                                                      Page 3 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


       5.2.3    Qualified Dublin Core ...................................................................................................... 26
     5.3    eLib Simple Collection Level Description ............................................................................... 26
     5.4    ISO Basic Semantics Register ............................................................................................... 27
     5.5    ROADS ................................................................................................................................... 27
6.      Metadata Mappings and Cross-walks............................................................................................ 28
7.     Prototype Implementation Details .................................................................................................. 29
     7.1   Database ................................................................................................................................ 29
     7.2   User Interface ......................................................................................................................... 29
     7.3   Admin Interface....................................................................................................................... 29
PART IV ................................................................................................................................................ 30

8.      References ..................................................................................................................................... 30




Project RE 4004 (RE)                                                                                                                     Page 4 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


PART II

DOCUMENT CONTROL
    Issue Number      Issue Date      Reason for Change

    0.1               17-02-00        Version sent to peer-reviewers

    1.0               31-02-00        Slightly revised in accordance with peer-reviewers comments.



EXECUTIVE SUMMARY
Metadata registries are formal systems that can disclose authoritative information about the
semantics and structure of the data elements that are included within a particular metadata scheme.
Registries would typically define the semantics of metadata elements, give information on any local
extensions in use, and provide mappings to other metadata schemes.

This information would ideally need to be stored in a syntax that is machine-readable (e.g. in
XML/RDF) as well as in human readable form so that information can be disclosed to both:

         Humans who want to create metadata according to defined standards; those who want to
          discover whether appropriate metadata element sets already exist for their purpose; those who
          wish to align their metadata sets with those that exist for other purposes.

         Software that wants to manipulate metadata and needs to know its structure and semantics;
          metadata creation tools which need to validate and present helpful user interfaces; conversion
          tools that need reference to mapping tables.

In order to demonstrate the feasibility of creating useful and scalable metadata registries, the DESIRE
project has developed a prototype metadata registry. The registry has been implemented using a
relational database (mySQL) but a standard Web browser provides both administrative and user
interfaces. Access to the registry is available from:

http://desire.ukoln.ac.uk/registry/

The DESIRE registry implementation follows the general principles of the ISO/IEC 11179 standard for
the specification and standardisation of data elements. Unlike most ISO/IEC 11179 based registries,
however, the DESIRE registry implementation has been designed to present data elements from
multiple namespaces in a consistent manner, rather than for the maintenance of authoritative
definitions of data elements under a single namespace. This means that in addition to providing basic
registry functions, the DESIRE registry implementation can provide mappings between different
metadata schemes. Within the registry, data elements are mapped onto a single semantic layer - in
this case those defined in the ISO Basic Semantics Register (BSR) - so that the mapping process is
simplified if and when new metadata vocabularies are added to the registry.

The registry is based on a data model intended to be rich enough to support the registration of
elements from multiple namespaces. This data model is influenced by existing data models (e.g. that
developed for Dublin Core) but is not based on them directly.

The prototype registry implementation currently presents information in human-readable format only,
due to constraints on time and effort available. The registry is accessible via a Web interface. An
index page gives access to browse and search interfaces for all of the entities that can be registered,
and to the page that generates crosswalks. For each registered namespace, information is available
on its registration authority (who registered or is responsible for it) and the namespace concept to
which it belongs (e.g. Dublin Core). Each data element registered is defined within a particular
namespace so that elements that share the same name but belong to different namespaces can be
identified separately. An application profile groups together sets of elements for use in a particular


Project RE 4004 (RE)                                                                                 Page 5 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


context. Within the registry, cross-walks between namespaces are automatically generated via the
BSR.

Several different element sets have been included within the demonstrator. The Dublin Core is
represented by three namespaces that correspond to Dublin Core 1.0 (RFC 2413, 1998), Dublin Core
1.1 and a form of qualified DC. Examples of what might be seen as forms of extended DC are the
BIBLINK Core elements used by the BIBLINK project for metadata conversion, and the simple
description elements developed as part of an UK eLib supporting study on collection level description.
The registry also includes elements from selected ROADS/IAFA Template-Types used by ROADS-
based information gateway services.


SCOPE STATEMENT
This report introduces the concept of metadata registries and describes the development and
implementation of a prototype registry as part of the DESIRE project. The objective of the deliverable
is to demonstrate the usefulness of metadata registries to the developers of metadata schemes and
their implementers. The deliverable should also be of interest to those with a research interest in
topics like metadata registries, the ISO/IEC 11179 standard, format conversion and metadata
cross-walks.

Several different groups concerned with metadata, e.g. the EU-NSF Working Group on Metadata
(1998), have suggested that the development of metadata registries will be important to authoritatively
define the semantics of metadata schemes, to promote their use, their extensibility and their
interoperability with other schemes. The prototype registry described in this report is important
because it demonstrates the implementation of an ISO/IEC 11179 compliant registry for the Dublin
Core elements, ROADS Templates and other resource discovery metadata schemes. It also tests the
automatic generation of cross-walks using an underlying semantic layer provided by the ISO Basic
Semantics Register (BSR). It relates to other DESIRE deliverables, in particular to the chapter on
interoperability in the DESIRE Information Gateways Handbook (D3.4). This chapter suggests that
registries should be developed to provide canonical definitions of all metadata elements within a
particular scheme, to disclose information on local usage, and to publish mappings to other schemes.

http://www.desire.org/handbook/3-7.html

The initial input into the prototype DESIRE metadata registry has been the Dublin Core and BIBLINK
Core element sets, the eLib simple collection level description elements and selected ROADS
templates. This has allowed the developers to test the structure of the registry database as regards
elements, qualifiers, local usage, and permitted values. It will now be possible to populate the registry
with other schemes in use within the DESIRE project, e.g. the LDAP directory schema used in D3.3.

This report is best read in conjunction with some investigative access to the registry itself, indeed
without use of the registry it may prove difficult to fully understand. The registry is accessible at:

http://desire.ukoln.ac.uk/registry/




Project RE 4004 (RE)                                                                        Page 6 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


PART III

GLOSSARY
AIHW
Australian Institute of Health and Welfare.


Application Profile
A set of elements with associated descriptions of usage for use in a particular context e.g. in a project,
service or group of collaborating services. An application profile may register elements or schemes
that are valid for use with particular elements.


BIBLINK
European project providing a flow of metadata between publishers and national bibliographic
services.


BSR
Basic Semantics Register. An ISO standard that identifies and defines semantic components for use
in data exchange.


BSU
Basic Semantic Unit.


Cedars
CURL Exemplars in Digital Archives. Project run by CURL and funded by eLib to investigate the
problems of digital preservation.


Cross Walk
A mapping from the elements of one namespace to the elements of another namespace.


CSS
Cascading Style Sheets.


CURL
Consortium of University Research Libraries.


Data Element / Element
The realisation of a semantic unit in a particular namespace.


DC
Dublin Core.


DCMI
Dublin Core Metadata Initiative.


DDC
Dewey Decimal Classification.




Project RE 4004 (RE)                                                                         Page 7 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


Element
A data element - in ISO/IEC 11179 terms, a "unit of data for which the definition, identification,
representation and permissible values are specified by means of a set of attributes."


Element Usage
A description of the interpretation of a particular element for use in specific contexts. Unlike an
element definition, an element usage definition does not introduce a new element name - it describes
a local usage of an existing element.


eLib
The Electronic Libraries Programme. A series of UK digital library research projects funded by the
JISC.


Enumerated List Scheme
A scheme which specifies a set of valid values - scheme elements. Scheme elements may be
registered within the registry, or they may be indicated via a reference to an external definition.


EPA
Environment Protection Agency.


GILS
Global Information Locator Service.


IAFA
Internet Anonymous FTP Archive.


IEC
International Electrotechnical Commission.


IETF
Internet Engineering Task Force.


IMS
IMS Global Learning Consortium, an international consortium of over 200 educational, commercial
and governmental organisations with the aim of promoting technical specifications for management
tools and educational content supporting distributed learning. The abbreviation once stood for
"Instructional Management Systems."


Indecs
An international initiative of rights owners, creating metadata standards for e-commerce.


ISO
International Organization for Standardization.


JISC
Joint Information Systems Committee of the UK Higher Education Funding Councils.


LCSH
Library of Congress Subject Headings.




Project RE 4004 (RE)                                                                        Page 8 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


LDAP
Lightweight Directory Access Protocol.


MARC
Machine-Readable Cataloguing.


MPEG
Moving Picture Experts Group.


MPEG-7
Formally named "Multimedia Content Description Interface", MPEG-7 aims to create a standard for
describing multimedia content data.


Namespace
A scoping device used for uniquely identifying registered entities. Identically named entities in
different namespaces can be distinguished. A namespace is identified via a URL including a name
and version identifier.


Namespace Concept
The shared basis for different versions of a namespace. A namespace is introduced as a version of a
particular namespace concept.


NEDLIB
European project attempting to create the infrastructure underlying a networked European deposit
library.


NHIK
National Health Information Knowledgebase. A registry run by the Australian Institute of Health and
Welfare.


NSF
National Science Foundation.


PHP
An open-source, cross-platform, HTML-embedded scripting language used to create dynamic web
pages.


Qualifier
A term that helps to refine the meaning of an element or attribute value. These are sometimes
separated into 'element qualifiers' (that refine the semantics of elements) and 'value qualifiers'
(contextual information about an element value, e.g. a 'scheme').


RDF
Resource Description Framework.


Registration Authority
Any organisation authorised to register data elements.




Project RE 4004 (RE)                                                                   Page 9 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


ROADS
Resource Organisation and Discovery in Subject-based Services. Software toolkit for Internet
information gateways initially funded as part of eLib.


Rule Set Scheme
A scheme specified by a set of rules that define or describe valid values. The rule set is indicated via
a reference to an external definition. The semantics of rule sets cannot be captured in any way within
the registry at present.


Schema
Detailed formal descriptions of metadata element sets, e.g. as in RDF Schemas.


Scheme
A description or specification of valid values, e.g. a type of qualifier. One or more schemes may be
associated with an element to specify valid values.


Semantic Unit
An informational element described independently of a specific namespace.


Semantic Layer
The set of semantic units registered in a registry.


SQL
Structured Query Language.


Sub Element
An Element which refines the definition of an existing element. The sub element inherits the definition
associated with the element it refines.


Value Components Scheme
A scheme which splits a value domain into multiple value components. A valid value is then made up
of a tuple of valid values from the value components. Note that it is the tuple that is a valid value - not
each of the values associated with value components.


Vocabulary
A list of element terms used by a particular metadata namespace. For example, in RDF terms, an
RDF Schema will describe a vocabulary developed to suit specific needs, e.g. the Dublin Core RDF
vocabulary.




Project RE 4004 (RE)                                                                         Page 10 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework




1. INTRODUCTION

1.1 Metadata registries

New patterns for managing metadata are emerging in relation to the various process of metadata
creation, maintenance of the metadata repository and inter-working between other services. Humans
and software are involved in these processes and need to be able to locate information about the
metadata schema that exist. One way of being able to declare information about the structure and
semantics of metadata element sets is through the development of metadata registries.

Metadata registries have been defined as (Bargmeyer, et al., 1997) "a formal system that records the
semantics, structure, and interchange formats of any type of data." The EU-NSF Working Group on
Metadata (1998) has elaborated their purpose:

                 The metadata schemas available on the Web will form a global
                 collection of namespaces that will effectively function as a distributed
                 registry. These registries will need to be managed, coordinated, and
                 ultimately connected. Registries will define the elements of metadata
                 schemas in a machine-readable syntax (e.g., RDF) and offer
                 authoritative listings of legal values, local extensions, mappings to
                 other schemas, and guidelines for good usage. They will serve both
                 humans, with readable text, and programs, with structured content
                 that can automatically be parsed. Their role will be both to promote
                 and to inform, thereby encouraging the use of standard formats.

Metadata registries, therefore, are systems that are designed to disclose authoritative information
about the structure and semantics of metadata element sets for both:

   Humans who want to create metadata according to defined standards, for those who want to
    discover whether appropriate metadata element sets already exist for their purpose, for those who
    wish to align their metadata sets with those that exist for other purposes.

   Software that wants to manipulate metadata and needs to know its structure and semantics,
    metadata creation tools which need to validate and present helpful user interface, conversion
    tools that need reference to mapping tables.

In short, metadata registries permit both the extensibility and interoperability of metadata element sets
(Heery, 1997). This is particularly important because there are a growing number of metadata
schemas now under development. Most have been designed for particular purposes but will need to
interoperate with metadata from other schemes. To give an idea of the extremely broad basis of this
development work, metadata schemes have been (or are being) developed for quite different domains
and with quite different functional requirements. For example:

   Internet resource discovery (e.g. Dublin Core, ROADS templates)

   Managing learning resources (e.g. the Meta-Data Specification developed by the IMS
    Consortium)

   Rights management (e.g. that being developed as part of Indecs)

   Multimedia (e.g. MPEG-7)

   Digital preservation metadata (e.g. draft scheme being developed for the Cedars and NEDLIB
    projects and the National Library of Australia)




Project RE 4004 (RE)                                                                        Page 11 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


   Administrative metadata


1.2 Examples of metadata registries

A small number of metadata registries have been developed. Some are structured according to the
ISO/IEC 11179 standard, " Specification and Standardization of Data Elements."


1.2.1 National Health Information Knowledgebase (Australia)

The Australian Institute of Health and Welfare (AIHW) maintains its National Health Information
Knowledgebase (NHIK) as an electronic storage site for Australian health metadata. The registry
provides information about the use of particular data elements as well as definitions and information
about permitted values. The Knowledgebase has been constructed according to the ISO/IEC 11179
standard.

http://www.aihw.gov.au/services/health/nhik.html


1.2.2 Environmental Data Registry (USA)

The US Environment Protection Agency (EPA) has established an ISO/IEC 11179 compliant
Environmental Data Registry that permits the retrieval of information about data elements and data
concepts found in selected EPA systems. The context of the registry is that of data surveys and data
collection, with an acknowledged hierarchy of authority for formulating definitions and permitted
values.

http://www.epa.gov/edr/


1.2.3 ROADS Template Registry (UK)

Although not compliant with ISO/IEC 11179, the ROADS Template Registry provides authoritative
information about existing ROADS Template-Types and enables users of the ROADS software to
define new data elements and/or Template-Types.

http://www.ukoln.ac.uk/metadata/roads/templates/


1.3 Metadata registries and the DESIRE Registry framework
As part of the DESIRE project, UKOLN has built a demonstrator of a metadata registry. Its
development was intended to help investigate a registry's functionality, in particular with regard to the
authoritative disclosure of metadata usage. For example:

   Definition of elements

   Element usage

   Allowed schemes

   Mappings to other namespaces

The registry is not designed for the purpose of managing a single namespace, but is intended to
provide information across a range of metadata schemes. In this way it differs from work taking place
within the Dublin Core Metadata Initiative (DCMI), although we believe it is relevant to questions of
data modelling in this context. In time we would expect a variety of registries to evolve. For example
each namespace, such as the Dublin Core (DC), might be registered authoritatively by a registry
owned by their own maintenance agency, with 'implementation level' registries linking into such
registries as appropriate.



Project RE 4004 (RE)                                                                       Page 12 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


Our initial input into the DESIRE registry has been different variants of Dublin Core, the BIBLINK Core
element set, ROADS templates and a simple collection level description scheme. This has allowed
the developers to test the structure of the database as regards elements, qualifiers, local usage, and
permitted values.

We have used ISO/IEC 11179 as a guide to constructing the registry and the chosen data model has
been strongly influenced by this standard. We are using units from the Basic Semantics Register
(BSR) as the basis for mapping between schemes. We hope this will ensure that our work fits in with
parallel activity in the wider forum of data registries.


2. TECHNICAL OVERVIEW

2.1 ISO/IEC 11179

The DESIRE Metadata registry follows the principles of the ISO/IEC 11179 standard for metadata
registries. ISO/IEC 11179 provides standards for the informational and organisational structure for
metadata registries. This approach ensures that the DESIRE registry builds on existing best practice
and is consistent with other work in the area. Where possible, the DESIRE registry uses terminology
from the ISO/IEC 11179 standard.

The key concept in ISO/IEC 11179 is the Data Element:

Data element
A unit of data for which the definition, identification, representation and permissible values are
specified by means of a set of attributes.

The ISO/IEC 11179 standard consists of six parts:

11179-1: Framework for the Specification and Standardization of Data Elements - Part 1 of the
standard provides an overview of registry structure and has influenced the design of the DESIRE
registry.

11179-2: Classification for Data Elements - This part of the standard is concerned with the structure
of the data model and supports the discovery of data elements. The data model and prototype
implementation has been influenced by this part of the standard.

11179-3: Basic Attributes of Data Elements - This part of the standard influenced the attribute sets
used for defining entities that can be registered within the DESIRE Registry. For the prototype a
subset of the required attributes was chosen; this subset was sufficient for a proof-of-concept
application.

11179-4: Rules and Guidelines for the Formulation of Data Definitions - Part 4 of the standard
discusses best practice for definition writing. This part of the standard was not directly relevant to the
DESIRE registry, which was concerned with registering existing definitions.

11179-5: Naming and Identification Principles for Data Elements - Part 5 of the standard
influenced the construction of identifiers within the DESIRE registry, but as with definitions, element
names were taken from existing metadata vocabularies.

11179-6: Registration of Data Elements - Part 6 provides details of the organisational infrastructure
for metadata registries. Terminology from this part of the standard has been employed within the
DESIRE registry but this part of the standard is not fully implemented within the prototype.




Project RE 4004 (RE)                                                                        Page 13 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


2.2 Usage of the term Metadata

It is useful to highlight the different uses of the term metadata in ISO/IEC 11179 and within the
resource discovery community. In ISO/IEC 11179 the term metadata refers to data that defines the
structure of data, for example, a Letter might consist of an Address, Salutation, Body, etc.
Descriptions of the Address and other components form the metadata for the Letter.

The elements registered in the DESIRE registry describe the structure of data, which is metadata in
the context of resource discovery. In other words we are registering metadata (in the ISO/IEC 11179
sense) about metadata (in the resource discovery sense).

In the DESIRE Metadata Registry data elements represent the elements or attributes within a
metadata vocabulary or element set.


2.3 Multiple Namespace Coverage

The DESIRE registry differs in intent from the majority of ISO/IEC 11179 metadata registries. Typically
a registry is responsible for maintaining definitions of data elements under a particular namespace, for
which the registering organisation has control. In the case of the DESIRE registry, the aim is to
present data elements from multiple namespaces in a consistent manner. This distinction means that
within the DESIRE registry, namespaces can also be registered, and each data element is associated
with a particular namespace.

In the future, the DESIRE registry could be expected to obtain definitions of elements from various
namespaces from their associated registration authorities. Currently, those authorities do not maintain
machine accessible metadata registries (although for, example, in the case of the Dublin Core
Metadata Initiative, there are plan to develop such a registry). Since metadata cannot currently be
obtained from external registries, the current approach has been to directly register elements from
multiple namespaces.


2.4 Semantic Mapping via BSR

In addition to basic metadata registry functionality, the DESIRE metadata registry aims to provide
mappings between different metadata vocabularies.

This can be achieved via hard-coded mapping tables detailing the relationships between the elements
in a source vocabulary and those in a target vocabulary. This approach has a high development and
maintenance cost due to the large number of mappings that must be developed to provide full
coverage.

The DESIRE registry takes an alternative approach. Instead of mapping between every pair of
vocabularies, every vocabulary is mapped onto an underlying semantic layer. The aim is that instead
of mapping from vocabulary A to vocabulary B, we map from A onto the underlying semantic layer,
and then back on to B. The result is that instead of having to create mappings between every pair of
vocabularies, it is only necessary to map between each vocabulary and the semantic layer. This
means that when introducing a new vocabulary into a registry with 20 vocabularies, it is only
necessary to add a single mapping (between the new vocabulary and the underlying semantic layer)
rather than 20 mappings, to support translation between the new vocabulary and those already
registered.




This approach does have potential disadvantages. Since mappings are not hand-crafted there is a
potential for a reduced quality level in auto-generated mappings. To counteract this it will be



Project RE 4004 (RE)                                                                      Page 14 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


necessary to build up a detailed and complex semantic layer so that vocabulary elements can be
precisely explained. If the semantic layer is not detailed enough then the translation will suffer from
information loss.

The DESIRE registry is a pilot application to trial this approach and provides a platform for future
research in this area.


3. DATA MODEL
The data model of the DESIRE registry is intended to be rich enough to support the registration of
elements from multiple namespaces. The data model is influenced by the models used for existing
namespaces (such as Dublin Core and BSR) but does not aim to reproduce them directly. The data
model is intended to be generic enough to support the registration of elements from multiple
namespaces and detailed enough to support advanced functionality such as the automatic generation
of cross-walks.


3.1 Namespaces and Versions
A namespace is a scoping construct that supports the definition of unique identifiers. Identifier x in
namespace A is distinct from identifier x in namespace B.

Namespaces can be registered in the DESIRE registry, and registered elements can be assigned to a
namespace where appropriate. For example, the fifteen Dublin Core elements can be registered as
belonging to a Dublin Core namespace. The data elements associated with a particular namespace
form a metadata vocabulary.

Since metadata vocabularies evolve over time, it is necessary to provide a mechanism for recording
this with the registry. To support this, namespaces have an associated version so Dublin Core version
1.0 and Dublin Core version 1.1 can be registered. An underlying `Namespace Concept' is also
registered to tie the versions together, in this case Dublin Core is registered as a namespace concept.

In the initial version of the registry, versioning is supported only at the namespace level. Individual
elements cannot be versioned.


3.2 Semantic Layer

The purpose of the semantic layer is to provide an underlying set of concepts onto which registered
vocabularies can be mapped - for example, notions of author and abstract will be needed. The
registered concepts must be precisely defined and unique. There should never be a situation where
multiple registered concepts have the same semantics. If concepts are not unique then the quality of
mappings between vocabularies will be reduced: elements may have the same semantics, but if they
map to different registered concepts then this information will not be available for use in
auto-generation of mappings. The semantic layer must therefore be managed, or be based on a
managed namespace with limited scope for extension.

The Basic Semantics Register provides a set of elements suitable for use in the DESIRE registry.
Mappings already exist between BSR and the Dublin Core and GILS metadata vocabularies. For the
initial prototype DESIRE registry, BSR elements corresponding to the Dublin Core have been
registered.

Note that the BSR only provides data for the DESIRE Registry, the registry is not limited to BSR for its
semantic layer. The data model allows concepts from other namespaces to be registered. However, it
should be emphasised that the concepts must come from a managed namespace. If BSR is found to
be appropriate for this purpose then any concepts that need to be added to support registry
functionality should either be added to the BSR standard or managed as a namespace extension.




Project RE 4004 (RE)                                                                      Page 15 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


The data model of the BSR also introduces a layer of `Basic Semantic Units' between vocabulary
elements and concepts. This layer allows a `representation class' to be attached to a concept to
further refine its description. A representation class describes the data type associated with the value
space of a concept - Name, Text and Code are examples of representation classes.

For some concepts, there is only one appropriate representation class. In this case the BSR
introduces only a BSU (since there is a 1-1 mapping between concepts and BSUs in such cases, the
introduction of a separate concept is superfluous). In other cases, multiple representation classes are
possible - for example a subject classification may be expressed as Text or as a Code.

The DESIRE registry currently has only a single semantic layer that combines concepts and BSU
from BSR. This approach offers a reduction in complexity and has provided sufficient modelling power
for to meet the requirements of the DESIRE registry.


3.2.1 Data Elements

Data elements are the units from which metadata vocabularies are built. A metadata vocabulary
consists of all the data elements in a particular namespace, or all the elements associated with a
particular application profile. For example, Dublin Core version 1.1 can be registered as a namespace
with the fifteen DC elements as data elements of that namespace.

A data element is a realisation of a BSU in a specific context.


3.2.2 Application Profiles

Application profiles describe data element usage for a particular application. The application may be a
specific project, a piece of software, an interchange format, etc.

Application profiles cannot introduce new data elements, data elements must have an associated
namespace. Application profiles can group together data elements from multiple vocabularies. An
application profile can also associate a scheme with a data element to specify valid values for that
data element in a specific application.


3.2.3 Schemes

Schemes provide a mechanism for attaching information about valid values for a particular data
element.

There are three kinds of scheme possible in the DESIRE registry:

   Enumerated List - The scheme specifies a set of valid values - scheme elements. Scheme
    elements may be registered within the registry, or they may be indicated via a reference to an
    external definition.

   Rule Set - The scheme is specified by a set of rules that define or describe valid values. The rule
    set is indicated via a reference to an external definition. The semantics of rule sets cannot be
    captured in any way within the registry at present.

   Value Components - The scheme splits a value domain into multiple value components. A valid
    value is then made up of a tuple of valid values from the value components. Note that it is the
    tuple that is a valid value - not each of the values associated with value components.

Recommended schemes may be associated with data elements. Additionally, schemes may be
associated with application profiles to reflect actual usage (strictly, a relationship can be introduced
between an application profile, a data element, and a scheme).




Project RE 4004 (RE)                                                                      Page 16 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


Where multiple schemes are permitted for a data element in a specific application it must be possible
to specify the scheme that has been used along with a particular value.

Where an element is repeated within a record with the same value component scheme, it must be
possible to distinguish the value components associated a particular element from the value
components associated with another. That is, there must be some grouping mechanism that
combines value components into a tuple, which is the value of the element.


3.3 Qualified Dublin Core

Qualified Dublin Core provides a rich set of elements and schemes that can be represented within the
framework of the DESIRE registry.

Notes:

   Only three DC elements are shown.

   Schemes are also associated with a registration authority although this is not shown. DCAgent
    would be associated with the DC registration authority and ISO8601 would be associated with an
    (externally managed) ISO registration authority.

   The registry permits refined elements to be further refined.

   The registry permits schemes to be associated with sub elements (that is, elements that refine
    another element.




3.4 BIBLINK Application Profile

The BIBLINK project defines a vocabulary, BIBLINK Core, which incorporates Dublin Core elements
as well as BIBLINK-specific elements. Vocabularies such as BIBLINK that describe specific usage,
and potentially draw elements from multiple namespaces, are referred to as Application Profiles in the
DESIRE registry.

The relationship between BIBLINK Core and the Dublin Core is relatively complex. BIBLINK Core
elements are one of:

   Element from the BIBLINK namespace.

   Element from the DC namespace.

   Element from the DC namespace with BIBLINK scheme.

   Element from the BIBLINK namespace which refines an element from the DC namespace.

The following diagram illustrates how this is represented within the DESIRE registry.

Notes:

   Not all elements and schemes are shown.

   Schemes also have a registration authority.




Project RE 4004 (RE)                                                                     Page 17 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


   The dotted notation indicates that a scheme is associated with an element in the BIBLINK
    application profile. For example, BIBLINK Core includes the Dublin Core element Format with the
    scheme MIME.




3.5 DC 1.0 / ROADS Cross-walk

The registry automatically generates cross-walks between namespaces based on relationships with
the underlying semantic layer that is currently based on BSR.

BSR includes a mapping to Dublin Core (assumed to be DC 1.0). A mapping from ROADS/IAFA
elements to Dublin Core (also DC 1.0) is also available:

http://www.ukoln.ac.uk/metadata/interoperability/dc_iafa.html

The relationships between DC 1.0 elements and the BSR semantic units were registered. Rather than
registering relationships between DC 1.0 elements and ROADS elements, relationships between
ROADS elements and the semantic units were deduced.

A cross-walk from ROADS to DC 1.0 can now be generated based on relationships with the semantic
layer.




4. USAGE OF PROTOTYPE IMPLEMENTATION
The DESIRE metadata registry currently presents information in a human-readable format (via a
web interface). However, it also provides the basis for future work into machine-accessible metadata
registries.

The prototype implementation does not provide support for end-user registration of new elements.


4.1 Navigation

The DESIRE registry offers both search and browse interfaces for navigating the registry.


4.1.1 Index

The index is the first registry page that the user is presented with after viewing the introductory
explanation text. The index can be returned to from any page within the registry by clicking the
DESIRE Registry logo that appears at the top of each page.

The index provides access to the browse and search interfaces for each of the entities that can be
registered, and to the page for generating cross-walks.




Project RE 4004 (RE)                                                                        Page 18 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


4.1.2 Browse

All of the registered entities of a particular type, for example namespaces, can be viewed via the
browse interface. The listings for each entity type can be accessed from the Index, or, for the most
widely used entities, from the menu that appears at the foot of each page.

The browse listings show a short description of each registered entity. For entities with further
associated information, the full description can be accessed by selecting Detail' for a particular item.


4.1.3 Search

The search interface for a particular entity type can be accessed from the Index page. Equality and
substring (contains) searches are supported for each entity type.

For example, via the Namespace search, it is possible to search for namespaces where the id
contains dc.


4.2 Viewing Registered Entities

For each major entity type that can be registered within the registry, a short (browse) form is
displayed when browsing all entities of a particular type or viewing search results. Where further
information is associated with an entity, a full description can be accessed by clicking on 'Detail' in the
short form. Minor entities (such as the value elements associated with a scheme) are only accessible
through navigating links from major entities.

The registry supports navigation through the data model via hyperlinks. Where one entity refers to
another, for example where an element refers to its registration authority, the related entity can be
accessed by clicking on its identifier.


4.2.1 Registration Authorities

According to ISO/IEC 11179 a registration authority is 'any organization authorized to register data
elements'. ISO/IEC 11179 provides detailed instructions for the setup and conduct of a registration
authority. The registration authorities referred to in the DESIRE registry prototype are not registration
authorities in this formal sense although they play the same role. The formal registration of authorities
was not considered necessary for the registry prototype although it would be appropriate for a real
service.

Within the DESIRE registry, registration authorities are used to indicate the source of the definition of
a data element or other entity that is registered. For example, the Dublin Core Metadata Initiative is
the registration authority for elements in the dc/1.0 and dc/1.1 namespaces.

For each registration authority a name and a URL associated with the authority is given. A short
identifier is also introduced, this is used for cross-referencing.

Example:

ID     Name                                                      URL

DC     Dublin Core Metadata Initiative                           http://purl.org/dc/




Project RE 4004 (RE)                                                                         Page 19 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


4.2.2 Namespace Concepts

Namespace concepts relate different versions of the same namespace. A namespace is formed by
associating a version number with a namespace concept. For example dc/1.0 and dc/1.1 are both
versions of the Dublin Core (dc) namespace concept.

The full view for namespace concepts includes the following information:

Name                                           Dublin Core

ID                                             dc

Registration Authority                         DC

Description                                    The Dublin Core is a simple metadata element
                                               set intended to facilitate discovery of electronic
                                               resources.

Status                                         Community Consensus - RFC

URL                                            http://purl.org/dc/

Comment



The namespaces derived from this namespace concept are also listed:

Namespace

dc/1.0

dc/1.1




4.2.3 Namespaces

Every element within the DESIRE registry is defined within a namespace. This means that elements
that share the same name, but belong to different namespaces, can be uniquely identified. For
example, the element 'Title' appears in both dc/1.0 and dc/1.1.

ID                           dc/1.0

Version                      1.0

Description                  Version 1.0 of the Dublin Core Element Set. 15 elements are
                             defined.

Registration Authority       DC

Namespace Concept            dc

Status                       RFC

URL                          http://purl.org/dc/documents/rec-dces-199809.htm




Project RE 4004 (RE)                                                                        Page 20 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework




The elements and/or semantic units that are defined within the displayed namespace are also shown.
This means that the Namespaces provide a useful navigation point for the registry.


4.2.4 Semantic Units

Semantic Units define concepts independently of a specific context. The definitions of semantic units
are taken from the Basic Semantics Register (BSR). Only definitions that correspond to the 15 Dublin
Core elements have been registered for the purposes of the prototype.

Example:

Name                         InformationResource.Name

ID                           bsr/1.0/2043

Namespace                    bsr/1.0

Definition                   The name given as the distinctive designation of the information
                             resource. Note: GILS name: title, Dublin Core name: title; MARC
                             245$a

Status                       Under Review

Comment



The status 'Under Review' corresponds to the status of ISO BSR.


4.2.5 Elements

Elements (also referred to as data elements or metadata elements) are the central items in the
registry. All elements should have a corresponding semantic unit in order to support the automatic
generation of cross-walks.

Example:

ID                        dc/1.1/title

Name                      Title

Definition                A name given to the resource.

Datatype                  Character String

Obligation                Optional

Namespace                 dc/1.1

URL                       http://purl.org/DC/documents/rec-dces-19990702.htm



The fields datatype and obligation are required for all elements under ISO/IEC 11179.




Project RE 4004 (RE)                                                                    Page 21 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


The following items are also displayed and can be used for navigation:

    Usage - Usage details for this element in specific contexts.

    Schemes - Schemes that specify constraints on valid values for this element.

    Semantic Unit - The semantic unit, if any, to which this element corresponds.

    Super Element - The element, if any, which this element refines.

    Sub Elements - The elements, if any, which refine this element.


4.2.6 Schemes

Schemes specify valid values for data elements. Schemes encapsulate representation details that are
required for ISO/IEC 11179. Specifying schemes separately from data elements allows multiple
schemes to be associated with the same element. This is appropriate when the registered elements
are metadata elements with multiple valid schemes. For example, there may be multiple valid
classification schemes associated with a 'subject' data element, any one of these schemes could be
used provided the scheme is specified along with the value.

Example:

ID                          RFC1766

Description                 Language codes

Registration Authority      IETF

URL                          http://www.ietf.org/rfc/rfc1766.txt



The following related items are also shown:

    Applies to Elements - The elements within the registry to which the scheme applies.

    Application Profile Usage - The elements, if any, that this scheme is associated with via an
     application profile.

    Components - The value components associated with any data element using this scheme (only
     appropriate if the current scheme is a 'value component scheme').

    Values - The values that can be used for a data element using this scheme (only appropriate if
     the current scheme is an 'enumerated list scheme').

Note that the same scheme may be associated with multiple data elements - for example the
ISO8601 scheme for dates may be associated with any data element that can take a date value.


4.2.7 Application Profiles

An application profile groups together a set of elements for use in a particular context. For example,
the BIBLINK application profile describes the metadata element set used within the BIBLINK project.

An application profile may use registered elements directly, or may introduce context-specific details.
Schemes can also be associated with elements via application profiles. The schemes that apply to
data elements directly are assumed to be valid within the application profile (a possible extension to



Project RE 4004 (RE)                                                                     Page 22 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


the data model would be to support the overriding of inherited schemes with those within an
application profile).

Example:

ID                           biblinkcore

Name                         BIBLINK Core

Version                      1.0

Description                  The 19 BIBLINK Core fields.

Status                       Deployed

Registration Authority       BIBLINK

Constraints



The constraints field allows constraints that apply to more than one element to be specified. For
example, it would be possible to specify that at least one of the 'publication date' and 'version fields'
must appear in a metadata record.

The following information is also presented:

Elements - The elements that are included in the application profile.

Usage - The usage descriptions associated with any included elements.

Schemes - The schemes that apply to elements included in the profile.


4.3 Generating a Cross-walk

The DESIRE registry supports the automatic generation of cross-walks between layers, via the Basic
Semantics Register (BSR).

The namespaces to map from and to are selected and clicking on 'Go' will generate a cross-walk.

For example, generating a cross-walk from dc/1.0 (Dublin Core version 1 elements) to roads/2.0
(ROADS 2.0 template attributes) gives the following result:

          dc/1.0                   via                     roads/2.0

dc/1.0/title            bsr/1.0/2043            roads/2.0/Title

dc/1.0/creator          bsr/1.0/2044            roads/2.0/Author-Name

dc/1.0/date             bsr/1.0/2046            roads/2.0/Creation-Date

dc/1.0/Subject          bsr/1.0/2050            roads/2.0/Keywords

dc/1.0/Description      bsr/1.0/2049            roads/2.0/Description

dc/1.0/Publisher        bsr/1.0/2071            roads/2.0/Publisher-Name




Project RE 4004 (RE)                                                                       Page 23 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


dc/1.0/Contributor

dc/1.0/Type             bsr/1.0/2069           roads/2.0/Category

dc/1.0/Format           bsr/1.0/2094           roads/2.0/Format

dc/1.0/Identifier       bsr/1.0/2095           roads/2.0/URI

dc/1.0/Source           bsr/1.0/2096           roads/2.0/Source

dc/1.0/Language         bsr/1.0/2048           roads/2.0/Language

dc/1.0/Relation

dc/1.0/Coverage

dc/1.0/Rights



All elements of the 'from' namespace (dc/1.0 in this case) are listed in the left-hand column. For all
cases where there is a corresponding element in the 'to' namespace (roads/2.0) in this case, the 'to'
element is listed together with the semantic unit via which the mapping was made.

Note that direct mappings from dc/1.0 to roads/2.0 are not stored in the registry. The mapping is
possible because both namespaces are mapped to semantic units.


4.4 Registry Glossary

The glossary within the registry provides definitions of terms used within the DESIRE registry which
do not correspond to entities to be registered. These terms include the standards used (such as
ISO/IEC 11179), terms used in the data model (such as Element) and terms associated with
registered namespaces (such as Dublin Core).

A list of all glossary terms can be viewed via the browse interface (accessible from the Index or the
Menu at the foot of each page). Alternatively, specific terms can be searched for via the search
interface (accessible from the Index).

For each glossary term an associated definition and URL is provided. The URL links to the external
source of the definition, if appropriate, and to the registry itself for terms that have a particular
meaning within the DESIRE registry.

The glossary section of this document includes the terms defined in the DESIRE registry glossary but
is extended to expand acronyms used within this document.


5. ELEMENT SETS IN DEMONSTRATOR
The DESIRE metadata registry demonstrator does not manage a single namespace but provides
information across a range of metadata schemas. Several different element sets have been included
within the registry so that it can demonstrate how an application profile can group together a set of
elements for use in a particular context. For example, the BIBLINK Core application profile would
include all 19 BIBLINK Core elements, made up of elements taken from the both the dc/1.0 and
biblink/1.0 namespaces.

The following sub-sections give general background information on all of the element sets currently
included in the demonstrator.



Project RE 4004 (RE)                                                                     Page 24 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


5.1 BIBLINK

BIBLINK was a multi-partner project that was funded by European Commission DG XIII/E-4 under the
Telematics Applications Programme of the European Union's Fourth Framework Programme. The
project commenced in 1996 and was concerned with the establishment of electronic links between
national bibliographic agencies and publishers (Day, Heery and Powell, 1999). The project developed
a demonstrator system that enabled publishers of digital objects to submit metadata to a BIBLINK
Workspace (BW) where it could be converted into the (mostly) MARC-based formats used by the
participating national bibliographic agencies. These agencies could then enhance the records (e.g. by
the application of authority control for proper names or the addition of subject information) for
inclusion in a national bibliography or for returning to the publisher.

In order to help demonstrate the feasibility of the metadata conversion process and to provide a
relatively simple format in which publishers could submit metadata, Project BIBLINK defined the
semantics of 19 metadata elements - the BIBLINK Core (BC). The BC elements were based on
Dublin Core and constituted an extended form of DC (as it was then defined). BC included 12 of the
basic 15 DC elements (some qualified), with an additional 7 elements derived from the participating
national libraries own metadata requirements. The semantics of the BC elements are described at:

http://hosted.ukoln.ac.uk/biblink/wp8/fs/bc-semantics.html


5.2 Dublin Core

The Dublin Core Metadata Initiative (DCMI) is an international and interdisciplinary attempt to define a
'core' set of descriptive metadata elements for resource discovery. The element set was initially
developed through a series of workshops sponsored by the Online Computer Library Center (OCLC)
and other organisations, the first workshop being held at OCLC's US headquarters in Dublin, Ohio in
March 1995. This arrangement has since became more formalised with the creation of a Dublin Core
Directorate (hosted by the OCLC Office of Research) with an Executive Committee and an Advisory
Committee.

The Dublin Core Metadata Element Set currently consists of 15 elements. These elements were first
formally defined in RFC 2413 (1998), but the most recent definition is contained in the Reference
Description of the Dublin Core Metadata Element Set Version 1.1 (1999). Both versions have been
included in the DESIRE metadata registry.

The Dublin Core home page can be found at:

http://purl.org/dc


5.2.1 Dublin Core 1.0

RFC 2413 (1998) provided definitions of the semantics of the fifteen Dublin Core elements. These
definitions are known as the Dublin Core Metadata Element Set Version 1.0. DC Version 1.0 has
since been superseded by DC Version 1.1. A Reference Description of DC Version 1.0 can be found
at:

http://purl.org/DC/documents/rec-dces-199809.htm


5.2.2 Dublin Core 1.1

The Dublin Core Metadata Element Set Version 1.1 contains updated definitions for the metadata
elements originally defined in RFC 2413 (1998). In Version 1.1 the element definitions are formally
described using ten attributes taken from ISO/IEC 11179. A Reference Description of DC Version 1.1
can be found at:

http://purl.org/DC/documents/rec-dces-19990702.htm



Project RE 4004 (RE)                                                                      Page 25 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


5.2.3 Qualified Dublin Core

From an early stage of the development of Dublin Core it has been envisaged that the basic 15 DC
elements would need to be refined by the use of qualifiers. In DC terms, qualifiers are attributes that
may be used to further refine (but not extend) the meaning of a DC element. The DC-4 workshop in
Canberra defined three types of qualifier known as TYPE (or sub-element), SCHEME and
LANGUAGE (Weibel, Iannella and Cathro, 1997). Work on implementing DC in terms of the Resource
Description Framework (RDF) revealed, however, that the use of these terms could at times be
confusing, as similar tasks can often be "tackled by means of different solutions" (Miller, Miller and
Brickley, 1999). As a result, the DC Data Model Working Group began to reconsider DC qualifiers
from first principles and then evolved the following structure:

   Element qualifiers. Element qualifiers refine the semantics of a DC element. For example, the
    term 'photographer' is a narrower one than 'creator'. Therefore, the term 'photographer' refines
    'creator' and is a valid element qualifier. Element qualifiers must conform to the 'dumb-down
    principle' which states that the value of a qualified element must also be a valid value of the
    unqualified element.

   Value qualifiers. Value qualifiers provide contextual information about the value of a DC element.
    Some value qualifiers explain how the value should be parsed. For example, a value qualifier of
    'ISO 8601' indicates that the string '1999-10-02' should be parsed as the second day of October
    1999. Other value qualifiers indicate that the value is taken from a particular controlled
    vocabulary. For example, a value qualifier of 'LCSH' would indicate that the value is taken from
    the list of Library of Congress Subject Headings; 'DDC' would indicate a classification code taken
    from the Dewey Decimal Classification system.

   Value components. Some values are composed of multiple components. Components may be
    implicit in a value that is formatted in a particular way. For example, an ISO 8601 date may be
    made up of a year, a month and a day. Other components may be explicitly named within a value.
    For example, a value encoded using XML. A value made up of value components is known as a
    'structured value'. The DC Data Model Working Group has reached no firm consensus yet about
    how such value components should be represented.

DCMI has set up working groups to identify the qualifiers that pertain to their group's focus - typically a
single element. General information on the DCMI Working Group Qualifier Proposals can be found at:

http://purl.org/DC/groups/qualifierlist.htm


5.3 eLib Simple Collection Level Description

In 1997, following a workshop on "Integrating access to resources across domains" (MODELS 4), the
UK Electronic Libraries Programme (eLib) commissioned a 'supporting study' on collection level
description. Some phase 3 eLib projects had noted the need for some cross-domain collection
description standard to aid large-scale resource discovery services (or clumps). A Collection
Description Working Group was set up and produced a review of existing practice and a proposal for
a core set of collection level description attributes (Powell, 1999). The proposal contained 23
elements (12 of them taken from DC) grouped into those that describe a collection itself and those
that describe a service that provides access to a collection.

The proposed set of collection description metadata elements were intended to allow:

   Users to discover, locate and access collections of interest;

   Users to perform searches across multiple collections in a controlled way;

   Software to perform such tasks on behalf of users based on known user preferences.

Definitions of the semantics of these simple collection level description elements can be found at:



Project RE 4004 (RE)                                                                         Page 26 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


http://www.ukoln.ac.uk/metadata/cld/simple/


5.4 ISO Basic Semantics Register

A working group (WG 1) investigating the creation of a Basic Semantics Register (BSR) was set up in
1998 by ISO Technical Committee 154 (TC 154) to "act as a central reference to assist in the
universal, multilingual understanding of data across commerce, industry and administration." The
BSR has been defined (Chapdaniel, 1999) as an "official ISO register of non-ambiguously defined
semantic data." BSR data is identified by numbers - so it is not dependent on any particular language
- and are intended to describe concepts independently of any particular context. Because BSR
concepts are context-independent, BSR-defined semantic units are a useful way to share semantics
in a "neutral" way (Chapdaniel, 1999). BSR can also act as a tool for establishing bridges between
different data dictionaries. The definitions of the terms in the register have been defined using the
rules laid down in ISO/IEC 11179 (Bryan and Li, 1999).

Both semantic components and semantic units have been proposed as BSR content. Basic Semantic
Units (BSUs) have been defined as (Premenos, 1995):

                 ... a concept unambiguously defined and applicable in one or more
                 contexts in an EDI environment. It may be part of a broader concept
                 in which case it shall possess at least all the characteristics of that
                 concept.

An semantic unit description in BSR includes its identification number, its type (in this context, usually
a BSU), a definition (including notes of corresponding fields in GILS, DC and MARC) and its name in
both English and French. For example, the proposed BSU for language codes takes the following
form:

ID      TYPE      DEFINITION                                 NAME (ENGLISH)           NAME (FRENCH)

2048    BSU       The code identifying the language of       InformationResource.     GILS      name:
                  the information resource. Note: GILS       Language.Code            langue   de  la
                  name: language of resource; Dublin                                  resource
                  Core name: language; MARC 041$a

Within the DESIRE metadata registry, cross-walks between element sets (namespaces) are produced
by mapping all elements (where possible) to the BSR.

More information on BSR and the work of ISO TC 154 WG 1 can be found at:

http://forum.afnor.fr/afnor/WORK/AFNOR/GPN2/TC154WG1/index.htm

More information on the work of ISO TC 154 can be found at:

http://www.iso.ch/meme/TC154.html


5.5 ROADS

The ROADS software is a suite of programs intended to aid in the setting up and day to day running
of World Wide Web based catalogues of on-line resources. The UK Electronic Libraries Programme
(eLib) initially funded the development of ROADS as part of its Access to Network Resources (ANR)
strand. eLib also funded a number of gateway-type services, some of which implemented the ROADS
software tools and contributed to its development. A number of Internet subject guides or gateways
are currently based on ROADS.

ROADS-based services use a metadata format known as ROADS templates. They are based on the
Internet Anonymous FTP Archive (IAFA) templates that were published in an Internet-Draft in 1994



Project RE 4004 (RE)                                                                        Page 27 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


(Deutsch et al., 1994). For this reason they are sometimes referred to as ROADS/IAFA templates.
The templates themselves are text (ASCII) based and take the form of simple attribute-value pairs
separated by a colon and a space. ROADS templates are currently defined for 15 different
resource-types. These are known as Template-Types. Some of these Template-Types (e.g.
DOCUMENT, MAILARCHIVE and SERVICE) existed in the original IAFA template specification.
Others have been developed specifically for ROADS-based services (e.g. PROJECT). At least one of
the others (TRAINMAT) was independently developed and has been published as RFC 2007.

The DESIRE metadata registry contains attributes used in the ROADS Template-Types used for
SERVICE and DOCUMENT.

More information about the different Template-Types available and a template registry can be found
at:

http://www.ukoln.ac.uk/metadata/roads/templates/

More information about the ROADS project can be found at:

http://www.ilrt.bris.ac.uk/roads/


6. METADATA MAPPINGS AND CROSS-WALKS
In order to promote semantic interoperability between metadata formats, metadata registries need to
contain mappings and cross-walks between metadata standards. In a resource discovery context,
these mappings and cross-walks can be used both for format conversion (converting one metadata
format into another) and for enabling the development of systems that enable searching across
heterogeneous data sources.

The EU-NSF Working Group on Metadata (1998) has attempted to define what is meant by these
terms:

                 Mappings … represent relationships that are unambiguous; they
                 support transparent searching across domains. Crosswalks are more
                 complex frameworks that establish the relationship between
                 schemas that have significantly different syntaxes or semantics.

A number of metadata mappings and cross-walks have been published (e.g. Day, 1996). Perhaps the
most well known of these is the Dublin Core, MARC 21 and GILS cross-walk produced by the Library
of Congress Network Development and MARC Standards Office (1999).

Producing accurate mappings and cross-walks is not particularly easy. St. Pierre and LaPlant (1998)
comment that:

                 Unfortunately, the specification of a crosswalk is a difficult and
                 error-prone task requiring in-depth knowledge and specialized
                 expertise in the associated metadata standards. Obtaining the
                 expertise to develop a crosswalk is particularly problematic because
                 the metadata standards themselves are often developed
                 independently, and specified differently using specialized
                 terminology, methods and processes.

For this reason, once an authoritative mapping or cross-walk has been developed, a metadata
registry provides a logical place to declare the information.

The prototype DESIRE metadata registry implementation does not contain direct mappings between
the element sets included within the registry, even where these are available, e.g. for ROADS



Project RE 4004 (RE)                                                                    Page 28 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


Templates to Dublin Core (Day, 1996). Instead, all elements should be mapped onto a semantic unit
taken from the ISO Basic Semantic Registry. For example:

 Dublin Core 1.1       BSR Semantic Unit                                   ROADS Template

 Language              2048 InformationResource.Language.Code              Language-v1

Using this information, the registry can automatically generate a cross-walk between two
namespaces. This approach means that it will be easier to keep the registry up to date and to add
new metadata vocabularies.

However there are disadvantages as well. If the semantic layer itself is not very detailed or consistent,
inaccuracies could creep into the generated cross-walks. In particular, if the semantic layer is not
detailed enough, the translation will begin to suffer from information loss. On the other hand, if the
semantic later is not general enough, 'coarse-grained' navigation and discovery could begin to suffer.
In principle, these problems could be addressed by defining hierarchical relationships within the
intermediate layer and defining more detailed relationships between this layer and the elements in a
particular scheme.


7. PROTOTYPE IMPLEMENTATION DETAILS
The prototype DESIRE Registry, described here, is accessible from:

http://desire.ukoln.ac.uk/registry/


7.1 Database

The DESIRE registry prototype is built on top of a relational database. This approach is well
understood and straightforward to implement. Potential alternatives would have been an XML
database or a database storing RDF triples. Alternative approaches to storing metadata are being
considered elsewhere in the DESIRE project and the outcomes of that research will influence future
development of metadata registries. For the DESIRE metadata registry prototype implementation a
simple relational database was sufficient to implement the data model and provide a proof of concept
application.

The freely available and widely used mySQL database was used on a Solaris system.


7.2 User Interface

The user interface for the registry was implemented as a web application. The user interface for the
prototype provides read-only access to registry data.

PHP was chosen as a scripting language for its rapid development cycle and integrated access to
mySQL. The user interface is accessed via a web browser. CSS1 style sheets have been used for
layout but the registry is still accessible via older browsers without support for style sheets.


7.3 Admin Interface

The admin interface to the DESIRE registry prototype was provided by the PHPMyAdmin tool, which
provides a web interface for managing mySQL databases. This interface provides read/write access
to the database and was used for data entry.

The admin interface requires users to understand the underlying relational database schema and is
not suitable for general end users. The prototype does not provide an interface for end-users to
update the registry.




Project RE 4004 (RE)                                                                       Page 29 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


PART IV

8. REFERENCES
Bargmeyer, B., McCarthy, J., Olken, F. and Miller, N., 1997, Joint Workshop on Metadata Registries:
workshop report. Draft 1.6, 23 December. http://pueblo.lbl.gov/~olken/Workshop/report.html

Chapdaniel, A., 1999, Basic Semantics Register - tools for federation. Recent Developments in
Standards for Electronic Publishing, Paris, 21-22 January. http://inf2.pira.co.uk/agenda9.htm

Day, M., 1996, Mapping between metadata formats. Bath: UKOLN: the UK Office for Library and
Information Networking. http://www.ukoln.ac.uk/metadata/interoperability/

Day, M., Heery, R. and Powell, A., 1999, National bibliographic records in the digital information
environment: metadata, links and standards. Journal of Documentation, 55 (1), 16-32.

Deutsch, P., Emtage, A., Koster, M. and Stumpf, M., 1994, Publishing information on the Internet with
Anonymous FTP. Internet-Draft. http://info.webcrawler.com/mak/projects/iafa/iafa.txt

Bryan, M. and Li, M.-S., 1999, Electronic Data Interchange (EDI) Standards. Luxembourg: European
Commission Information Society DG. http://www2.echo.lu/oii/en/edi.html

EU-NSF Working Group on Metadata, 1998, Metadata for digital libraries: a research agenda. Draft
10. Le Chesnay: ERCIM. http://www.ercim.org/publication/ws-proceedings/EU-NSF/metadata.html

Heery, R., 1997, Naming names: metadata registries. Ariadne (Web version), No. 11, September.
http://www.ariadne.ac.uk/issue11/metadata/

ISO 8601:1988, Data elements and interchange formats -- Information interchange -- Representation
of dates and times. Geneva: International Organization for Standardization.

ISO/IEC 11179 (Parts 1 to 6), Information technology -- Specification and standardisation of data
elements. Geneva: International Organization for Standardization.

Miller, E., Miller, P. and Brickley, D., 1999, Guidance on expressing the Dublin Core within the
Resource Description Framework (RDF). Dublin Core Metadata Initiative, Working Draft. Bath:
UKOLN. http://www.ukoln.ac.uk/metadata/resources/dc/datamodel/WD-dc-rdf/

Network Development and MARC Standards Office, 1999, Dublin Core/MARC/GILS Crosswalk.
Washington, D.C.: Library of Congress, 14 October. http://lcweb.loc.gov/marc/dccross.html

Powell, A., ed., 1999, Simple Collection Description. Draft Version. Bath: UKOLN: the UK Office for
Library and Information Networking, 2 August. http://www.ukoln.ac.uk/metadata/cld/simple/

Premenos,      1995,    Re:     BSR    specs?    Mail   to    EDI-L    mailing-list,   23   January.
http://mlarchive.ima.com/edi-l/1995/0187.html

RFC 2007, 1996, Catalogue of network training materials. Internet Engineering Task Force.
http://www.ietf.org/rfc/rfc2007.txt

RFC 2413, 1998, Dublin Core metadata for resource discovery. Internet Engineering Task Force,
September. http://www.ietf.org/rfc/rfc2413.txt

St. Pierre, M. and LaPlant, W.P., 1998, Issues in crosswalking content metadata standards.
Bethesda, Md.: NISO, 15 October. http://www.niso.org/crsswalk.html




Project RE 4004 (RE)                                                                    Page 30 of 31
Deliverable D3.5
Title: DESIRE metadata registry framework


Weibel, S., Iannella, R. and Cathro, W., 1997, The 4th Dublin Core Metadata Workshop report. D-Lib
Magazine, June. http://www.dlib.org/dlib/june97/metadata/06weibel.html




Project RE 4004 (RE)                                                                 Page 31 of 31

								
To top