The Role Of Metadata
Document Sample


The Role Of Metadata
Brian Kelly
UKOLN
University of Bath
Bath, BA2 7AY
Email
B.Kelly@ukoln.ac.uk
URL
http://www.ukoln.ac.uk/web-focus/presentations
UKOLN is supported by:
A centre of expertise in digital information management www.ukoln.ac.uk
Introduction Contents
• Introduction
• Background To Metadata
• Metadata Standards
• Metadata Management
• Metadata And Quality
• Conclusions
The Brief
"I know from conversations … I have had with customers,
that metadata poses some really difficult questions …"
The talk addresses the questions:
What is metadata and why is it important? What's this Dublin
Core I've heard about (and why Dublin?) What benefits will I
get if I use metadata? How should I do it? What will it cost
me? of expertise in digital information management
A centre www.ukoln.ac.uk
Introduction About UKOLN / Web Focus
UKOLN:
• A national centre of expertise in digital information
management (including metadata)
• Based at University of Bath
• Funded by JISC and Resource to support the
Higher & Further / cultural heritage sectors
UK Web Focus:
• Provides advice and support on Web issues,
especially standards and best practices
• Provided by Brian Kelly
• Funded by JISC from Nov 1996 - August 2003.
Now jointly funded by JISC & Resource
QA Focus:
• Developing QA methodology to support JISC
digital library programmes
A centre of expertise in digital information management www.ukoln.ac.uk
Introduction About You
How many are:
• Librarians
• Software / systems developers (techies)
• Commercial vendors
• Others
What is the extent of your knowledge of metadata?
Novice Average Expert
??? MARC RDF
Dublin Core OAI
… CLD
…
A centre of expertise in digital information management www.ukoln.ac.uk
Background What is Metadata?
"This metadata you've been talking about …. isn't it
just catalogue records?"
Question at metadata seminar, 1998
Metadata can be regarded as:
• Catalogue records for the Web
• Data about data
• Structured information suitable for automated
processing
Metadata Demystified
In current practice, the term has come to mean structured
information that feeds into automated processes, and this
is currently the most useful way to think about metadata
http://www.niso.org/standards/resources/Metadata_Demystified.pdf
A centre of expertise in digital information management www.ukoln.ac.uk
Background The Problem
Back in mid-1990s:
• Size of Web growing exponentially
• Web being used for both scholarly and
non-scholarly (!) purposes
• Need for better searching mechanisms
• Search engines seemed promising, but
concerns over abuse (e.g. porn index
spammers) and difficulties in finding
quality information
• Various sectors came together to develop
a core set of metadata attributes for
resource discovery
A centre of expertise in digital information management www.ukoln.ac.uk
Dublin Core Dublin Core
In mid-1990s:
• Meeting held in Dublin, Ohio in 1995
• Involvement from several sectors
(libraries, museums, science, IT, …)
• Agreement reached on a core set of
metadata attributes for resource discovery
• Given the name Dublin Core (DC)
• DCMI organisation later formed
• DC Working parties established to
coordination development of DC
• Regular annual conferences held
See <http://dublincore.org/>
A centre of expertise in digital information management www.ukoln.ac.uk
Dublin Core Why So Complex?
Why is there a need for working groups,
annual events, etc. for developing a standard
for catalogue records?
• It's not just documents: an Author record is
inappropriate for a painting, a piece of music, etc.
• It's not just for humans: the DC records will be
processed by software, for which unambiguity in
essential
• It needs to be integrated: with a rapidly-
developing Web architecture
• It needs to be future-proofed : so we don't have
to do it all again when a new technology emerges
A centre of expertise in digital information management www.ukoln.ac.uk
Dublin Core Using Dublin Core
Note that DCMI defined a core set of elements:
Title A name given to the resource.
Creator An entity primarily responsible for
making the content of the
resource.
Publisher An entity responsible for making
the resource available.
Date A date of an event in the lifecycle
of the resource.
… …
How this format could be represented was not
defined initially
A centre of expertise in digital information management www.ukoln.ac.uk
Dublin Core Representing Dublin Core
Initially many people thought that DC would be
embedded in HTML pages:
<META NAME="DC.Creator" CONTENT="Brian Kelly">
but how are multiple author's represented:
<META NAME="DC.Creator" CONTENT="Brian Kelly">
<META NAME="DC.Creator" CONTENT="John Smith">
or
<META NAME="DC.Creator" CONTENT="Brian Kelly,
John Smith">
It is not possible to describe the potential complexities
of DC in the HTML language
A centre of expertise in digital information management www.ukoln.ac.uk
Dublin Core Dublin Core Is Too Simple!
Dublin Core was designed as a core set of metadata
elements for resource discovery. However:
• The benefits of the standard became apparent
and DC became used in many areas
• There was a need to be able to represent richer
metadata content and relationship e.g.
Multiple authors and contact details
Alternative titles
Use of controlled vocabularies from particular
schemes
A mechanism known as Qualified Dublin Core was
developed to address this.
A centre of expertise in digital information management www.ukoln.ac.uk
Dublin Core Use In HTML
Dublin Core potential was recognised and the W3C's
release of HTML 4.0 included a mechanism for
defining schemes in the <meta> element:
<meta name = "DC.Subject" <meta name = "DC.Type"
content = "heart attack"> scheme = "DCMIType"
<meta name = "DC.Subject" content = "Dataset">
scheme = "MeSH" <meta name = "DC.Type"
content = "Myocardial scheme = "DCMIType"
Infarction; Pericardial Effusion"> content = "Event">
See
<http://dublincore.org/documents/2001/
04/12/usageguide/qualified-html.shtml>
A centre of expertise in digital information management www.ukoln.ac.uk
W3C Developments XML
XML (Extensible Markup Language):
• Developed by W3C
• A meta-language used to create other languages
• Addresses HTML's lack of extensibility
• A family of standards which form the foundations
for a richer and more interoperable Web:
XML XML Namespaces
XSLT XML Schemas
…
• A proven success
Rather than slowly tweaking HTML to allow rich DC to
be embedded, XML allows new metadata applications
to be developed which can be integrated with existing
Web services digital information management
A centre of expertise in www.ukoln.ac.uk
W3C Developments Beyond Use In HTML
In parallel to release of HTML 4.0 W3C working on:
• A rich metadata framework which could be used
for any metadata application:
Content filtering (this resource contains
nudity)
Defining collections of related resources
(Web site maps)
Digital signatures
…
• Development of the Semantic Web - An
ambitious attempt to allow data from distributed
services to be integrated
RDF (Resource Description Framework) was
A developed as W3C's solution to both problems
centre of expertise in digital information management www.ukoln.ac.uk
W3C Developments RDF
RDF:
• An XML application
• Richer than conventional XML applications: a
mathematical model which describes
relationships is embedded in the RDF
• This richness comes with a price - increased
complexity
RDF applications are being developed. However at present
it may be advisable to leave RDF to the research
community or well-funded pilot studies to prove its benefits
before committing to use in a service environment
(However note that metadata in PDF documents is stored
as RDF)
A centre of expertise in digital information management www.ukoln.ac.uk
Using Metadata Beyond Resource Discovery
Metadata has a role to play beyond item-level resource
discovery
Other metadata applications include:
• Metadata for digitised objects: about the object
and about the digitisation process
• Management / administrative metadata: review
this resource by xx; delete this resource on …;
this resource is managed by the XYZ group; …
• Metadata about collections (physical and online)
• …
A centre of expertise in digital information management www.ukoln.ac.uk
Using Metadata Metadata Modelling (1)
You want to use Dublin Core metadata. How do you
choose how to model your metadata?
• Do you use simple Dublin Core (the basic 15
elements)?
• Do you use qualified Dublin Core to enable richer
metadata to be described?
• If the latter, how do you decide which qualified
DC metadata to use?
These are key issues to address.
In some cases answers may be provided for you.
In other cases, you musty answer these
questions for yourself.
A centre of expertise in digital information management www.ukoln.ac.uk
Using Metadata Metadata Modelling (2)
Why do you wish to use metadata?
• Because it fashionable?
• Because you're a librarian and librarians 'do'
metadata?
• Because you want you Web site to be no. 1 in
Google?
• Because you are developing an application which
requires use of metadata?
Please remember:
• Developing applications which make use of
metadata can be expensive.
• Creating and managing metadata can be expensive
• Search engines such as Google typically make little
or expertise in digital information management
A centre ofno use of metadata www.ukoln.ac.uk
Using Metadata Metadata Modelling (3)
Exploit Interactive case study:
• EU-funded ejournal
• Requirement to provide
local searching better than
simple free text searching:
• Search by title, author and
keywords
• Search by funding stream
• Search by issue and article
type
• The end-user interface is
illustrated
See <http://www.ukoln.ac.uk/qa-focus/
documents/case-studies/case-study-01/>
A centre of expertise in digital information management www.ukoln.ac.uk
Metadata Modelling (4)
How did we manage and model the metadata?
doc_title = "The XHTML Interview" issue_num = "6"
author="Kelly, B." pub_date="25 Oct 2002"
title="WebWatching National Node
Issue metadata
Sites"
description = "In this issue's Web
name = "Exploit Interactive"
Technologies column we ask Brian
publisher="UKOLN"
Kelly to tell us more about XHTML."
article_type = "regular" Site metadata
Article metadata
Processed by server-side script
<meta name="DC.Title" content="The XHTML Interview">
<meta name="DC.Creator" content="Kelly, B.">
<meta name="DC.Description" content="In this issue's Web Technologies ….">
<meta name="DC.Relation.IsPartOf" content="http://www.exploit-lib.org/issue6/">
<meta name="DC.Type" content="text.article.regular" scheme="Exploit-categories">
A centre of expertise in digital information management www.ukoln.ac.uk
Metadata Management Storing DC Metadata
It is up to you how you store your metadata. Your
choice will be affected by the use which will be made of
your metadata and how it will be created and managed.
You may wish to store your Author Book Pub. Date
metadata in a database
G.Orwell 1984 1948
and make it available
I. Rankin Question 2003
according to its use. Of Blood
You may wish to:
• Embed HTML metadata in HTML pages
• Link to HTML metadata from HTML HTML
• Embed RDF RDF
• Store metadata in application
(home-grown scripts, CMS, Metadata
metadata repository, image management
management system, …)
A centre of expertise in digital information management tool
www.ukoln.ac.uk
Metadata Management A Simple DC Management Tool
DC-dot:
• Simple Web-based
DC creation and
management tool
• Output in range of
formats (HTML,
XHTML, RDF, …)
• Provides validation
• Useful for small-scale
metadata creation http://www.ukoln.ac.uk/metadata/dcdot/
But:
• Not ideal for large-scale usage
• Doesn't provide rich
management capabilities
A centre of expertise in digital information management www.ukoln.ac.uk
Metadata Management Management Tools
Many types of metadata tools:
• Type the metadata by hand
• Use File -> Properties menu in MS Office
applications and export data
• Home-grown database systems
• Home-grown scripting solutions
• Use of commercial systems:
• Library management systems
• Image management systems
• …
There is no single ideal solution.
The solution you choose should reflect your needs,
expertise, organisational culture, …
A centre of expertise in digital information management www.ukoln.ac.uk
Quality Assurance Quality Assurance
The Need for QA:
• Metadata is the 'glue' for integration of services
• If the metadata quality is poor, services will not be
able to be interoperable
• There is therefore a need for quality assurance
procedures to ensure fitness for purpose
What Can Go Wrong?
• Things that can go wrong include:
• Metadata is out-of-date or incorrect
• Metadata is used inconsistently within service
• Metadata is used inconsistently across services
• Metadata is not modelled correctly
• Metadata not compliant with storage standard
• …
A centre of expertise in digital information management www.ukoln.ac.uk
Quality Assurance Think About The Implementation
It is important that when you deploy metadata systems
you can manage and maintain the metadata. For
example:
• Details of the person maintaining the data change
(name change due to marriage, person leaves,
…)
• Organisational details change (mergers,
takeovers, …)
• Technology changes
Prepare for change! People change, organisations
change, responsibilities change, technologies change,
…
Ensure that you can manage the metadata which
A centre of such digital information
reflectsexpertise inchanges management www.ukoln.ac.uk
Metadata Management Need For Cataloguing Rules
Your Cataloguing Rules
• You will need cataloguing rules to support your
metadata creation
• You will need to provide necessary training and
support (especially if you are dependent on
cataloguing by non-professionals)
Interoperability
• How will you interoperate with services which
deploy different cataloguing rules:
04/07/03 – what date is this?
LSC – what does this stand for?
• Humans use context; software products don't
• There is a need to define the standards you're
applying (in a machine understandable way)
A centre of expertise in digital information management www.ukoln.ac.uk
Quality Assurance Need For QA Procedures
So we have:
• Tools for managing metadata
• Cataloguing rules
But:
• People make mistakes
• Software may have bugs
• Our rules may be ambiguous
• The standards may be ambiguous
• The metadata may be correct but confusing in
other contexts,
• …
Although humans can adapt to errors and unambiguities, software
typically can't. We therefore need quality assurance procedures to
ensure that metadatainapplications management
A centre of expertise digital information
will be interoperable. www.ukoln.ac.uk
Quality Assurance Approaches To QA
We may wish to consider:
• Systematic checking at data creation
• Systematic checking of output
• Semi-automated checking (e.g. duplication,
common misspellings, out-of-range checks, …)
• Automated checking
• …
Worst Case Scenario:
You service is fine, and quality metadata provided. Your data is
integrated with others services to provide an international portal
to quality resources. However the other service providers have
poor quality metadata. The poor quality of the final service brings
your contributor into disrepute.
A centre of expertise in digital information management www.ukoln.ac.uk
Pulling It Together
A centre of expertise in digital information management www.ukoln.ac.uk
Conclusions
To conclude:
• Metadata can provide richer searching and other services
within a service and the glue for integration across several
services
• There are several key standards: Dublin Core, HTML, XML, …
• You will need to select the standards appropriate to your
service requirements
• You will need to choose the metadata according to your
service requirements
• You will need to choose the architectural framework and
applications for managing your metadata according to your
service requirements
• You will need to ensure that you have appropriate quality
assurance mechanisms in place – otherwise the above work
will have been wasted!
• It can be digital information management
A centre of expertise inworth it! www.ukoln.ac.uk
Get documents about "