An introduction to Metadata by 7n2Mpe

VIEWS: 5 PAGES: 112

									            An introduction to metadata
             for libraries, museums and archives
            Metadata in Digital Libraries, DELOS meeting,
                     Riga, Latvia, 16 April 2003




Pete Johnston                      p.johnston@ukoln.ac.uk
UKOLN, University of Bath          http://www.ukoln.ac.uk/
Bath, BA2 7AY


UKOLN is supported by:
         Section 1 :
An Introduction to Metadata
    An introduction to Metadata

      • Memory institutions, network services
        and metadata
      • What is metadata?
      • Exposing/sharing metadata
      • Exposing/sharing metadata :
        semantics
         – the Dublin Core Metadata Initiative




3       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
Memory institutions,
network services and
     metadata
       Memory institutions
       Museums, libraries and archives—often called
       memory institutions—are trusted organizations that
       collectively document the entire range of human
       experience and expression.
       Memory institutions are engaged in the important
       work of:

          • Capturing, authenticating, and making sense of
            cultural memory;
          • Preserving the human record for future generations;
            and
          • Sharing knowledge to support education and learning.

    http://www.ukoln.ac.uk/interop-focus/ccs/positions/


5            Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
    Delivering services

      • Memory institutions provide services to
        users
         – (At least some of) these services provide access
           to resources
      • Emergence of built on global networks
         – remote access to digital resources for all
           (potentially…)
         – resources available “round the clock”
         – resources comparable to other digital resources
           from elsewhere
      • Investment in
         – digitisation of cultural content
         – network services providing access to digitised
           content


6       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
    Delivering services

      • Potential for new types of service
         – “digital libraries”, “virtual museums” etc
         – integrated access to resources from multiple
           remote content providers
         – services defined by theme/subject/activity/audience
           etc, not by location/source
         – “packaging” and re-purposing of content
         – user-oriented rather than provider-oriented
      • Changing user expectations
         – user wants information relevant to task/activity
               – may see structural/organisational boundaries of
                 content providers as unimportant!
         – user wants access from any location
         – user wants access at any time


7       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
    Delivering services

      • Move from web sites to “portals”
         – “A network service that provides a personalised,
           single point of access to a range of heterogeneous
           network services, local and remote, structured and
           unstructured”
                                                          – Andy Powell, 2002
      • Content providers exposing content for
        delivery through multiple services, channels
      • Presentation services “surfacing” content
        from multiple (distributed) sources
      • Memory institutions may perform both roles
      • Move away from “silo mentality” towards
        more “joined-up” approaches


8       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
    Resource discovery on the Web

      • Broadly two approaches to providing
        discovery services
         – software indexing of resource content
         – human description of resources
      • Web search engines
         – software agents (robots) retrieve documents by
           following hyperlinks (crawling)
         – index text of documents
         – make index available as searchable database
         – some clever ranking algorithms
               – e.g. Google infers “Page Ranking” based on links to
                 document
         – “find pages which link to page X”
         – “find pages similar to X”


9       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Resource discovery on the Web

       • Web search engines
          – tend to generate many results
                – and may suffer from “spamming”
                – ranking algorithms may help
          – don’t support “structured search”
                – search on author name
                – search on document type (“journal article”)
          – limited to textual resources
                – generally, poor support for search for multimedia
                  objects
       • “The hidden Web”
          – robots may not crawl documents dynamically
            generated from databases/CMS




10       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Resource discovery on the Web

       • But automated indexing
          – is low cost
                – At least compared to human resource
                  description
          – (usually) scales to large numbers of
            resources
          – can be a useful tool!
       • Challenge of finding appropriate
         balance of approaches for context



11       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Metadata for services

       • Metadata has been important to
         “traditional” service provision…
       • … is essential component of effective
         network services




12       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
What is metadata?
     What is metadata?

       • Simple definitions…
       • ‘Structured data about data’.
                – Dublin Core Metadata Initiative FAQ, 2003
       • Machine-understandable information
         about Web resources or other things.
                – Tim Berners-Lee, W3C, 1997




14       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Towards a “functional” view of metadata

       • Data associated with objects which
         relieves their potential users of having
         to have full advance knowledge of their
         existence or characteristics.
         A user might be a program or a
         person.
                           – Lorcan Dempsey & Rachel Heery, 1998
       • Structured data about resources that
         can be used to help support a wide
         range of operations
                                                        – Michael Day, 2001



15       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     What resources, objects, things?
      • Metadata might exist for almost anything
         – digital, physical, “abstract” resources

       •   HTML documents                            •   Web sites
       •   digital images                            •   collections
       •   databases                                 •   services
       •   books                                     •   physical places
                                                     •   people
       •   museum objects
                                                     •   institutions
       •   archival records
                                                     •   abstract “works”
       •   metadata records                          •   concepts
                                                     •   events


16         Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     What resources, objects, things?

       • Metadata records include
          – bibliographic records in library catalogues or from
            abstracting & indexing services
          – descriptions of archival material in archival finding
            aids
          – object records in museum documentation /
            collection management systems
          – entries in directories of organisations, individuals
            and services
          – descriptions of digital objects (documents, images,
            software)
          – descriptions of collections of digital objects
          – descriptions of network services
          – descriptions of metadata records



17       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     What operations?

       • Operations by human users, software tools
       • Metadata might be used to support many
         different functions
          –   resource disclosure & discovery
          –   resource management, including preservation
          –   intellectual property rights management
          –   commerce
          –   authentication and authorisation
          –   personalisation and localisation of services
       • Different functions require different
         types/classes of metadata
          – No “one size fits all solution”
          – Need to specify “functional requirements”


18       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Metadata elements & element sets

       • Metadata describes attributes or properties of
         a resource
       • Each attribute or property is described by a
         metadata element
          – Can be identified, formally documented/defined
          – May be represented in different forms
       • A metadata element set
          – coherent bounded set of elements formulated as
            basis for metadata creation
          – created for purpose, as a unit
       • Schema
          – structured representation of an element set



19       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
         Metadata for resource discovery

     •    User wishes to
          1. discover resources according to some criteria
          2. (optionally) identify a specific resource
             – confirm that resource described is resource sought
             – distinguish similar resources
          3. select
             – evaluate, choose resource appropriate to needs
          4. locate resource
          5. obtain/access resource
          6. use resource
             – open, read, display, run, play, copy,
               unpackage/repackage
             – interpret content
     •    Resource discovery metadata supporting
          (primarily) operations 1 - 4
20            Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
        Metadata for resource discovery
    Continuum of complexity/functionality

full-text indexes    might not be classed as “metadata”     discovery (by
                     by some!                               content), location
                     generated by software tools

semantically         typically covering description of      discovery,
                     broad range of resources               identification,
simple forms                                                selection, location
                     maybe part generated automatically,
(e.g. Dublin Core)
                     partly human authored
richer complex       typically covering specific types of   discovery,
                     resources                              identification,
forms                                                       selection, location,
                     often associated with particular
(e.g. MARC, EAD,                                            access, use (which
                     community/domain
CIMI-SPECTRUM,                                              may be type
AMICO etc)           creation may involve relatively high
                     degree of human expertise              specific)
      Association of resource and metadata (1)

                                          Metadata embedded in resource
        Creator = J Smith
        Date = 2001-11-05                  e.g. meta elements in HTML docs;
        Title = Report                     summary properties in word processor
                                           docs
                                           Can resource support embedding of
                                           metadata?
                                           Does metadata creator have write access
                                           to resource?
                                           Can service extract embedded metadata?
                                           Metadata about aggregates of resources?
                                           Metadata about people, places, concepts?
     Resource1

22                 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
      Association of resource and metadata (2)
                                        Metadata record as separate object
        Creator = J Smith
                                        Record identifier embedded in resource
        Date = 2001-11-05
        Title = Report                    e.g. link elements in HTML docs
                                          Metadata record may be remote from
     Metadata rec 1                       resource
                                          Can resource support embedding of link?
                                          Does metadata creator have write access
       Metadata rec = 1                   to resource?
                                          Can service follow link to metadata
                                          record?
                                          What happens when resource deleted?
                                          Metadata about aggregates of resources?
                                          Metadata about people, places, concepts?
     Resource1
23                 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
      Association of resource and metadata (3)
        Doc = 1                       Metadata record as separate object
        Creator = J Smith             Resource identifier in metadata record
        Date = 2001-11-05
                                         Metadata record may be remote from
        Title = Report
                                         resource
                                         Does not require embedding of metadata
     Metadata rec 1                      or link
                                         Does not require metadata creator to have
                                         write access to resource
                                         Metadata record created independently of
                                         resource – possibly multiple records
                                         Service uses metadata records
                                         independently of resource
                                         Metadata record may persist after
                                         resource deleted
                                         Metadata record can describe anything
     Resource1                           (with identifier…)
24                 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
        Metadata as managed resource
 Metadata record is used separately from resource described
 Recognition that metadata is resource to be managed, separately
 from resource described
 Metadata content stored in “database”, exposed in form(s)
 appropriate for service(s)
                                        Doc     Creator        Date            Title




                                        1       J Smith        2001-11-05      Report




25            Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
Exposing/sharing metadata
     How is metadata exposed/shared?

       • Resource description “communities”
          – characterised by consensus on conventions for
            internal exchange of metadata
       • Metadata for resource discovery
          – is used beyond its creator community
          – is combined/compared with metadata from other
            communities
          – is aggregated or cross-searched by services
       • How does a content provider make metadata
         records available in a commonly
         understood form?
       • How does a service provider obtain these
         metadata records from data providers?

27       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     How is metadata exposed/shared?

       • Effective sharing of information expressed in
         metadata record requires agreement on
          – metadata semantics
                – what metadata elements mean
          – metadata structure
                – data model, relationships of component parts
          – metadata syntax
                – rules of expression
          – protocols
                – how metadata records transmitted between
                  content provider and service provider
       • Agreements formalised as specifications and
         standards (ideally…)

28       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
Exposing/sharing metadata :
semantics
Introducing the Dublin Core
     Introducing the Dublin Core

       • Initiative to improve resource discovery
         on Web
          – not for complex resource description
          – based on description of simple “document-
            like objects”
          – extended to other classes of resource
       • International, cross-disciplinary
         consensus on simple element set
          – 15 elements
          – all optional                       http://dublincore.org/
          – all repeatable


30       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Introducing the Dublin Core (2)

       •   Title                                    •   Type
       •   Subject                                  •   Format
       •   Description                              •   Identifier
       •   Creator                                  •   Source
       •   Publisher                                •   Language
       •   Contributor                              •   Relation
       •   Date                                     •   Coverage
                                                    •   Rights



31         Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Dublin Core: creator
       • Term Name: creator
       • Label: Creator
       • Definition: An entity primarily responsible for making
         the content of the resource.
       • Comment: Examples of a Creator include a person, an
         organisation, or a service. Typically, the name of a
         Creator should be used to indicate the entity.
       • Type of Term: element
       • Status: recommended
       • Date issued: 1999-07-02
       • URI: http://purl.org/dc/elements/1.1/creator




32       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Dublin Core: date
       • Term Name: date
       • Label: Date
       • Definition: A date associated with an event in the life
         cycle of the resource.
       • Comment: Typically, Date will be associated with the
         creation or availability of the resource. Recommended
         best practice for encoding the date value is defined in
         a profile of ISO 8601 [W3CDTF] and follows the
         YYYY-MM-DD format.
       • Type of Term: element
       • Status: recommended
       • Date issued: 1999-07-02
       • URI: http://purl.org/dc/elements/1.1/date



33        Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Standardisation of Dublin Core

     CEN Workshop Agreement (EU)
        • 2000: Dublin Core elements endorsed as
          CWA13874
        • Usage guidelines for European industry
     NISO Z39.85 (USA)
        • 2001: National Information Standards
          Organization, an ANSI affiliate
     ISO
        • 2002: Dublin Core Metadata Element Set
          approved as ISO 15836



34        Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Using the Dublin Core

       • Tom Baker, “ A Grammar of Dublin Core”,
         Dlib, October 2000
       • Metaphor of metadata as language
       • DC as a simple “pidgin” language for use by
         “tourists on the Internet commons”
       • Small vocabulary, simple grammar/structure
          – This Resource has Title “An introduction to
            metadata”
          – This Resource has Subject “Resource discovery”
       • Not subtly expressive, but easy to learn and
         deploy - “good enough” to work


35       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Using the Dublin Core

       • Designed for simplicity of semantics,
         ease of use
       • Provides basic semantic
         interoperability
          – semantics sufficiently general to be useful
            across domains
       • Can provide 15 “windows” into richer
         resource descriptions
          – disclose rich description in simple form
          – semantic cross-walks, mappings



36       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
      Using the Dublin Core



                                                             title
                                                          creator
                                                            date
                                                            desc
                                                           rights


                                                              Simple DC
                                                              description

     Rich description
37           Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Qualifying Dublin Core

       • Allows for controlled extensibility through
         “qualifiers”
          – Element refinements
                – make element meanings narrower, more specific:
                – a Date Created versus Date Modified
                – an IsReplacedBy versus Replaces Relation
          – Encoding schemes
                – provide contextual information or parsing rules that
                  aid in the interpretation of a value
                – may specify that a value is drawn from a controlled
                  vocabulary (e.g. LCSH, TGN etc)
                – may specify that a value is formatted in accordance
                  with a specified notation (e.g. date formats)




38       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Qualifying Dublin Core

       • Qualifiers make elements more specific
          – Element Refinments narrow meanings, never
            extend
          – Encoding Schemes give context to element values
       • The “dumb-down” rule
          – Application should be able to use the value as if it
            were unqualified
          – Ignore unknown Encoding Schemes
          – Resolve (semantically more specific) Element
            Refinements to (more generic) Elements
       • Some loss of specificity, but still generally
         correct and useful for discovery


39       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Dublin Core: valid
       • Term Name: valid
       • Label: Valid
       • Definition: Date (often a range) of validity of a
         resource.
       • Type of Term: element-refinement
       • Status: recommended
       • Date issued: 2000-07-11
       • URI: http://purl.org/dc/terms/valid




40        Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Using the Dublin Core

       • Not a replacement for richer descriptive
         standards
       • But useful
          – If you wish disclose community-specific metadata
            to other communities using commonly
            understood semantics
          – If you wish to provide integrated access to your
            own metadata databases with different underlying
            semantics
          – If you only need simple metadata semantics




41       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Using the Dublin Core

       • Inherent tensions in DC
          – Broad, fuzzy “search buckets” or rigidly prescribed
            usage?
          – Generic applicability across domains or intra-
            domain precision?
          – One-size-fits-all or customise-as-you-please?
          – Simply discovering resources (a few typical search
            attributes) or describing them fully (lots of detail)?
          – Dublin Core primarily as a native record format or
            extracted from richer metadata?
          – Broad-brush minimalism or comprehensive
            structuralism?




42       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Summary

      • Emergence of global networks enable new
        approaches to providing access to resources
         – Increasing requirement to provide resource
           discovery across boundaries
      • Metadata supports many functions, including
        resource discovery
      • DC as simple, cross-disciplinary metadata
        element set
      • Next:
         – How metadata records are represented:
           syntax/structure
         – How metadata records are exposed/shared/used in
           resource discovery services



43      Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
       Section 2 :
  Sharing metadata: XML and
the OAI Protocol for Metadata
         Harvesting
     Sharing metadata : XML and OAI

       • Exposing/sharing metadata: syntax
         and structure
          – Extensible Markup Language (XML)
          – XML Schema
       • Metadata harvesting
          – The Open Archives Initiative Protocol for
            Metadata Harvesting
       • Some OAI-based services
       • Developing metadata-based services


45       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
Exposing/sharing metadata :
syntax and structure
XML & XML Schema
     Embedding DC metadata in (X)HTML

       • Dublin Core metadata can be embedded into
         (X)HTML documents
          – Simple to deploy but may be difficult to manage,
            maintain
       • But almost none of the Web search engine
         services index it
       • Lack of trust in “open” Web context
          – Abuse by content providers seeking to improve the
            ranking of their documents
       • However, may be useful technique in “closed”
         context
          – e.g. single Web site or where control over which
            documents indexed



47       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
          Embedding DC metadata in (X)HTML
     <html xmlns="http://www.w3.org/1999/xhtml">
     <head>
     <link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" />
     <meta name="DC.Title" lang="en" content="Expressing Qualified Dublin
     Core in HTML/XHTML meta elements" />
     <meta name="DC.Creator" content="Andy Powell, UKOLN, University of
     Bath" />
     <meta name="DC.Date.Issued" scheme="W3CDTF" content="2002-09-09" />
     <meta name="DC.Identifier" scheme="URI"
     content="http://dublincore.org/documents/dcq-html/" />
     <meta name="DC.Format" scheme="IMT" content="text/html" />
     <meta name="DC.Type" scheme="DCMIType" content="Text" />
     </head>
     <body>
     </body>
     </html>
48               Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Introducing XML
      • Extensible Markup Language
         – Recommendation of W3C, 1998, 2000
      • Defines means of describing tree-structured
        data in text-based format
         – embedded markup delimits and describes data
      • Simple, platform-independent syntax
      • Standard programming interfaces
         – reusable software components
      • Support from major software vendors
      • Widely adopted for transferring data between
        programs, systems


49      Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Doc Creator   Date          Title                                      table




     1   J Smith   2001-11-05 Report                              record             record



                                                 doc           creator           date           title

     <table>
     <record>
                                                   1           J Smith       2001-11-05        Report
     <doc>1</doc>
     <creator>J Smith</text>
     <date>2001-11-05</date>
     <title>Report</title>
     </record>
     </table>
50                 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
 Doc       Creator    Date             Title
                                                        Serialisation                <record>
                                                                                     ...
                                                                                     </record>




                                           Transmission




     <record>
                                                                          Remote
     ...
                                                                          application
     </record>
                         De-serialisation

51                   Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     XML and interoperability

       • “Meta-language”
          – language for describing markup languages
          – can define unlimited number of markup languages
       • But….
          – XML says nothing about what your names mean
          – will a software agent process my <doc> XML
            element correctly?
       • Interoperability requires consensus on
          – the names of components (XML elements and
            attributes)
          – the structural model of a class of document:
          – the semantics represented by the components and
            the structure
       • Shared use of common XML “schemas”

52       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     XML schemas

      • Means to codify syntax/structure rules for
        class of XML document
         – what markup is allowed
         – structural constraints on use of markup
      • Document Type Definition (DTD)
         – part of XML Recommendation
      • W3C XML Schema
         –   W3C recommendation
         –   data-typing i.e. tighter control on element content
         –   support for XML Namespaces
         –   uses XML syntax
      • Software can validate instance against
        DTD/schema

53      Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
Metadata harvesting:
The Open Archives Initiative Protocol for
Metadata Harvesting
     Searching & harvesting
      • Resource discovery services operating
        across the resources of multiple distributed
        content providers
      • Possible strategies
         – Distributed search
               – submit parallel queries to multiple metadata
                 databases
               – collate multiple result sets for presentation to user
         – Harvest
               – gather metadata records from multiple providers into
                 single database
               – (periodic re-gathering to refresh data)
               – query central database
      • Performance issues in cross-searching


55      Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Introducing OAI
      • Open Archives Initiative
         – develops/promotes interoperability standards to
           facilitate dissemination of content
         – roots in “e-prints” community seeking to improve
           access to scholarly publications
               – Deposit pre-prints – for quicker dissemination
               – Deposit post-prints – to reduce institutional costs,
                 maximise impact
         – e-print “archives”
               – institutional
               – federated subject/discipline-based
         – required simple low-cost interface to expose
           metadata for reuse

        http://www.openarchives.org/
56      Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Introducing OAI (2)
       • Terminology
          – “Archive” = repository, not archive
          – “Open” in terms of architecture, not free/unlimited
            access to repository
       • Protocol for Metadata Harvesting (OAI-PMH)
          – Developed by international technical committee,
            1999-2002
          – Shift from “optimising discovery of e-prints” to more
            generic resource discovery
          – OAI “committed to version 2.0 as a production
            release”




57       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Introducing OAI PMH
      • Lightweight, low-cost protocol which allows
        data providers to expose metadata records
        for retrieval by service providers
      • Service providers can say “give me all/some
        of your metadata records”
      • Built on HTTP, XML
         – Six verbs: requests from service provider to data
           provider sent using HTTP GET/POST
         – responses from data provider to service provider
           as XML documents
      • Not a distributed search protocol
      • Not limited to e-print archives



58      Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Introducing OAI PMH (2)
      • Supports transfer of metadata records
         – resources made available separately
         – identifier/locator of resources typically included in
           metadata record
      • Data provider must provide
        simple/unqualified DC metadata record
         – may provide metadata records in other “formats”
         – metadata formats must be associated with a W3C
           XML Schema
      • Extensible framework for metadata about
         – repository, sets, records
      • Metadata and resources often freely available
         – but not a requirement


59      Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Introducing OAI PMH (3)
      • Supports selective harvesting
         – by sets
         – by datestamps
      • Example
         – Service Provider: List all records added since Jan
           1 2002 in simple DC format (oai_dc)
               –   verb = ListRecords
               –   from = 2002-01-01
               –   metadataPrefix = oai_dc
               –   http://www.myarchive.org/cgi-
                   bin/oai?verb=ListRecords&from=2002-01-
                   01&metadataPrefix=oai_dc
         – Data Provider: Returns XML document containing
           records



60      Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
              Resources

                                          DC                                   Portal
                                                                               Web
                                                        OAI-
     Web                                                PMH                     site
     site
                                                  OAI-
             Metadata                             PMH
                                                                               Portal
                                                                               Web
                Resources                                                       site
                                                     OAI-
                                                     PMH
      Web                                  DC
      site                                                                      Portal
                                                      OAI-                      Web
                                                      PMH
                                                                                 site
             Metadata

61           Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
       OAI DC metadata record
       (from Library of Congress Repository 1)
<oai_dc:dc>
<dc:title>Empire State Building. [View from], to Central
Park</dc:title>
<dc:creator>Gottscho, Samuel H. 1875-1971, photographer.</dc:creator>
<dc:date>1932 Jan. 19</dc:date>
<dc:type>image</dc:type>
<dc:type>two-dimensional nonprojectible graphic</dc:type>
<dc:type>Cityscape photographs.</dc:type>
<dc:type>Acetate negatives.</dc:type>
<dc:identifier>http://hdl.loc.gov/loc.pnp/gsc.5a18067</dc:identifier>
<dc:coverage>United States--New York (State)--New York.</dc:coverage>
<dc:rights>No known restrictions on publication.</dc:rights>
</oai_dc:dc>




62             Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
Some OAI based services
     Resource Discovery Network (RDN)

       • Co-operative network of “subject gateways”
          – Funded by JISC for HE and FE
       • Seven “hubs”
          –   ALTIS - Hospitality, Leisure, Sport and Tourism
          –   BIOME: Health and Life Sciences
          –   EEVL: Engineering, Mathematics and Computing
          –   GESource: Geography and Environment
          –   Humbul: Humanities
          –   PSIgate: Physical Sciences
          –   SOSIG: Social Sciences, Business and Law
       • Databases of metadata records describing
         Internet resources selected for high quality
              http://www.rdn.ac.uk/

64       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Resource Discovery Network (RDN)

       • Hubs as subject communities
          – metadata creators are subject specialists
          – good links with users
          – separate metadata schemas
       • Hubs provide their own Web interfaces
          – search databases
          – other services: tutorials, guides, alerting etc
       • But operate within a shared policy framework
          –   collection development
          –   cataloguing guidelines
          –   technical standards
          –   agreements on IPR



65       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Resource Discovery Network (RDN)

       • RDN Resource Finder
          – Cross-search of Hubs’ metadata records
          – Initially distributed search using Z39.50
                – Performance issues
                – Difficult to build flexible browse interface
          –   Now using OAI PMH to harvest records
          –   Currently harvesting simple DC
          –   Basic keyword searching
          –   Exploring harvesting some richer record formats for
              additional functionality
       • Also some sharing of metadata
          – between Hubs (DC plus extensions)
          – between Hubs and other similar services (LOM)
          – but Hubs’ metadata not freely available for harvest


66       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
                                                   Resource Discovery Network
                                                     http://www.rdn.ac.uk/




67   Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
      e-Prints UK

         • JISC-funded project, 2002-2004
         • Provide access to e-prints via subject-based
           RDN services
         • Harvest metadata from e-print archives
            – institutional, non-institutional, personal
         • Automatically enhance harvested metadata
           (using Web Services)
            – Add (or validate) authoritative forms of author
              names (OCLC)
            – Assign subject classification (based on analysis of
              full-text of resource) (OCLC)
            – Generate OpenURLs from citations (based on
              analysis of full-text of resource) (Univ of
              Southampton/UKOLN)

     http://www.rdn.ac.uk/projects/eprints-uk/
68         Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     e-Prints UK

       • Provide search services
          – across all metadata
          – subject-partitioned search services for Hubs
       • Enhanced metadata records made available
         to originating e-print archive
       • Note
          – service provider enhancing harvested metadata to
            provide more functionality
          – some of enhancement process requires access to
            resource as well as metadata record
          – two-way flow of metadata records
          – recommendations for how to use simple DC to
            describe e-prints to maximise benefits of
            metadata disclosure


69       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     e-Prints UK
                                                          e-print archives
                                         Institutional       Non-institutional     Personal
                                         e-print             e-print               e-print
                                         archives            archives              archives




                                                          OAI-PMH
      Web services            Subject
                           classification
      offered                 service

      by OCLC                  Name           SOAP
                             authority                       e-Prints UK
                              service


      Web service            Citation
                             analysis
                                                                            SOAP
                                                                            Javascript/HTTP
      offered                service
                                                                            Z39.50

      by Southampton                                                RDN
                                                                  RDN
                                                               RDN
                                                               gateway/portal
                                                            gateway/portal
                                                                   service
                                                          gateway/portal
                                                                service
                                                              service


                                                         end-user services thru the RDN



70       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
Developing metadata-based
         services
     Developing services

       • Consensus on metadata semantics/syntax,
         transport protocols etc as minimal
         requirements
       • Resource selection
          – collections policies
       • Metadata quality assurance
          – “cataloguing rules”
                – mandatory elements, minimum-level records
                – guidance on content of values of elements: formats,
                  controlled vocabularies, identifiers etc
          – Maintenance, currency of metadata
       • Agreements on IPR, usage rights, “branding”
          – for metadata records as well as resources



72       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Developing services

       • DCMES intended to be simple enough for
         creation by untrained creators
          – assumption that metadata creation
            straightforward?
       • Recognition that precision in services
         depends on quality of metadata
       • Subject terms/classification difficult for non-
         expert
       • Different services providing different
         functionality to different audiences may
         require different metadata



73       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Developing services

       • Human creation of metadata is not cheap!
       • Where possible, use automated methods to
          – Generate metadata
          – Normalise/enhance metadata
       • Service providers as well as data providers
         can contribute (e.g. e-prints UK)
       • Reuse/repurpose metadata
       • Where human creation required, provide
         support
          – Education, guidelines
          – Appropriate software tools



74       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Developing services

       • Service developers use/implement metadata
         standards in pragmatic way
       • Standards creators concerned with
          – Consensus, commonality, interoperability
          – e.g. DCMES
       • Implementers concerned with
          – Functionality, specificity, localisation
          – e.g. “Using simple DC to describe e-Prints”
       • “Application profile”
          – A metadata element set optimised for a particular
            application



75       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Summary

      • Standards for metadata semantics
      • XML as syntax for metadata exchange, but
        requires consensus on structures
      • Harvesting model as alternative to distributed
        search
         – OAI PMH
      • Service provision
         – metadata quality
         – rights issues
         – application profiles
      • Next:
         – A common framework for metadata?
         – Towards the “Semantic Web”?


76      Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Section 3 :
Sharing metadata: RDF and
    the Semantic Web
     Sharing metadata: RDF & the
     Semantic Web
       •   Is there a problem?
       •   The vision of the “Semantic Web”
       •   Introducing RDF
       •   Some RDF applications




78         Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     The problem with XML?

       • XML as a mechanism for expressing tree-
         structured data
       • Different communities make different design
         choices for the meaning of their trees
          – All “good” (and valid v XML DTD/Schema)
       • Within resource description community,
         meaning(s) of structure(s) may be limited
       • But applications working across communities
         have to work with multiple XML trees
          –   potentially unlimited
          –   not scalable in an “open” Web environment?
          –   how to manage ever increasing set of conventions
          –   always encountering new structures/schemas


79       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     The “Semantic Web”

       • Activity of World Wide Web Consortium
         (W3C)
       • To make data available on the Web in a form
         which is easier for machines to to process
          – Machine-processable statements about all kinds
            of things (Web pages, organisations, people,
            concepts, products, etc) and the relationships/links
            between them
       • To share data between programs and
         systems designed independently
          – Unlock the data held in databases
          – Link data from different sources
          – To enable richer more flexible services

         http://www.w3.org/2001/sw/
80       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     The “Semantic Web”

       • Builds on
          – use of Uniform Resource Identifiers
            (URIs) to uniquely identify resources
          – the Resource Description Framework
            (RDF) as a common model for expressing
            information about resources
          – an XML syntax for representing RDF data
          – existing Web protocols (HTTP) for
            transferring data




81       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
Introducing RDF
     Introducing RDF

       • Resource Description Framework
          – Model & Syntax, W3C Recommendation, 1999
          – RDF Core WG activity, 2001-2003
       • Set of revised/expanded specifications
         currently (April 2002) in “last call”
          – Semantics: formal model
          – Concepts: abstract syntax (graph)
          – RDF/XML syntax: conventions for encoding
            statements using XML
          – Test Cases
          – Vocabulary Description Language
          – Primer: introduction

               http://www.w3.org/RDF/

83       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Introducing RDF (2)

       • Provides generic framework for representing
         information about resources
          – set of conventions/infrastructure for applications
            exchanging metadata
          – allows semantics to be defined by different
            resource description communities
          – accommodates mixing of information from diverse
            sources
       • Resource : any object identified by URI
          – not necessarily accessible via Web
       • Property : “attribute” to describe resource
          – properties also uniquely identified by URI
       • Statement : “triple” of specific resource,
         property, and value

84       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     The RDF model

     A resource has some property whose value is
     either (i) a simple string value (literal)…


                                               author
       http://example.org/doc/1                                     John



        • The resource identified by the URI
          http://example.org/doc/1 has a
          property “author” whose value is “John”
        • Or, “John” is the “author” of the resource
          identified by http://example.org/doc/1

85         Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     The RDF model (2)

     … or (ii) another resource...

                                                 author
        http://example.org/doc/1




                                                 name                       email



                                        John                      john@example.org


         • The value of property “author” is another
           resource which has a property “name” with
           value “John” and a property “email” with
           value “john@example.org”
86         Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     The RDF model (3)

     … which may itself have a URI
                                             author
        http://example.org/doc/1                   http://example.org/person/john




                                                name                       email



                                       John                      john@example.org




87        Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     The RDF model (4)

     Properties themselves are identified by URIs
                                   http://example.org/author

        http://example.org/doc/1                   http://example.org/person/john




                   http://example.org/name                      http://example.org/email



                                       John                      john@example.org




88        Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     The power of the RDF model

       • Extensible model
          – supports any vocabularies
       • Supports arbitrary complexity of description
       • URIs as unique “fixed points” to identify
          – resources
          – properties
       • Descriptions created independently can be
         “merged” using URIs as “anchors”
          – i.e. supports distributed metadata




89       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
      First source


                                       author
     http://example.org/doc/1                   http://example.org/person/john




                                          name                       email



                                 John                      john@example.org




90          Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Second source




                                              subject
      http://example.org/doc/1                                            XML




91        Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
       Third source



                                            organisation

     http://example.org/person/john                             JS Foundation




92           Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
       Three descriptions merged
                                     XML                                   JS Foundation


                subject
                                                                           organisation

                                       author
     http://example.org/doc/1                 http://example.org/person/john




                                         name                       email



                                John                      john@example.org




93           Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     A simple DC metadata record
     (the “hedgehog”)


                                              dc:identifier
                               dc:creator
             dc:relation                            dc:contributor
        dc:language                                         dc:publisher
                                                                dc:date
         dc:description
                        http://example.org/doc/1
          dc:rights                                             dc:title

            dc:subject                                   dc:coverage

                      dc:type
                                                        dc:format
                                       dc:source




94        Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     The RDF XML syntax

       • XML representation of model
          – to store/exchange descriptions
       • Use of XML Qualified Names and XML
         Namespaces to represent URIs in RDF/XML
       • Conventions for the meaning of structures
         in RDF/XML document
       • Service can “know in advance” the meaning
         of structures in RDF/XML document
          – i.e. always represents RDF graphs
          – even if unanticipated vocabularies used
          – can read multiple descriptions into store and
            “merge” on URIs


95       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
         A simple DC metadata record (RDF/XML)

     <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:dc="http://purl.org/dc/elements/1.1/">
       <rdf:Description rdf:about=“http://example.org/doc/1”>
         <dc:creator>a</dc:creator>
         <dc:contributor>b</dc:contributor>
         <dc:publisher>c</dc:publisher>
         <dc:subject>d</dc:subject>
         <dc:description>e</dc:description>
         <dc:identifier>f</dc:identifier>
         <dc:relation>g</dc:relation>
         <dc:source>h</dc:source>
         <dc:rights>i</dc:rights>
         <dc:format>j</dc:format>
         <dc:type>k</dc:type>
         <dc:title>l</dc:title>
         <dc:date>m</dc:date>
         <dc:coverage>n</dc:coverage>
         <dc:language>o</dc:language>
       </rdf:Description>
     </rdf:RDF>

96                Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     RDF Vocabulary Description Language
     (RDF Schema)
       • Provides mechanisms to describe
          – terms used in RDF statements
          – relationships between terms
          – e.g. Dublin Core metadata element set described
            using RDF(S)
       • Defines type system
          – resources grouped into classes
          – classes may be related hierarchically (subClassOf)
          – properties may be related hierarchically
            (subPropertyOf)
          – use of properties may be constrained (domain,
            range)
       • More RDF statements
          – i.e. metadata about metadata elements


97       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
        Description of Dublin Core Creator
                                                                      Creator


                                          rdfs:label

                                                                              An entity …
                                                      rdfs:comment
http://purl.org/dc/elements/1.1/creator



                                                     dc:description
                  rdf:type                                                 Examples of a …



              http://www.w3.org/1999/02/22-rdf-syntax-ns#Property



98             Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
     Description of Dublin Core Creator (RDF/XML)

     <rdf:Property rdf:about="http://purl.org/dc/elements/1.1/creator">
        <rdfs:label xml:lang="en-US">Creator</rdfs:label>
        <rdfs:comment xml:lang="en-US">An entity primarily responsible
     for making the content of the resource.</rdfs:comment>
        <dc:description xml:lang="en-US">Examples of a Creator include a
     person, an organisation, or a service. Typically, the name of a
     Creator should be used to indicate the entity.</dc:description>
        <rdfs:isDefinedBy
     rdf:resource="http://purl.org/dc/elements/1.1/"/>
        <dcterms:issued>1999-07-02</dcterms:issued>
        <dc:type
     rdf:resource="http://dublincore.org/usage/documents/principles/#elem
     ent"/>
     </rdf:Property>




99               Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
      Simplicity, contradiction, trust

        • In RDF, meaning is expressed by simple
          statements:
           – Subject-Predicate-Object
        • Anyone on Web can assert (in RDF sense)
          anything about anything
           – software agents navigating Web of statements
           – may be able to process some of these statements
             but not all
           – ignore the statements you don't understand
           – tolerance of inconsistency and errors
        • Establishing trust as fundamental part of
          Semantic Web infrastructure
           – Who said this (and when etc)


100       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
      Metadata and the Semantic Web

        • Argued that the Semantic Web principles fit
          the nature of metadata
           – Metadata supports many different functions
                 – Metadata is inherently "modular"
           – Metadata creation is not a one-off act, but an
             ongoing, distributed process
                 – the metadata creator can't predict how users may
                   want to use resources and query metadata
                 – new uses of resources result in new metadata
           – Metadata is not (or at least not only) "objective",
             "authoritative" information
                 – Some attributes represent interpretations
                 – Some attributes are context-dependent
                 – Multiple (even conflicting) descriptions can co-exist




101       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
Some RDF applications
      RDF Site Summary (RSS) 1.0

        • Simple RDF metadata vocabulary designed
          to support syndication of "news" items
        • An RSS "channel" is published as an
          RDF/XML docment
        • Provides metadata about
           – The channel itself
                 – A summary of its scope and purpose
           – A sequence of items
                 – Summary descriptions of Web documents
        • Content of channel regularly updated by
          provider
        • Wide, simple, automated distribution

             http://purl.org/rss/1.0/
103       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
      RDF Site Summary (RSS) 1.0

        • Typical applications
           – Web sites: render content of specific channels as
             part of their own Web sites
           – On line aggregator services: harvest numerous
             channels and provide search/filtering services
             across the items
                 – e.g. Meerkat
           – Desktop news readers: allow users to "subscribe"
             to list of channels, regularly download content for
             user to browse
                 – e.g. Amphetadesk
        • RSS also generated from some Weblog
          management systems
           – SWAD(E) activity on "semantic weblogging"



104       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
http://www.ukoln.ac.uk/
      Metadata schema registries

        • How to encourage convergence and reuse of
          metadata vocabularies
        • Implementers
           – may be unaware of existing vocabularies
           – adapt/customise "standard" terms for application-
             specific use
           – may combine terms from multiple "standard"
             sources
           – coin application-specific terms or extensions
        • Application profile
           – A metadata element set optimised for a particular
             application



106       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
      Metadata schema registries

        • A publication context for
           – "standard" metadata vocabularies and their terms
           – (depending on scope of registry) also implementer
             usages/adaptations of those vocabularies and their
             terms
           – To provide a "dictionary" function
           – To highlight relationships, encourage
             reuse/convergence
        • Based on indexing RDF data distributed on
          Web?
        • Requires shared conventions for describing
           – metadata vocabularies
           – and their usages and adaptations



107       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
http://dublincore.org/dcregistry/
      Summary

       • RDF provides a common framework for
         making machine-processable statements
         about resources
       • The “Semantic Web” provides a vision of
         metadata as
          – modular, extensible
          – distributed, devolved
          – dynamic, evolving
       • Seeks to address (some of) the challenges of
         cross-domain, cross-community
         interoperability
       • Fundamental role of trust on the Semantic
         Web


109      Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
      Overall summary

        • Global networks have created a new context
          for the delivery of services
        • Metadata fundamental to service provision
        • Services being built (successfully!)
           – OAI PMH as a low-barrier technology
        • No one-size-fits-all solution
        • Debates, tensions, balances….
           – automated processes v human labour
           – domain-specific richness v cross-domain (over-?)
             simplicity
           – standards v their implementation
           – objectivity v subjectivity
           – centralisation v distribution
        • Emergence of a Semantic Web?
110       Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
        Acknowledgements
      Parts of the content of this presentation are
      adapted from earlier presentations by:
      Tom Baker (Fraunhofer-Gesellschaft, Berlin),
      Michael Day, Rachel Heery, Paul Miller, and
      Andy Powell (UKOLN)




111          Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003
        Acknowledgements
      UKOLN is funded by Resource: the Council for
      Museums, Archives and Libraries, the Joint Information
      Systems Committee (JISC) of the UK higher and further
      education funding councils, as well as by project funding
      from the JISC and the European Union.
      UKOLN also receives support from the University of
      Bath where it is based.

      http://www.ukoln.ac.uk/




112           Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003

								
To top