Workshop_Metadata by xiangpeng

VIEWS: 4 PAGES: 42

									Metadata issues and DOI
Metadata issues and DOI
overview of presentation...


Background
      Three <indecs> conclusions
      The metadata landscape: which schemes
      matter most to DOI?

DOI metadata - practical implications
     DOI applications: sets of metadata for a use
     DOI Kernel
     Handle and metadata
     Conclusion
Definitions of metadata
popular...
Metadata is data about data.
Everyone


logical...
An item of metadata is a relationship that someone
claims exists between two entities*.
<indecs> framework


functional...
Metadata is the life-blood of e-commerce.
John Erickson (HP)




             *entity = something which has identity
#1: All metadata is just a view

e.g. Views of a “person”: some (generic) ways in which
you might be identified in metadata schemes...
Son                  Hospital patient       Husband
Legal person         Citizen                Charity giver
Agent                Car driver             Hotel guest
Alien                Rights owner           Speeding ticket recipient
Scholar              Marathon runner        DisneyWorld visitor
Library user         Software licensee      Frequent Flyer
Composer             Parent                 Concert-goer
credit card holder   Tax payer              Passenger
Shoe purchaser       Club member            Employee
Author               e-consumer             Voter
Lottery entrant      Back account holder    Dog owner


In each of these roles “you” will have different
IDs and attributes.

                                           Three <indecs> conclusions
#1: All metadata is just a view

Creations are the same. An identifier for a
published article may refer to...
 A manuscript
 The abstract work
 A draft
 A (class of) physical copy in a publication
 A (class of) digital copy (not in a publication)
 A (class of) digital copy in a publication
 A (class of) digital format
 A specific digital copy
 A (class of) paper copy
 A specific paper copy
 An edition
 A reprint
 A translation
 etc…and many combinations of the above

Similar views apply to other types of creations.
                                                    Three <indecs> conclusions
#1: All metadata is just a view

Views must not be confused for digital content and
rights management. Mistaken identity can be
catastrophic.
Increasingly, views need to be interoperable
(e.g. production workflow, rights, marketing within
one business; supply chain transfer; etc.).
The need for automated, interoperable views in d-
commerce will be enormous.




                                   Three <indecs> conclusions
#2: (Almost) all terms need identifiers

Each of the values of a view must be defined and
identified if other views are to recognize them (what do
you mean by an abstract work? an edition? a format? a
scholar? a name?)
So views need comprehensive controlled vocabularies
(nb our reliance on ISO language, territory, currency,
time codes).
Automation needs disambiguity.
Terms of rights must be unambiguous. Anything may be
a term of an agreement.
Emergence of the value of structured ontologies for
commerce (like the indecs model).
                                   Three <indecs> conclusions
#3: Events are the key to interoperability

Most metadata is “thing” or “people” based.
   • static views e.g. “a creation”


In the net future, metadata interoperability will
be achieved by describing “events”; relating things
and people
   • dynamic views e.g. “A created B”


Event descriptions will also be the key to rights
metadata (transactions are events)

                                      Three <indecs> conclusions
The metadata landscape

These conclusions are being reached increasingly
often elsewhere.


There is an explosion of metadata activity:
   • Models, Identifiers, Vocabularies,
      Dictionaries, Ontologies.
   • XML/RDF schemas.
   • Registries/Repositories/”crosswalks”.
   • Technical standards.
The metadata landscape for “creations”
 The metadata landscape for “creations”
         Libraries           Archives   Museums       Education

    Technology                                                  Newspapers


                                                                      Magazines

                                        Standards
                                                                      Journals

                                                                       Books


                                                                    Texts
Audiovisual



                     Audio               Music      Copyright
 The metadata landscape for “creations”                                    1980s

         Libraries           Archives    Museums          Education
          MARC
    Technology                                                      Newspapers

                                                  EAN
                                        UPC                                Magazines

                                         Standards                  ISSN
                                                                           Journals
                                         ISO codes

                                                                     ISBN Books


                                                                        Texts
Audiovisual

                                                          CAE
                     Audio                Music         Copyright
  The metadata landscape for “creations”                                         mid 90’s

             Libraries           Archives    Museums      IMS Education
             MARC         FRBR                                      IIM
     Technology                             Dublin Core                     Newspapers

                                                      EAN
                                            UPC                                    Magazines
                     url                                    DOI
                                             Standards                      ISSN
                    urn     Handle                                                 Journals
                                             ISO codes
Multimedia
                                                                             ISBN Books



                                                                                Texts
 Audiovisual ISAN                                      ISWC

                                              ISMN        CIS     CAE
                         ISRC
                         Audio                Music             Copyright
  The metadata landscape for “creations”                                           today

             Libraries           Archives    Museums       IMS Education
             MARC         FRBR              CIDOC                    IIM           NITF
                                                              LOM
     Technology                             Dublin Core                     Newspapers
                         RDF
 XML schema                                                     abc
                   ISO11179                           EAN
       EBooks                               UPC                         PRISM Magazines
 eBooks              url     uri                                DOI
                                             Standards                      ISSN
                    urn     Handle                          CROSSREF                 Journals
      MPEG7                                  ISO codes                      SICI
Multimedia       MPEG21                                           EPICS
                                              <indecs>                       ISBN Books
                          P/META                                  ONIX
                                                                              BICI
               UMID                                      XrML
                           SMPTE                                          ISTC Texts
 Audiovisual ISAN                           IPDA         ISWC
                           DMCS
                                              ISMN        CIS     CAE
                         ISRC
                         Audio                Music             Copyright
Convergence

All serious schemes are          EPICS/ONIX (text)
becoming...                      SMPTE (audiovisual)
                                 SDMI/DCMS (audio/music)
•Granular (parts and versions)
                                 eBooks
•Modular (creations within       DOI genres
            creations)           CIDOC (museums/archives)
•Multimedia                      FRBR (libraries)
                                 Dublin Core
•Multinational
                                 CIS (copyright societies)
•Multilingual                    PRISM (magazines)
•Multipurpose                    NITF (newspapers)
                                 MPEG21 (multimedia)

Result: major “sector” schemes are now trying to
define metadata with broadly the same scope, only
different emphases.
Which initiatives matter most to DOI?

 MPEG21
 SMPTE data dictionary
 ONIX
 XrML


                           Criteria...
                           Strong underlying data
                           model
                           Multi-purpose
                           Extensive, structured
                           vocabulary
                           Commercial critical mass
                           Outward-looking
 MPEG21

Began 2000 (ISO Motion Picture Expert Group).
Possible umbrella for digital multimedia standards. Place
to bring technology and content standards together.
MPEG track record of disciplined standards development.
Most major players getting involved.
Not many lawyers (yet).
Short-term perception problem: “MPEG is audiovisual”.
Is the challenge too great?
 SMPTE Data Dictionary/UMID

Began 1998 (Society of Motion Picture and Television
Engineers).
Well-structured multimedia technically-oriented data
dictionary.
ISO 11179 metadata registry based, good governance and
update procedure.
SMPTE track record of disciplined standards
development.
UMID (Unique Media Identifier) for digital material -
complementary to “editorial” identifiers like DOI.
Guaranteed implementation in “home” sector.
Start point for MPEG7 metadata work.
EPICS & ONIX International

EDItEUR (EPICS) and AAP (ONIX) convergence (May
2000).
Substantial and extensible EPICS metadata dictionary,
<indecs>-model based, from which “ONIX” XML-tagged
subset(s) are taken.
Commerce-driven (Amazon etc) with transatlantic
industry support and International Steering Group.
Likely to be used by eBooks, ISTC.
ONIX for video (Amazon initiative)? ONIX for audio?
Best chance of e-commerce multimedia vocabulary and
schema (and maybe d-commerce?).
XrML and Rights metadata

DRM (Digital Rights Management) systems at present are
for “unitary” rights: doesn’t deal with modularity.
Holdup 1: Rights vocabularies need descriptive
vocabularies - not yet ready.
Holdup 2: Events model needed to integrate descriptions
and rights - event-based tools not yet developed.
XrML likely focal point for next stage.
2001+ before more mature interoperable developments
start to emerge.
DOI-R? Interested partners in a prototype?
Standard controlled vocabularies

Existing…
Territories, Language, Currency, Date/Time (ISO)
Measures (Unified Code for Units of Measure)

Needed…
Creation types
Derivation types (adaptation, sample, compilation…)
Contributor roles (author, translator, cameraman…)
Title types (abbreviated, inverted, formal... etc)
Media types (formats)
Name types
Identifier types
Encoding types
Tools/instruments
User roles
etc...and many identifiers need establishing or creating (Parties,
Agreements, ISWC, ISTC, ISAN, UMID etc)
Metadata issues and DOI




DOI metadata - practical implications
DOI “Application Profiles” and “User Communities”
                  (was “Genres”)
DOI Kernel
Handle and metadata
Conclusion
DOI Application Profile

A DOI Application Profile is a DOI view: mechanism for
“unity in diversity”.
Based on any interest group’s view of a type of creation (a
DOI User Community). Functional granularity: create a
genre when you need it.
DOI-AP’s can overlap: creations can be in multiple DOI-
APs.
DOI-AP has metadata kernel, Registration Agency,
Governance /Development Group
Base Set for new, unplaced DOIs.
Zero Set = “initial implementation” DOIs (just a single
URL redirection; zero additional metadata).
                          Metadata              W3C, WIPO,
Single redirection                              NISO, ISO,
(persistent identifier)   Multiple resolution   UDDI, etc




     Initial
      Initial                   Full
                                Full             Activity
                                                  Activity
 implementation
  implementation          implementation
                           implementation        tracking
                                                 tracking
                          Metadata              W3C, WIPO,
Single redirection                              NISO, ISO,
(persistent identifier)   Multiple resolution   UDDI, etc




                            Defined
                             Defined
Zero App Profile
Zero App Profile           App Profiles
                           App Profiles
DOI Kernel

Each DOI-AP starts from Base kernel (8 elements) and
may add whatever else it needs: defined by the DOI User
Community.
A kernel extension model is being developed
DOI metadata vocabulary to be developed - in tandem
with EPICS/ONIX?
Can/should coincide with or provide sector requirements
(eg ISBN, ISRC, ISWC etc).
Different DOI-APs’ metadata will interoperate if
vocabularies are developed within indecs-based model.
DOI Kernel

Contains critical minimum metadata for basic
recognition (but not complete disambiguation).
Standard base vocabulary (eg manifestation, version)
mean all DOI applications can expect base genre
metadata.
DOI -AP entity (e.g.     DOI             10.1000/ISBN0141255559
“book”) must be          DOI Genre       Book
analysable in terms      Identifier      ISBN 0141255559
of other attributes      Title           Two for the dough
(e.g. media, mode,       Type            Manifestation
content, subject).       Mode            Visual
                         Primary Agent   Janet Evanovich
                         Agent Role      Author
DOI Kernel Extensions

IDF to develop an extended “catalogue” for all
extended metadata requirements from indecs-based
models and vocabulary, along these lines...
 DOI                           Content Creations
 UASet                             Content Link Sequence, Measure
 Identifier(s)                 Related Creations + Link Type
 Title(s) + Types, Languages   Creation Event + Type
 Primary type                      Primary Agent + Agent Role + Tool
 Mode                              Source Creation
 Media                             Date(s)
 Encoding                          Location(s)
 Form(s)                       Availability Event + Type
 Subject(s)                        Agent + Agent Role
 Content Language + Use Type       Date(s)
 Measures + Units of Measure       Location(s)
                                   Price + Type
  DOI Kernel as the basis of each app. profile

  Each Profile can be thought of as built from the kernel +
  extensions:




                                       DOI AP




Compulsory kernel for any DOI
                           metadata for application
DOI Kernel as the basis of each application set

Each DOI-AP can be thought of as built from the kernel +
extensions…
...But the kernel is actually what several AP’s have in
common (compare the different views of a person) :



 Son                  Hospital patient      Husband
 Legal person         Citizen               Charity giver
 Agent                Car driver            Hotel guest
 Alien                Rights owner          Speeding ticket recipient
 Scholar              Marathon runner       DisneyWorld visitor
 Library user         Software licensee     Frequent Flyer
 Composer             Parent                Concert-goer
 credit card holder   Tax payer             Passenger
 Shoe purchaser       Club member           Employee
 Author               e-consumer            Voter
 Lottery entrant      Back account holder   Dog owner
DOI Kernel as the basis of each Application




This kernel cannot be logically defined from first
principles


In the absence of existing Application Profiles to define
this overlap = kernel, we have made a reasonable estimate
from the logical analysis of <indecs>
                                     3
                                 AP
                                 I
       DO




                               DO
         I
             AP
                  2


                                     DOI AP 1




kernel for any DOI
                          metadata for AP
 DOI-APs: all metadata in well-formed structure
     <indecs> analysis and DOI

                                                                                DOI-AP
                                                                               format
            DOI>                                           infixion
                                                                             genre
             identifiers                                continuity                         Attributes
                                                                         audience
                    names               measures        language       origination
                   (titles)            quantities       mode           IP type
                              labels       quantities      qualities          types
  agent               Creation identified by DOI
     time                      events                                         situations
      place           Creating events                                   Content
                                                     Source
       tool             Using events                IP entity          creations
                                                                           Related
   Primary agent                                                          Creations        Relations
                                           agent
Agent role                                time                               IP Rights
                price                                                        statement
                                         place

     number             currency                                   = kernel
Metadata declarations

WHAT:
• Base kernel metadata must be declared.
• DOI-AP-specific metadata is a matter for the DOI
User Community (Governance Group/Registration
Agency) to decide.
HOW:
• Either local webpage or central repository or both (as
decided by User Community rules).
• Automated access to metadata declaration via Handle
data types?
• XML schemas.
Roles of declared metadata

= Functional specification of the DOI kernel

(a) to assign a unique DOI to the creation [DOI]

(b) to link the DOI to the principal local identifier of
  a creation (if any) to enable the integration of
  DOI-related applications and metadata with
  others [Identifier]

(c) to enable a searcher or application to identify the
   creation by its most common name and the
   parties(s) responsible for its creation or
   publication [Title, Primary Agent, Agent Role]
Roles of declared metadata (continued)

(d) to enable a searcher or application to distinguish
  the fundamental type of creation (abstract,
  physical, digital or spatio-temporal), and thereby
  also to distinguish between creations of different
  types with the same names and creators. [Type]

(e) to enable a searcher or application or distinguish
   the mode of the creation (visual, audio, etc.)
   [Mode]

(f) to enable a searcher or application to determine
   to which DOI user/application set the creation
   belongs [DOI-AP].
Handle and metadata

Handle data types could create a way of processing
metadata as a “distributed database” of services: e.g.

                 metadata@10.1000/123456
                 rights@10.1000/123456
                 abstract@10.1000/123456
                 sample@10.1000/123456
                 buy@10.1000/123456
                 license@10.1000/123456
                 pdf@10.1000/123456
                 etc.


Data types (and results) must be consistent, so the
Handle data type vocabulary must be developed with
great care within indecs-based model. Some data types
could be application specific.
Metadata tasks for DOI
• Mapping ONIX to <indecs>
  – reconcile any differences
• <indecs> data dictionary
  – elements and iids tested in depth; for mappings
• maintaining iid registry
  – database
  – available to anyone building application schema,
    but not need to be public
• applications based on iid registry
  – technology tools to ease application set building
The DOI model: future extension

1. developing rights management aspects of dictionary.


                    Identifier


     Description       doi>          Action
The DOI model: future extension

Developing rights management aspects of dictionary:


                   Identifier


     Description       doi>         Action


                     Rights




               DOI for parties and events in future?
Conclusion: DOI as the Integrator

“DOI is the most ambitious identifier in the history of
the world”. (G. Rust 1998)
But now several things are becoming established...
…it has a persistent, granular, flexible, unique identifier
which can be a “wrapper” for other IDs. Not competitive
- enhances legacy identifiers’ functionality in d-
commerce. DOI as the integrating digital identifier?
...a strong, established metadata model and vocabulary.
…a controlled but flexible development structure.
…it does not confuse names with addresses.
…allows multiple, standardised automated actions.
Nothing else comes close...
Metadata issues and DOI

								
To top