Docstoc

Persistent identifiers

Document Sample
Persistent identifiers Powered By Docstoc
					   Persistent identifiers:
the 7 levels of identification
           Juha Hakala
     Helsinki University Library
  ELAG 2005 1-3 June 2005, CERN
                    Persistence?
   Is not dependent on the identifier itself, but on
    legal, organisational and technical infrastructure
       ISSN would collapse without the ISSN standard, a
        community using it according to the generally
        accepted principles, ISSN International Centre
        governing the system and the ISSN database linking
        the non-semantic (that is, dumb) identifiers to serials
   Even a technically brilliant system may be
    discontinued if its mission breaks apart
    ”Normal” identifiers and resolution
                services
   Resolution services are a new brand of identifiers which
    render traditional identifier systems actionable in the
    Internet (Web) environment
        Resolve: provide a link from reference to the resource
   Prime examples: DOI and URN
        Both may encompass, at least in principle, any existing identifier
         (URN namespaces have been defined for e.g. ISSN and ISBN)
        Both are useless without an existing identifier adding flesh to the
         DOI/URN bones
   From now on, only ”normal” identifiers will be discusses
        Complex enough topic for 35 minutes…
      Seven levels of identifiers
 Afterthe collapse of integrated library
 system paradigm, and implementation of
 IR portals, digital asset management
 systems, digital archives, e-resource
 management systems, what do we need
 to identify?
     This can be analysed from top to bottom, from
      organisations to search attributes
     Such analysis may show gaps and help in
      design of identifier systems
                 Top level: libraries
   Identifier system must cover at least other (memory)
    organisations
   National level (union catalogue codes) exists; due to the
    Internet / Web it became necessary to develop an
    international system
   ISIL, International Standard Identifier for Libraries and
    Related Organisations; ISO 15511
       Consists of ISO country code, hyphen and UC code
         • FI-H (Helsinki University Library)
   Danish Library Authority hosts the ISIL IC; national
    centres have been established in some countries but the
    system needs wider acceptance
 2nd level: collections and services
 These   identifiers are important for IR
  portals; international exchange of
  collection & service (e.g. a Z39.50 server)
  metadata is cumbersome unless there is
  an efficient means for duplicate control
 These identifiers do not exist yet
     Helsinki University Library is writing a New
      Work Item proposal for ISO TC 46 on ISCI;
      International Standard Collection Identifier
     No on-going efforts to develop service ID
         ISCI: design principles
 Will be based on ISIL in order to allow
  efficient decentralization of the ISCI
  assignment and creation of Internet-wide
  resolution service without a global ISCI DB
 Will consist of three parts: ISIL, delimiting
  character (colon) and the actual (colon-
  less) collection identifier
     FI-H:Slavica (Slavic collection in HUL)
 Need    for an international support center?
               3rd level: authors
 International exchange of authority records can
  be made more efficient with persistent and
  unique identification
 ISADN, International Standard Authority Data
  Number, has been discussed for quite a few
  years, but it is not yet formally under
  development
 Retrospective assignment may create interesting
  ”ownership” problems, especially if the future
  ISADN contains country of origin
       Is Franz Liszt German or Hungarian?
    4rd level: identifiers for works
   ISWC: International Standard Musical Work Code
       T-345246800-1
         • Letter T, 9-digit unique number and check digit
   ISAN: International Standard Audiovisual Number
       ISAN 006A-15FA-002B-C95F-A
         • 12-digit root segment + 4-digit segment for episode identification
           and check digit
   ISTC: International Standard Text Code
       ISTC OA9 2005 12B4A105 6
         • agency code, year, work element & check digit
   These systems were developed at the same time, but
    their syntax and terminology used varies
       This should not complicate usage too much
         ISTC/ISWC/ISAN issues
 Many library system vendors are investigating
  the possibility of implementing FRBR, but few
  have been capable of doing it (VTLS, OCLC)
 Once an ILMS is frbrized, implementing work
  identifiers is essential, but there is more than
  technology to consider here:
       Do we need to pay for these identifiers; even when
        retrospectively generating them for old works?
       Who will establish the national centers and create the
        identifiers (and work level records they require)?
       5th level: manifestations
 This   used to be familiar terrain for us
     ISBN, ISSN, NBN belong here
 E-publishing     has destroyed the old status
 quo:
     Systems that worked well for decades have
      adaptation problems for different reasons
     It is not yet entirely clear if the revisions done
      (or planned) are sufficient
E-problems with manifestations
   It is increasingly difficult to define valid ”targets”
       ISSN could be assigned to any Web site out there
       Publishers want to give ISBNs to anything that can in
        principle be sold separately (e-book chapters, images
        within a book, teddy bears on sale in book stores)
   The number of things to be identified is growing
    fast; this will cause syntax problems (ISBN
    revision was done to make more room) and staff
    issues in ISSN/ISBN national centers
       There is no point to give a persistent identifier to a
        non-persistent resource; therefore resources must be
        identified, described & archived which is labour-
        intensive process
                         Case ISBN
   The old ISBN was running out of number space
   Several extension options were discussed:
       13, 16, even 32-digit ISBNs
       The idea to make ISBN a ”dumb” number such as ISSN was
        voted down (for this the librarians in the WG are to blame)
   The new ISBN will be compliant with the EAN system
       13 digits, starting with 978, 979 or in the future with something
        else to extend the scope of the system further
       New check digit calculation algorithm adopted from EAN
       It is possible to convert from an old ISBN to the new (starting
        with 978) and back
   Publishers retroconvert to new ISBNs; libraries will keep
    the old ones
       ILMS need to do sophisticated things with old/new ISBNs
        6th level: component parts
   Libraries have not done too well in this area in
    the past due to staff limitations
       We catalogue serials but not the articles
 E-publishing may force us to change tactics
  since now even component parts are separate
  items accessible directly
 Manual processing must be partially or fully be
  replaced by automated processes; this will also
  have an impact on identifiers
       Automated ID generation solves the staff bottleneck
    SICI: still alive, but not kicking
   Serial Item and Component Identifier, 1991-
       NISO standard; has never really taken off
       Can be generated programmatically provided that the
        article is structured enough
         • 0095-4403(199502/03)21:3<>1.0.TX;2-Y
       Complex; consists of ISSN and stuff identifying the
        issue and article within it
       Publishers have their own systems like PII which
        have been easier to create and maintain (for them)
       Still not clear how popular SICI will eventually be
    BICI: Dead On Arrival, or conflict
      between theory and practice
 Book Item and Contribution Identifier
 NISO draft standard, never completed
 Consists of ISBN and extra stuff to identify the
  relevant section within the book; may be
  automatically generated
 Publishers & book stores prefer to rely solely on
  ISBN in their systems
       Using ISBN only is not a neat solution (uses a lot of
        ISBNs, and giving ISBN both for the thing as a whole
        and its component parts is messy)
 7th level: search attributes etc.
 Within Z39.50, sets (e.g. attribute and
 diagnostic), record syntaxes etc. are
 identified by ISO Object Identifiers
     MARC21: 1.2.840.10003.5.10
     Bib-1: 1.2.840.10003.3.1; term examples:
       •   Author: 1.2.840.10003.3.1.1.1003
       •   Name: 1.2.840.10003.3.1.1.1002
       •   Author-name personal: 1.2.840.10003.3.1.1.1004
       •   Personal name: 1.2.840.10003.3.1.1.1
                   OID problems
 Bib-1 attribute set is not quite as coherent as it
  should be, there are lots of (domestic) search
  attributes missing from it, and sometimes there
  are too many alternatives
 Attempt to develop Bib-2 failed, and even if we
  succeed in the future, co-existence of Bib-1 and
  Bib-2 may cause trouble
 ISO OIDs can be applied to anything
       Not clear how to use them in ”bibliographic context” to
        e.g. identify government publications or parts of them;
        this is currently being investigated in Finland
                Conclusion
 E-publishing and new applications (and their
  novel metadata) have expanded both the scope
  of identifiers needed and the requirements
  towards existing systems, especially on
  manifestation & component parts levels
 Standards developers have reacted to these
  needs, but the progress has been slow; still, on
  some areas system builders have been even
  more slow
                  Conclusion (2)
   Identifier is more than just a string of characters
       There must be an agent which assigns the identifier
        to a resource, and (usually) describes it
 As long as all parts in this picture are stable,
  identification is a routine process
 Agent breakdowns have been the most common
  reason for problems in the past
       Number of national ISSN agencies are non-active
   E-resources have destroyed the balance, and it
    may take a while before the identification system
    works again in ”business as usual” style

				
DOCUMENT INFO