U.S. Government Use of the OAI-PMH

Document Sample
U.S. Government Use of the OAI-PMH Powered By Docstoc
					   U.S. Government Use
     of the OAI-PMH
             Michael L. Nelson
            Old Dominion University
              Norfolk Virginia, USA

        Indo-US Workshop on
Open Digital Libraries and Interoperability
    Arlington, VA - June 23-25, 2003
•   ODU: K. Maly, M. Zubair, J. Bollen, X. Liu
•   LANL: R. Luce, X. Liu
•   NASA: G. Roncaglia, J. Rocker
•   MAGiC (UK): P. Needham
• Review:
    – OAI-PMH
    – data provider / service provider model
        • including “aggregators”
•   Role of registration for repositories
•   NASA projects
•   OSTI demo project
•   Technical Report Interchange (TRI)
    – NASA, DOE, DOD
Scientific and Technical Information (STI)

• This talk will cover US Government
  focused / sponsored STI only
• This talk will not cover American Memory
  – a cultural history project from the Library of
    Congress (LoC)
  – the LoC played a significant role in the
    definition and early adoption of the OAI-PMH
                                 Acronym Review
LaRC = Langley Research Center    LANL = Los Alamos National Laboratory
                                                                           AFRL = Air Force Research Laboratory
                                  Sandia = Sandia National Laboratory

      NASA                   Department of Energy                         Department of Defense

       CASI                               OSTI                                       DTIC
(Center for AeroSpace             (Office of Scientific and                    (Defense Technical
       Information)                Technical Information)                      Information Center)               
       The Rise and Fall of
      Distributed Searching
• wholesale distributed searching, popular at
  the time, is attractive in theory but
  troublesome in practice
  – Davis & Lagoze, JASIS 51(3), pp. 273-80
  – Powell & French, Proc 5th ACM DL, pp. 264-265
• distributed searching of N nodes still
  viable, but only for small values of N
      • NCSTRL: N > 100; bad
      • NTRS/NIX: N<=20; ok (but could be better)
                 resource – item - record

 set-membership is
 item-level property

item = identifier        all available metadata
                              about David

           Dublin Core        MARC       SPECTRUM
           metadata           metadata   metadata       records

        record = identifier + metadata format + datestamp
             Overview of OAI-PMH Verbs
                         Verb                      Function
              Identify              description of repository
about the
              ListMetadataFormats   metadata formats supported by
              ListSets              sets defined by repository

              ListIdentifiers       OAI unique ids contained in
verbs         ListRecords           listing of N records

              GetRecord             listing of a single record

               most verbs take arguments: dates, sets, ids, metadata formats
               and resumption token (for flow control)
   Data Providers / Service Providers

data providers             service providers
(repositories)             (harvesters)

                               aggregators allow for:
                                  • scalability for OAI-PMH
                                   • load balancing
                                   • community building
                                   • discovery

data providers    aggregator         service providers
(repositories)                       (harvesters)
• Frequently interchangeable terms:
   – aggregators: likely to be community / institutionally
   – caches: stores a copy, less likely to be community-
   – proxies: less likely to store a copy, may gateway
     between OAI-PMH and other protocols
       • Dienst / OAI Gateway; Harrison, Nelson, Zubair, JCDL 03
• To learn more about aggregators, caches &
         Example Aggregators
• Arc -
   – first described “hierarchical harvesting” in D-
     Lib Magazine, 7(4) 2001
• Celestial -
   – among other services, it provides a history of
     harvests (successful vs. errors)
                 OAI-PMH 2.0 Registration
                                                                unregistered because:
                                                                      • testing / development
                                                                      • not for public harvesting
                                    ??? unregistered                  • public, but “low-profile”
                                       repositories                   • never got around to it…
           75 repositories                                            • ???

Data Providers:
Service Providers:   DP:SP ~= 5:1
          Registration is Nice…
           …But Not Required
• OAI-PMH is (becoming) the “http” for digital
   – there is no central registry of http servers
      • remember the NCSA “What’s New” page? (ca. 1994)
• There will never be “registration support” in OAI-
   – registries are a type of service provider, built on top of
   – registration will be an integral part of community
   – friends…
• A light weight, optional, DP-centric
  method to communicate the existence of

 <friends ..namespace stuff..>
               NASA <friends> example



                                  Use of <friends>

Slide from S. Warner, Cornell University
   Langley Technical Report Server
                                       • publicly available
                                           – began as an anonymous ftp
                                             server in 1992; http access
                                             in 1993
                                           – model for other technical
                                             report servers at other
                                             NASA centers
                                                • details in NASA TM-
                                       • mostly LaTeX, MS Word,
                                         other systems
                                           – some scanned reports
     NACA Technical Report Server
                                    • publicly available
                                       – began in 1996
                                       – details in NASA TM-1999-
                                    • scanned reports from
                                       – NACA = predecessor to
                                    • contents mirrored with the
                                      MaGIC project
                                       – a UK-based grey-literature
                                         preservation project             – OAI-PMH used to mirror        contents
NACA Report 1345

as seen through its native DL
NACA Report 1345

as seen through MAGiC
NACA Report 1345

as seen through its Scirus
NACA Report 1345

as seen through OAIster
NACA Report 1345

as seen through my.OAI
(FS Consulting)
                NTRS OAI Architecture
                                                                                    all searching, browsing,
                                                                                    etc. performed on
                                       user                                         the metadata here
individual nodes can                        search for “cfd
still support direct user
                                      NTRS                          local copy of
                                                                                             metadata harvested
                                                                                             offline, through
                                                                                             OAI interface

                                                                                                  each node
            LTRS            ATRS        GTRS                  ...         CASITRS                 independently

                   content (reports) remain archived at the local sites
NASA Technical Report Server
                         • publicly available
                         • replacement for the former
                           distributed searching
                           version of NTRS
                            –   MySQL
                            –   Va Tech harvester
                            –   modified “bucket”
                            –   details in Nelson, Rocker,
                                Harrison, Library Hi-Tech,
                                21(2) (July 2003)
                         • a service provider &
                           aggregator      – same OAI-PMH baseURL
                              as used for interactive
NASA Technical Report Server
                • advanced, fielded
                • explicit query routing
                   – 12 NASA repositories
                   – 4 non-NASA
                      • turned “off” by default

> 0.5M records
 NASA DLs in the Larger STI Realm
    Publishers    Universities      DOD          International                  DOE

                                                                                   this could be a fully
                                                                                   connected graph

NTRS could also be a                                                       NTRS could also harvest
data provider from the                    NTRS
                                                                           metadata from other DLs,
point of view of other                                                     and provide access to
DLs; allowing the                                                          non-NASA content.
harvesting of NASA
report metadata.                 LTRS     ATRS     …       CASITRS
                                                                           We hope to influence
                                                                           the direction of the
                                                                  effort to use
   OSTI Energy Citations Database
                                       • OAI-PMH support just
                                         recently added (Feb
                                          – not yet officially
                                            announced or
                                          – 20k records, 8k full-
                                       • other OSTI collections
   Technical Report Interchange
• Goal: share technical reports between 4 US
  government labs without creating new digital
  libraries for users to learn!
   –   NASA Langley Research Center
   –   Air Force Research Laboratory
   –   Los Alamos National Laboratory (DOE)
   –   Sandia National Laboratory (DOE)
• Solution: use cooperating OAI-PMH caches at
  each site to
   – export local contents
   – ingest remote contents
                     TRI Production System - Status

              LaRC                  LANL             Sandia         AFRL
            TRI System            TRI System       TRI System     TRI System

       Records coming in from              Records going out to                    ODU
       other TRI systems                   other TRI systems                   TRI System


Slide from M. Zubair, ODU
                   Mappings in TRI

Laboratory   Native                       Native Source Native
             Metadata                     Commercial DL Destination
             Format                       System        Commercial DL
LaRC         MARC                         BASIS+                        (TBD)
LANL         MARC + local fields          Geac ADVANCE                  Science Server
AFRL         COSATI                       Sirsi STILAS                  Sirsi STILAS
Sandia       MARC                         Horizon                       Verity

              Details in Liu, et al. ECDL 2002; the above table also taken from the same paper
                                  A Single TRI Module
                                                                               Connect to remote DL by
                                                                                    OAI protocol

                                                       Local DB

                                    Read new data from
                                       remote DL       Write new data published
                                                              in local DL
                                                                                   OAI Harvester Control


                                                                                                  Common Modules in all three DLs

                                      Remote Data in
                                                           Local Data in DC format                  Specific module for each DL
                                        DC format

                                                  Local DL Manager

                            Write Remote data to local               Read local data and
                                     format                          convert to DC format

                                Input Directory                       output Directory

Slide from M. Zubair, ODU
The Future: Community Building

• Ultimately, protocols and metadata formats are not what
  makes a difference
• Rather, the critical mass afforded by a common set of
  utilities (cf. http, Dublin Core, XML)
• The best current example: The Open Language Archives
• OAI-PMH provides the basis for communication between
  strangers, but allows even richer communication between
             STI Communities
• Government produced/sponsored STI
• Academia
  – self-archiving vs. institutional archives
• Commercial publishers
  – e.g. BioMed Central

Shared By: