Docstoc

U.S. Government Use of the OAI-PMH

Document Sample
U.S. Government Use of the OAI-PMH Powered By Docstoc
					   U.S. Government Use
     of the OAI-PMH
             Michael L. Nelson
            Old Dominion University
              Norfolk Virginia, USA
                 mln@cs.odu.edu
            http://www.cs.odu.edu/~mln/


        Indo-US Workshop on
Open Digital Libraries and Interoperability
    Arlington, VA - June 23-25, 2003
           Acknowledgements
•   ODU: K. Maly, M. Zubair, J. Bollen, X. Liu
•   LANL: R. Luce, X. Liu
•   NASA: G. Roncaglia, J. Rocker
•   MAGiC (UK): P. Needham
                           Outline
• Review:
    – OAI-PMH
    – data provider / service provider model
        • including “aggregators”
•   Role of registration for repositories
•   NASA projects
•   OSTI demo project
•   Technical Report Interchange (TRI)
    – NASA, DOE, DOD
                Disclaimer:
Scientific and Technical Information (STI)

• This talk will cover US Government
  focused / sponsored STI only
• This talk will not cover American Memory
  – a cultural history project from the Library of
    Congress (LoC)
     • http://memory.loc.gov/
  – the LoC played a significant role in the
    definition and early adoption of the OAI-PMH
                                 Acronym Review
LaRC = Langley Research Center    LANL = Los Alamos National Laboratory
                                                                           AFRL = Air Force Research Laboratory
                                  Sandia = Sandia National Laboratory


      NASA                   Department of Energy                         Department of Defense




       CASI                               OSTI                                       DTIC
(Center for AeroSpace             (Office of Scientific and                    (Defense Technical
       Information)                Technical Information)                      Information Center)
http://www.sti.nasa.gov/            http://www.osti.gov/                       http://www.dtic.mil/
       The Rise and Fall of
      Distributed Searching
• wholesale distributed searching, popular at
  the time, is attractive in theory but
  troublesome in practice
  – Davis & Lagoze, JASIS 51(3), pp. 273-80
  – Powell & French, Proc 5th ACM DL, pp. 264-265
• distributed searching of N nodes still
  viable, but only for small values of N
      • NCSTRL: N > 100; bad
      • NTRS/NIX: N<=20; ok (but could be better)
                 resource – item - record



 set-membership is
                                                    resource
 item-level property



item = identifier        all available metadata
                                                        item
                              about David


           Dublin Core        MARC       SPECTRUM
           metadata           metadata   metadata       records

        record = identifier + metadata format + datestamp
             Overview of OAI-PMH Verbs
                         Verb                      Function
              Identify              description of repository
metadata
about the
repository
              ListMetadataFormats   metadata formats supported by
                                    repository
              ListSets              sets defined by repository

              ListIdentifiers       OAI unique ids contained in
                                    repository
harvesting
verbs         ListRecords           listing of N records

              GetRecord             listing of a single record

               most verbs take arguments: dates, sets, ids, metadata formats
               and resumption token (for flow control)
   Data Providers / Service Providers




data providers             service providers
(repositories)             (harvesters)
                 Aggregators

                               aggregators allow for:
                                  • scalability for OAI-PMH
                                   • load balancing
                                   • community building
                                   • discovery




data providers    aggregator         service providers
(repositories)                       (harvesters)
                      Aggregators
• Frequently interchangeable terms:
   – aggregators: likely to be community / institutionally
     focused
   – caches: stores a copy, less likely to be community-
     oriented
   – proxies: less likely to store a copy, may gateway
     between OAI-PMH and other protocols
       • Dienst / OAI Gateway; Harrison, Nelson, Zubair, JCDL 03
• To learn more about aggregators, caches &
  proxies:
   – http://www.openarchives.org/OAI/2.0/guidelines-aggregator.htm
   – http://www.cs.odu.edu/~mln/jcdl03/
         Example Aggregators
• Arc - http://arc.cs.odu.edu/
   – first described “hierarchical harvesting” in D-
     Lib Magazine, 7(4) 2001
      • http://www.dlib.org/dlib/april01/liu/04liu.html
• Celestial - http://celestial.eprints.org/
   – among other services, it provides a history of
     harvests (successful vs. errors)
      • http://celestial.eprints.org/cgi-bin/status
                 OAI-PMH 2.0 Registration
                                                                unregistered because:
                                                                      • testing / development
                                                                      • not for public harvesting
                                    ??? unregistered                  • public, but “low-profile”
                                       repositories                   • never got around to it…
           75 repositories                                            • ???
            registered




Data Providers: http://www.openarchives.org/Register/BrowseSites.pl
Service Providers: http://www.openarchives.org/service/listproviders.html   DP:SP ~= 5:1
          Registration is Nice…
           …But Not Required
• OAI-PMH is (becoming) the “http” for digital
  libraries
   – there is no central registry of http servers
      • remember the NCSA “What’s New” page? (ca. 1994)
• There will never be “registration support” in OAI-
  PMH
   – registries are a type of service provider, built on top of
     OAI-PMH
   – registration will be an integral part of community
     building
   – friends…
                      <friends>
• A light weight, optional, DP-centric
  method to communicate the existence of
  “others”
http://techreports.larc.nasa.gov/ltrs/oai2.0/?verb=Identify

..
<description>
 <friends ..namespace stuff..>
   <baseURL>http://naca.larc.nasa.gov/oai2.0</baseURL>
   <baseURL>http://ntrs.nasa.gov/oai2.0</baseURL>
   <baseURL>http://horus.riacs.edu/perl/oai/</baseURL>
   <baseURL>http://ston.jsc.nasa.gov/collections/TRS/oai/</baseURL>
 </friends>
</description>
..
               NASA <friends> example
                                                 harvester

                    Identify
                                                       <friends>…</friends>




http://techreports.larc.nasa.gov/ltrs/oai2.0/                                                          http://naca.larc.nasa.gov/oai2.0/




                                                       http://ston.jsc.nasa.gov/collections/TRS/oai/




                  http://ntrs.nasa.gov/oai2.0/
                                                                                                                 http://horus.riacs.edu/perl/oai/
                                  Use of <friends>




Slide from S. Warner, Cornell University
   Langley Technical Report Server
                                       • publicly available
                                           – began as an anonymous ftp
                                             server in 1992; http access
                                             in 1993
                                           – model for other technical
                                             report servers at other
                                             NASA centers
                                                • details in NASA TM-
                                                  109162
                                       • mostly LaTeX, MS Word,
                                         other systems
                                           – some scanned reports

http://techreports.larc.nasa.gov/ltrs/
http://techreports.larc.nasa.gov/ltrs/oai2.0/
     NACA Technical Report Server
                                    • publicly available
                                       – began in 1996
                                       – details in NASA TM-1999-
                                         209127
                                    • scanned reports from
                                      1917-1958
                                       – NACA = predecessor to
                                         NASA
                                    • contents mirrored with the
                                      MaGIC project
                                       – a UK-based grey-literature
                                         preservation project
http://naca.larc.nasa.gov/             – OAI-PMH used to mirror
http://naca.larc.nasa.gov/oai2.0/        contents
NACA Report 1345

as seen through its native DL
http://naca.larc.nasa.gov/
NACA Report 1345

as seen through MAGiC
http://www.magic.ac.uk/
NACA Report 1345

as seen through its Scirus
(Elsevier)
http://www.scirus.com/
NACA Report 1345

as seen through OAIster

http://oaister.umdl.umich.edu/
NACA Report 1345

as seen through my.OAI
(FS Consulting)
http://www.myoai.com/
                NTRS OAI Architecture
                                                                                    all searching, browsing,
                                                                                    etc. performed on
                                       user                                         the metadata here
individual nodes can                        search for “cfd
                                            applications”
still support direct user
interaction
                                      NTRS                          local copy of
                                                                    metadata
                                                                                             metadata harvested
                                                                                             offline, through
                                                                                             OAI interface


                                                                                                  each node
            LTRS            ATRS        GTRS                  ...         CASITRS                 independently
                                                                                                  maintained




                   content (reports) remain archived at the local sites
NASA Technical Report Server
                         • publicly available
                         • replacement for the former
                           distributed searching
                           version of NTRS
                            –   MySQL
                            –   Va Tech harvester
                            –   modified “bucket”
                            –   details in Nelson, Rocker,
                                Harrison, Library Hi-Tech,
                                21(2) (July 2003)
                         • a service provider &
                           aggregator
 http://ntrs.nasa.gov/      – same OAI-PMH baseURL
                              as used for interactive
                              searching
NASA Technical Report Server
                • advanced, fielded
                  search
                • explicit query routing
                   – 12 NASA repositories
                   – 4 non-NASA
                     repositories
                      • turned “off” by default
 non-NASA
 repositories




> 0.5M records
 NASA DLs in the Larger STI Realm
    Publishers    Universities      DOD          International                  DOE
                                                                     ...


                                                                                   this could be a fully
                                                                                   connected graph




NTRS could also be a                                                       NTRS could also harvest
data provider from the                    NTRS
                                                                           metadata from other DLs,
point of view of other                                                     and provide access to
DLs; allowing the                                                          non-NASA content.
harvesting of NASA
report metadata.                 LTRS     ATRS     …       CASITRS
                                                                           We hope to influence
                                                                           the direction of the
                                                                           science.gov effort to use
                                                                           OAI-PMH
   OSTI Energy Citations Database
                                       • OAI-PMH support just
                                         recently added (Feb
                                         2003)
                                          – not yet officially
                                            announced or
                                            registered
                                          – 20k records, 8k full-
                                            text
                                       • other OSTI collections
                                         planned
http://www.osti.gov/energycitations/
   Technical Report Interchange
• Goal: share technical reports between 4 US
  government labs without creating new digital
  libraries for users to learn!
   –   NASA Langley Research Center
   –   Air Force Research Laboratory
   –   Los Alamos National Laboratory (DOE)
   –   Sandia National Laboratory (DOE)
• Solution: use cooperating OAI-PMH caches at
  each site to
   – export local contents
   – ingest remote contents
                     TRI Production System - Status

              LaRC                  LANL             Sandia         AFRL
            TRI System            TRI System       TRI System     TRI System




       Records coming in from              Records going out to                    ODU
       other TRI systems                   other TRI systems                   TRI System
                                                                                (Listener)


              In
                            Proposed
          Production




Slide from M. Zubair, ODU
                   Mappings in TRI

Laboratory   Native                       Native Source Native
             Metadata                     Commercial DL Destination
             Format                       System        Commercial DL
                                                        System
LaRC         MARC                         BASIS+                        (TBD)
LANL         MARC + local fields          Geac ADVANCE                  Science Server
AFRL         COSATI                       Sirsi STILAS                  Sirsi STILAS
Sandia       MARC                         Horizon                       Verity




              Details in Liu, et al. ECDL 2002; the above table also taken from the same paper
                                  A Single TRI Module
                                                                               Connect to remote DL by
                                                                                    OAI protocol

                                                       Local DB




                                    Read new data from
                                       remote DL       Write new data published
                                                              in local DL
                                                                                   OAI Harvester Control

                                                       Scheduler

                                                                                                  Common Modules in all three DLs


                                      Remote Data in
                                                           Local Data in DC format                  Specific module for each DL
                                        DC format


                                                  Local DL Manager




                            Write Remote data to local               Read local data and
                                     format                          convert to DC format




                                Input Directory                       output Directory


Slide from M. Zubair, ODU
The Future: Community Building

• Ultimately, protocols and metadata formats are not what
  makes a difference
• Rather, the critical mass afforded by a common set of
  utilities (cf. http, Dublin Core, XML)
• The best current example: The Open Language Archives
  Community
   – http://www.language-archives.org/
• OAI-PMH provides the basis for communication between
  strangers, but allows even richer communication between
  friends
             STI Communities
• Government produced/sponsored STI
     • http://ntrs.nasa.gov/
     • http://www.osti.gov/energycitations/
     • http://dlib.cs.odu.edu/tri/
• Academia
  – self-archiving vs. institutional archives
     • http://www.soros.org/openaccess/
     • http://www.ecs.soton.ac.uk/~harnad/Tp/resolution.htm
• Commercial publishers
  – e.g. BioMed Central
     • http://www.biomedcentral.com/

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:1
posted:9/26/2012
language:English
pages:36