Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage by juanagui

VIEWS: 14 PAGES: 32

									                                          ALA/CLA Annual Meeting
                                                   22 June 2003
                                                     Toronto, CA




      Using OAI-PMH to Aggregate Metadata
      Describing Cultural Heritage Resources


Timothy W. Cole (t-cole3@uiuc.edu)
University of Illinois at Urbana-Champaign

http://dli.grainger.uiuc.edu/Publications/TWCole/ALA2003OAI/
    Order of Presentation

   Perspectives on OAI-PMH
   Illinois OAI metadata harvesting project
        Goals & objectives
        Findings regarding metadata
        Findings regarding search & discovery
   New OAI projects at Illinois
        IMLS digital collections & content
        CIC OAI metadata harvesting project
    22 June 2003       ALA 2003 / OAI-PMH   Tim Cole (t-cole3@uiuc.edu)
     OAI Protocol
     for Metadata Harvesting
Harvesting approach
  to interoperability
  at metadata level
Divides world into
  Metadata Providers
  & Service Providers
Builds on HTTP,
  XML, & Dublin Core

                 http://www.openarchives.org/
  22 June 2003          ALA 2003 / OAI-PMH   Tim Cole (t-cole3@uiuc.edu)
    OAI Antecedents
   Call to other E-Print archives (July 1999)
    Paul Ginsparg, Rick Luce, & Herbert Von de Sompel:
        ―…mobilize core group to work towards achieving a
        universal service for author self-archived scholarly literature.‖
   Santa Fe Mtgs. (Oct. 1999 & June 2000)
   OAI – PMH version history:
       First Alpha Release, Sept. 2000
       1.0 (Beta) Release January 2001
       1.1 (Beta 2) Release July 2001
       2.0 (Production) Release June 2002


    22 June 2003              ALA 2003 / OAI-PMH     Tim Cole (t-cole3@uiuc.edu)
    Original OAI Organization
   OAI Executive:
       Carl Lagoze & Herbert Van de Sompel
   OAI Steering Committee:
        Co-Chairs: Dan Greenstein, Cliff Lynch
   OAI Technical Committee
   Funded by NSF, DLF & CNI
   Seeks to be user community driven

    22 June 2003      ALA 2003 / OAI-PMH   Tim Cole (t-cole3@uiuc.edu)
OAI-PMH as a tool
   All about moving metadata around
   Designed to be a building block, useable by
    many different communities
   Can facilitate (in some cases enable)
    services & functions
   Assumes widely distributed content, but
    centralized indexing(!) & services
   Build once, use for many applications
   Focus of OAI is interoperability
22 June 2003      ALA 2003 / OAI-PMH   Tim Cole (t-cole3@uiuc.edu)
    Harvesting vs. Broadcast
   Competing approaches to interoperability
        Distributed/Broadcast searching: search and
         discovery over remote services and data

        Harvesting is when data/metadata is
         transferred from the remote source to the
         destination where search & discovery
         services are located (e.g. Union catalogs)

   OAI-PMH is a harvesting protocol
    22 June 2003       ALA 2003 / OAI-PMH   Tim Cole (t-cole3@uiuc.edu)
  As Compared to Z39.50
                      Z39.50                OAI
Content (Objects)     Distributed           Distributed

World View            Bibliographic         Bibliographic

Object Presentation   Data provider         Data provider


Searching is          Distributed           Centralized

Search done by        Data provider         Service provider

Metadata searched is Up to date             Stale

Semantic Mapping      When searching        Metadata delivery

  22 June 2003         ALA 2003 / OAI-PMH   Tim Cole (t-cole3@uiuc.edu)
Metadata vs. Resources

   Resource refers to information objects or
    digital representations of information objects
   Metadata item is a collection of properties
    about a resource (e.g. title, author, etc.)
   Metadata record is a metadata item expressed
    in a specific syntax according to an XSD
   OAI focuses on metadata, with the implicit
    understanding that metadata contains useful
    links to the source information object(s)

22 June 2003       ALA 2003 / OAI-PMH   Tim Cole (t-cole3@uiuc.edu)
    When to use OAI-PMH
   Metadata is sufficient for services desired
   Normalization, dedupping, metadata
    augmentation desired
   Content is widely distributed across small,
    non-Z39.50 enabled repositories
        OAI-PMH is more lightweight than Z39.50

   Portals can use BOTH Z39.50 & OAI-PMH

    22 June 2003      ALA 2003 / OAI-PMH   Tim Cole (t-cole3@uiuc.edu)
    What OAI-PMH Is Not

    Not search & discovery on its own

    Not a database management system

    Not a single metadata schema

    Not OAIS

    22 June 2003   ALA 2003 / OAI-PMH   Tim Cole (t-cole3@uiuc.edu)
  How OAI Works
OAI ―VERBS‖
                          Service Provider           Metadata Provider
  Identify
                               H                                R
                                     HTTP Request
  ListMetadataFormats          A                                E
                               R      (OAI Verb)                P
  ListSets                     V                                O
                               E OAI                        OAI S
                                                                I
  ListIdentifiers              S
                               T                                T
                                     HTTP Response              O
  ListRecords                  E
                               R      (Valid XML)               R
                                                                Y
  GetRecord

  22 June 2003      ALA 2003 / OAI-PMH       Tim Cole (t-cole3@uiuc.edu)
 OAI Provider Architectures
 Descriptive Metadata
                    HTML
                                          OAI Administrative Metadata,
                    <meta>              e.g., Ids, datestamps, sets, formats
           XML
DBMS




       OAI Application (CGI, ASP, PHP, etc.)
                                                                  OAI
                                                                Harvesters
                Webserver - HTTP


 22 June 2003                ALA 2003 / OAI-PMH      Tim Cole (t-cole3@uiuc.edu)
A few projects using OAI-PMH
   Basic building block of the
    National Science Digital Library

   Large-scale implementations in
    E-Prints, OLAC, NDLTD, …

   Built into ENCompass,
    ContentDM, Michigan’s DLXS, D-
    Space, and other products

   Open Archives Forum in Europe;
    will be part of federation
    activities in the UK and EU

22 June 2003           ALA 2003 / OAI-PMH   Tim Cole (t-cole3@uiuc.edu)
     Univ. of Illinois OAI Metadata
     Harvesting Project

   Funded by Andrew W. Mellon Foundation
    (July 2001 – May 2003)
   Primary objectives:
       Develop & make available OAI harvesting tools
       Build search services for aggregated metadata in
        the domain of cultural heritage
       Examine metadata aggregation issues, including
        use of EAD in OAI context
       Investigate utility of aggregated metadata,
        including preliminary testing with end-users

22 June 2003           ALA 2003 / OAI-PMH   Tim Cole (t-cole3@uiuc.edu)
    Type of resources

   39 data providers
       academic libraries                          Other
       Museums / cultural orgs                      5%

       digital libraries                     Artifact             Text &
                                               20%                  Sheet
       public library                                             Music
                                                                    50%
                                              Images
   1.1 million original DC                    25%
    records
       + 1.5 million derived
        from EAD

    22 June 2003         ALA 2003 / OAI-PMH              Tim Cole (t-cole3@uiuc.edu)
       Variations in DC element usage
   Records containing subject & description element
                                             SUBJECT    DESCRIPTION
            Digital libraries                78%        36%
            (10 total, 122,719 records)
            Museums, hist. societies,        93%        93%
            etc.
            (6 total, 255,800 records)
            Academic libraries               15%        13%
            (7 total, 235,294 records)

   Many different controlled and local vocabularies in use
   Granularity: a record may describe a collection
    of coins — or one coin

    22 June 2003                   ALA 2003 / OAI-PMH    Tim Cole (t-cole3@uiuc.edu)
     Excerpt of a metadata record
     describing a cotton coverlet
Description: Digital image of a single-sized cotton coverlet for a bed with embroidered
   butterfly design. Handmade by Anna F. Ginsberg Hayutin.
Source: Materials: cotton and embroidery floss. Dimensions: 71 in. x 86 in. Markings:
   top right hand corner has 1 1/2 in. x 1/2 in. label cut outs at upper left and right
   hand side for head board; fabric is woven in a variation of a rib weave; color each of
   yellow and gray; hand-embroidered cotton butterflies and flowers from two shades
   of each color of embroidery floss - blue, pink, green and purple and single top 20 in.
   bordered with blue and black cotton embroidery thread; stitches used for
   embroidery: running stitch, chain stitch, French knot and back stitches; selvage
   edges left unfinished; lower edges turned under and finished with large gray running
   stitches made with embroidery floss.
Format: Epson Expression 836 XL Scanner with Adobe Photoshop version 5.5; 300 dpi;
   21-53K bytes. Available via the World Wide Web.
Coverage: —
Date Created: 2001-09-19 09:45:18; Updated: 20011107162451; Created: 2001-04-
   05; Created: 1912-1920?
Type: Image


     22 June 2003                  ALA 2003 / OAI-PMH          Tim Cole (t-cole3@uiuc.edu)
Excerpt of a metadata record
describing "American woven coverlet―
Description: Materials: Textile--Multi, Pigment—Dye; Manufacturing Process:
   Weaving--Hand, Spinning, Dyeing, Hand-loomed blue wool and white linen
   coverlet, worked in overshot weave in plain geometric variant of a checkerboard
   pattern.Coverlet is constructed from finely spun, indigo-dyed wool and undyed
   linen, woven with considerable skill. Although the pattern is simpler, the overall
   craftsmanship is higher than 1934.01.0094A. - D. Schrishuhn, 11/19/99 This
   coverlet is an example of early "overshot" weaving construction, probably dating
   to the 1820's and is not attributable to any particular weaver. -- Georgette
   Meredith, 10/9/1973

Source: —
Format: 228 x 169 x 1.2 cm (1,629 g)
Coverage: Euro-American; America, North; United States; Indiana? Illinois?
Date: Early 19th c. CE
Type: cultural; physical object; original



 22 June 2003                   ALA 2003 / OAI-PMH         Tim Cole (t-cole3@uiuc.edu)
    Implications
   Service providers
       Automatically normalize metadata encoding
        where possible (e.g., dates)
       Normalize for and co-locate by type / format
        where possible


   Metadata providers
       Create metadata for interoperability
       Consider more expressive schema – e.g.,
        Qualified DC, MARC
22 June 2003          ALA 2003 / OAI-PMH   Tim Cole (t-cole3@uiuc.edu)
Original interface
                Portal had two search
                 pages—simple (keyword)
                 and advanced.
Pilot study with student teachers

   23 users in honors-level C&I class
   Assignment: Use the site in preparing a lesson
    plan (high school social studies)
                     __________
   Introduced to ―aggregated metadata‖ concept
   Focus group interviews conducted
   Students’ papers examined
   Transaction logs analyzed


22 June 2003       ALA 2003 / OAI-PMH   Tim Cole (t-cole3@uiuc.edu)
    Results of initial user testing
1. Users expected all links pointed to digital objects
       Some records pointed to finding aids
       Some records pointed to collection’s web site
       Some records described analog objects

2. Users unable to make use of search results
       Simple searches produced 1000s of unranked results
       Advanced search (with limits) rarely used

3. Distinction between portal and data providers
  unimportant to users
    22 June 2003         ALA 2003 / OAI-PMH   Tim Cole (t-cole3@uiuc.edu)
      What does ―online access‖ mean?


   To librarian & curator




   To student teacher


      22 June 2003       ALA 2003 / OAI-PMH   Tim Cole (t-cole3@uiuc.edu)
     Response to test results

   EAD-derived records segregated

   Analog only collections excluded

   Categories of resource types reduced to 3:
        Images and Video
        Text, Sheet Music, and Websites
        Museums and Archival Collections



      22 June 2003        ALA 2003 / OAI-PMH   Tim Cole (t-cole3@uiuc.edu)
      Revised interface

   Simple keyword &
    advanced search
    put on one page

   Clarify ―online
    access‖

   Natural language in
    Boolean operators

      22 June 2003        ALA 2003 / OAI-PMH   Tim Cole (t-cole3@uiuc.edu)
Revised search results

                              Link goes to finding
                               aid or collection
                               page? ―Learn more.‖

                              Link displays object?
                               ―View item.‖

                              Subj/Desc expanded


22 June 2003   ALA 2003 / OAI-PMH    Tim Cole (t-cole3@uiuc.edu)
IMLS Digital Collections & Content

   Build a registry of all National Leadership Grant
    collections with digital content.
   Assist and guide NLG projects in making item-
    level metadata sharable using OAI.
   Build a repository and search & discovery tools
    for integrated access to the content of NLG
    collections (unique metadata schema?).
   Research best practices for sharing metadata
    about diverse digital content and for supporting
    the interests of diverse user communities.
22 June 2003        ALA 2003 / OAI-PMH   Tim Cole (t-cole3@uiuc.edu)
http://imlsdcc.grainger.uiuc.edu/
    CIC OAI metadata harvesting

   Univ. of Illinois at UC will host an OAI-PMH
    metadata harvesting service for 10 CIC libraries
   Project Goals (3 year experimentation phase)
       Improve access to selected resources at CIC libraries
       Advertise these resources (internally & externally)
       Prepare member institutions for future grant-
        mandated OAI-based resource sharing
       Serve as a useful testbed for experimentation with
        OAI-PMH, development of metadata best practices,
        usability and user needs testing, etc.

    22 June 2003         ALA 2003 / OAI-PMH   Tim Cole (t-cole3@uiuc.edu)
     Using OAI-PMH to Aggregate Metadata
     Describing Cultural Heritage Resources

http://dli.grainger.uiuc.edu/Publications/TWCole/ALA2003OAI/



              Timothy W. Cole (t-cole3@uiuc.edu)
            University of Illinois at Urbana-Champaign

								
To top