Docstoc

Findings

Document Sample
Findings Powered By Docstoc
					  Findings from the Mellon
Metadata Harvesting Initiative

               Martin Halbert,
     Joanne Kaczmarek, and Kat Hagedorn
            Monday 18-Aug-2003
                ECDL 2003
                            Overview

• Highlights of the Mellon projects
• Findings regarding metadata harvesting
• Questions about the context of metadata and
  metadata harvesting
• Next steps, subsequent research projects




  ECDL 2003 – Trondheim, Norway    Mellon Metadata Initiative – Slide 2
Highlights of the Projects
    Andrew W. Mellon Foundation

• Mellon is a major U.S. private philanthropic
  foundation that has been involved with the OAI-PMH
  from the beginning
• Sought to foster projects exploring how the OAI-PMH
  could be used by libraries and other organizations
  supporting research to make metadata concerning
  scholarly collections more visible to users
• Funded seven projects in 2001 with total of US $1.5M



  ECDL 2003 – Trondheim, Norway   Mellon Metadata Initiative – Slide 4
                        Seven Projects

1.     University of Illinois at Urbana-Champaign
2.     The University of Michigan (OAIster)
3.     Emory University (MetaArchive)
4.     SOLINET / ASERL (AmericanSouth)
5.     The Research Libraries Group (RLG)
6.     University of Virginia
7.     (Woodrow Wilson International Center for Scholars
       at the Smithsonian)


     ECDL 2003 – Trondheim, Norway   Mellon Metadata Initiative – Slide 5
              Highlights of Projects

• OAIster and UIUC Repository harvested millions of
  records and developed sophisticated search tools
• Emory and SOLINET MetaScholar projects harvested
  focused collections, enhanced existing OSS
  harvesting tools, formed teams of scholars and
  librarians to study the process and context of
  metadata harvesting for research portals
• Other projects examined internal uses of OAI-PMH
  for cultural scholarship


  ECDL 2003 – Trondheim, Norway   Mellon Metadata Initiative – Slide 6
Findings Concerning
Metadata Harvesting
       Metadata Harvesting Findings:
       Slow Adoption of the OAI-PMH
• Most institutions with cultural materials collections
  have not yet implemented the protocol in the 2002-
  2003 period
• This is due to many reasons: lack of institutional
  priority, insufficient technical staff, little organizational
  understanding of the benefits of the protocol
• However, both Emory and Illinois found that
  centralized regional centers providing relatively
  modest OAI technical expertise to other libraries was
  very effective in fostering adoption of the protocol

  ECDL 2003 – Trondheim, Norway         Mellon Metadata Initiative – Slide 8
    Metadata Harvesting Findings:
 Problems with Institutional Metadata
• Wide variations in implementation of Unqualified
  Dublin Core (UDC) descriptive metadata elements
• Duplication of records between collaborating
  institutions, difficult to de-dupe due to lack of unique
  inter-institutional identifiers
• Format incompatibilities/collisions, especially
  between Encoded Archival Descriptions (EAD) and
  UDC record perspectives
• Inconsistent access restrictions to content leads to
  confusion by users

  ECDL 2003 – Trondheim, Norway       Mellon Metadata Initiative – Slide 9
   Metadata Harvesting Findings:
 Problems with Inst. Metadata (cont.)
• No controlled vocabulary in effect for any UDC field,
  nor would this make sense for most fields
• Although universal systems such as US Library of
  Congress Subject Headings (LCSH) exist, they are
  not granular enough for most repositories
• No uniform mechanism in place to express dates or
  locations (coverage), which can mean many things in
  UDC, and no authority control for creator field
• 96% of institutional repositories using Eprints
  software do not use standard controlled vocabularies

  ECDL 2003 – Trondheim, Norway   Mellon Metadata Initiative – Slide 10
       Metadata Harvesting Findings:
       Need for Metadata Gardening
• The best way to make metadata effective cross-
  institutionally is to coordinate the entire life cycle of
  metadata production
• Uncoordinated harvesting is relatively easy to do, but
  the resulting metadata aggregation then suffers from
  all the problems previously described and needs
  remediation (which may be effectively impossible)




  ECDL 2003 – Trondheim, Norway      Mellon Metadata Initiative – Slide 11
   Metadata Harvesting Findings:
 Need for Metadata Gardening (cont.)
• Coordinated gardening of metadata is the long-
  standing solution to this problem
• Examples include virtually any community of
  information users that have come up with consistent
  standards for the metadata they share
• The problem is that new information communities are
  still forming, having been enabled by the OAI-PMH
• Mature information communities are mature precisely
  because they have well-understood standards and
  practice in using and sharing information

  ECDL 2003 – Trondheim, Norway   Mellon Metadata Initiative – Slide 12
Findings Concerning
  Metadata Context
                  Metadata Context

• Metadata without a context is useless, much like
  encrypted information without the key
• Metadata is considered useful precisely because it is
  created in particular contexts by particular
  communities
• OAI-PMH only prescribes UDC format
• UDC is some context, and is (probably?) better than
  nothing, but many groups inaccurately thought that it
  was enough context to build robust discovery
  systems around

  ECDL 2003 – Trondheim, Norway   Mellon Metadata Initiative – Slide 14
          Metadata Context Findings:
             Recovering Context
• Different opinions among the projects over how to recover
  context for aggregated heterogeneous metadata
• OAIster made some efforts to normalize some UDC metadata
  fields after harvesting (UDC type field)
• Illinois developed mechanism for displaying original EAD
  context of records disaggregated from finding aid series
  information
• Emory/SOLINET AmericanSouth has a team of nationally
  renowned scholars studying how online scholarship can
  contextualize metadata and vice versa



  ECDL 2003 – Trondheim, Norway       Mellon Metadata Initiative – Slide 15
       Metadata Context Findings:
      Harvesters vs. other Discovery
                Systems
• How do we understand harvesters vs. online
  catalogs, Google, and commercial databases?
• How do we articulate the difference to users?
• What information should we aggregate and make
  searchable? Metadata and crawled web content?
  Very different information realms need to be bridged
  through new federated search mechanisms




  ECDL 2003 – Trondheim, Norway   Mellon Metadata Initiative – Slide 16
   Next Steps and
Subsequent Research
                Next Steps for Emory,
                Michigan, and Illinois
• All of these projects learned a great deal during the
  Mellon Metadata Harvesting Initiative that has
  informed their subsequent planning for new services
• All of these projects are in the process of being
  mainstreamed using various strategies
• All of these projects continue to grapple with
  metadata quality and context issues




  ECDL 2003 – Trondheim, Norway    Mellon Metadata Initiative – Slide 18
                 Next Steps: Illinois

• Additional research is being undertaken on the integration of
  EAD and OAI
• Beginning a three year collaboration with the research libraries
  of other Committee on Institutional Cooperation (CIC)
  institutions to study the potential of OAI-PMH to facilitate
  resource sharing
• NSF grant to develop digital libraries for scientific communities
  in connection with National Science Digital Library (NSDL)
• Institute for Museum and Library Services (IMLS) grant to
  develop an OAI-based registry of IMLS projects



  ECDL 2003 – Trondheim, Norway            Mellon Metadata Initiative – Slide 19
              Next Steps: Michigan

• Working on further techniques for metadata
  remediation
   – De-duplication
   – Normalization of more UDC fields
   – Further tailoring of metadata for research purposes
• Exploring use of OAIster in connection with campus
  courseware initatives




  ECDL 2003 – Trondheim, Norway      Mellon Metadata Initiative – Slide 20
                 Next Steps: Emory

• Undertaking further modeling of scholarly portals
  based on metadata harvesting, with application to an
  international Irish Literature portal
• New grant from the Mellon Foundation to build on
  previous projects
   – Experiments in semantic clustering of metadata using
     support vector machines
   – Exploration of combining metadata harvesting and web
     crawling
   – Developing frameworks for federating loosely-coupled
     digital library components

  ECDL 2003 – Trondheim, Norway    Mellon Metadata Initiative – Slide 21
                        Appreciation

• Enormous thanks go to the Andrew W. Mellon
  Foundation for advancing the understanding of
  metadata harvesting applications through these
  projects
• Mellon continues to be a driving force in the United
  States and internationally for research into digital
  library experiments benefiting scholarly
  communication



  ECDL 2003 – Trondheim, Norway    Mellon Metadata Initiative – Slide 22
                             Contacts


• Martin Halbert (mhalber@emory.edu) 404-727-2204

• Kat Hagedorn (khage@umich.edu)

• Joanne Kaczmarek (jkaczmar@uiuc.edu)




  ECDL 2003 – Trondheim, Norway     Mellon Metadata Initiative – Slide 23

				
DOCUMENT INFO