Qualified Dublin Core Using RDF for Sci-Tech Journal Articles
DC-2001 International Conference on Dublin Core and Metadata Applications, October 22-26, 2001 National Institute of Informatics, Tokyo, Japan http://dli.grainger.uiuc.edu/Publications/DC2001/
Thomas G. Habing (thabing@uiuc.edu) Timothy W. Cole (t-cole3@uiuc.edu) William H. Mischo (w-mischo@uiuc.edu) University of Illinois at Urbana-Champaign
History and Objectives of the Testbed
• Funded 1994-98 under DLI-I (NSF/NASA/DARPA). • Continued 1998-2001 under CNRI‟s D-Lib Test Suite. • Construct large-scale, multi-publisher, markup-based full-text journal testbed. • Investigate processing, indexing, normalization, retrieval, rendering and linking. • Study end-user searching behavior and needs.
2
Description of Testbed
• Testbed contains 65,000 articles from 50 journals. • Received from publishers as SGML (various DTDs). • Converted to well-formed XML. • Content & support from AIP, APS, ASCE, IEE, ASM, ACM, Elsevier. • Additional support from IEEE, NRL, NTT Learning Systems.
3
Usage of Metadata in Illinois Testbed
• Facilitate resource discovery across heterogeneous sources through normalization. • Common, easily displayable search results. • Add value to the original object: reference linking, links to alternate formats and A & I services. • Data exchange, as with Open Archive Initiative Protocol for Metadata Harvesting (OAI PMH).
4
5
6
Metadata Extraction Process
• Metadata is derived from full-text using XSLT. • One-to-one mappings.
– select=“//titlegrp/title” maps to .
• Complex mappings:
– Tables of Contents, Literal Markup such as MathML.
• Advanced XSLT techniques:
– JavaScript functions are used for some formatting. – The document(url) function is used to merge XML from other sources, such as CrossRef, into the metadata.
• See paper for sample XSLT code.
7
Other uses of XSLT
• „Dumb-down‟ to unqualified DC. • Transform metadata to HTML for display. • Generate RDF triples for use in a RDBMS.
8
„Dumb-down‟ XSLT
… …
9
Local Extensions to DCQ
• Qualified DC was not adequate for our needs. • Various DC working groups provided some guidance. • We extended DCQ in three areas:
– Citation-related extensions. – Agent-related (creator) extensions. – Type and encoding scheme extensions.
10
Citation-related Extensions
• A. Author. "A Title" Some Jrnl. … • genre=article&aulast=Author… • 1234-5678 Some Journal…
11
Agent-related Extensions
• Based on DC Agent Qualifiers, Working Draft. 1999. • Author, A. N. Big University
12
Type and Encoding Extensions
• Extensions to DCMI Type Vocabulary. – – • Additional Encoding Schemes. – PACS, ACMCCS, ISSN, CODEN, ACM_JRNL_CODE.
13
Conclusions
• Using DCQ/RDF for sci-tech journal articles is viable • Steep learning curve for RDF • „Dumbing-down‟ DCQ/RDF is complex
– Cannot ignore non-DC tags, RDF Schema is required
• DCQ is missing many properties and types required for complete serials descriptions • Utility of RDF remains uncertain
14