D4_ SKOS and HIVE—Enhancing the Creation_ Design and Flow of
Shared by: pptfiles
-
Stats
- views:
- 12
- posted:
- 2/23/2012
- language:
- pages:
- 30
Document Sample


D4: SKOS and HIVE—Enhancing
the Creation, Design and Flow of
Information
Speakers: Hollie White
Jane Greenberg
Coordinator: Alan Keely
Overview
• HIVE—Helping Interdisciplinary
Vocabulary Engineering
– Motivation—Dryad repository
• HIVE—Goals, status, and design
• A scenario
• HIVE for Law Library, repositories, etc.
• Challenges
– Technical and social
• Conclusion and questions
HIVE model
<AMG> approach for integrating discipline CVs
Model addressing C V cost, interoperability, and usability
constraints (interdisciplinary environment)
23/02/2012 Titel (edit in slide master)
3
Motivation
~ Surveyof400 evolutionary biologist: 48 %
based on other data; 78% data not deposited
~ Evolutionary
biologists use
Ecology Paleontology Physiology published data more
Systematics Genomics frequently than they
Population genetics…. are depositing it
themselves!
5
Partner Journals
American Society of Naturalists
American Naturalist
Ecological Society of America
Ecology, Ecological Letters, Ecological Monographs, etc.
European Society for Evolutionary Biology
Journal of Evolutionary Biology
Society for Integrative and Comparative Biology
Integrative and Comparative Biology
Society for Molecular Biology and Evolution
Molecular Biology and Evolution
Society for the Study of Evolution
Evolution
Society for Systematic Biology
Systematic Biology
Commercial journals
Molecular Ecology
Molecular Phylogenetics and Evolution
Vocabulary needs for Dryad
• Vocabulary analysis
– 600 keywords, Dryad partner journals
• Vocabularies: NBII Thesaurus, LCSH, the Getty’s TGN,
ERIC Thesaurus, Gene Ontology, IT IS (10 vocabularies)
• Facets: taxon, geographic name, time period, topic, research
method, genotype, phenotype…
• Results
431 topical terms, exact matches
– NBII Thesaurus, 25%; MeSH, 18%
531 terms (research method and taxon)
– LCSH, 22% found exact matches, 25% partial
• Conclusion: Need multiple vocabularies
Goals, status, and design
HIVE...as a solution
Address CV (controlled vocabulary) cost,
interoperability, and usability constraints
• COST: Expensive to create, maintain, and
use
• INTEROPERABILITY: Developed in silos
(structurally and intellectually)
• USABILITY: Interface design and
functionality limitations have been well
documented
Relevance to the law library community?
• Orphaned data (more of a Dryad issue)
• More important, interdisciplinary needs
• COST (create, maintain, and use)
• INTEROPERABILITY
• USABILITY
Three phases of HIVE:
HIVE Goals 1. Building HIVE
- Vocabulary Development
− Automatic metadata - Server preparation
- Primate Life Histories Working
generation approach that Group
dynamically integrates - Wood Anatomy and Wood
discipline-specific controlled Density Working Group
vocabularies encoded with
the Simple Knowledge 2. Sharing HIVE
Organisation System - Continuing education
(SKOS) (empowering information
• Provide efficient, affordable, professionals)
interoperable, and user
friendly access to multiple
vocabularies during metadata
creation activities 3. Evaluating HIVE
• A model that can be replicated - Examining HIVE in Dryad
—> model and service
HIVE Partners Advisory Board
• Jim Balhoff, NESCent
Vocabulary Partners • Libby Dechman, LCSH
• Library of Congress: • Mike Frame, USGS
LCSH • Alistair Miles, Ok
• William Moen, University of North
• the Getty Research Texas
Institute (GRI): TGN • Eva Méndez Rodríguez,
(Thesaurus of University Carlos III of Madrid
Geographic Names ) • Joseph Shubitowski, Getty
Research Institute
• United States Geological • Ed Summers, LCSH
Survey (USGS): NBII • Barbara Tillett, Library of
Thesaurus Congress
• Kathy Wisser, Simmons
Agrovoc Thesaurus
• Lisa Zolly, USGS
WORKSHOPS HOSTS: Columbia
Univ.; Univ. of California, San
Diego; Univ. of North Texas;
Universidad Carlos III de Madrid,
Madrid, Spain
HIVE Construction
• HIVE stores millions of concepts from different
vocabularies, and makes them available on the Web
by a simple HTTP
– Vocabularies are imported into HIVE using SKOS/RDF
format
• HIVE is divided in two different modules:
1. HIVE Core
– SKOS/RDF storage and management (SESAME/Elmo)
– SMART HIVE: Automatic Metadata Extraction and Topic
Detection (KEA++ and MAUI)
– Concept Retrieval (Lucene and MG4J)
2. HIVE Web
– Web user Interface (GWT—Google Web Toolkit)
– Machine oriented interface (SOAP and REST)
<rdf:RDF>
SKOS
<rdf:Description rdf:about="http://thesaurus.nbii.gov/nbii#Wood-pulp">
<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
<skos:prefLabel>Wood pulp</skos:prefLabel>
<skos:altLabel>Pulp (wood)</skos:altLabel>
<skos:broader rdf:resource="http://thesaurus.nbii.gov/nbii#Wood”/>
<skos:related rdf:resource="http://thesaurus.nbii.gov/nbii#Paper”/>
<skos:related rdf:resource="http://thesaurus.nbii.gov/nbii#Paper-industry-
wastes”/>
<skos:related rdf:resource="http://thesaurus.nbii.gov/nbii#Pulp-mills”/>
<skos:related rdf:resource="http://thesaurus.nbii.gov/nbii#Sawdust”/>
<skos:inScheme rdf:resource="http://thesaurus.nbii.gov/nbii#"/>
<skos:scopeNote>LSC Life Sciences</skos:scopeNote>
</rdf:RDF>
A scenario
Meet Amy
• Amy Zanne is a botanist.
• Like every good scientist,
she publishes.
Meet Amy
• Amy Zanne is a botanist.
• Like every good scientist, she publishes.
• She deposits data in Dryad.
Law library/data repositories
• http://www.law.harvard.edu/library/researc
h/databases/major.html
• http://www.digitalcurrent.com/legal_webho
sting.aspx
Challenges
• Building vs. doing/analysis
– Source for HIVE generation, beyond abstracts
• Combining many vocabularies during the indexing/term
matching phase is difficult, time consuming, inefficient.
– NLP and machine learning offer promise
• Interoperability = dumbing down
– ontologies
• Proof-of-concept/ illustrate the differences between HIVE
and other vocabulary registries (NCBO and OBO
Foundary)
• General large team logistics, and having people from
multiple disciplines (also the ++)
Conclusion
• Vocabularies will enrich Dryad data description,
and assist with access, use, reuse, etc…
• Nothing novel, but infrastructure is supportive,
finally…
• Dryad and HIVE are real-world applications using
Semantic Web technology
Links
• HIVE
– http://ils.unc.edu/mrc/hive/
• Metadata Research Center <MRC>
– http://www.ils.unc.edu/mrc/
• Dryad
– http://datadryad.org/
• National Evolutionary Synthesis Center (NESCent)
– http://www.nescent.org/index.php
The Dryad Data Repository
23/02/2012
30
Get documents about "