Embed
Email

Managing Scientific Data

Document Sample

Shared by: dffhrtcv3
Categories
Tags
Stats
views:
1
posted:
11/12/2011
language:
English
pages:
47
Managing Scientific Data

Holly Miller

MBLWHOI Library, Marine Biological

Laboratory

MBLWHOI Library

Marine Biological Laboratory

Woods Hole Oceanographic Institution



•Boston University Marine

Program

•MIT/WHOI Joint Program

•Sea Education Association

•NOAA National Marine

Fisheries Service

•the United States Geological

Survey







2

Data

Infrastructure

• 65 Virtual Servers over 17 Physical Servers

• Combined totals:

–312GB RAM

–272 Processor cores

–196TB of storage









13

We’ve Got Data

What to do with it?

Unstructured Natural

Data Language

Acquisition Processing









Structured Data

Acquisition









Data XML

REST

Warehouse

Data Considerations

•Accessible



•Easy to find and retrieve



•Quality



•Analysis tools



•Visualization tools



•Potential for reuse

18

Three

Examples:

LigerCat

Cell Image Library

WHOAS repository

LigerCat

Search PubMed Database









20 million citations, biomedical literature

Medical Subject Headings ~= Key words



22

What can you search for?

• Concepts ‐ Alzheimer disease, vitamin D, mitochondria

• Author names ‐ Borisy GG

• Organism names ‐ Mus musculus

• Institutions ‐ Marine Biological Laboratory









igerCat: Literature and Genomics Research Catalogue ligercat.ubio.org

Results are displayed in a ‘tag’ cloud









igerCat: Literature and Genomics Research Catalogue ligercat.ubio.org

Histogram of publication date









igerCat: Literature and Genomics Research Catalogue ligercat.ubio.org

MeSH clouds in EOL







• 1,360,665 total species processed65,630 

species returned PubMed 

articles5,544,635 articles were analyzed









igerCat: Literature and Genomics Research Catalogue ligercat.ubio.org

Cell Image Library

Cell Image Library



• Collaboration (American Society for Cell 

Biology, Harvard, and others)

• Resource for cell image data 

• Metadata added by experts

• Ontology terms used for annotation

Workflow

Woods Hole Data

Repository

for Data Supporting Published

Articles

Archiving data associated with

scientific journal articles









MBLWHOI Library mblwhoilibrary.org

Scientific Article

36

Workshops

Woods Hole, April 2009

Paris, April 2010

• Stakeholders included scientists, data

managers and librarians

• Data must be discoverable, citeable and

available on the internet

• Resources, standards and workflows

must be defined to support the publisher

and funding agency mandates

• Action item - Library develop process to

MBLWHOI Library mblwhoilibrary.org

Metadata Schema

• Dublin Core alone not enough

• Also need Darwin Core and

“Woods Hole Core”









MBLWHOI Library mblwhoilibrary.org

Persistent Identifiers

Digital Object Identifiers

(DOI’s)

• Library has existing relationship with

CrossRef to assign DOI’s

• Provides link from figure in article to

data





MBLWHOI Library mblwhoilibrary.org

Cultural Shift

• Early discussions often ended with

my data is too complicated, files too

large, etc.



• Growing recognition that

transparency is important – that

means make data available



• Just do it!

MBLWHOI Library mblwhoilibrary.org

Ongoing Challenges

•Researcher concerns regarding reuse and

misuse of data

•Proprietary file types

•Responsibility for quality control of data

•Additional work for authors





MBLWHOI Library mblwhoilibrary.org

Summary

Summary

• Scientific data is heterogeneous

• Data management is complicated









43

Data Considerations

•Accessible



•Easy to find and retrieve



•Quality



•Analysis tools



•Visualization tools



•Potential for reuse

44

Keys for Success

• Sustainability ‐ Always in 

development

• Infrastructure ‐ Strong foundation

• User Experience ‐ Easy and 

beautiful



45

Lessons Learned

• Modular, reusable architectures

• Robust, flexible infrastructures

• Data standards compliance

• Structured processes

• Clear communication



46

Thank you!

Lisa Raymond

Ann Devenish

Ryan Schenk

John Hufnagle

Anthony Goddard



Funding:

George F. Jewett Foundation

Ellison Medical Foundation

NIH

NLM 47



Related docs
Other docs by dffhrtcv3
Chromosomal Miss-Segregation and DNA Damage
Views: 16  |  Downloads: 0
Christmas
Views: 16  |  Downloads: 0
Christmas Party Counting
Views: 15  |  Downloads: 0
Christmas dishes
Views: 14  |  Downloads: 0
CHRISTIAS FOR BIBLICAL ISRAEL or CFBI
Views: 16  |  Downloads: 0
Christian Ethics Living a Responsible Life
Views: 16  |  Downloads: 0
Christian Duty - Seymour Church of Christ
Views: 16  |  Downloads: 0
Chp 9 Power Point 08-09
Views: 15  |  Downloads: 0
Choose Your Own Adventure 2
Views: 16  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!