XML-based Data Management Support for
Biomedical Applications Ohio State University
Comprehensive Cancer Center &
Department of Biomedical Informatics
Multiscale Computing Laboratory
Scott Oster, Stephen Langella, Shannon Hastings, www.multiscalecomputing.org
Tahsin Kurc, Joel Saltz www.projectmobius.org
Large Scale Biomedical Image Analysis
Mobius State-of-the-art biomedical imaging studies make use very
large image datasets, potentially at multiple institutions. It is
necessary to carry out feature-based analysis of both
• Mobius provides a set of generic grid services and protocols to support:
morphological and functional data and link this information
• distributed creation, versioning, and management of data models and to clinical data.
• on demand creation of databases, federation of existing databases, and Functional Imaging of Tumors: Use of static and dynamic
• querying of data in the Grid. image information to determine anatomic microstructure and
• Its design is motivated by the requirements of Grid-wide data access and to characterize physiological behavior.
Digitized Microscopy: Remote viewing and analysis of
Global Model Exchange (GME) Federated Ad hoc Storage Service (Mako) microscopy specimens.
• Publish, Version, Retrieve, and • Federated Framework for Managing Data
Query Schemas • Data indexed from GME-published One of the main challenges is that a typical study may
• Schema Discovery schemas involve 1000s of images distributed across multiple sites.
• Hierarchical service instances • Management of Data: Store, Update, Large sizes of image data (up to 20-50GB per image)
• Each has an authority (excluding Retrieve, Delete, Query (via XPath) represent a significant challenge in storing, querying, and
root) • Provides an XML Realization for an sharing digitized microscopy images.
• Each is the authority of a set of underlying data resource
namespaces Mako Architecture
Sample GME Usage TCP Listener GSI Listener
org gov XML XML
GME GME GME
gridforum.org osu.edu nih.gov
Publish NIH standards
for representing Supported Interfaces
GME GME GME
Get Service Data
Publish GGF SDE definitions
OSU Grid Services
receive GGF grid protocol and
NIH data models from the
GME using the protocol.
File System Exist XML DB
An Image Archival and Analysis System:
Client front-end implementing the functionality to submit
queries in a uniform way against distributed image databases.
Synthesis of Information for Phenotype-Genotype Support for extensible metadata schema for images that can
Analyses. be used to represent 2D, 3D, and time dependent images with
Genotype-phenotype correlation analysis can be used to identify polymorphism in optional application specific metadata.
candidate genes that correlate with disease related phenotypes and to help in •Virtualized and Federated Data Access. Multiple image
achieve a better understanding of complex diseases such as Coronary Artery servers can be grouped to form a collective which can be
Disease (CAD). Such analysis can involve integrating SNP, Gene, and Phenotypic queried as if it were a single, centralized server entity.
data from public repositories and local datasets, BLAST searches, phylogenetic
analysis. •Active Storage. Invocation of user-defined procedures is
supported on ensembles of images in a distributed
Support for creation of materialized views (or local caches) of external data environment.
sources on storage clusters.
• Web spiders to download data from external data sources.
• Currently, spiders for Genbank, BLAST, and MatchMiner.
data types and image
Unified, extensible data models for SNP data, Phylogenetic analysis output, strains
datasets conforming to a
and phenotype data from mouse phenome database, BLAST and MatchMiner
given schema can be
output. Information Warehouse Mobius Grid Service
as custom databases at
GenBank Blast MatchMiner
Information data warehouse data exposed as an xml grid data service
Website Website Website
(data grid service framework)
GenBank Blast MatchMiner
Scraper R unner Scraper
(xml data service virtualization)
(xml view of relational data)
List of SN PS
Integration of data Dan Janies, Biomedical Informatics
Mako Mako Mako
collected in basic Wolfgang Sadee, Pharmacology
research with clinical lab Gustavo Leone, Human Cancer
and outcome data is Genetics Program
Linking to clinical outcome data in needed to translate basic
Lab R esearcher
Enterprise data warehouses through Michael Knopp, Radiology
biomedical research to a Tony Pan, Biomedical Informatics
Mobius and XQuark Bridge. successful clinical
Kun Huang, Biomedical Informatics