An Endless Frontier: Enable Biology Through Information Technology
• Biology 21 - the life science research of the 21st Century – is rich, and will become exceptionally rich, in diverse, complex data, while focusing on a system-level of understanding, enabling predictive capacity
– The language for understanding biology at a systems level will be bioinformatics/IT as calculus/math has been the language for understanding the physical sciences – a compelling partnership at the frontier between IT and all of the biological sciences will be created
• “Computing has changed biology forever; most biologists just don’t know it yet” – Michael Levitt • “Computational Biology is as essential for next Q Century as Molecular Biology was the last” – William McGinness
21st Century BIO-Cyberinfrastructure
Changing How Science is Done
Providing the Tools to Swim in the Rapid Current of Data
“Computers” now KNOW about Biology
Planetary Sciences Engineering CFD Earth Sciences Chemistry Industry
Biomolecular
Astronomy Materials Particle Physics
6/1/97 to 5/31/98 CTC, NCSA, PSC, SDSC
Number Scale (over size scale from Angstroms to Km)
Organisms
10 10 10
6
3
The Complexity of Biosystems Finite element
models
Discrete Automata models Evolutionary Processes
0
10 Cells 10 10 Biopolymers
6
Ecosystems and Epidemiology
Organ function
Electrostatic continuum models
3
Cell signalling
0
10 10 10
6
DNA replication Enzyme Mechanisms Protein Folding
3
Ab initio Quantum Chemistry
0
10 Atoms 10 10
6
Regions where Computational Modeling can be Employed Today vs Goals for Coverage
Homology-based Protein modeling
Empirical force field Molecular Dynamics First Principles Molecular Dynamics
3
0
10
-15
10
-12
10
-9
10
-6
10
-3
10
0
10
3
10
6
10
9
Time Scale (seconds)
Geologic & Evolutionary Timescales
Bioinformatics / IT for Life Science: Drinking from the Fire Hose in the Era of Data-rich, Genome-enabled Biology
Dynamic Form and Function: Characterizing Biological Mechanisms Across Multiple Scales
Genomes Gene Products
Structure & Function Pathways & Physiology
Populations & Evolution Ecosystems
Scientific Challenges
Data Integration Challenges Computational Challenges
Algorithmic Challenges
Crosscutting Themes Underlying Major IT Challenges for Genomeenabled Biological Science
• Specifying the Relationships of Molecular Sequence, Structure and Function • Bridging Vast Scales of Time, Space and Biological Organization • Understanding the Complexity of Living Systems [Summary, NSF BIO computing workshop]
Exemplar Research Challenges in the Life Sciences that require BIO Cyberinfrastructure
1. Full genome-genome comparisons 2. Rapid assessment of polymorphic genetic variations 3. Complete construction of orthologous and paralogous groups of genes 4. Structure determination of large macromolecular assemblies/complexes 5. Dynamical simulation of realistic oligomeric systems 6. Rapid structural/topological clustering of proteins, families 7. Prediction of unknown molecular structures; protein folding 8. Computer simulation of membrane structure and dynamic function 9. Simulation of genetic networks including the sensitivity of these pathways to component stoichiometry and kinetics 10. Integration of observations across scales of vastly different dimensions and organization to yield realistic, ecological and environmental models for basic biology and societal needs
Model Exists: Architecture To Support
a Biological Informatics Research Network
BIRN - Phase I - 2001-2002
UCSD
NIH Centers for
Bio Imaging and Computational Biology & NCRR Research Ctrs.
Form a National Scale Data Grid and Federate Multi-scale NeuroImaging Data from Centers with High Field MRI and Advanced 3D Microscopes
Harvard
Cal Tech
NSF NPACI W/SDSC
Cal-(IT)2
UCLA
“Deep Web”
Duke
Integrating Cyber Infrastructure to Link:
•Advanced Imaging Instruments •Data Intensive Computing •Multi-Scale Brain Databases
Test Beds for an NSF BIO Cyberinfrastructure
Sites
National-Scale, Testbed in Cyberinfrastructure: Federating Multi-Scale, Multi-Modal NeuroImaging Data
Could Expand Readily; Examples: Plant & Microbe Genome, Cellular Level; Tree of Life; also Indefinite Expansion to Many Laboratories
Model-based Integration of Multi-resolution Data: Development of a Cell Centered Database
Parallel computing resources for tomography Spatial database of rat brain anatomy
Models
Neuronal models
Database federation Imaging databases
Large scale 3D EM reconstructions
Cells and tissues
Modeling cellular microdomains
Cellular processes Cellular microdomains BIRN IT System Maryann Martone
Macromolecular distributions
Correlated LM and EM
Amarnath Gupta
Bertram Ludaescher
Hi-throughput tomography
BIRN Information Integration
What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Integrated View
Integrated View Definition
Mediator
Wrapper
Wrapper
Wrapper
Wrapper
Web
protein localization
morphometry
neurotransmission
CaBP, Expasy
PDB & Genome enabled Biology – Using Structure to Understand Function
Accelerated Drug Development Individualized Medicine Productive, Healthy Citizens
Environmental Remediation Biofuels, Biocatalysts Improved Agriculture
DNA Sequence Implies Structure Implies Function
DNA Sequence Provides Protein Sequence
CA Synchrotron Facilities Provide 3-D Protein Structure Basis for 21st Century Medicine, Sustainable Development: Enhanced U.S. Competitiveness, Environmental Quality
A Cyberinfrastructure for BIO is Needed to Extract Implicit Genome Information
PDB Status:Numbers and Complexity
(a) myoglobin (b) hemoglobin (c) lysozyme (d) transfer RNA (e) antibodies (f) viruses (g) actin (h) the nucleosome (i) myosin (j) ribosome
David Goodsell, TSRI
A Cyberinfrastructure for Century Biological Sciences
st 21
• All BIO research endeavors will require this Foundation – open access, opportunities. • BIO should define the Architecture, initiate construction, at Highest Priority. • Create Test beds modeled on BIRN, GryPhyn; describe LTER, NEON options. • BIO scientists will extend the framework in exciting ways beyond expectation.
Obvious Opportunities: Early BIO Test Bed Options
• • • • • • • • • • • • Interconnecting Extant Database Activities Evo-Devo/Devo-Evo; Networks; Phylogeny Systems Biology Plant Genomes Microbial Genome Sequencing Ecology of Infectious Diseases Tree of Life LTER Biocomplexity in the Environment NEON Frontiers in Integrative Biological Research Research Coordination Networks