PLUS: DOCK THIS: IN SILICO DRUG DESIGN FEEDS DRUG DEVELOPMENT Summer 2007 contents ContentsSummer 2007 Summer 2007 Volume 3, Issue 3 ISSN 1557-3192 Executive Editor FEATURES David Paik, PhD Managing Editor 8 Imaging Collections: How They’re Stacking Up BY MEREDITH ALEXANDER KUNZ Katharine Miller Science Writers Katharine Miller Louisa Dalton Matthew Busse, PhD Dock This: In Silico Drug Design Feeds Drug Development 20 BY KRISTIN COBB, PhD Meredith Alexander Kunz Kristin Cobb, PhD Community Contributors David Paik, PhD DEPARTMENTS Mia Markey, PhD 1 FROM THE EDITOR: THE ACTIVE TRANSPORT OF IDEAS Layout and Design BY DAVID PAIK, PhD Affiliated Design Printing 2 NEWSBYTES Advanced Printing BY KATHARINE MILLER, LOUISA DALTON, AND MATTHEW BUSSE, PhD Editorial Advisory Board • Aquaporin Simulations De-Bunk Gas Exchange Assumptions Russ Altman, MD, PhD • Parkinson’s Culprit Modeled Brian Athey, PhD • Clustering Without Limits Andrea Califano, PhD Valerie Daggett, PhD • Computer Vision That Mimics Human Vision Scott Delp, PhD • Nature vs. Nurture In Silico Eric Jakobsson, PhD • Simulating Populations With Complex Diseases Ron Kikinis, MD Isaac Kohane, MD, PhD Paul Mitiguy, PhD Mark Musen, MD, PhD 31 SIMBIOS NEWS: Tamar Schlick, PhD IN THE (PROTEIN) LOOP Jeanette Schmidt, PhD BY KATHARINE MILLER Michael Sherman Arthur Toga, PhD Shoshana Wodak, PhD 32 UNDER THE HOOD: MUTUAL INFORMATION John C. Wooley, PhD BY CHIH-WEN KAN AND MIA K. MARKEY, PhD For general inquiries, subscriptions, or letters to the editor, visit our website at www.biomedicalcomputationreview.org Office Biomedical Computation Review Stanford University 318 Campus Drive Clark Center Room S231 Stanford, CA 94305-5444 33 PUTTING HEADS TOGETHER: Biomedical Computation Review is pub- lished quarterly by Simbios National Center for CONFERENCES/SYMPOSIA Biomedical Computing and supported by the National Institutes of Health through the NIH 34 SEEING SCIENCE: REMODELING Roadmap for Medical Research Grant U54 GM072970. Information on the National Centers WITH CURVATURE for Biomedical Computing can be obtained from http://nihroadmap.nih.gov/bioinformatics. The NIH program and science officers for Simbios are: Peter Lyster, PhD (NIGMS) COVER ART BY Jennie Larkin, PhD (NHLBI) SARA L. MALLOURE OF AFFILIATED DESIGN Jennifer Couch, PhD (NCI) Semahat Demir, PhD (NSF) Peter Highnam, PhD (NCRR) Jerry Li, MD, PhD (NIGMS) Richard Morris, PhD (NIAID) Grace Peng, PhD (NIBIB) David Thomassen, PhD (DOE) Ronald J. White, PhD (NASA/USRA) Jane Ye, PhD (NLM) Yuan Liu, PhD (NINDS) BIOMEDICAL COMPUTATION REVIEW Summer 2007 www.biomedicalcomputationreview.org from the editor From theEditor BY DAVID PAIK, PhD The Active Transport of Ideas ow ideas spread gets at the very H fabric of scholarly research and has been studied from many different angles. tion of the innovation and Many studies examine confirmation of the value person-to-person connec- of the innovation. tivity in social networks. Although broadly meant to Within a social network, describe the cultural spread of the average path length ideas and technology, it applies well in the narrower context between any two people is a key of academic research. While the last four stages are well cov- concept. By asking participants in Omaha or ered by traditional research activities, it is the initial stage of Wichita to mail chain letters that would get closer to becoming aware of new ideas from far afield that is often the selected recipients in Boston, Milgram’s classic 1967 small rate limiting factor and the least formalized in research. world experiment demonstrated the six degrees of separa- As a great tion concept. Movie buffs have created a board game using believer in the this concept called the Six Degrees of Kevin Bacon and those interested in mathematical genealogy have adopted power of cross fer- A foray into tilization, I think Erdös Numbers linking researchers by co-authorship to the prolific mathematician Paul Erdös. that diffusion is too passive a some areas that However, a small world is not necessarily a robust world. In addition to path lengths, the connectedness metaphor; I prefer instead to think in may seem off between different parts of the social network is an impor- terms of the active tant measure. A recent Journal of the American Medical transport of ideas topic can provide Informatics Association paper by Bradley Malin, PhD, and and places where I Kathleen Carley, PhD, examines the connection can search out a little dose of between editorial boards of medical informatics and bioin- sources that facili- formatics journals to describe the fragility of links between these two sister fields. tate long range hybrid vigor to transport. There are also many ways to examine the spread of ideas more broadly. The Rogers theory of diffusion of innovation I’ve recently one’s work. found inspiration states that depending on when they adopt new ideas, people for orthogonal form a bell curve as either innovators, early adopters, early thinking from sev- majority, late majority or laggards and that the innovation eral unconventional sources. The TED (Technology, penetration forms an S curve over time. The five stages are Entertainment, Design) Conference features a diverse set awareness of the innovation, persuasion of the value of the of inspiring speakers and is podcasted on the web. Edge innovation, decision to adopt the innovation, implementa- Foundation is a web-based publication that includes the World Question Center annually featuring a grand yet sim- ple question asked of numerous notable scientists. On the DETAILS more focused topic of biomedical computation, the NIH Biomedical Computing Interest Group hosts webcast semi- Technology, Entertainment, Design (TED) Conferences: nars, book clubs, tutorials and brainstorming events. http://www.ted.com Although things are changing, academia is still ham- pered by the inertia of traditional boundaries between dis- Edge Foundation: http://www.edge.org ciplines that form unintentional energy barriers against the diffusion of ideas. Just as a retreat or a sabbatical can NIH Biomedical Computing Interest Group: provide a refreshing perspective, a foray into some areas http://www.nih-bcig.org that may seem off topic can also provide a little dose of hybrid vigor to one’s work. ■ www.biomedicalcomputationreview.org Summer 2007 BIOMEDICAL COMPUTATION REVIEW 1 NewsBytes Aquaporin Simulations exchange experimentally for about ten years. To him, aquaporins are a likely De-Bunk Gas Exchange suspect for gas conduction because they Assumptions exist in places where oxygen must go in Biologists have long taken gas and carbon dioxide must come out. For exchange for granted, assuming that example they are plentiful in cells that gases simply seep through the cell’s lipid line the lung, in red blood cells, and in membrane. Since 1998, however, evi- astrocytes—cells at the blood-brain barri- dence has been building that gases er. But it’s very hard to measure small might also be exchanged through pores changes in oxygen concentration at the created by specialized proteins. surface of a membrane experimentally. Now molecular dynamics simulations So Tajkhorshid’s team pitched in of aquaporins have weighed in on the with molecular dynamics simulations. question. The result: “It’s now well Aquaporins occur in groups of four established that these proteins can con- (tetramers), with four pores that con- duct gas molecules,” says Emad duct water (one through each aquapor- Tajkhorshid, PhD, co-author of the in molecule) and one central pore work and assistant professor of bio- where the molecules meet. The latter, chemistry, pharmacology and biophysics until now, had no known function. at the University of Illinois at Urbana- When simulated using two comple- Champaign. But, he says, some uncer- mentary methods—explicit sampling tainty remains: “Whether or not it’s with full gas permeation and implicit Simulations of the aquaporin tetramer important in the human body, that’s the ligand sampling—the team found both found that carbon dioxide and oxygen are controversial part.” The work was pub- oxygen and carbon dioxide were exchanged through the central pore—a site lished in the March 2007 issue of the exchanged through that central pore. of previously unknown function. Image Journal of Structural Biology. Carbon dioxide was also transmitted courtesy of Emad Tajkhorshid, a faculty Fifteen to twenty years ago, scientists through the four water pores, while oxy- associate of the NIH Resource for believed that water permeation through gen passed through those pores only Macromolecular Modeling and lipid bilayers was enough for water trans- rarely. The research also found, howev- Bioinformatics, and his UIUC colleagues port into and out of cells. Gradually, er, that a plain lipid bilayer conducts Klaus Schulten, Yi Wang, and Jordi Cohen. “It’s now well established that [aquaporins] can conduct gas molecules,” says Emad Tajkhorshid. “Whether or not it’s important in the human body, that’s the controversial part.” though, researchers realized that some two and a half times as much gas as one properties of the central pore. cells need to control water permeability, embedded with aquaporin tetramers. Meanwhile, Boron’s group is looking for and other cells have lipid bilayers that “The question is whether this pathway a system in which gas conduction aren’t very permeable to water. is significant and makes any difference through aquaporins is a major pathway. Aquaporins, it turned out, carry water in in terms of total permeability of the Says Tajkhorshid: “Even if it’s 30 per- and out in a controllable fashion. “I membrane,” says Tajkhorshid. cent of total gas permeability, it becomes think the same might be true for gas per- The researchers hypothesize that, as physiologically relevant because then meability,” says Tajkhorshid. “Gas perme- with water permeability, aquaporins may you can control it.” ability of a lipid bilayer is like an open be physiologically relevant to gas According to Nazih Nakhoul, PhD, free highway where everything can go exchange when cells have dense, rigid research associate professor in biochem- through. With a protein, you can have a lipid bilayers or when aquaporins occu- istry at Tulane University, “This idea of gating mechanism and some regulation.” py a major fraction of the membrane. gas transport through membrane proteins One of Tajkhorshid’s collaborators, Tajkhorshid plans to introduce point is really gaining support. It’s interesting to Walter Boron, MD, PhD, professor of mutations inside the central pore and see molecular dynamics simulations con- cellular and molecular physiology at Yale manipulate the behavior of a gating loop firm some of the earliest findings.” University, has been working on gas to see how that changes the conducting —By Katharine Miller 2 BIOMEDICAL COMPUTATION REVIEW Summer 2007 www.biomedicalcomputationreview.org NewsBytes Parkinson’s Culprit its hexamer interacting with the cell mediate and each may last only as long membrane required juggling around a as half of a nanosecond. Nevertheless, Modeled million atoms, Tsigelny says. Tsigelny says, even such fleeting inter- Under a microscope, the curious pro- Yet more than the size of alpha-synu- mediates may aggregate. The pore- tein clumps that dot the brains of clein, what made it difficult to model like aggregates, they found, are far Parkinson’s patients stick out like the was its lack of structure. Alpha-synucle- more stable than single molecules of culprits they are. But no one has yet in is an intrinsically unstructured pro- alpha-synuclein. caught the protein—alpha-synuclein—in tein—one without a distinct three- Having this model “is one step for- the act of causing disease. Now, investi- dimensional shape. Most proteins con- ward,” says Hilal Lashuel, PhD, profes- gators report in an April 2007 issue of sistently fold into a favored shape to do sor at the Swiss Federal Institute of FEBS Journal that they’re getting closer: their jobs, a form that can be crystal- Technology in Lausanne, Switzerland. they’ve modeled alpha-synuclein’s early lized, imaged, and pored over. But The UCSD model provides a structural aggregation and offered a detailed mech- unstructured proteins flop this way and basis for testing the hypothesis that anism for its participation in neuron that, even while performing their spe- alpha-synuclein forms toxic pores, he death. cific tasks, making them very difficult adds. But Lashuel also cautions that “This is not just the first computa- to pin down and study. only biochemical and in vivo studies can tional model of alpha-synuclein,” says “We were not scared by an unstable prove whether alpha-synuclein pokes Igor Tsigelny, PhD, an author of the protein,” Tsigelny states. And he and holes in neurons. “Isolating the toxic paper and a computational biologist at his coworkers developed an unusual species is really the most difficult ques- the San Diego Supercomputer Center. “all-dynamic” approach to modeling tion we are dealing with. You have to “Up to now, there was no molecular the protein. None of the conformations catch it in the act.” concept of the aggregation going on.” are final—they are all considered inter- —By Louisa Dalton In the brain cells of Parkinson’s patients, alpha-synuclein first starts to cluster as a proto-fibril. It then forms fib- ril chains, and finally ends up in the dense clumps of fibrils called Lewy bod- ies. Some researchers have suggested in the past few years that alpha-synuclein knocks off neurons right at the begin- ning of aggregation, long before it can be detected as a Lewy body. Biochemical and structural evidence hints that when a few alpha-synuclein molecules first self- assemble into proto-fibrils, they can form pore-like ring structures. These may interact with the cell membrane and allow ions to enter the cell. The entrance of ions such as Ca2+ could lead to neuron death. The computer model created by Tsigelny and his colleagues at the University of California, San Diego, sup- ports this theory, providing detailed dynamics of alpha-synuclein hexamers and pentamers and their interaction with the cell membrane. What’s more, the model shows that another synuclein in the cell—beta-synuclein—blocks alpha- synuclein’s ring-making, suggesting at least one avenue for future inhibitory drug development. Modeling such a complex aggregation wasn’t simple. Alpha-synuclein is a large Alpha-synuclein poses as a pentamer, pore-like, on the surface of a cell membrane. Courtesy protein (140 amino acids), and to model of Igor Tsigelny www.biomedicalcomputationreview.org Summer 2007 BIOMEDICAL COMPUTATION REVIEW 3 NewsBytes Clustering Without Limits “Part of the Starting in preschool we all learn how to get organized. Typically, we start with pre-determined categories (dolls, trains, attraction of the blocks); pre-set ideas about what belongs in each category (Barbie: doll; Thomas [affinity propagation] the Tank Engine: train) and a fixed num- ber of bins to put things in. algorithm is that, But what if you started with none of those initial limitations? Could you still although it was group the toys? It turns out that, in a computer, such sorting is not only possi- ble, but extremely efficient. Using a complicated to Frey and Dueck use affinity propagation novel algorithm called affinity propaga- tion, researchers at the University of derive, it’s quite to cluster data around “exemplars”— data points that best represent their Toronto found that they can not only cluster lots of different kinds of data simple to implement compatriots. In this graphic, after start- ing with an equal chance of serving as an appropriately, but do it better and faster than other methods. The work was and to get an exemplar, candidates for that job have already emerged (red dots). Each data published in the February 16 issue of Science. intuitive feel for it,” point sends messages to each candidate exemplar conveying how well it repre- “Almost all existing techniques work on a hypothesis refinement basis: they says Brendan Frey. sents the blue point compared to other candidate exemplars. And candidate start off with a set of assumed groups exemplars send messages conveying and iteratively refine them,” says their availability to serve as an exemplar Brendan Frey, PhD, associate professor for particular data points. of electrical and computer engineering The task sounds mind-boggling: There at the University of Toronto, co-author are a huge number of possible groupings. of the paper. “To our knowledge, ours is But affinity propagation handles that says Dueck. Indeed the algorithm is so the first algorithm to consider all possi- problem by sending messages between generic that Frey and Dueck used it to ble groupings at once.” data points—pair-wise—so as to maximize analyze gene expression data, facial the net similarity in images, and airline routes, while other each group. “Each mes- researchers have found applications in sage encapsulates or basketball statistics, the stock market and summarizes a whole dis- computer vision. And many tasks in com- tribution of possible putational biology require a computer to groupings for one of the organize the data before using it to make data points,” says predictions. Delbert Dueck, a PhD “Part of the attraction of the algo- candidate in Frey’s lab. rithm is that, although it was complicat- “No one has done that ed to derive, it’s quite simple to imple- before.” ment and to get an intuitive feel for it,” Affinity propagation says Frey. There are basically only two is based on an algo- equations to it. “Sometimes we’ll give a rithm called belief prop- talk and get emails from people who’ve agation, which has been implemented it the day after,” he says. around in various incar- When the researchers looked at how nations for many years. well the algorithm performed compared But, say the authors, it’s to other clustering methods they found an approach that has it remarkably efficient. “A problem our never been applied to algorithm could solve in about five min- If asked to cluster facial images, a standard clustering method clustering. “Certainly utes on one computer would take other (k-means clustering) would take up to a million years on a sin- not to generic clustering methods up to one million years to solve gle computer to achieve the accuracy achieved by affinity prop- of any type of data,” on that same computer,” says Frey. agation after five minutes. 4 BIOMEDICAL COMPUTATION REVIEW Summer 2007 www.biomedicalcomputationreview.org NewsBytes Tim Hughes, PhD, of the Center for lished out of the lab run by Tomaso was able to classify pictures of a busy Cellular and Biomolecular Research at Poggio, PhD, at MIT’s McGovern street scene as well as other leading the University of Toronto, is considering Institute for Brain Research. mathematics-based computer vision sys- using affinity propagation in his For decades, scientists have struggled tems, as described in the March 2007 research. “It seems like it would do best to create computer programs that can rec- issue of IEEE Transactions on Pattern when things really do form independent ognize visual objects as well as humans Analysis and Machine Intelligence. groups, and when the data are can. Some computer systems excel at rec- Serre’s team then built a more com- fairly sparse, so most of the correlation ognizing one particular object, but none plex system, consisting of many S and C matrix can be dropped in early are anywhere close to recognizing the wide layers designed to closely match the flow cycles,” he says. “I think it will work well range of objects observed by the human of information in a human brain during with exon-profiling data or brain. Visual the first 100-200 genome-tiling data, where there is also a recognition is milliseconds of constraint that the groups complicated by “We’ve built a model perception. This have to correspond to regions near each two conflicting enhanced system other on the chromosome.” goals: a program to be as close as performed as well —By Katharine Miller must be specific as humans on a enough to discrim- possible to what is rapid object recog- inate between nition task: distin- Computer Vision that different objects, such as a person known about the guishing animals from non-animals Mimics Human Vision Our brains can recognize most of the or a car, yet flexi- ble enough to rec- human visual when images were flashed in front of things we pass on an evening stroll: ognize the same humans and com- Cars, buildings, trees, and people all reg- type of object in system,” says puters. The work ister even at a great distance or from an different sizes, appeared in the odd angle. Now, a new computer vision poses, and light- Thomas Serre. April 2007 issue of program can do the same thing. It suc- ing. the Proceedings of cessfully rivals the human ability to rap- To achieve these goals, Serre and col- the National Academy of Sciences. The idly recognize objects in a complex pic- leagues used data recorded from real computer system even made errors simi- ture because it mimics how information neurons in the visual system to program lar to the errors made by humans, sug- flows during the initial stages of visual two fundamentally different kinds of vir- gesting that the model recapitulates the perception. tual neurons called S (simple) and C early processes of the human visual sys- “We’ve built a model to be as close as (complex) units. S units recognize specif- tem. possible to what is known about the ic features of an image; C units monitor The model will be used as a tool by human visual system,” explains Thomas a range of S units in one area and allow neuroscientists to better understand the Serre, PhD, a postdoctoral associate in for variation in position and size. human visual system, and also has prac- the Center for Biological and The researchers were surprised to tical applications for surveillance, driv- Computational learning at MIT and find that a simple system, consisting of ing assistance, and autonomous robot- lead author of two papers recently pub- four alternating layers of S and C units, ics. According to Poggio, the team’s next When presented with a real-world street scene (left), Serre’s computer vision system successfully recog- nized pedestrians, cars, buildings, trees, sky, and the street (right). Although not pictured, the model also successfully identified bicycles. Note the error in this example: the model mistakenly classified a street sign as a pedestrian. Graphic cour- tesy of Stanley Bileschi, PhD, McGovern Institute for Brain Research at MIT. www.biomedicalcomputationreview.org Summer 2007 BIOMEDICAL COMPUTATION REVIEW 5 NewsBytes goal is to extend the model to include the “back projections” from other parts of the brain that allow feedback process- ing of visual information after 200 mil- liseconds. Agent-based computer models predict the “This is the first demonstration that pattern (left) produced when genetically a purely bottom up approach to visual identical cells have an inherent probability object recognition, inspired by record- of changing (from green to red and vice ings from the neurons in the brain, is versa), and the pattern (right) produced effective as a practical computer vision when cells are triggered to change by an system,” says Terry Sejnowski, PhD, extrinsic factor, such as cell density. Top head of the Computational Neuro- images represent exponential growth; biology Lab at the Salk Institute. “There bottom are at equilibrium. Courtesy of is much more work to do, both to Andras Paldi. improve its performance, and also to use it to better understand how our own visual system works.” agent based models of a tissue culture can affect the differentiation process. —By Matthew Busse, PhD plate. In each model, all cells act inde- “The stem cell nature is not an intrinsic pendently and can switch between two property of the cell,” he says. “It is a prop- cell types: A or B. In the “extrinsic” erty of the whole cell population.” Paldi model, A cells turn into B cells when it further believes the work supports the Nature Versus gets crowded, and back to A cells when effort to find a way of converting adult, Nurture In Silico they have more space. In the “intrinsic” differentiated cells into stem cells (and Every generation, a few noncon- model, each cell has fixed probabilities of avoid the need for harvesting embryonic formists crop up in tissue cultures of switching from A to B and back again. stem cells)—a possibility that has not just genetically identical cells. The question is: When the scientific, are the wayward simply born that way, or scientists ran the Why, in the same warm but social did something in the environment affect models, they and political them? “You have these two possibilities— intrinsic or extrinsic, nature or nurture,” found each pro- duces a stable, spot, getting the same implications as well. says Andras Paldi, PhD, a biologist at heterogeneous Genethon in France. population, yet rich media, do some cells Christa Muller- Now, Paldi and his colleagues have they differ in the Sieburg, modeled such cultured cells to deter- cell patterns. differentiate and others PhD, how- mine whether extrinsic or intrinsic The intrinsic ever, dis- influences play a key role in the sponta- model predicts stay stem cells? putes that neous emergence of phenotypic varia- lone A cells dis- scientific tion. It turns out that for spatial patterns tributed evenly throughout a largely B conclusion. “The idea that mature cells beyond randomness to arise, there has population. Extrinsic predicts that the A can turn into stem cells is very attractive to be some effect of sensing neighboring cells will cluster. The result held even to many modelers but has little support cells—i.e., extrinsic factors must play a though the cells were allowed to migrate. through experimental data,” says the role. And the extrinsic model resembles This pattern difference allowed the professor at the Sidney Kimmel Cancer results seen in real cells. The work researchers to compare their computa- Center. appears in April in PLoS One. tional simulation with real cells. Using a Sui Huang, MD, PhD, at Paldi’s work was motivated in part by muscle cell line that can switch between Children’s Hospital Boston, would the open question among stem cell biol- two distinct phenotypes, a stem-cell like have liked to see Paldi’s group perturb ogists of what triggers a stem cell to dif- progenitor state and a differentiated state, the cell line or the culture to confirm ferentiate. Why, in the same warm spot, they found that the cell pattern mostly their model. But both he and Muller- getting the same rich media, do some resembles that of the extrinsic model. Sieburg believe the study addressed an cells differentiate and others stay stem Many of the rare, stem-cell like cells clus- important question, that of heterogene- cells? It is commonly assumed that this is ter; a few are solitary. ity of a genetically identical population because the decision to differentiate is What’s important here, Paldi says, is of cells. And, says Huang, it certainly intrinsic—that is, purely random. that they find environment playing a “contributes to the discussion in the To test that assumption, Paldi’s group role—a significant one. In the case of stem community.” started by designing two simple, multi- (progenitor) cells, it means neighbor cells —By Louisa Dalton 6 BIOMEDICAL COMPUTATION REVIEW Summer 2007 www.biomedicalcomputationreview.org Simulating Populations But that technique is not without its based on Python. The software is freely problems. When a population evolves for- available at http://simupop.sourceforge.net, with Complex Diseases ward in time, there are simply too many under a GPL license. Diabetes, breast cancer, multiple When Peng and his colleagues used possible outcomes. Most notably, when sclerosis, Alzheimer’s disease. All are their method to compare several gene map- you introduce a disease allele, it can rapid- associated with several genes’ alleles ping techniques they found that certain ly be eliminated and replaced with new interacting in complex ways with one methods worked better for loci that were alleles. So Peng came up with a trick: He another and the environment. Now, located distantly from one another; and pre-sets desired disease allele frequencies in using a computationally intensive other methods were method known as forward-time simula- more effective when tion of human populations, researchers loci were close together. are hoping to gain a better understand- Overall, though, says ing of how such complex diseases Kimmel, “We’re mildly become established. pessimistic” about cur- “In a real population you just see peo- rent gene mapping ple with the disease,” says Marek approaches. “When Kimmel, PhD, professor of statistics at the number of loci Rice University and co-author of the CANCER involved in complex work. “You don’t see who in the popula- disease is greater than tion has the disease genes because peo- two, the methods rap- ple carrying these genes do not necessar- MULTIPLE SCLEROSIS idly lose their power.” ily become diseased.” But in the model Until recently, gene population, he says, “you see both.” And mapping for complex the researchers’ approach allows them to diseases has been disap- simulate a very complicated scenario— pointing, he says. Loci including changes in types of selection identified in such pressure. efforts have later “This lets us evaluate how well statis- DIABETES turned out to be statis- tical genetics tests determine what genes tical artifacts. “Our are responsible for the symptoms of a modeling could figure disease and how frequently those genes out if this is inevitable,” appear in the population.” That’s a he says—and help guide non-trivial exercise, he says, because it people toward more has been impossible, until now, to effective approaches. compare the many existing gene-map- David Balding, ping methods head-to-head. The work PhD, a professor of was published in PLoS Genetics in “In a real population, you just see statistical genetics at March 2007. Imperial College in Before now, the most commonly people with the disease,” says London, does similar used approach to simulating diseases in human populations—called the “coales- Marek Kimmel. “You don't see work using forward- time simulations of cent” method—worked by coalescing who in the population has the large genomic backward in time to a most-recent com- regions. He has mon ancestor. But it’s extremely diffi- disease genes...” become pessimistic cult to take selection into account using the current generation, extrapolates them about the method’s usefulness for the coalescent method, says co-author backward, and starts the simulation from understanding complex diseases because Bo Peng, PhD, a postdoctoral fellow at there. As Kimmel puts it, “We are restrict- no one really knows what kind of selec- the University of Texas MD Anderson ing potential variability in one aspect of tion is going on. Nevertheless, he says, Cancer Center. Moreover, that the present in order to produce a simula- this work can be useful for studying approach gets too complicated if more tion that resembles something close to the selection itself. “People tend to look at than one disease gene is involved. So selection one allele at a time,” he says, actual variability that exists now.” Peng and his colleagues turned to for- The simulation uses a scripting lan- “But forward-time simulation lets us do ward-time simulation, an approach guage called simuPOP, a general-purpose it with complex interactions.” that’s been around for about one hun- forward-time simulation environment —By Katharine Miller ■ dred years. www.biomedicalcomputationreview.org Summer 2007 BIOMEDICAL COMPUTATION REVIEW 7 How They’re Stacking Up BY MEREDITH ALEXANDER KUNZ 8 BIOMEDICAL COMPUTATION REVIEW Summer2007 www.biomedicalcomputationreview.org n the beginning there was the I Visible Human. It broke new ground by gathering some 2,000 serial images from a death row inmate’s cadaver, and was the first time researchers had sectioned a single human being and gotten it right. But the project broke new ground in another way as well. As the first large, publicly-available image collection, it proved that “If you build it, they will come,” according to project director Michael Ackerman, PhD, of the National Library of Medicine (NLM). The Visible Human was initially envisioned as a tool for teaching anato- my. But soon after the database launched in 1994, use agreements started pouring in from scientists who wanted to create 3-D images to test for radiation absorption or design artifi- cial hips and knees, not to mention from artists illustrating anatomical injuries in court cases, to name just a few of the dozens of projects based on the Visible Human data. Despite the suggestion that such large image collections could inspire new types of research, the Visible Human Project remained the only public imaging database available for many years. During that time, large public databases in other fields—most notably genomics and proteomics—cre- ated whole new realms of research. Today, unlike genetic sequence data, which are centralized in GenBank, and protein structures, which reside in the Protein Data Bank (PDB), imaging www.biomedicalcomputationreview.org Summer 2007 BIOMEDICAL COMPUTATION REVIEW 9 How They’re Stacking Up IMAGE COLLECTIONS: This section through the Visible Human Male’s thorax shows his heart (with muscular left ventricle), lungs, spinal column, major vessels, and musculature. Image courtesy Michael Ackerman, Visible Human Project, National Library of Medicine. data still lacks a central repository. But an increas- ing number of people are hoping to create image collections from thousands of people, and not just one prisoner in Texas. The question is whether the shift from examining Specialists carrying out images one at a time to looking at them in large groups will not only lead to better research of the imaging projects feel type already done today, but will create something fundamentally different. Just as the field of genetics they should be the first to transformed into genomics when biologists moved from looking at individual genes and diseases to reap the benefits of the examining the whole genome, so too imaging could see a shift. A field that has traditionally studied nar- information the images rowly defined problems using small collections gleaned from physician-collaborators could find itself faced with huge collections and the potential to contain, rather than reveal new correlations between diseases, genes, and anatomy. As in genomics, it will be possible to look having to share the data. at variation both within and between diseases like never before. Before this transformation can happen, though, a leap of faith is required: Researchers must share their images now in hopes of greater rewards later. That’s one of the current challenges researchers are tackling. There are others as well: Researchers must find ways to increase computer storage capacity; create a com- 10 BIOMEDICAL COMPUTATION REVIEW Summer 2007 www.biomedicalcomputationreview.org Indeed, in 2000, a spat erupted in “Neuroscientists who do complicated the brain imaging world when Michael Gazzaniga, PhD, director of the imaging studies are not that happy about National fMRI Data Center, wrote to fMRI specialists who had contributed having data out there before they to the Journal of Cognitive Neuro- science, telling them they would be can mine it,” says Maryann Martone. required to share their experimental data with the center if they wished to publish in journals including Science and the Journal of Neuroscience. Researchers immediately raised objec- mon language for describing images; cheap—researchers might pay around tions, sending a letter to the center’s develop standards for “metadata” that $3,500 for a terabyte of storage—and the financial backers and 14 journals. will explain where an image comes from capacity of computer networks to trans- Releasing their images, they argued, and what it shows; find ways to map mit large images is ever improving. Fred “impinges on the rights authors should images from different individuals onto Prior, PhD, of Washington University have on the publication of findings an agreed upon “model;” and improve School of Medicine in St. Louis, recent- stemming from their own work.” The existing ways to analyze and interpret ly purchased space to store new research center decided to establish a “data images consistently. They also must images he expects will be generated dur- hold” for a period of time, to allow make images available remotely, so that ing the next three years at the authors to profit from their images first. physicians in rural areas will have access Electronic Radiology Laboratory which Maryann Martone, PhD, has run to large comparative collections. he directs. His team’s new Network up against some of the same issues. As As these barriers fall and imaging Attached Storage system from BlueArc co-director of the National Center for collections become more readily avail- can hold 102 terabytes, with an option Microscopy and Imaging Research able, suddenly, imaging researchers will to expand to 500 terabytes or, with an be able to do what genomics research- upgrade, to 4,000 terabytes (4 ers do all the time: look at human petabytes)—a number once unthink- systems in their entirety rather than in able. And that does not even include pieces. clinical imaging, another huge figure. But before we get ahead of ourselves, Even with such imaging, storage, let’s review the challenges. and computing power in hand, a ques- tion remains: how to motivate other researchers to share their images? BUILDING AND SHARING Scientists feel a sense of proprietary THE COLLECTION ownership over the images they have Creating image data is easier than collected. While patients can perhaps ever. Imaging capacity has increased by stake the greatest claim to the images, leaps and bounds. X-ray technology, most images are technically “owned” by developed in the 1890s, was followed by the institution where they were made, incrementally stronger imaging meth- and specialists carrying out imaging ods, from ultrasound (widely available in projects feel they should be the first to 1970s), to positron emission tomography reap the benefits of the information the or PET (1970s), to computerized axial images contain, rather than having to tomography or CT scans (1970s), to mag- share the data. netic resonance imaging or MRI (early “Science is highly competitive. Researchers have shared abundant 1980s) and functional MRI (early 1990s). Scientists want to get the first publica- images in the Cell Centered Database. New techniques are still appearing. tion, to gain funding, and get academic Here, a screenshot shows the types of And with major improvements in promotions,” says Arthur Toga, PhD, images and movies available. Image data storage and networking, scientists head of the Laboratory of Neuro courtesy Skip Cynar, National Center for do not worry as much about amassing Imaging (LONI), at the University of Microscopy and Imaging Research, bigger data sets. Big disks are relatively California, Los Angeles. University of California, San Diego. www.biomedicalcomputationreview.org Summer 2007 BIOMEDICAL COMPUTATION REVIEW 11 How They’re Stacking Up IMAGE COLLECTIONS: One of the most important parts of collecting large amounts of imaging data is also to capture each image’s back story—the context in which it was made and the condition of the patient at the time. (NCMIR) at the University of California, San trials. It aims to take an “open source” approach— Diego, she has led the creation of the Cell creating an environment of sharing information in Centered Database (CCDB), one of the first the work it funds. According to some, this is the Internet databases for cell-level structural data. wave of the future. She also coordinates a project supported by the “Increasingly, the NIH is requiring that peo- Biomedical Informatics Research Network ple share data,” says Daniel Rubin, MD, MS, a (BIRN) that investigates mouse models of human clinical assistant professor and research scientist neurological disease. at Stanford University Medical Center. Clinical “These resources were created with the idea trial information, for instance, is becoming more that people were going to populate them from the readily available, Rubin says. He points to the community, but neuroscientists who do compli- American College of Radiology Imaging Network cated imaging studies are not that happy about (ACRIN) as an example of this trend. This NCI- having data out there before they can mine it,” funded group hosts an imaging database that she says. Because NCMIR is a “technology devel- houses a large archive of clinical trial imaging opment center” funded by the NIH, she says, it data in cancer fields. has a mission “to serve a large collaborative com- Toga thinks that it is ultimately in a scientist’s munity.” So she decided to begin with her own self-interest to share. Lots of data is needed if sci- center’s data and hope that others would follow: entists want to identify subtle differences between “We do imaging that is unique. I figured, if we images, he says. “You can’t possibly collect it on just took all the data around here and made it your own.” What helps, he says, is when a couple available, that would be helpful.” It was: the proj- of folks get together and say, “I’ll share mine if you ect was one of the first web databases devoted to share yours,” which is becoming more common. electron tomography when it launched in 2002. Since then, it has continued to give access to com- plex cellular and subcellular data from light and METADATA: CAPTURING electron microscopy. Meanwhile, Martone and THE CONTEXT colleagues are still thinking about the best ways to One cooperative project in which Toga has encourage other research groups to share their been involved is the NIH-sponsored Alzheimer’s data with the site. Disease Neuro-imaging Initiative (ADNI), which As so often happens in the world of science, it encompasses 60 different sites that are sharing is funders—in particular, big government-spon- image data on the disease. But if a researcher looks sored efforts—who are beginning to change the at an ADNI image without knowing whether the rules of the game. One project aiming to put its patient has a disease or not, or without access to arms around as many images as possible is caBIG™. the person’s age or gender, or the drugs he or she Launched in 2004 by the National Cancer has been taking, it becomes much less useful. Institute (NCI), it embraces 50 cancer centers and One of the most important parts of collecting 30 other organizations. caBIG™ is an attempt to large amounts of imaging data is also to capture bring together the huge amounts of data gathered each image’s back story—the context in which it and tools created in NCI-funded cancer clinical was made and the condition of the patient at the 12 BIOMEDICAL COMPUTATION REVIEW Summer 2007 www.biomedicalcomputationreview.org Brain imaging studies are expanding into ever-larger populations. This enables digital atlases to be developed that synthesize brain data across vast numbers of subjects. Mathematical algorithms can exploit the data in these population-based atlases to detect pathology in an individual or patient group, to detect group features of anatomy not apparent in an indi- vidual, and to uncover powerful linkages between structure and demo- graphic or genetic parameters. In this image, researchers from UCLA’s Laboratory of Neuro Imaging (LONI) have used composite tensor mapping to show how Alzheimer’s patients’ brains exhibit loss of gray matter. Courtesy of Dr. Arthur W. Toga, Laboratory of Neuro Imaging, UCLA. time. For images, efforts to create a laries and common data elements,” an tions that range from slight to framework for recording such informa- effort to standardize terminology in immense. On top of that, describing tion—known as metadata—currently lag cancer analysis. Rubin, one of the shape is notoriously difficult. Though behind efforts in other realms (e.g., the group’s co-leads, reports that they are shape has been explored by the scientif- “MIAME” standards for microarray trying to structure radiology imaging ic community since the time of the data). But work is now underway to findings, to establish controlled termi- Greeks, we still have no quantitative improve the situation. nologies for radiology, and to associate parameters for defining the shapes of Some metadata—such as a patient’s specific metadata about patients with “normal” human organs, let alone name, home address, and identifying each image gathered. those suffering from disease. In addi- features—must be removed before Indeed, such efforts do not end with tion, images are affected by the exact images enter a large database. The cancer research, but could sweep across place and time they are taken, and the process of “de-identification of protect- all aspects of radiology. Rubin is also precise method used to take them. All ed health information” follows federal involved with a project called RadLex, this serves to undermine any straight- privacy regulations. which is being created to offer a uni- forward database of imaging data. But other useful information needs form lexicon for radiologists. RadLex “Image data is a snapshot of one to be incorporated into image collec- plans to unify radiology term standards instance of a thing at one time under tions. Before image metadata can make and to make the new terminology freely certain conditions. It’s not a ground sense, though, more standardization available on the Internet. Rubin sees truth like a gene sequence,” says needs to be introduced into the field, these attempts to create a common Martone. many say. Radiologists have a long tra- vocabulary as the first steps in making If all images can be standardized in dition of looking at images with expert metadata meaningful and useful for the way they are conducted—that is, the eyes and dictating a free-flowing analy- researchers and clinicians alike. types of equipment used, and the sis, which becomes a text report that kinds of patients included, and the dis- often uses terms in unique ways. That ease(s) being examined—comparison makes it difficult for other scientists or COMPARING IMAGES: becomes easier. That is part of the suc- doctors to understand the image’s con- SNAPSHOTS AND SCALES cess of ADNI, according to Toga: its text and content in a uniform way. The race to create useful imaging col- research sites are required to follow Attempts to collect and codify meta- lections faces another hurdle: how can strict protocols for their equipment data are already well underway. One of multiple images be compared in a way and image acquisition. caBIG™’s initiatives in its In-vivo that makes sense? Each human’s body Imaging specialists have also come Imaging Workspace is called “vocabu- parts are shaped differently, with varia- to rely on the best available scientific www.biomedicalcomputationreview.org Summer 2007 BIOMEDICAL COMPUTATION REVIEW 13 How They’re Stacking Up IMAGE COLLECTIONS: means of shape comparison, and they try to tell viewers that there is an 80 percent likeli- incorporate this material into their collections. hood that the basal ganglia is in a particular loca- One example is in neuroimaging, where pictures tion that has been set out by coordinates. of the brain are often linked to coordinate Another means of handling variation is evi- systems. Like a road map, these identify what dent in the Allen Brain Atlas, an extensive map- parts are found where with reference to a ping of the mouse brain’s gene expression creat- grid or common starting point. For example, ed by the Allen Institute for Brain Science in Talairach coordinates measure distances Seattle. The team behind this atlas created its from a specific spot in the brain, the anterior own coordinate system to ensure extra accuracy. commissure. The ABA is a union of neuroscience, genetics, However, researchers find fault with existing and informatics. To map gene expression onto coordinate systems because they fail to accommo- the 3-D mouse brain model, a team of neu- date variation in large populations. While they roanatomists drew all the regions of the brain, may serve well for a single human or animal, they and then “we lofted those regions onto a 3-D are not as helpful when scientists aim to “warp” model of the brain using informatics algo- many individuals onto a common model to illus- rithms,” says Michael Hawrylycz, PhD, director trate the workings of a disease, for example. As a of informatics at the Allen Institute for Brain result, some recent brain atlases have developed Science. Using high-level computations, an their own, mathematically-complex methods for image of gene expression was then mapped onto mapping variability in big groups onto a single the reference atlas’s coordinates, creating pic- framework. tures that form the database. ABA scientists In human brain mapping, researchers have chose one mouse to be the reference model, and found novel ways of dealing with natural varia- the rest of the mouse data was warped to fit into tion between human brains. Toga reports that the spatial framework of that single animal’s the 15-year-old International Consortium for brain. “We wanted a mouse that was held under Brain Mapping (ICBM) describes the brain in a exactly the same conditions that we were going to probabilistic sense. For example, the atlas might run the genes under,” Hawrylycz says. The Allen Brain Atlas produced this 3-D reconstruction showing normal expression of manosidase 1a in the adult mouse brain viewed from the front left. The translucent forms represent the left half of the brain and reflect the underlying standard anatomical reference framework to which the gene expression data was registered. Each colored sphere reflects expression of the Man1a gene in a 100 μm3 area. The size of each sphere corresponds to expres- sion density, and the color reflects expres- sion level. The large red arc indicates that this gene is turned on strongly in the hip- pocampus, a part of the brain known to be involved in learning and memory. The image was generated from the Allen Brain Atlas (www.brain-map.org) using the 3D visualization tool, Brain Explorer. Courtesy of the Allen Institute for Brain Science. 14 BIOMEDICAL COMPUTATION REVIEW Summer 2007 www.biomedicalcomputationreview.org Another vexing challenge for image comparison is the issue of scale. Although Slicer was conceived as an interactive tool for processing sin- “Image data is a Martone points to the problems con- gle images, it is also useful for fronted by brain researchers when they researchers working with large sets of snapshot of one try to see the workings of a disease on images, Kikinis says. “Now people are multiple scales in a large set of images beginning to build informatics frame- instance of a thing at taken using different technologies. works to hold and manage images; “We go from MRIs, to optical and soon people will shift focus to one time under certain microscopy, to electron microscopy, how to process those images,” Kikinis then to X-ray crystallography,” she says. explains. “With all the progress in conditions. It’s not a “Every time you traverse scales, there image acquisition, you still need to are gaps. Every time you switch tech- turn data into medically-relevant ground truth like a niques, you lose continuity.” Even the information, and that requires image contrast mechanisms are different, so one scale may contain fluorescents analysis,” he says. The current version of Slicer is interoperable with BIRN’s gene sequence,” while another is gray scale, disorient- informatics frameworks and is also ing researchers. It’s like being con- linked directly to the National Cancer says Martone. fronted with a GPS tracking image of a Imaging Archive (NCIA)—a large moving vehicle one minute, and a repository of cancer trial images—as a Polaroid photo of the vehicle’s front recommended viewer for its images. wheel the next. Slicer can be used to review image sets tions that are optimized for clinical To combat confusion, Martone’s for prototyping and results for quality reading, lots of research packages like team is trying to create new coordinate assurance. For example, before pro- Slicer, and great toolkits like ITK that and reference systems that ease the tran- cessing hundreds of images, it’s wise give you functionality, but what’s miss- sition among scales when studying neu- to test your algorithms and procedures ing is a way to build custom applica- rons in the brain. She cites a new soft- with a handful first. That’s where tions for these tools,” says Prior. ware project that attempts to correlate Slicer’s interoperability with large XIP will give users a “rapid develop- microscopy with “feature-based match- databases can be used as a tool that ment environment,” he says, enabling ing systems” that describe the attributes offers essential functionality. researchers to do image processing of such cells in a uniform way. Another fundamental tool avail- more easily. XIP’s initial targets are able to image users is the Insight Tool cancer researchers already working Kit (ITK), which Ackerman of NLM in the grid, but its potential is ANALYZING IMAGES IN says took some three years to develop. much greater. THREE DIMENSIONS Based on GE’s Visualization Tool Kit “We’re hoping we’ll see a cottage Those who set out to compare (VTK), ITK’s algorithm allows a user industry building new applications in images are also getting help from to identify a body part—for instance, this XIP framework to do things advances in image analysis software, a the heart—and then ask the tool to like virtual colonoscopy and radiation field that has advanced rapidly in the draw a line around everything that therapy analysis,” Prior says. The past few years. Ron Kikinis, MD, pro- looks like heart tissue. “Up until now, “slick part” in Prior’s words is fessor of radiology at Harvard Medical you’d have to do that by hand,” says that such applications could be run School, has helped lead the way. He Ackerman. The tool saves users’ time through the grid and offered to other and colleagues developed the “3D and is constantly being updated, mak- researchers remotely through the Slicer” image analysis software, initially ing it ever more efficient. platform—creating a whole new level a joint, open-source effort between the Other complementary efforts are of sharing. Surgical Planning Lab at Brigham and working to ensure that researchers in In quite a different application of Women’s Hospital, where Kikinis is distant labs can create their own image analysis, some researchers are founding director, and the Artificial image analysis applications on a lab honing in on new ways to help scien- Intelligence Lab at MIT. Created to workstation. Fred Prior has worked tists and doctors find the images they help visualize medical image data in with other researchers to oversee cre- need using tools that analyze its image 3-D, it has been used with success in ation of the Extensible Imaging content rather than its metadata. fields as far flung as astronomy Platform, or XIP. “The idea is that Known as content-based image and geology. there are lots of commercial worksta- retrieval, these programs also strive to www.biomedicalcomputationreview.org Summer 2007 BIOMEDICAL COMPUTATION REVIEW 15 How They’re Stacking Up IMAGE COLLECTIONS: The Cell-Centered Database, a project of the National Center for Microscopy and Imaging Research, brings together data from different experiments so that multi-scaled views can be created, helping scientists to study how higher order structures, such as cellular networks, are assembled out of finer building blocks, such as dendritic architectures. This montage shows seven orders of magnitude of scale from centimeters to nanometers. A slice through a centime- ter-sized mouse brain was obtained by making a mosaic from thousands of multiphoton microscopic images. Then flu- orescence microscopy was used to isolate a spiny neuron (first sub-panel). Correlating cell structures identified under the light microscope for subsequent examination under the electron microscope permitted biologists to visually recon- struct the three-dimensional structure of dendritic structures with nanometer resolution. The second and third sub- panels portray electron tomographic reconstructions of an unbranched spiny dendrite from cerebellum and its nanometer-sized synaptic complex (from hippocampus). Image courtesy Skip Cynar, National Center for Microscopy and Imaging Research, University of California, San Diego. overcome errors caused when inaccurate text- texture, and shape. Ultimately, some hope that based keywords lead to mismatches in retrieving these systems might allow a physician to click on images, write Paul Miki Willy and Karl-Heinz an image of a cancer in a particular patient and Küfer, PhD, of the German Fraunhofer Institut ask a database to show similar images for compar- Techno- und Wirtschaftsmathematik in a 2004 ison. So far, this technology has not yet reached a paper. Content-based programs attempt to index wide audience; some believe more work is needed images according to visual features such as color, to ensure accuracy in such searches. 16 BIOMEDICAL COMPUTATION REVIEW Summer 2007 www.biomedicalcomputationreview.org ACCESSING IMAGE sets resemble one single virtual data- multiple databases, to allow people to DATABASES: CONNECTING base. Joel Saltz, MD, PhD, professor discover what images are out there, and TO THE GRID and chair of the department of bio- to analyze both remote and local All these image collections will medical informatics at Ohio State imagery and to integrate image data do little good if no one can access University, leads a group that develops with information from molecular stud- them remotely. Researchers at the technologies that can enable “grid” ies, clinical studies, and pathology spec- crossroads of biomedicine and compu- access for large image collections to imens,” Saltz says. The National Cancer tational science are tackling that create such federated systems. His Institute caBIG™ project has incorporat- problem now. group has developed middleware to ed the Ohio State group’s software in One promising answer is to create support complex distributed applica- the caGrid software package. This was “federated databases”—groups of tions. It attempts to stitch together dif- first distributed in December and, Saltz unique imaging collections that are ferent bodies of images, making them says, quite a number of funded efforts linked together by a sort of “grid,” and available and searchable. have begun to incorporate it. Furthest that are accessible remotely via a seam- “The overall goal of the effort is to along in the process of opening up an less user interface that makes the data develop an infrastructure to connect image database to many users with Slicer3 image analysis software is an integral part of the brain atlas created by the Surgical Planning Laboratory and the Psychiatry Neuroimaging Laboratory (PNL) at Brigham & Women’s Hospital in Boston. This three-dimensional digitized atlas of the human brain is used for surgical planning, model-driven segmentation, research, and teaching. As this screenshot illustrates, Slicer3 enables users to outline and manipulate specific regions of the brain in three dimensions based on multi-modal volumetric input data including specialized MRI methods. An additional goal of this brain atlas is that it can be used as a template for automatically segmenting regions of interest in large new MR data sets. Image courtesy of Ron Kikinis, Surgical Planning Laboratory, Brigham & Women’s Hospital. www.biomedicalcomputationreview.org Summer 2007 BIOMEDICAL COMPUTATION REVIEW 17 How They’re Stacking Up IMAGE COLLECTIONS: Saltz’s help is the National Cancer and provide their feedback via software that Imaging Archive. allows a user to capture mark-ups, pointers, and These new systems may not be open to just comments. For instance, a radiologist in Omaha any member of the public—at least some will might send out a CT scan of a patient’s lung via require registration and credentials. But the Saltz’ software to radiologists around the world incentive to participate is high. Researchers and as well as to computer-aided diagnosis physicians who gain access will be able to com- algorithms available at supercomputers in municate with each other in new ways that could research centers. She might hear back from make a big difference to patients. A major bene- radiologists in Mumbai, Tokyo, and Chicago, fit for those linking their images to a grid is the and from computers at a handful of univer- possibility of “central review,” says Saltz. In cen- sities, possibly discovering lung nodules she tral review, radiologists remotely read an image had missed. This screenshot from the Saltz lab's gridIMAGE application shows how radiologists in remote locations can review and markup images from multiple collections. A radiologist accesses the interconnected or “federated” imaging databases through a single interface and can submit a review request to other participating physicians who use the same data- base. The reviewers can add marks and comments and then submit their marked-up results to a central result server, which transmits it to the radiologist who made the request. This application is based on the Saltz lab’s In Vivo Imaging Middleware. Image courtesy Joel Saltz, Ohio State University. 18 BIOMEDICAL COMPUTATION REVIEW Summer 2007 www.biomedicalcomputationreview.org The next generation of applications will reveal whether the rise of large imaging collections will create a new science, just as genetics spawned genomics. APPLICATIONS: ments—and to substantiate that a tumor Increasing numbers of researchers on WILL THEY COME? has indeed changed size in an important the biomolecular scale are also using If researchers overcome the barriers way, he explains. Researchers could use a imaging in their research, including scien- described above, the question then will central review-style process to verify their tists like Martone and the people who uti- be whether it will prove worthwhile. Will reading of an image. “An image database lize the ABA and other such atlases. For innovative applications follow? In other allows you to go back to a larger commu- example, labs are using the ABA to inves- words, if you build it, will they come? nity of observers and confirm whether or tigate risk factors for multiple sclerosis Early indications are that they will. For not something seems to be supportable.” and to identify genetic hotspots associated some physicians, the near-term possibility For researchers studying rare diseases, with memory performance. And new of central review alone will make federat- the goal is to find others to compare databases at the cellular level are popping ed imaging databases worth the effort. against and to increase understanding up, including the Open Microscopy For neuroscientists, gaining insights remotely. For example, says Jaffe, in the Environment, a large public database into the brain’s workings and connec- old days, a researcher hoping to test a focused on microscopy imaging data. tions requires large numbers of fine- drug for a rare disease such as retinoblas- grained images. In the past, scientists had toma—a cancer of the retina with an inci- done studies of specific parts of the brain, dence of only 430 cases per year—would THE NEW NEW THING but few had tried to discover the overall have to request MRI films from around Imaging is just one of many bioscience structure of the brain. Large neuroimag- the country to try to prove that his trial fields moving towards more and better ing projects such as the ABA are attempt- worked on a range of patients. But some information sharing and collecting. While ing to change that. Indeed, some hope to films would come back too dark, some the field faces its own hurdles—the diffi- one day map every single neuron in the too light, and some without the right culties of comparing images, for example— human brain, creating a data set of metadata. If all the data and images could it falls within a larger trend of making data upwards of 1 million petabytes. This “con- be collected digitally in an online data- available and breaking down the silos of nectome,” promises to be the image-based base, the researcher would more quickly single organ or disease-focused work that Human Genome Project of brain understand the drug’s impact. “What for so long dominated the sciences. It’s the researchers. Its success will rely on com- you want is an electronic, common pool same impulse that inspired the release of puter-assisted image acquisition and of data and metadata,” Jaffe says. the genome and the dawn of genomics, analysis to map the structure of the nerv- Surgeons and other physicians could and could cause a similarly radical shift in ous system, says Jeff Lichtman, MD, also benefit from such systems as how people use image data. PhD, professor of molecular and cellular Rubin’s efforts to use large groups of The next generation of applications will biology at Harvard. images to inform a doctor of how to diag- reveal whether the rise of large imaging col- In clinical trials for cancer treatments, nose and treat a patient. Using Rubin’s lections will create a new science, just as image collections help in evaluating a decision support software, physicians genetics spawned genomics. Ultimately, it drug’s effectiveness, says Carl Jaffe, MD, can select from a series of structured might be possible to cross-compare diagnostic imaging branch chief for the annotations of an image and upload the between imaging and genomics. That’s cancer imaging program in the division of image data. Then a computer program already happening in brain research proj- cancer treatment and diagnosis at NCI. tells them the likelihood of disease. “We ects such as the Allen Brain Atlas, but the The promise of using image collections to want to give radiologists a tool to help trend could spread throughout the body. speed drug development is already beck- them decide when to biopsy based on And as in genomics, the shift could gener- oning. “The regulatory authorities are what they see,” he says. While it is partly ate an entire new field of research in which more willing to accept regression of a based on the knowledge of expert radiol- scientists could build an entire career. tumor as a sign of a drug’s effective- ogists, this type of technology will work If the Visible Human is any proof, sim- ness…and imaging is the pivotal marker even better when a large number of ply building large, accessible collections of for this,” he says. A large database of refer- images are available to inform the pro- images will attract scientific curiosity and ence images helps to balance “reader arti- gram—hence the need for large databases will launch a wealth of useful applications facts”—that is, errors in radiologist’s assess- filled with rich stores of metadata. we cannot even imagine today. ■ www.biomedicalcomputationreview.org Summer 2007 BIOMEDICAL COMPUTATION REVIEW 19 20 BIOMEDICAL COMPUTATION REVIEW Summer 2007 www.biomedicalcomputationreview.org DOCK THIS: Drug Design Feeds Drug Development BY KRISTIN COBB, PHD Once upon a time, not long ago, HIV/AIDS was a scourge, killing any- one who contracted the deadly virus. Now, many people are living with the disease, which they control with drugs initially developed in the 1980s and early 1990s using an approach called computer-aided drug design— the use of computer models to find, build, or optimize drug leads. Armed with information about the 3-D structure of HIV protease, an enzyme essential to the HIV reproductive cycle, computational researchers designed molecules in silico to precisely fit the shape of the enzyme’s active site—as though fitting a key to a lock. The resulting drugs, potent inhibitors of HIV protease and the HIV life cycle, were brought to market in record time and revolutionized the treat- ment of HIV/AIDS. Around the same time, another anti-viral—Relenza, which treats influenza and was a forerunner to Tamiflu—was also designed using these methods. These HIV and flu drugs are among the best known success sto- ries of computer-aided drug design (see page 23 for both stories). Since those early successes, computer modeling has become an integral part of drug discovery. “Almost everything that has recently moved for- ward from big pharmaceutical companies to market has involved some sort of collaboration with computational chemistry. It’s like asking, were there chemists involved? Of course there were. It is part of the process,” says Tara Mirzadegan, PhD, head of the computer-aided drug design group at Johnson & Johnson. www.biomedicalcomputationreview.org Summer 2007 BIOMEDICAL COMPUTATION REVIEW 21 DOCK THIS: Drug Design Feeds Drug Development “Almost everything that has recently moved forward from big pharmaceutical companies to market has involved some sort of collaboration with computational chemistry. It’s like asking, were there chemists involved? Of course there were. It is part of the process,” says Tara Mirzadegan. Quite often, computers play a role without making the big splash they did with Relenza and the protease inhibitors. That’s probably because no drug is created solely in silico; the com- puter is just one of many tools in this process. But as algorithms evolve, com- puting power explodes, and scientists solve a greater number of 3-D protein structures, computer-aided design has the potential to dramatically cut the cost and time of drug discovery. How? By narrowing down the field of com- pounds that might help treat a particu- lar disease; by assembling novel drug molecules to disrupt specific disease pathways; and by providing new attack routes against traditionally difficult drug targets. Computers are also increasingly playing a role in optimizing drug leads for bioavailability and safety. Despite the over-hype of computers as the saviors of drug development companies, many still expect this process to bear important fruit. Computer-aided drug design played a critical role in the design of several drugs that are now in late preclinical Docked Drug. This 3-dimensional computer graphic shows a candidate drug (a JAK2 or early clinical development. Only inhibitor) docked in the active site of its target protein (JAK2). JAK2 protein is implicated time will tell which of these, if any, will in various myeloproliferative disorders (diseases that produce excess bone marrow cells, emerge as drug success stories. such as chronic myelogenous leukemia, or CML) estimated to affect 80,000-100,000 peo- ple in the U.S.. Courtesy of SGX Pharmaceuticals, Inc. VIRTUAL SCREENING ture of a target is screened against start with, the ligand and protein target How it works: In the ideal situation, libraries of potentially active small mol- are often pictured as a rigid lock and the 3-D structure of the target molecule ecules. The computer “docks” each key—but in fact they are dynamic, mov- (usually an enzyme or receptor) is compound, or ligand, into the target’s ing objects that continually change known, allowing scientists to directly active site and scores its geometric and shape and adjust their shapes in visualize drug-target interactions in sili- electrostatic fit. response to each other. co. Structure-based methods have Considerable progress has been “Imagine taking a fluffy ball and trying evolved in two directions since Relenza made in docking programs in the last to mold it to optimally fit some kind of a and the HIV proteases—virtual screen- two decades, but scientists agree that binding site. There are just way too many ing and fragment-based design. the problem is complex and that they configurations,” says Dimitris K. In virtual screening, the 3-D struc- have yet to find a perfect solution. To Agrafiotis, PhD, vice president of Continues on page 24 22 BIOMEDICAL COMPUTATION REVIEW Summer 2007 www.biomedicalcomputationreview.org EARLY EXAMPLES: ANTI-VIRAL DRUGS Relenza and the HIV protease inhibitors stand out as years, but the former won FDA approval sooner (in the the two classic examples of computer-aided drug design. mid-1990s) because of the pressing medical need. Relenza was developed through a collaboration of Dale Kempf, PhD, who is now a distinguished Australian scientists, including Jose N. Varghese, PhD, research fellow in Global Pharmaceutical Research and head of structural biology at CSIRO Molecular and Health Development at Abbott, was involved in Abbott’s devel- Technologies. In 1983, Varghese and his colleagues used opment of ritonavir (brand name Norvir), which started X-ray crystallography to solve the 3-D structure of the in late 1987. enzyme neuraminidase, one of two potential protein tar- “It’s one of the first examples of the application of gets on the surface of flu. Neuraminidase plays a critical genomics for drug design,” he says. When the HIV role in the flu life cycle: after the virus replicates within a genome was sequenced and published in the mid- host cell, neuraminidase releases the newly formed viral 1980s, several groups recognized characteristic progeny by cleaving a bond between the viral surface pro- sequences suggestive of a protease enzyme. tein hemagglutinin and a sugar on the host cell surface, Interestingly, the gene encoded only half a protein, sialic acid. which led Kempf and others to realize that the protease A series of structural experiments revealed important must be composed of a dimer—two identical halves that insights. The active site of the enzyme was high- come together to form one active site. This pro- ly conserved in all strains of flu—both vided a key structural insight even before human and animal; the virus routine- X-ray crystal structures of the protease ly escaped antibody recognition by were available: the active site had mutating around the periphery of to have a particular type of sym- the active site but never chang- metry, known as C2 or two-fold ing the active site itself. symmetry (rotation 180 degrees “Because it was so highly around a central axis yields the conserved, it seemed clear to us identical structure). that it must have a very important Kempf’s group used that insight function,” Varghese says. “So, clear- to create a computer model of the ly if one made a molecule that went in protease active site and to design possible there and blocked that site, it would be pretty inhibitors in silico by starting with a known sub- effective.” strate, chopping off half of the substrate, and rotating A synthetic analog of sialic acid was known to inhibit the remaining half by 180 degrees. neuraminidase, but without sufficient potency. Using the “And when we went into the lab and made those crystal structure of neuraminidase bound with this ana- compounds, they turned out to be very potent log, the researchers set out to design a better inhibitor in inhibitors,” Kempf says. silico. Computer predictions revealed that a particular Using a combination of the X-ray crystal structures of guanidinium-for-oxygen substitution would give tight HIV protease (which had since become available) and binding. Synthesis of this compound—Relenza—turned computer graphics, they modified these compounds in out to be tricky, but eventually succeeded. silico to visualize how certain substitutions would “It bound in nanomolar binding, so it was very tight, improve characteristics like bioavailability. The first com- and it certainly blocked the virus replication right down pound with sufficient oral bioavailability, ritonavir, was to its tracks,” Varghese says. synthesized in 1991. Relenza was licensed to GlaxoSmithKline Inc. in 1990 In 1996, the FDA approved ritonavir in record time and approved by the FDA in 1999. Following their (72 days). The total development time—about eight lead—and capitilizing on a patent oversight, according years—was roughly half that of a typical drug, due both to Varghese—Gilead Sciences developed the better- to the structure-based approach and to the FDA’s accel- known neuraminidase inhibitor, Tamiflu (marketed by erated review. Several other HIV proteases emerged Roche). Both drugs may be important in the fight around the same time, including saquinavir (Roche) and against bird flu, Varghese says. nelfinavir (developed by Agouron, now a subsidiary of Development of the HIV protease inhibitors lagged Pfizer). These drugs helped to revolutionize the treat- behind that of the neuraminidase inhibitors by several ment of HIV. www.biomedicalcomputationreview.org Summer 2007 BIOMEDICAL COMPUTATION REVIEW 23 DOCK THIS: Drug Design Feeds Drug Development Cancer Interrupted. This three-dimensional computer graphic shows a drug candidate (MET tyrosine kinase inhibitor) bound to its target pro- tein. MET receptor tyrosine kinase controls cell growth, division, and motility and is implicated in a range of cancers, including renal cell carci- noma, gastric cancer, lung cancer, glioblastoma and multiple myeloma. Courtesy of SGX Pharmaceuticals, Inc. Continued from page 22 informatics at Johnson & Johnson make the problem computationally from a quantum mechanical point of Pharmaceutical Research & Develop- tractable but still meaningful,” view. Now the quantum mechanical cal- ment. “Small molecules—unless they’re Agrafiotis says. culations, as you can imagine, are hor- very small—tend to be very flexible. They Besides the flexibility of the protein, rendous,” says Jose N. Varghese, PhD, flop around a lot. They can assume a mul- many docking programs do not ade- head of structural biology at CSIRO titude of conformations in 3-D.” If a mol- quately account for the influence of Molecular and Health Technologies. ecule has five rotatable bonds, then each water—which surrounds all molecules in “At this stage, it is a computational chal- bond can rotate at many different angles, living systems. “The mathematical mod- lenge.” creating a lot of freedom to take on els for defining water and how it shapes Methods of scoring how well a small unique conformations. itself around the receptor and the drug molecule fits a protein’s active site also Most docking programs now molecule are still pretty unclear,” says must trade off between speed and accu- account for the flexibility of the ligand Kent Stewart, PhD, a research fellow racy. “The scoring function that we use by sampling its many conformations in structural biology at Abbott. has many shortcuts and approxima- and docking each one, but adequately In addition, the algorithms estimate tions,” says Mirzadegan. Her group will accounting for the flexibility of the tar- binding energies using classical virtually dock the company’s one mil- get protein is a much more challenging Newtonian physics, rather than quan- lion proprietary compounds (which it problem. Adding protein flexibility tum physics—which also reduces accura- has purchased or developed over the exponentially increases computing cy. “You can calculate the binding ener- years) against a given target, and pick demands. gies from some sort of Newtonian point the highest ranked 10,000 for biological “The state of the art today is coming of view, treating atoms as sort of balls testing. “We cannot afford docking one up with sensible simplifications that attached to springs. Or you can treat it compound per day. That would be one 24 BIOMEDICAL COMPUTATION REVIEW Summer 2007 www.biomedicalcomputationreview.org “The state of the art today is coming up with sensible simplifi- cations that make the problem computationally tractable but still meaningful,” says Dimitris K. Agrafiotis. million days. So we have to do it in a matter of seconds or sub-seconds.” But increased computing power can help boost the speed of virtual screen- ing without compromising accuracy. In 2000, for instance, Arthur J. Olson, PhD, professor of molecular biology and director of the Molecular Graphics Laboratory at The Scripps Research Institute, started the FightAids@Home project, which uses internet-based grid computing—as was popularized by the SETI@Home project—to do virtual screening for new anti-HIV drugs. “If most people who have comput- ers use only about five percent of the CPU cycles—and the rest of the cycles are just idle—how much wasted or avail- able computing is there?” Olson asks. “It turns out to be an amazing number.” His grid computing project makes use of that idle computer time and helps evaluate drugs for dealing with HIV proteins’ habit of rapidly mutating to escape drug pressures. Fortunately, the 3-D structures have been solved for many of the mutant HIV proteins. With the help of about 500,000 volun- Anti-Cancer Key. An anti-cancer drug compound—nutlin—bound to the cancer-causing pro- teer computers, Olson used AutoDock tein MDM2. Courtesy of RMC Biosciences, Inc. (a popular docking program that was developed in his lab) to screen 2000 one that captures all unique interac- our work matures, we have been look- small molecules against several hundred tions with the ligands screened. “Doing ing into the next steps involved in com- different HIV protease mutants. The docking on only this subset of mutants putational drug design,” Pande says. program took six months to run; he would free up computer time for screen- Using distributed computing, his group estimates that on the Scripps super ing larger libraries, using more dynamic has devised new, more accurate algo- computer, with 300 processors running, representations of the protein tar- rithms for docking and for calculating it would have taken 50 years. gets, or using more accurate scoring ligand-protein binding energies. These Besides identifying several drug functions,” he says. algorithms are being used in the design leads, which are now in testing, Olson The Folding@Home project at of several new drugs, including new recognizes an even more important pay- Stanford also uses grid computing inhibitors of the cytokine-cytokine off: “When you do such massive dock- for drug design. Led by Vijay S. receptor interaction (involved in can- ings, you actually are collecting more Pande, PhD, associate professor of cer); novel chaperone inhibitors (also than just an answer; you’re collecting a chemistry and of structural biology, involved in cancer); and novel antibi- lot of statistics.” Such data could, for Folding@Home focuses on simulating otics that target the bacterial ribosome. example, be used to identify a subset of protein folding and misfolding, but “as “Distributed computing is a key mutants that represent a spanning set— www.biomedicalcomputationreview.org Summer 2007 BIOMEDICAL COMPUTATION REVIEW 25 DOCK THIS: Drug Design Feeds Drug Development Fragment-based design. Drug companies, such as SGX pharmaceuticals, screen hundreds of fragments in their fragment libraries and identi- fy hits that serve as the building blocks for novel drug candidates. Knowledge of the binding mode of each fragment to its target is com- bined with advanced computational tools to produce “engineered” drug leads. For example, in this series, a hit is first identified through crystallographic screening (yellow); then chemical groups (red and pink) are added to the bound fragment to increase its binding affinity. Courtesy of SGX Pharmaceuticals, Inc. aspect to this, as it allows us to do cal- that a really large company would have, culations otherwise impossible,” you take compounds that are say one- Distributed Pande says. third of the size, and explore them com- binatorically. If you explored ten frag- computing is key to ments in three different positions, FRAGMENT-BASED DESIGN you’d actually explore 1000 combina- developing better, Fragment-based methods take a tions. So with a database of something like 400 compounds, you can explore a “Lego” approach to drug design. In a more accurate lab, scientists create chemical libraries of small compounds, or fragments—per- chemical space that is in the several mil- lions,” says Sir Tom Blundell, FRS, FMedSci, professor and chair of bio- algorithms for haps one-third the size of a typical drug—that are easily linked together. chemistry at the University of They then screen the libraries for bind- Cambridge. In 1999, Blundell co- computer-aided ing activity experimentally, using high- founded Astex Therapeutics to do frag- throughput X-ray crystallography (or ment-based methods; the company is drug design, says NMR or mass spectrometry); when a now testing a kinase inhibitor—a type of fragment binds to the target, the crys- cancer drug—in clinical trials. Vijay Pande. “It tallography provides an exact 3-D pic- “The experiment is really one of ture of the bound fragment in the active using crystallography to do your allows us to do site. Next, with the help of computer screening. So you’ve pushed the crys- tallography technology to the point modeling, fragments are turned into where you can do it so rapidly that it calculations potent drug leads by adding new chem- ical groups to the initial core fragment becomes effective to use as a screening tool,” says Siegfried Reich, PhD, vice otherwise or by stitching together several frag- ments that bind to different points in president of drug discovery at SGX the active site. Pharmaceuticals, another company impossible.” “I think this approach is showing that uses fragment-based methods. quite good promise,” Varghese says. (Reich previously helped develop the “In fact, with the advent of these mod- HIV protease inhibitor nelfinavir at ern synchrotrons, scientists can do this Agouron.) When it was founded in fairly quickly—and a lot of pharmaceu- 1999, SGX was named Structural tical companies are moving in this Genomix and its aim was to use high direction.” throughput X-ray crystallography to The approach offers a combinatorial solve a record number of protein struc- advantage: “Instead of having a data- tures. But this was not sustainable as a base of say four million compounds business model. So, in 2000, the com- 26 BIOMEDICAL COMPUTATION REVIEW Summer 2007 www.biomedicalcomputationreview.org “When you’re talking about toxicity, it’s much easier to give a compound to a rat than it is to dock against all possible proteins that are in the rat, even today,” says Art Olson. “But someday, you might be able to do that. We’re certainly creeping up on that.” pany changed its name to SGX way, SGX got their first hit down to Pharmaceuticals and put its crystallog- nanomolar potency—i.e. very little of the raphy power to use in drug discovery. compound was required in order to bind One of their lead candidates is a new the protein—in about three months. inhibitor of BCR-ABL, a perpetually “That gives you a flavor for how fast this active kinase enzyme involved in chronic can go,” Reich says. myelogenous leukemia, or CML. The BCR-ABL inhibitor Gleevec has had TRICKY TARGETS enormous success in treating CML Docking algorithms and fragment- patients, but 20 percent are resistant to based methods work well on soluble Gleevec. So scientists at SGX cloned, Tricky Target. This computer model of a enzymes that are easily crystallized and expressed, purified, and crystallized the bacterial cell membrane helped scientists contain well-defined pockets where lig- Gleevec-resistant protein. Then they at Polymedix design new antibiotics that ands can bind—but many diseases screened their fragment library against mimic the action of the defensin proteins instead involve membrane-bound recep- the wild type and mutant versions of (natural proteins in the body that kill tors or protein-protein interactions. BCR-ABL to find compounds active bacteria by puncturing their membranes). Membrane-bound receptors transmit against both. The fragment hit that even- Courtesy of Polymedix. signals from outside to inside the cell. tually led to their lead candidate started Because the proteins are embedded in with a low binding affinity of just 10 the membrane, they cannot easily be ture-based methods and have helped micromolars (i.e., a fairly high concen- crystallized and it is difficult to solve develop many drugs, including drugs to tration of compound was required to their structures. For example, 25 percent treat high blood pressure, pain, and bind at least half the protein). of the top 100 drugs on the market today depression. This is where the medicinal chemists target G-protein coupled receptors— Protein-protein interactions occur via and structural biologists sit down with including the dopamine and serotonin surfaces that are often featureless and the computational chemists, Reich says. receptors in the brain—but the structure shallow, and binding affinities can be Computational chemists virtually build of only one mammalian G-protein cou- quite large—so it’s hard for small mole- new compounds by adding chemical pled receptor is known. cules to disrupt these interactions, says groups to the starting fragment. For When structural information is Arthur Olson of Scripps Research example, they might try linking all the unavailable, computational chemists use Institute. You have to find or design different simple alkyl amines to one of ligand-based methods to hunt for new drugs that can bind to multiple the fragment’s “chemical handles” (sites drug leads. They superimpose a set of lig- footholds, or hot spots, on the protein on the fragment that easily bind to ands with known activity against the tar- surface, which is challenging, he says. “I other chemical groups), Reich explains. get and compare their structural and think that this is an area that is really still The computer calculates the binding chemical features. A common pattern, in its infancy.” affinity for each iteration, until it finds called a pharmacophore, emerges—key But some progress is being made. one with tight binding. Specialized ver- functional groups (such as hydrogen Kent Stewart of Abbott Labs hopes to sions of docking programs are used to bond donors, electrostatic charges, and control BCL-2, a protein that is over- calculate the binding affinities. But hydrophobic patches) must be in certain expressed in certain cancers. It blocks because you already know exactly how positions. This fingerprint is then used apoptosis (programmed cell death) and the fragment binds, you start with more to virtually screen libraries for novel thus keeps cancer cells alive. Compared information than in virtual screening. compounds with similar patterns. to HIV, Stewart says, which has an actu- By elaborating their initial lead in this Ligand-based methods pre-date the struc- al cave you can dock a molecule into, on www.biomedicalcomputationreview.org Summer 2007 BIOMEDICAL COMPUTATION REVIEW 27 DOCK THIS: Drug Design Feeds Drug Development Cancer Interference. The oncogenic protein BCL-2 helps keep cancer cells alive via a protein-protein interaction. This Bcl-2 inhibitor—devel- oped at Abbott using a fragment-based approach—binds to the BCL-2 protein surface and disrupts the protein-protein interaction. The com- pound is in late preclinical development. Courtesy of Abbott. BCL-2, “there’s no such thing as a cave; defensins—natural proteins found in The result: drug leads one-tenth the it’s a very flat and open surface, so it’s the body that kill bacteria. size of the defensins, but about 100-fold hard to get molecules that actually “They work similarly to a needle or a more potent and 1000-fold more selec- stick,” So, using a fragment-based corkscrew going into a balloon. They tive. “So we’ve been able to improve on approach, scientists at Abbott linked directly attack and perforate the bacteri- nature,” Landekic says. The compounds together two fragments that bind to the al cell membrane,” says Nicholas are now being tested in animal studies. BCL-2 protein surface, resulting in a Landekic, MBA, President, CEO, and “We’ve spent less than 14 million dol- potent compound that can disrupt the co-founder of Polymedix. Because they lars to date since starting Polymedix, so protein-protein interaction. The com- do not target bacterial proteins—which in terms of an efficiency and efficacy pound is now in late preclinical devel- can easily evolve to escape drug pres- rate, I think that’s pretty good,” he adds. opment. sures—defensin-like drugs should not Some companies have made these engender bacterial resistance, he says. difficult targets their niche area. For Scientists at Polymedix built a com- example, Polymedix’s mission is to putational model of a defensin protein MAKING CHEMICALS develop drugs against membrane-bound inserted into a bacterial cell membrane INTO DRUGS targets, protein-protein interactions, (a peptide-membrane interaction). Computer-aided methods can identi- and membrane-protein interactions, Then they virtually transformed the fy drug leads with potent activity against using a suite of computational tools defensin protein into a drug-sized com- a target, but these compounds are far specifically developed for these aims (by pound. By swapping amino acid groups from being drugs. Drugs must also be professors William DeGrado, PhD, for chemically analogous small mole- bioavailable and safe. Safety problems and Michael Klein, PhD of the cule groups, they shrunk the protein derail many drugs late in development, University of Pennsylvania). while preserving its chemical interac- so identifying potential safety snags Polymedix is working on a new line tions (electrostatics, lipophilicity, etc.) early on could save considerable time of antibiotics that mimic the action of within the membrane. and money. 28 BIOMEDICAL COMPUTATION REVIEW Summer 2007 www.biomedicalcomputationreview.org HIV Protease Inhibitor. The second-generation HIV protease inhibitor, Kaletra, was developed at Abbott. Here Kaletra is shown bound to the active site of HIV protease. Courtesy of Abbott. “How well can we evaluate bioavail- much easier to give a compound to a rat (known as “Lipinski’s Rule of Five”) ability and toxicity in silico? It’s than it is to dock against all possible that are associated with favorable pretty blunt and not a very popular proteins that are in the rat, even today,” ADME profiles, such as having five or answer: we don’t do very well,” he says. “But someday, you might be fewer hydrogen bond donors and a Stewart says. “The biological mecha- able to do that. We’re certainly creeping molecular weight below 500. nisms underlying bioavailability up on that.” With enough computing power, sci- and toxicity are complex. So the math- Computers do play a role today, entists can also virtually screen a candi- ematical models in those areas are still however. Drugs must meet properties date compound against a large panel of in their infancy,” that fall under the ADME acronym: be proteins from the body, to make sure the Olson agrees: We are a long way Absorbed by the body, Distributed to compound will not cross react with other from being able to simulate a drug’s the target tissues, and not Metabolized enzymes or receptors to cause side effects. effect on the entire human body. or Excreted too quickly. Software pro- To ensure that molecules identified “When you’re talking about toxicity, it’s grams check molecules for key features in the computer will have real-world For the field to progress, says Anthony Nicholls, the current software needs to be more closely scrutinized—using prospective studies that directly compare the impact of computer-aided methods with more traditional drug design approaches. www.biomedicalcomputationreview.org Summer 2007 BIOMEDICAL COMPUTATION REVIEW 29 DOCK THIS: Drug Design Feeds Drug Development “I think in the next seven to ten years, with the computational power that’s coming on line here pretty soon and the steady development in algorithms, computer-aided design is going to make a huge difference,” says Richard Casey. value, computational scientists benefit ing whether large investments in tech- atively late—in the mid-to-late 1990s. By from working closely with medicinal nology, including computer-aided this time, computer-aided drug design chemists during lead identification and drug design, are paying significant was well integrated into big pharmaceu- optimization. dividends. tical companies. Several companies “Medicinal chemists would tell you Many modeling programs are unreli- quickly identified binding sites and that there’s lots of intuition involved, so able, and they are not making a big dif- designed inhibitors, many of which are it’s not all computational,” says Hans ference in the real world, cautions now in early clinical trials. “It is expect- Wolters, PhD, associate director of Anthony Nicholls, President and ed to completely change the treatment informatics at XDx, Inc. For example, CEO of OpenEye Scientific Software, paradigm for HCV infected patients,” he says that as computer scientists which develops software for computer- Klumpp says. became more involved in making drugs, aided drug design. “It’s all done on Richard Casey, PhD, founder and the molecular weight of candidate com- faith. It’s all done on the idea that ‘oh, chief scientific officer of RMC pounds began to creep up precipitous- we’re using computers, so it must be Biosciences, Inc., has also witnessed the ly—to sizes that would not be easily better,’” he says. “I think a lot of people dramatic effect that computers can have absorbed by the human body. are fooling themselves.” He believes on drug design. His company provides Medicinal chemists help recognize this that, for the field to progress, the cur- computer-aided drug design services for type of problem early in the process. rent software needs to be more closely small and mid-size pharmaceutical com- scrutinized—using prospective studies panies, which often lack in-house teams. that directly compare the impact of Recently, he made 3-D models and DEBATING THE IMPACT computer-aided methods with more tra- performed in silico docking studies for a In the past two decades, although ditional drug design approaches. mid-size pharmaceutical company that computer-aided drug design has Other scientists agree that the algo- had identified active lead compounds but become an integral part of drug dis- rithms are still being refined, but have a had no understanding of how they were covery, some remain skeptical as to more optimistic outlook. They say that binding the target, an RNA synthetase. whether these methods are delivering progress is steady and that computer- “When they saw this for the first on their promise. The productivity of aided design is already having an time, it was the ‘aha’ effect: So that’s the pharmaceutical industry has actu- impact. Klaus Klumpp, PhD, an asso- why this compound has high activity ally declined in the past decade (The ciate director at Roche (who was and this compound does not. It was a FDA approved 58 drugs from 2002 to involved in the development of the HIV real eye-opener for them,” Casey says. 2004 compared with 110 from 1994 to protease inhibitor saquinavir), points to “I think in the next seven to 1996, according to the Tufts Center a suite of emerging drugs for hepatitis C ten years, with the computational for Drug Development.) Though this virus (HCV) as a case in point. power that’s coming on line here is likely due to many factors—in partic- HCV was discovered in 1989 and pretty soon and the steady develop- ular, tightening safety standards and the virus was difficult to grow, so struc- ment in algorithms, computer- the enormous cost and time of clinical tural information for HCV polymerase aided design is going to make a huge trials—the trend has left some wonder- and HCV protease became available rel- difference.” ■ 30 BIOMEDICAL COMPUTATION REVIEW Summer 2007 www.biomedicalcomputationreview.org simbios news SimbiosNews BY KATHARINE MILLER In the (Protein) Loop n the gaps between the tight coils and flattened sheets I that comprise most protein structures, flexible loops wave and bend. When crystallized, these loops can appear fuzzy in an electron density map—like moving objects captured in a still photograph. Often, loops may have an important role in a protein’s function, but because they are so mobile, their structure and dynamics can be hard to study. To better understand how protein loops move, Simbios researchers have created LoopTK, a toolkit that samples and visualizes many conformations of a loop, and provides The Latombe group’s seed sampling algorithm successfully various algorithms to manipulate and analyze loop struc- defines the motion space for loops surrounded by empty space tures. “We want to find answers that are distributed over (as shown here) as well as for loops that are more constrained all the motion space,” says Jean-Claude Latombe, PhD, by the surrounding protein structure (not shown). In this pic- a roboticist and professor of computer science at Stanford ture, the red dots show the positions of the middle C atom of University whose team developed the software. LoopTK is the loop in many sampled conformations, but for clarity only a now available for download on the SimTK.org web site. small number of these conformations are displayed in their Latombe and his colleagues set out to place protein loops entirety. Courtesy, Jean-Claude Latombe and Peggy Yao. so that they correctly connect up with the protein’s coils and sheets while avoiding atomic clashes in the loop and between the loop and the rest of the protein. “Solving both other allows you to explore specific regions of the motion constraints simultaneously is the hard part,” says Latombe. space in more detail.” “That’s what we do with LoopTK. And we can do it very Latombe’s group is working with others on two appli- fast. We can sample many conformations very quickly.” cations of LoopTK. With the part of the Joint Center for LoopTK relies on two techniques: seed sampling and Structural Genomics located at the Stanford Linear deformation sampling. The seed sampling algorithm starts Accelerator Center, they are interpreting fuzzy electron den- with nothing but the amino acid sequence of the protein. sity maps created from X-ray crystallography. “One would It then tries to place the loop in the full range of possible like to know the full range of loop conformations that could solutions. When several correct placements are found, the fit into this fuzziness,” says Latombe. The resulting loop posi- deformation sampling algorithm is used to deform the tions could then be submitted to the Protein Data Bank. loop slightly without breaking the ends and without creat- “Biologists need to be aware of the flexibility of the loop and ing collisions among the atoms. “The two techniques are the uncertainty in the conformation,” says Latombe. very complementary,” says Latombe. “One gives you a LoopTK can provide a sense of which conformations are global picture of the entire molecule in space, and the more likely—a characterization of the distribution of possible conformations. In a second project, LoopTK is being used for functional homology research. Russ Altman, PhD, chair of Stanford’s DETAILS bioengineering department, and his group are trying to extract structural knowledge based on partial knowledge LoopTK, a C++ based object-oriented toolkit, models about a protein’s function. For example, if a protein X is the kinematics of a protein chain and provides known to bind to pro- methods to explore its motion space. In LoopTK, a tein Y, LoopTK might protein chain is modeled as a robot manipulator help to infer possible with bonds acting as links and the dihedral degree conformations of the of freedoms acting as joints. loop that are consistent LoopTK is now available for download at with such binding. https://simtk.org/home/looptk. An application “There might be programming interface (API) lets users embed dozens or more applica- LoopTK in their application software. tions for this tool,” says LoopTK will be presented at the 7th Workshop on Latombe. “What we Simbios is a National Center for Algorithms in Bioinformatics in Philadelphia on hope is that by putting Biomedical Computing located September 8-9, 2007. (http://www.wabi07.org/ ) it on the web site other at Stanford University. people will explore those possibilities.” ■ www.biomedicalcomputationreview.org Summer 2007 BIOMEDICAL COMPUTATION REVIEW 31 under the hood Under TheHood BY CHIH-WEN KAN AND MIA K. MARKEY, PhD Mutual Information utual information (MI) is defined in information and post-operatively in M theory as a measure of the dependencies between two random variables. There are many biomedical applications in which it is beneficial to quantify the infor- order to assess the success- fulness of a surgery. To facilitate the interpretation mation content using a measure such as MI. In classifica- of such sets of images, reg- tion problems, MI is used as a dependence measure to select istration—the process of features such that they are dissimilar from each other in aligning multiple images—is neces- order to reduce feature redundancy. MI can also be used in sary. The goal of registration is to identify a transformation database retrieval. The MI is calculated between a query that maps each point in one image to the corresponding item and every entry in the database in order to identify the point in the other image. entry in the database that is most similar to the query item. One approach to image registration is based on defin- In image processing, it is also used extensively as a sim-ing landmarks or fiducial points in the images. By deter- ilarity measure for image registration and for combining mining how to align those landmarks, one can determine multiple images to build 3D models. We will use the appli- how to transform one image to match the other. However, cation domain of medical image registration to illustrate manual definition of landmarks is time consuming, may the utility of MI. be difficult even for an experienced observer, and suffers The mutual information of random variables A and B from intra- and inter-reader variability. is defined as Another approach to image registration is to determine a transformation based on a measure of the similarity of I(A,B) =∑ ( p(a,b)log p(a,b) p(a)p(b)) the images, such as MI. Since larger MI corresponds to a,b more similarity of the two images, MI is maximized in reg- istration algorithms. where p(a,b) is the joint probability distribution function In image registration, the goal is to determine a trans- of A and B, and p(a) and p(b) are the marginal probabili- formation of one image such that the MI between the trans- ty distribution functions of A and B, respectively. formed image and the reference image is maximized. Thus, in the context of medical image registration, MI Different types of transformations may be considered based measures the distance between the joint distributions of the on the application. The simplest class of transformations images’ gray only permits values p(a,b) and the distribution In image registration, the goal is to determine rotations and translations. In when the two medical imag- images are inde- a transformation of one image such that the ing, a wider vari- pendent from ety of scaling each other. It is a mutual information between the transformed and shape measure of the changes are dependence be- image and the reference image is maximized. often needed, tween the two including non- images. Since the mutual information I(A,B) is the reduc- linear transformations that allow for non-uniform changes tion in the uncertainty of A due to the knowledge of B, across the image. An optimization algorithm is applied to when p(a) = p(b), the uncertainty is minimal and the reduc- dynamically search among transformations for the one with tion of uncertainty is maximized. maximal MI. In medical imaging, it is often necessary to compare MI has been shown to be especially valuable for regis- images of a patient that are acquired at different times or by tering multi-modality images. For example, computed different modalities. For example, images may be taken pre- tomography (CT), positron emission tomography (PET), and magnetic resonance imaging (MRI) images of the same patient provide complementary DETAILS information. Registration based on MI enables a Chih-Wen Kan is a graduate student in The University of Texas healthcare provider to directly correlate the data Department of Biomedical Engineering. She works on developing from such different imaging techniques. MI has diagnostic decision support systems in Dr. Mia Markey’s also shown promise for registering time series Biomedical Informatics Lab (http://bmil.bme.utexas.edu/). images. A series of images over time is often used to evaluate tissue function in addition to structure. ■ 32 BIOMEDICAL COMPUTATION REVIEW Summer 2007 www.biomedicalcomputationreview.org putting heads together PuttingHeadsTogether The 6th Annual International Conference on The Pacific Symposium on Biocomputing (PSB) 2008 Computational Systems Bioinformatics (CSB2007) coordinated by the Life Sciences Society. WHAT: The Pacific Symposium on Biocomputing (PSB) 2008 is an WHAT: This conference is international, multidisciplinary designed for any scientist conference for the presentation and interested in the interaction of discussion of current research in biology and computing who the theory and application of wants to gain fast access to computational methods in current research results; network problems of biological significance. with other life scientists; and PSB is a forum for the presentation listen to and meet scientific of work in databases, algorithms, stars. CSB2007 will continue to interfaces, visualization, modeling, be a five-day single track and other computational methods, conference featuring 10 half-day as applied to biological problems, tutorials, 30 referred papers plus with emphasis on applications in keynote speakers, 150 posters data-rich areas of molecular biology. and five full-day workshops. Special events for the evenings are Papers and presentations are being planned. rigorously peer reviewed and are published in an archival WHEN: August 13-17, 2007 proceedings volume. WHERE: University of California, San Diego WHEN: January 4-8, 2008 MORE INFO: WHERE: The Fairmont Orchid on http://lifesciencessociety.org/CSB2007/index07.html the Big Island of Hawaii DEADLINES: Call for Papers—July 16, 2007; Poster abstract submissions—Nov. 9, 2007. Stanford’s Bio-X Symposium: Life in Motion MORE INFO: http://psb.stanford.edu/ WHAT: Bio-X, Stanford’s interdisciplinary life sciences initiative, hosts a major symposium each year. This year Bio-X has teamed up with Simbios—Stanford’s National NIH Center OF NOTE: This year, Simbios will be holding a special session for Physics-based Simulation of Biological Structures—to hold at PSB: Multiscale Modeling and Simulation: from Molecules to a symposium entitled, “Life in Motion”. The goal of this Cells to Organisms symposium is to educate students and scientists from different disciplines about the exciting uses of simulations driven by the laws of physics and mechanics across a range of scales, from molecules to organisms. The talks will be presented by a series WHY “PUTTING HEADS TOGETHER”? of experts and innovators from around the world. Confirmed speakers are: Sylvia Blemker; Joachim Frank; Robert Full; This magazine strives to build connections Jessica Hodgins; John Hutchinson; Roger Kamm; Mimi among diverse researchers, all of whose work Koehl; Vijay Pande; Klaus Schulten; Demetri Terzoplulos. touches on biomedical computation. Because these highlighted conferences & symposia WHEN: October 25, 2007 do the same thing, we are giving them a well-deserved spot in these pages. If you have WHERE: James Clark Center Auditorium, a favorite conference you’d like to see Stanford University appear in this magazine, let us know: editor @ biomedicalcomputationreview.org. MORE INFO: simtk.org/home/lifeinmotion www.biomedicalcomputationreview.org Summer 2007 BIOMEDICAL COMPUTATION REVIEW 33 Nonprofit Org. U.S. Postage Paid Permit No. 28 Palo Alto, CA Biomedical Computation Review Simbios A NATIONAL CENTER FOR BIOMEDICAL COMPUTING Stanford University 318 Campus Drive Clark Center Room S231 Stanford, CA 94305-5444 seeing science SeeingScience BY KATHARINE MILLER Remodeling by Curvature W henever a cell needs to get rid of waste, transport materials, sort proteins, or build new Researchers knew that specialized pro- teins are involved in triggering mem- branes to remodel themselves, but exper- Using coarse-grained simulations, Kurt Kremer, PhD, Markus Deserno, PhD, and their colleagues at the Max organelles, membranes remodel them- imental and theoretical research could Planck Institute for Polymer Research in selves. Often that means forming small not explain how they do it. Because the Mainz, Germany, showed that curvature- enclosed compartments called vesicles. energy required for major remodeling mediated attraction can indeed explain Now researchers have gained a better projects is greater than the energy used to how membranes refashion themselves. understanding of that process using bind the specialized proteins to the mem- Once a membrane starts to bend, pro- coarse-grained computer simulations. brane (or to each other), some suspected teins embedded in that membrane begin The work was published in the May 24, that membrane curves themselves could to cluster and draw the membrane into a 2007 issue of Nature. carry the necessary energy. curved shape—not unlike a vesicle. The coarse-grained membrane simulation starts with a flat membrane containing 46,080 lipids and 36 large hemispherical “caps” (shown in pink) representing membrane proteins. Over the course of roughly one millisecond, the proteins begin to aggregate and form a large vesicle. The final image shows a cross-section of the vesicle in order to reveal the protein caps within. Courtesy of Kurt Kremer and Markus Deserno.