Grid Added Value to Address Malaria V. Breton1, N. Jacq1+2 and M. Hofmann3 1 Laboratoire de Physique Corpusculaire, Université Blaise Pascal/IN2P3-CNRS UMR 6533, France 2 Communication & Systèmes, CS-SI, France 3 Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Department of Bioinformatics, Germany Breton,Jacq@clermont.in2p3.fr, firstname.lastname@example.org Abstract and, moreover, can also contribute significantly to the monitoring of ground studies to control malaria and to Through this paper, we call for a distributed, the clinical tests in plagued areas. internet-based collaboration to address one of the It is well understood that the grid itself cannot drive worst plagues of our present world, malaria. The spirit drug development, but it can function as the catalyst is a non-proprietary peer-production of information- that brings the actors (biochemists, physicians, etc) embedding goods. And we propose to use the grid together and pushes them ahead in the same direction. technology to enable such a world wide “open source” like collaboration. The first step towards this vision 2. Genomics research areas on malaria has been achieved during the summer on the EGEE grid infrastructure where 46 million ligands were To increase the chances to develop new and better docked for a total amount of 80 CPU years in 6 weeks drugs and vaccines, it is very important to know the in the quest for new drugs. sequence of the genomes of the parasites that cause malaria as well as the specifics of gene and protein 1. Introduction expression at different stages in the life cycle and under pressure from different drugs. The number of cases and deaths from malaria The availability of genomic information for increases in many parts of the world. There are about Anopheles gambiae, the major vector in Africa, and 300 to 500 million new infections, 1 to 3 million new Plasmodium spec. enables us now to work out new deaths and a 1 to 4% loss of gross domestic product (at approaches to interfere with the mechanism of the least $12 billion) annually in Africa caused by malaria. development of infectious sporozoites in anopheles The main causes for the comeback of malaria are mosquitoes and for reducing contact between infectious that the most widely used drug against malaria, mosquitoes and humans. Much effort is also being chloroquine, has been rendered useless by drug devoted to biochemical and molecular studies of resistance in much of the world  and that anopheles resistance mechanisms and to mapping of resistance mosquitoes, the disease vector, have become resistant genes. to some of the insecticides used to control the mosquito population. 2.1. Development of vaccines Genomics research has opened new ways to find novel drugs to cure malaria, vaccines to prevent In the development of vaccines, the value of malaria, insecticides to kill infectious mosquitoes and knowing the sequences of certain genes and/or peptides strategies to prevent development of infectious is undeniable. It is known that susceptibility to sporozoites in the mosquito . These studies require Plasmodium can be entirely prevented for 9 months by more and more in silico biology; from the first steps of immunizing with radiation attenuated sporozoites  gene annotation via target identification to the and those children who live to 3 to 10 years of age in modelling of pathways and the identification of areas endemic for malaria rarely if ever develop severe proteins mediating the pathogenic potential of the malaria and die. If protective immunity requires parasite. Grid computing supports all of these steps induction of immune responses against 100 or even 500 different target proteins, genomics research offers the malaria, but this research must be strongly coupled to most immediate way to identify those targets and to ground studies in order to guide disease controllers, begin the process of developing vaccines based on especially those working in countries with annual them. health budgets of less than $10 per person. Evaluation However, progress has been painfully slow, and the of the impact of new drugs and new vaccines require few months duration of protection from the best careful monitoring of clinical tests, especially in areas currently available vaccine is inferior to what can be of high malaria transmission where it is important to achieved with insecticide-treated nets or house distinguish recurrence of parasites, due to spraying. When better vaccines eventually emerge, they recrudescence of incompletely cured infections, from should probably be used in conjunction with vector reinfection due to new mosquito bites. Disease control to avoid their effects being hidden in areas of controllers are also faced with the necessity to monitor intense transmission. Whether vaccines and vector goal-oriented field work on a long term. control synergize will have to be tested in the field and will not be predictable from molecular properties. The grid technology provides the collaborative IT environment to enable the coupling between molecular 2.2. Human susceptibility to malaria biology research and goal-oriented field work. It proposes a new paradigm for the collection and Knowledge of the human genome offers analysis of distributed information where data are no unprecedented potential for understanding who is and more to be centralized in one single repository. On a is not susceptible to dying from malaria, and who can grid, data can be stored anywhere and still be benefit most from a particular type of vaccine. Having transparently accessed by any authorized user. The the human and parasite blueprint, and the computing resources of a grid are also shared and can computational biology capacity that goes with it, may be mobilised on demand so as to enable very large allow developing a simple diagnostic that would scale genomics comparative analysis and virtual identify at birth who is most at risk of dying from screening. malaria. A single nucleotide polymorphism in the gene encoding the beta chain of human haemoglobin is The motivating perspective is to enhance the ability associated with a 90% decrease in the chance of dying of both pharmaceutical industry and academic research from a Plasmodium falciparum infection. Genomics institutions to share diverse, complex and distributed research allows exploring if other SNPs or SNPs information on a given disease for collaborative complexes in the human, anopheles and parasite exploration and mutual benefit. Leading international genomes could impact the infection and the disease. pharmaceutical groups such as Pfizer, Sanofi-Aventis and Novartis are already involved in activities of the 2.3. Transgenesis United Nations / WHO to combat diseases of the poor. The goal here is to lower the barrier to such substantive There is also interest about transgenesis as a way to interactions in order to produce cheaper drugs and generate strains of mosquito that cannot transmit insecticides to address diseases affecting third world malaria. However, without extremely reliable systems development and to increase the return on investment for driving the transgenes into wild sector populations, for new drugs in the developed countries. possession of a non-transmitter strain would be of no practical use. 4. Grid added value to address malaria A more feasible and acceptable way of using transgenesis may be to improve the sterile insect 4.1. In silico molecular biology technique, which would only require release of non- biting males. Genomics research areas on malaria were discussed previously and include: 3. Rationale for a grid to address malaria - Search for targets on Anopheles For both vector control and chemotherapy, knowing genome and proteome for new the gene sequences of Anopheles and Plasmodium insecticides species should lead to discovery of targets against - Search for targets on Plasmodium which new insecticides or antimalarial drugs can be genomes and proteomes for new produced. Genomics research is crucial to attacking drugs (e.g. inhibitors of parasite In silico drug discovery should foster collaboration development) between public and private laboratories. It should also - Identification of target proteins on have an important societal impact by lowering the Plasmodium proteomes to induce barrier to develop new drugs for rare and neglected sustainable immune responses diseases. - Study of human susceptibility to malaria : identification of SNPs on 4.3. Federation of databases for clinical tests in human and Plasmodium genomes plagued areas related to sensitivity - Study of Plasmodium and Grid technology opens new perspectives for Anopheles genomics to prevent preparation and follow-up of medical missions in development of infectious developing countries as well as support to local sporozoites in anopheles medical centres in terms of teleconsulting, mosquitoes telediagnosis, patient follow-up and e-learning . In - Study of mechanisms for drug every hospital, patients are recorded in the databases resistance; including distributed which are federated in a network. Such a federation of monitoring of epidemiological databases allows medical data to be kept distributed in parameters the hospitals behind firewalls. Views of the data are - Transgenesis to generate strains of granted according to individual access rights through mosquitoes that cannot transmit secured networks. malaria Such a federation of databases can be used to monitor ground studies in developing countries. The These studies require extensive computing and data stored can be used for epidemiology purposes as storage resources. well as clinical tests. In testing antimalarial drugs in areas of high malaria 4.2. In silico drug discovery transmission, it is important to distinguish recurrence of parasites, due to recrudescence of incompletely cure Virtual screening is about selecting in silico the best infections, from reinfection due to new mosquito bites. candidate drugs acting on a given target protein. Molecular matching of the parasite clones in the same Screening can be done in vitro using real chemical individuals before and after treatment is a way of doing compounds, but this is a very expensive undertaking. If this. it could be done in silico in a reliable way, one could When better vaccines eventually emerge, they reduce the number of molecules requiring in vitro and should probably be used in conjunction with vector then in vivo testing from a few millions to a few control to avoid their effects being hidden in areas of hundreds . Advance in combinatorial chemistry has intense transmission. Whether vaccines and vector paved the way for synthesizing millions of different control synergize will have to be tested in the field and chemical compounds. Thus there are millions of will not be predictable from molecular properties. chemical compounds available in pharmaceutical laboratories and also in a very limited number of 4.4. Knowledge space for disease information publicly accessible databases. A coordinated public effort on in silico drug The goal is to make all relevant information on the screening would require to set up a “public” collection disease available to all interested parties. The concept of virtual compounds from existing compound of a knowledge space is to organize the information so databases and compound catalogues (PDB  Ligand that it can be reached in a few clicks. This concept is Chemistry, KEGG-Ligand , compound catalogues, already successfully used internally by pharmaceutical PubChem …). This public compound repository laboratories to store knowledge . The grid allows should also comprise all information known about building a distributed knowledge space so that each biological activities of the compounds in this participant is able to keep the information he owns on repository. Moreover, information on possible his local computer. A set of grid services is proposed to strategies for the synthesis of these compounds should make information easily consultable for the different be available to speed up the in vitro testing of potential potential customers (physicians, researchers, etc.). candidate drugs after in silico screening. These services would take advantage of the developments in the area of semantic text analysis and text mining for extraction of information in biology and - Overseeing of vector control genome research. Moreover, a structured knowledge requiring data storage and space would produce grid services for indexing of management distributed data resources and thus improve navigation through knowledge and retrieval of relevant The grid should gather: information. - Biomedical laboratories searching 4.5. Monitoring of fieldwork to control malaria for vaccines, working on the genomes of the parasite and/or the Monitoring tools to control malaria in plagued areas parasite vector and/or the human include reducing contacts between infected mosquitoes genome and humans by filling in breeding sites, larviciding, - Drug designers to identify new spraying houses with insecticides, insecticide- drugs impregnated bed nets, and house screening. Effective - Healthcare centres involved in application of these measures would lead to reduce clinical tests morbidity and mortality from malaria. However, given - Healthcare centres collecting patent the extremely high transmission rate of Plasmodium information falciparum, especially where Anopheles gambiae - Structures involved in distributing mosquitoes predominate, the impact of these tools can existing treatments (healthcare be limited by the capacity of mosquitoes to develop administrations, non profit resistance and by the requirement for maintenance of organizations…) and enforcing the interventions for many years. vector control Long term monitoring of these tools clearly would - IT technology developers and benefit of technologies allowing a better collection and computing centres analysis of distributed data in developing countries as is the case with grids. The grid should provide the following services: Indeed, internet is now accessible worldwide so the idea is to organize the collect of information around - Large computing and storage regional centres acting as local repositories. These resources for genomics research repositories would be federated thanks to the grid and virtual docking technology at a world level for a general overview of - Access to relevant genomes, the vector control work at the level of international regularly updated data bases and agencies or non-profit organizations. publications - Knowledge space with genomics 5. Technology requirements for a malaria and medical information (epidemiology, status of clinical grid tests, drug resistances, etc.) - Collaboration environment for the The goal of a grid for malaria would be to handle all participating partners. No one entity aspects of the fight against malaria: can have an impact on all R&D aspects involved in addressing one - Search of new drug targets through disease as complex as malaria. The post-genomics requiring data grid would act as a virtual management and computing laboratory of the different actors. - Massive docking to search for new - Federation of regional data bases drugs requiring high performance for monitoring of vector control and computing and data storage clinical tests - Handling of clinical tests and patent data requiring data storage and management 6. An open source approach to drug - Overseeing the distribution of the discovery existing drugs requiring data storage and management The proposed grid to address malaria borrows the “open source” approach that has proven so successful in software development. Open source approaches have emerged in the biotechnology already. The 7.1. EGEE - an enabling grid for eScience international effort to sequence the human genome for instance resembled an open source initiative. It placed The EGEE project  (Enabling Grid for E- all the resulting data into the public domain rather than sciencE) brings together experts from over 27 countries allow any participant to patent any of the results. with the common aim of building on recent advances in Grid technology and developing a service Grid The open source research on malaria could be infrastructure which is available to scientists 24 hours- organized as follows: a web site would allow chemists a-day. The project aims to provide researchers in and biologists to volunteer their expertise on certain academia and industry with access to major computing areas of the disease. They would examine and annotate resources, independent of their geographic location. shared databases and perform experiments. The results The EGEE infrastructure is now a production grid with would be fully transparent and discussed in chat rooms. a large number of applications installed and used on the The research would be initially mainly computational, available resources. The infrastructure involves more based on resources provided by grid infrastructures, than 180 sites spread in Europe, America and Asia. and not carried out in “wet” laboratories. The WISDOM application was deployed within the The difference between this proposal and earlier framework of the biomedical Virtual Organization open source approaches in biomedical research is that (VO). Figure 1 gives an overview of the geographical scientists would collaborate on the data and not only on distribution of the resources node available for the software. biomedical applications, which scale up to 3000 CPUs The final development of drug candidates could be and 21 TB disk space. awarded to a laboratory based on competitive bids. The drug itself would go to public domain, for generic manufacturers to produce. This would achieve the goal of getting new medicines to those who need them at the lowest possible price. This model is currently supported by various programs of the United Nations and involves several commercial organisations, including large pharma companies. Open source research could also open the area of non-patentable compounds and drugs whose patents have expired. These receive very little attention from Figure 1: Geographical distribution of EGEE researchers because there would be no way to protect biomedical resources in August 2005. The and so profit from any discovery that was made about resources are spread over 15 countries. their effectiveness. Lots of potentially useful drugs could be sitting under researchers noses. 7.2. The WISDOM initiative 7. WISDOM, first step towards grid- WISDOM stands for World-wide In Silico Docking enabled virtual screening On Malaria. Its goal was to propose new inhibitors for a family of proteins produced by Plasmodium As discussed previously, in silico drug discovery is falciparum using in silico virtual docking on a grid one of the most promising strategies to speed-up the infrastructure. The chosen target is involved in the drug development process. High throughput virtual haemoglobin metabolism, which is one the key screening allows screening millions of compounds metabolic processes in the survival of the parasite. rapidly, reliably and in a cost effective way. There are several proteases involved in human haemoglobin degradation inside the food vacuole of the Grids like EGEE  are ideally suited for the first parasite inside the erythrocytes. Plasmepsin, the docking step where docking probabilities are computed aspartic protease of Plasmodium, is responsible for the for millions of ligands. It has been demonstrated during initial cleavage of human haemoglobin and later the summer 2005 by the WISDOM  initiative on followed by other proteases . There are ten malaria where 46 million ligands were docked for a different plasmepsins coded by ten different genes in total amount of 80 CPU years in 6 weeks. Plasmodium falciparum (Plm I, II, IV, V, VI, VII, VIII, IX, X and HAP) . High levels of sequence been achieved during the summer on the EGEE grid homology are observed between different plasmepsins infrastructure where 46 million ligands were docked for (65-70%). Simultaneously they share only 35% a total amount of 80 CPU years in 6 weeks in the quest sequence homology with its nearest human aspartic for new drugs. Further development of a grid-enabled protease, Cathepsin D4 . This and the presence of virtual screening pipeline is underway in several accurate X crystallographic data make plasmepsins European projects. ideal targets for rational drug design against malaria. Two docking software packages were used in 9. Acknowledgements WISDOM: one is FlexX , a commercial software made graciously available by BioSolveIT  for a Many ideas expressed in this document were limited time, and the other is Autodock , a software inspired by reading  and  as well as discussions which is open-source for academic laboratories and with H. Bilofsky, C. Jones, M. Peitsch, T. Schwede and which uses a different docking method. Chemical R. Ziegler. EGEE is a project funded by the European compounds were obtained from the ZINC database . Union under contract INFSO-RI-508833. A subset of the ZINC database, the ChemBridge database (~500000 compounds), was docked using 10. References FlexX and Autodock. Another drug like subset (~500000 compounds) of ZINC database was docked  R. Elderman et al, J. Infect. Dis. 168, 1066. (1993) using FlexX.  C.F Curtis and S.L. Hoffman, Science 290, 1508-1509 (2000) 7.3. WISDOM - results and perspectives  An open-source shot in the arm ? The Economist Technology Quarterly, June 12th 2004 During 6 weeks, 72751 jobs were launched for a  Fabrizio Gagliardi, Bob Jones, François Grey, Marc-Elian Bégin, Matti Heikkurinen, "Building an infrastructure for total of 80 CPU years, producing 1 TB of data. A scientific Grid computing: status and goals of the EGEE record number of 1643 jobs ran in parallel on 58 grid project". Philosophical Transactions: Mathematical, Physical nodes, each representing from a few to more than 1000 and Engineering Sciences, Issue: Volume 363, Number 1833 processors . / August 15, 2005, Pages: 1729 – 1742, First analysis of WISDOM results at Fraunhofer DOI:10.1098/rsta.2005.1603 Institute (SCAI) has selected the 10,000 most  N. Jacq, J. Salzemann, Y. Legré, M. Reichstadt, F. Jacq, promising compounds out of 1 million candidates using E. Medernach, M. Zimmermann, A. Maas, M. Sridhar, K. docking scores. The best scoring ligands include both Vinodkusam, J. Montagnat, H. Schwichtenberg, M. Hofmann known inhibitors and new ones. Most important is, that and V. Breton, In silico docking on grid infrastructures: the case of WISDOM, submitted to FGCS, 2006 besides well established chemical core structures that  M. Rarey, B. Kramer, T. Lengauer, G. Klebe, Predicting share features with the already known plasmepsin Receptor-Ligand interactions by an incremental construction inhibitors, new chemical entities were discovered. This algorithm, J. Mol. Biol. 261 (1996) 470-489. is a first demonstration of the relevance of the in silico  G.M. Morris, D.S. Goodsell, R.S. Halliday, R. Huey, approach. W.E. Hart, R.K. Belew, A.J. Olson, Automated Docking Before in vitro experimentation, the best compounds Using a Lamarckian Genetic Algorithm and Empirical will be re-ranked using molecular modelling within the Binding Free Energy Function, J. Computational Chemistry, framework of the BioInfoGrid European grid project. 19 (1998) 1639-1662. In parallel, a new massive deployment of in silico  Irwin, Shoichet, J. Chem. Inf. Model. 45(1) (2005) 177- 82. docking is foreseen on the EGEE infrastructure with  H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. new targets in the fall of 2006. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne, The Protein Data Bank, Nucleic Acids Research, 28 (2000) 235-242. 8. Conclusion  Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S., Katayama, T., Araki, M., and Through this paper, we call for a distributed, Hirakawa, M.; From genomics to chemical genomics: new internet-based collaboration to address one of the worst developments in KEGG. Nucleic Acids Res. 34, D354-357 (2006) plagues of our present world, malaria. The spirit is a  Kuntz, I. D.; Blaney, J. M.; Oatley, S. J.; Langridge, R.; non-proprietary peer-production of information- Ferrin, T. E. A Geometric Approach to Macromolecule- embedding goods. And we propose to use the grid Ligand Interactions. J. Mol. Biol. 1982, 161, 269-288 technology to enable such a world wide “open source”  BioSolveIT Homepage: http://www.biosolveit.de like collaboration. The first step towards this vision has  Florence Jacq, Frank Bacin, Nonfounikoun Meda,  S. E. Francis, D. J. Jr. Sullivan, D.E. Goldberg, Denise Donnarieix, Jean Salzemann, Vincent Vayssiere, Hemoglobin metabolism in the malaria parasite plasmodium Michel Renaud, François Traore, Gertrude Meda, Rigobert falciparum, Annu.Rev. Microbiol. 51 (1997), 97-123. Nikiema and Vincent Breton, Towards grid-enabled  G. H. Coombs, D.E. Goldberg, M. Klemba, C. Berry, J. telemedicine in Africa, submitted to IST Africa 2006 Kay, J.C. Mottram, Aspartic proteases of plasmodium conference falciparum and other protozoa as drug targets, Trends  M. C. Peitsch et al., Informatics and knowledge parasitol. 17 (2001), 532-537. management at the Novartis Institutes for BioMedical  A.M. Silva, A.Y. Lee, S.V. Gulnik, P. Majer, J. Collins, Research, SCIP-online, 46 (2004) 1-4. T.N. Bhat, P.J. Collins, R.E. Cachau, K.E. Luker, I.Y.  R.W. Spencer, Highthroughput virtual screening of Gluzman, S.E. Francis, A. Oksman, D.E. Goldberg, J.W. historic collections on the file size, biological targets, and file Erickson, Structure and inhibition of plasmepsin II, A diversity, Biotechnol. Bioeng 61 (1998) 61-67. haemoglobin degrading enzyme from Plasmodium  WISDOM Homepage : http://wisdom.eu-egee.fr falciparum, Proc. Natl. Acad. Sci. USA 93 (1996) 10034-  J. Weisner, R. Ortmann, H. Jomaa, M. Schlitzer, 10039. Angew. New Antimalarial drugs, Chem. Int. 42 (2003) 5274-  PubChem, http://pubchem.ncbi.nlm.nih.gov/ 529.
Pages to are hidden for
"singapore"Please download to view full document