CICC Overview - PowerPoint Presen by wulinqing


									Overview of Chemical Informatics and
  Cyberinfrastructure Collaboratory

                   Aug 16 2006
                   Geoffrey Fox

       Computer Science, Informatics, Physics
         Pervasive Technology Laboratories
      Indiana University Bloomington IN 47401
   Local Teams, successful Prototypes and International
    Collaboration set up in 3 initial major focus areas
     • Chemical Informatics Cyberinfrastructure/Grids with services,
       workflows and demonstration uses building on success in other
       applications (LEAD) and showing distributed integration of
       academic and commercial tools
     • Computational Chemistry Cyberinfrastructure/Grids with
       simulation, databases and TeraGrid use
     • Education with courses and degrees
   Review of activities suggest we also formalize work in two further
     • Chemical Informatics Research – model applicability
     • Interfacing with the User - bench chemist-friendly portal

                       Current Status
   Web site
   Wiki chosen to support project as a shared editable web space
   Building Collaboratory involving PubChem – Global Information
    System accessible anywhere and at any time – enhance PubChem with
    distributed tools (clustering, simulation, annotation etc.) and data
   Adopted Taverna as workflow as popular in Bioinformatics but we
    will evaluate other systems such as GPEL from LEAD
   Preparing large set of runs on local Big Red 23 Teraflop
    supercomputer (OSCAR3 CDK Mopac)
   Initial results discussed at conferences/workshops/papers
     • Gordon Conferences, ACS, SDSC tutorial
   First new Cheminformatics courses offered
   Advisory board set up and met
   Videoconferencing-based meetings with Peter Murray-Rust and group
    at Cambridge roughly every 2-3 weeks
   Good or potentially good interactions with NIH DTP, Scripps, Lilly
    and Michigan ECCR
              CICC Senior Personnel
   Geoffrey C. Fox                    Peter T. Cherbas
                                       Mehmet M. Dalkilic
   Mu-Hyun (Mookie) Baik              Charles H. Davis
   Dennis B. Gannon                   A. Keith Dunker
   Marlon Pierce                      Kelsey M. Forsythe
   Beth A. Plale                      Kevin E. Gilbert
   Gary D. Wiggins                    John C. Huffman
                                       Malika Mahoui
   David J. Wild                      Daniel J. Mindiola
   Yuqing (Melanie) Wu                Santiago D. Schnell
                                       William Scott
      From Biology, Chemistry,
    Computer Science, Informatics
                                       Craig A. Stewart
       at IU Bloomington and           David R. Williams
        IUPUI (Indianapolis)

              CICC Advisory Board
   Alan D. Palkowitz (Eli Lilly)
   Chris Peterson (Kalypsys)
   David Spellmeyer (IBM)
   Dimitris K. Agrafiotis (Johnson & Johnson)
   Horst Hemmerle (Eli Lilly)
   James M. Caruthers (Purdue University)
   Jeremy G. Frey (University of Southampton)
   Joel Saltz (Ohio State University/University of Maryland/Johns
    Hopkins University)
   John M. Barnard (Digital Chemistry)
   John Reynders (Eli Lilly)
   Peter Murray-Rust (University of Cambridge)       Industry and
   Peter Willett (University of Sheffield)           Academia
                                                      Met October 2005
   Thompson Doman (Eli Lilly)                        will meet this fall
   Val Gillet (University of Sheffield)                                  5
                           Chemical Informatics and Cyberinfrastucture Collaboratory
  CICC                                                  Funded by the National Institutes of Health                                CICC

                         CICC Combines Grid Computing with Chemical Informatics
    Large Scale Computing Challenges                                                    Science and Cyberinfrastructure
 Chemical Informatics is non-traditional area of high                                CICC is an NIH funded project to support chemical
 performance computing, but many new, challenging                                    informatics needs of High Throughput Cancer
 problems may be investigated.                                                       Screening Centers. The NIH is creating a data deluge
                                                                                     of publicly available data on potential new drugs.

     NIH          OSCAR
                                 Cluster    Toxicity
   PubMed          Text                                  Docking
                                Grouping    Filtering
   DataBase       Analysis

                   Initial 3D      OSCAR-mined molecular signatures can
Chemical           Structure       be clustered, filtered for toxicity, and
informatics                        docked onto larger proteins. These are
text analysis                      classic “pleasingly parallel” tasks. Top-
programs can                       ranking docked molecules can be further
process           Molecular        examined for drug potential.
100,000’s of      Mechanics
abstracts of     Calculations
online journal
articles to                                        Big Red (and the TeraGrid) will
                                                   also enable us to perform time    CICC supports the NIH mission by combining state of
extract           Quantum
chemical          Mechanics
                                                   consuming, multi-stepped          the art chemical informatics techniques with
                                                   Quantum Chemistry
signatures of    Calculations    DataBase
                                                   calculations on all of PubMed.
                                                                                      • World class high performance computing
drugs.                                             Results go back to public          • National-scale computing resources (TeraGrid)
                                                   databases that are freely          • Internet-standard web services
                                   IU’s            accessible by the scientific
                   Parallel       Varuna           community.
                                                                                      • International activities for service orchestration
                  Rendering      DataBase                                             • Open distributed computing infrastructure for scientists
                                                                                      world wide
                        Indiana University Department of Chemistry, School of Informatics, and Pervasive Technology Laboratories
       CICC Prototype Web Services
 Basic cheminformatics                       Key Ideas

 Molecular weights                   Add value to PubChem with additional distributed services
 Molecular formulae                    and databases
 Tanimoto similarity                 Wrapping existing code in web services is not difficult
 2D Structure diagrams               Provide “core” (CDK) services and exemplars of typical tools
 Molecular descriptors               Provide access to key databases via a web service interface
 3D structures                       Provide access to major Compute Grids
 InChi generation/search
                                             Next steps?

Application based services           Define WSDL interfaces to enable global production of
                                        compatible Web services; refine CML
Compare (NIH)                        Look at Pipeline Pilot

Toxicity predictions (ToxTree)       Extend Computational Chemistry (Varuna) Services

Literature extraction (OSCAR3)       Routine TeraGrid Big Red use

Clustering (BCI Toolkit)             Ready to try “Prototype Production” on OSCAR3 CDK Mopac

Docking, filtering, ... (OpenEye)    Develop more training material

Varuna simulation                    Link to screening center via Scripps
Varuna environment for molecular modeling (Baik, IU)

                 Chemical                    Researcher

Experiments              ChemBioGrid
                                                   Simulation Service
                                                    FORTRAN Code,

                                DB Service
Reaction         QM         Queries, Clustering,
  DB          Database        Curation, etc.                 Condor

     PubChem, PDB,                 Database            TeraGrid
        NCI, etc.                                   Supercomputers
                                                       “Flocks” 8

To top