JChem Base chemical database

Document Sample
JChem Base chemical database Powered By Docstoc
					            JChem Base chemical database


                                Szilárd Dóránt




                                                 1
May, 2005
                                                                   Slide 2


                                                        Contents



                      Introduction          Structure cache
                      Structural overview   Standardization
                      Compatibility         Search options
                      Administration        JSP example
                      JChem tables          API examples
                      Fingerprints          Performance
                      Structural search     Future plans




Jchem Base chemical database — May 2005
                                                               2
                                                                     Slide 3


                                                Introduction


               JChem Base provides high performance Java
               based tools for the storage, search and
               retrieval of chemical structures and associated
               data.


               These components can be integrated into web-
               based or standalone applications in association
               with other ChemAxon tools.


Jchem Base chemical database — May 2005
                                                                 3
                                                                                                Slide 4


                                                          Structural overview

                                      Application                Web application (JSP)



                                              JChem Base API:
                                              •Chemical logic
                                                                                    Web
                                              •Structure cache
                                                                                  browser

                      JDBC driver: Standard interface to the RDBMS



                      RDBMS (e.g. Oracle, MySQL, etc.) : Storage and security

Jchem Base chemical database — May 2005
                                                                                            4
                                                                               Slide 5


                                          Compatibility and integration

                    File formats:                     Database engines:
                            • SMILES                    • Oracle
                            • MDL molfile               • MySQL
                              (v2000 and v3000)         • MS SQL Server
                            • MDL SDF                   • PostgreSQL
                            • RXN                       • MS Access
                                                        • DB2
                            • RDF
                                                        • etc.
                            • MRV

                    Integration:                      Operating systems:
                            • 100% Java                 • Windows
                            • extensive API             • Linux
                            • JChem Cartridge for       • Mac OS X
                                Oracle                  • Solaris
                                                        • etc.


Jchem Base chemical database — May 2005
                                                                           5
                                                             Slide 6


                          Administration with JChemManager




                 User interface for
                         • creating tables
                         • import
                         • export
                         • deleting rows
                         • dropping tables

                 Most functions are also available
                 from command-line.


Jchem Base chemical database — May 2005
                                                         6
                                                                          Slide 7


                                             The property table

                  The property table stores information about JChem
                  structure tables, including:
                      • Fingerprint parameters
                      • Custom standardization rules
                      • Recent changes (to optimize cache updates)
                      • Other table options and information
                      • Database-related licence keys

                  More than one property table can be used, each
                  property table represents a particular JChem
                  environment.


Jchem Base chemical database — May 2005
                                                                      7
                                                                                                          Slide 8


                                          The structure of JChem tables

                Column name                                      Explanation
               cd_id                      unique numeric identifier in the table
               cd_structure               the imported structure in the original format, without
                                          modifications (except for the removal of data fields)
               cd_smiles                  the standardized structure in ChemAxon Extended Smiles
                                          (cxsmiles) format, used by the search process
               cd_formula                 the formula of the standardized structure

               cd_molweight               the molecular weight of the standardized structure

               cd_hash                    hash code used for duplicate filtering (PERFECT search)
               cd_flags                   can store row specific option, e.g. overriding the chiral
                                          flag
               cd_timestamp               the date and time of the insertion of the row
               cd_fp…                     fingerprint columns

               [user fields]              custom data fields can be added by the user



Jchem Base chemical database — May 2005
                                                                                                      8
                                                                                                     Slide 9


                                          Chemical Hashed Fingerprints

              • Chemical Hashed Fingerprints encode structural
                patterns in bit strings
              • If structure A is a substructure of structure B, every
                bit in B’s fingerprint will be set that is set in structure
                A’s fingerprint:
                                                        A& B  A
              • Tanimoto similarity of hashed fingerprints can be
                used for diversity analysis and similarity search:

                                                           BitCount  X & Y 
                        Tsim  X , Y  
                                           BitCount  X   BitCount Y   BitCount  X & Y 



Jchem Base chemical database — May 2005
                                                                                                 9
                                                                                             Slide 10


                                              Structural search in database

                      Two stage method provides optimal performance:

                      1. Rapid pre-screening reduces the number of
                           possible hit candidates
                               -          Chemical Hashed Fingerprints are used for
                                             substructure and superstructure searches
                               -          Hash code is used for duplicate filtering
                                             (usually during compound registration)

                      2. Graph search algorithm is used to determine
                           the final hit list


Jchem Base chemical database — May 2005
                                                                                        10
                                                                                 Slide 11


                                                      Structure Cache

              • Contains Fingerprints for screening and ChemAxon Extended
                SMILES for ABAS
              • Instant access to the structures for the search process
              • Reduced load on the database server
              • Incremental update ensures minimum overhead after changes
                in the table
              • Small memory footprint due to
                      – SMILES compression
                      – Optimized storage technique
              • Approximately 100MB memory needed for 1 million typical
                drug-like structures (using 512 bit long fingerprints)



Jchem Base chemical database — May 2005
                                                                            11
                                                                 Slide 12


                                              Standardization

              • Default standardization
                includes:
                      – Hydrogen removal
                      – Aromatization
              • Custom standardization
                can be specified for
                each table by specifying
                an XML configuration file
                at table creation or in the
                ―Regenerate‖ dialog of
                JChem Manager (jcman)


Jchem Base chemical database — May 2005
                                                            12
                                                                     Slide 13


                                   Custom Standardization Example




                                          before        after




Jchem Base chemical database — May 2005
                                                                13
                                                                         Slide 14


                                          Database search options


                         • Maximum search time / number of hits
                         • SQL SELECT statement for pre-filtering
                         • Ordering of results
                         • Result table
                         • Inverse hit list
                         • Chemical Terms filter constraint




Jchem Base chemical database — May 2005
                                                                    14
                                                                          Slide 15


                                                JSP example application

              •     Open source, customizable
              •     Features:
                      – Substructure, Superstructure,
                        Exact and Similarity search
                      – Molecular Descriptor similarity
                        search with descriptor coloring
                      – Substructure hit alignment and
                        coloring, inverse hit list
                      – Chemical Terms filter
                      – Import / Export
                      – Export of hits
                      – Insert / Modify / Delete
                        structures



Jchem Base chemical database — May 2005
                                                                     15
                                                                                     Slide 16


                              API example : connecting to a database


            ConnectionHandler ch = new chemaxon.jchem.db.ConnectionHandler();
            ch.setDriver(“oracle.jdbc.driver.OracleDriver”);
            ch.setUrl(“jdbc:oracle:thin:@localhost:1521:mydb”);
            ch.setPropertyTable(“JChemProperties”);
            ch.setLoginName(“scott”);
            ch.setPassword("tiger");
            ch.connect();
            // the java.sql.Connection object is available if needed:
            Connection con=ch.getConnection();
            …
            // closing the connection:
            ch.close();




Jchem Base chemical database — May 2005
                                                                                16
                                                                                                         Slide 17


                                          API example : database import

                     Importer importer = new chemaxon.jchem.db.Importer();
                     importer.setConnectionHandler(conh);
                     importer.setInput(“sample.sdf”);
                     // importer.setInput(is);    // alternatively a stream can also be specified
                     importer.setTableName(“SCOTT.STRUCTURES”);
                     importer.setHaltOnError(false);
                     importer.setDuplicateImportAllowed(false);          //can filter duplicates

                     // specifying SDFile field - table field pairs:
                     String fieldPairs = “DB_Field1=SDF_Field1; DB_Field2=SDF_Field2”;
                     importer.setFieldConnections(fieldPairs);
                     int importedCount = importer.importMols();
                     System.out.println( “Imported” + importedCount + “structures” );




Jchem Base chemical database — May 2005
                                                                                                    17
                                                                                            Slide 18


                                          API example : database export


                     Exporter exporter = new chemaxon.jchem.db.Exporter();
                     exporter.setConnectionHandler(conh);

                     exporter.setTableName(“structures”);
                     //data fields to be exported with the structure:
                     exporter.setFieldList(“cd_id cd_formula name comments”);
                     String fileName=“output.sdf”;
                     OutputStream os=new FileOutputStream(fileName);
                     exporter.setOutputStream(os);
                     exporter.setFormat(“sdf”);
                     int exportedCount = exporter.writeAll();
                     System.out.println(“Exported ” + exportedCount + “structures”);




Jchem Base chemical database — May 2005
                                                                                       18
                                                                                                          Slide 19


                                          API example : database search


                  JChemSearch searcher = new chemaxon.jchem.db.JChemSearch();
                  searcher.setConnectionHandler(ch);
                  searcher.setSearchType(JChemSearch.SUBSTRUCTURE)
                  searcher.setQueryStructure(“c1ccccc1”);
                  searcher.setStructureTable(“SCOTT.STRUCTURES”);
                  // a query that returns cd_id values can be used for prefiltering:
                  Searcher.setFilterQuery(
                              “SELECT cd_id FROM structures, biodata WHERE ”
                              + “structures.cd_id = biodata.cd_id AND biodata.toxicity < 0.3” );
                  searcher.setWaitingForResult(true);       // otherwise runs in a separate thread
                  searcher.setStructureCaching(true);       // caching speeds up the search
                  searcher.run();
                  // getting the results as cd_id values:
                  int[] results=searcher.getResults();



Jchem Base chemical database — May 2005
                                                                                                     19
                                                                                                Slide 20


                              API example : inserting a structure

            // ConnectionHandler, mode, table name and data field names:
            UpdateHandler uh = new chemaxon.jchem.db.UpdateHandler(
                        ch, UpdateHandler.INSERT, “structures”, “comment, stock”);
            uh.setValueForFixColumns(“c1ccccc1”); // the structure
            // specifying data field values:
            uh.setStructureValueForAdditionalColumn(1, “some text”);
            uh.setStructureValueForAdditionalColumn(2, new Double(8.5));
            uh.setDuplicateFiltering(true); // filtering duplicate structures
            int id=uh.execute(true); // getting back the cd_id of the inserted structure
            if ( id > 0 ) {
                 System.out.println(“Inserted, cd_id value : ” + id);
            } else {
                 System.out.println(“Already exists with cd_id value : ” + (-id));
            }
            // storing update information, the database connection remains open :
            uh.close();


Jchem Base chemical database — May 2005
                                                                                           20
                                                                                                              Slide 21


                                                              Performance (1)
          Compound registration:             Number of                          Elapsed time
                                             compounds        Duplicates not checked     Duplicates checked
                                                     10,000               32s                      45s
                                                 100,000             4min 11s                  6min 20s
                                                 200,000             8min 17s                  12min 26s

                                             Query             Number of hits    Search time (s)
          Substructure search
                                                                    12                 0.1
          in a table of 3 million
          compounds:
                                                                    936                0.9




                                                                     0                 1.2


                                                                  49740               10.7



          Server parameters: Windows XP; 1 CPU: Intel P4 3.0GHz; 2GB RAM; Oracle 9i
Jchem Base chemical database — May 2005
                                                                                                         21
                                                                                           Slide 22


                                                             Performance (2)
                                          Query   Number of hits   Search time (s)
          Similarity search:
                                                       24               1.5
          Tanimoto >0.8


                                                       156              1.3




                                                       336              1.3



          Server parameters: Windows XP; 1 CPU: Intel P4 3.0GHz; 2GB RAM; Oracle 9i




Jchem Base chemical database — May 2005
                                                                                      22
                                                                                     Slide 23


                                                            Future plans

                        • Additional layer: JChem Server (later also as grid)
                        • Structural keys as optional extension to current
                          fingerprints
                        • Tables for storing query structures
                        • Tables for storing general (Markush) structures
                        • Partial clean option for hit alignment
                        • Installer
                        • etc.



Jchem Base chemical database — May 2005
                                                                                23
                                                                     Slide 24


                                                      Summary



                      ChemAxon’s JChem Base toolkit
                      provides sophisticated methods to deal
                      with chemical structures and associated
                      data.


                      The usage of fingerprints and structure
                      cache provide high search performance.



Jchem Base chemical database — May 2005
                                                                24
                                                                           Slide 25


                                                             Links

                 • JChem home page:
                         – www.jchem.com
                 • Live demos:
                         – www.jchem.com/examples
                 • API documentation:
                         – www.jchem.com/doc/api
                 • Brochure:
                         – www.chemaxon.com/brochures/JChemBase.pdf



Jchem Base chemical database — May 2005
                                                                      25
                                                  Slide 26


              Thank you for your attention




                         Máramaros köz 3/a
                         Budapest, 1037
                         Hungary

                         info@chemaxon.com
                         www.chemaxon.com


Jchem Base chemical database — May 2005
                                             26