Docstoc

Optimized_virtual_screening

Document Sample
Optimized_virtual_screening Powered By Docstoc
					                                                 Slide 1




Optimized Virtual Screening

  Miklós Vargyas     Matthias Steger
  Zsuzsanna Szabó    Modest von Korff
  György Pirok
  Ferenc Csizmadia

  ChemAxon Ltd.      AXOVAN AG
                     Allschwil, Switzerland

                     (Axovan is now Actelion.)
                                                     Slide 2

Drug research
Is it searching for a needle in a haystack?




  corporate database              structures found
                                                       Slide 3

Drug research
Find something similar to a fistful of needles




query structures       corporate        structures found
(known actives)    database (targets)      (virtual hits)
                                                                      Slide 4

Molecular similarity
What is it?
Chemical, pharmacological or biological properties of two compounds
match.
The more the common features, the higher the similarity between two
molecules.

Chemical




Pharmacophore
                                                                                      Slide 5

Molecular similarity
How to calculate it?
Quantitative assessment of similarity/dissimilarity of structures
 need a numerically tractable form
 molecular descriptors, fingerprints, structural keys


Sequences/vectors of bits, or numeric values that can be compared by
distance functions, similarity metrics.



                       n                                        B( x & y )
                                          T ( x, y) 
       E ( x, y )     x        yi 
                                      2

                      i 1
                             i
                                                        B( x)  B( y )  B( x & y )
                                                                        Slide 6

Molecular descriptors
Example 1: chemical fingerprint

hashed binary fingerprint
 encodes topological properties of the chemical graph: connectivity,
  edge label (bond type), node label (atom type)
 allows the comparison of two molecules with respect to their
  chemical structure


Construction

1. find all 0, 1, …, n step walks in the chemical graph
2. generate a bit array for each walks with given number of bits set
3. merge the bit arrays with logical OR operation
                                                         Slide 7

Molecular descriptors
Example 1: chemical fingerprint
Example
CH3 – CH2 – OH

walks from the first carbon atom

             length walk                 bit array
             0       C                   1010000000
             1       C–H                 0001010000
             1       C–C                 0001000100
             2       C–C–H               0001000010
             2       C–C–O               0100010000
             3       C–C–O–H             0000011000

merge bit arrays for the first carbon atom: 1111011110
                                                                      Slide 8

Molecular descriptors
Example 1: chemical fingerprint




   0100010100010100010000000001101010011010100000010100000000100000




   0100010100010100010000000001101010011010100000000100000000100000
                                                                      Slide 9

Molecular descriptors
Example 2: pharmacophore fingerprint

 encodes pharmacophore properties of molecules as frequency
  counts of pharmacophore point pairs at given topological distance
 allows the comparison of two molecules with respect to their
  pharmacophore



Construction

1. map pharmacophore point type to atoms
2. calculate length of shortest path between each pair of atoms
3. assign a histogram to every pharmacophore point pairs and count
   the frequency of the pair with respect to its distance
                                                                                                                                                       Slide 10

     Molecular descriptors
     Example 2: pharmacophore fingerprint


                                                                               Pharmacophore point type based
                                                                               coloring of atoms: acceptor, donor,
                                                                               hydrophobic, none.


12
                                                                                  12
11
                                                                                  11
10
                                                                                  10
 9
                                                                                   9
 8
                                                                                   8
 7
                                                                                   7
 6
                                                                                   6
 5
                                                                                   5
 4
                                                                                   4
 3                                                                                 3
 2                                                                                 2
 1                                                                                 1
 0                                                                                 0
     A A A A A A D D D D D D D D D D D D H H H H H H H H H H H H H H H H H H           A A A A A A D D D D D D D D D D D D H H H H H H H H H H H H H H H H H H
     A A A A A A A A A A A A D D D D D D A A A A A A D D D D D D H H H H H H           A A A A A A A A A A A A D D D D D D A A A A A A D D D D D D H H H H H H
     1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6           1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
                                                                                      Slide 11

Virtual screening using fingerprints
Individual query structure

            0101010100010100010100100000000000010010000010010100100100010000




                query fingerprint
  query



                       proximity
            0000000100001101000000101010000000000110000010000100001000001000
            0100010110010010010110011010011100111101000000110000000110001000
            0100010100011101010000110000101000010011000010100000000100100000
            0001101110011101111110100000100010000110110110000000100110100000
            0100010100110100010000000010000000010010000000100100001000101000
            0100011100011101000100001011101100110110010010001101001100001000
            0101110100110101010111111000010000011111100010000100001000101000
                                                                               hits
            0100010100111101010000100010000000010010000010100100001000101000
            0001000100010100010100100000000000001010000010000100000100000000
            0100010100010011000000000000000000010100000010000000000000000000
            0100010100010100000000000000101000010010000000000100000000000000
            0101010101111100111110100000000000011010100011100100001100101000
            0100010100011000010000011000000000010001000000110000000001100000
            0000000100000000010000100000000000001010100000000100000100100000
            0100010100010100000000100000000000010000000000000100001000011000
            0001000100001100010010100000010100101011100010000100001000101000
            0100011100010100010000100001001110010010000010001100000000101000
            0101010100010100010100100000000000010010000010010100100100010000


 targets                  target fingerprints
                                                                                       Slide 12

Virtual screening using fingerprints
Multiple query structures
             0100010100011101010000110000101000010011000010100000000100100000
             0001101110011101111110100000100010000110110110000000100110100000
             0100010100110100010000000010000000010010000000100100001000101000
             0101110100110101010111111000010000011111100010000100001000101000
             0001000100010100010100100000000000001010000010000100000100000000
             0100010100010100000000000000101000010010000000000100000000000000
             0101010101111100111110100000000000011010100011100100001100101000
             0100010100011000010000011000000000010001000000110000000001100000
             0000000100000000010000100000000000001010100000000100000100100000




 queries    0101110100110101010111111000010000011111100010000100001000101000



                  hypothesis fingerprint



                       proximity
            0000000100001101000000101010000000000110000010000100001000001000
            0100010110010010010110011010011100111101000000110000000110001000
            0100010100011101010000110000101000010011000010100000000100100000
            0001101110011101111110100000100010000110110110000000100110100000
            0100010100110100010000000010000000010010000000100100001000101000
            0100011100011101000100001011101100110110010010001101001100001000
            0101110100110101010111111000010000011111100010000100001000101000
            0100010100111101010000100010000000010010000010100100001000101000
            0001000100010100010100100000000000001010000010000100000100000000
                                                                                hits
            0100010100010011000000000000000000010100000010000000000000000000
            0100010100010100000000000000101000010010000000000100000000000000
            0101010101111100111110100000000000011010100011100100001100101000
            0100010100011000010000011000000000010001000000110000000001100000
            0000000100000000010000100000000000001010100000000100000100100000
            0100010100010100000000100000000000010000000000000100001000011000
            0001000100001100010010100000010100101011100010000100001000101000
            0100011100010100010000100001001110010010000010001100000000101000
            0101010100010100010100100000000000010010000010010100100100010000


 targets                  target fingerprints
                                                                       Slide 13

Hypothesis fingerprints
Advantages
 allows faster operation
 compiles features common to each individual actives

Hypothesis types

Active 1   0     2    7     1   0    1    6     4       0    0     9      0

Active 2   1     6    0     4   3    3    1     2       2    0     5      1
Active 3   2     4    4     1   0    2    5     3       4    3     4      5
Minimum    0     2    0     1   0    1    1     2       0    0     4      0

Average    1     4   3.67   2   1    2    4     3       2   1.33   6      2
Median     1.5   4   5.5    1   0    2    5     3       3    0     5      3
                                                                   Slide 14

Hypothesis fingerprints
                    Advantages                    Disadvantages
Minimum   • strict conditions for hits if   • false results with
            actives are fairly similar        asymmetric metrics
                                            • misses common features of
                                              highly diverse sets
                                            • very sensitive to one
                                              missing feature
Average   • captures common features • less selective if actives are
            of more diverse active sets very similar

Median    • captures common features • less selective if actives are
            of more diverse active sets very similar
          • specific treatment of the
            absence of a feature
          • less sensitive to outliers
                                                                             Slide 15

Does this work?

        Active set            Pharmacophore           Chemical fingerprint
                                fingerprint
      name           size   Tanimoto    Euclidean    Tanimoto     Euclidean
5-HT3                  12       20.14       12.55       776.19        461.44
ACE                    89        1.99         1.42        3.71          1.74
Angiotensin2           10       22.80       27.81       183.45        173.91
Beta2                  50        3.59         1.52        7.52          2.65
D2                     13       61.25       27.64       302.52        155.61
delta                  20      109.53       11.66       114.48         56.22
Ftp                    35       50.92       46.88       571.50        575.16
mGluR1                 18       70.47         5.59      347.72        130.14
NPY-5                 139        1.09         1.00        1.46          1.44
Thrombin                8        2.46         2.56        3.71          1.67
                                    Slide 16

Then why do we need optimization?
Too many hits
                                    Slide 17

Then why do we need optimization?
Inconsistent dissimilarity values


            0.57


     0.47          0.55
                                                                                                                        Slide 18


 What can be optimized?
 Parameterized metrics
D                   ( x, y )  1 
 scaled, asym m etric                                                s min( x , y )
                                                                         i i            i   i


                                     x   s min( x , y )   1    y   s min( x , y )    s min( x , y )
 Tanim oto

                                      i i             i i   i   i                 i i           i i   i   i       i i      i   i




            0,1 asymmetry factor
          si  N            scaling factor



                                             wi xi  yi             wi 1   xi  yi 
                                                                                                              2
DEuclideanasym m etric( x, y) 
 weighted,                                                          2

                                            xi  yi                     xi  yi




            0,1 asymmetry factor
          wi  0,1 weights
                                                        Slide 19


Optimization of metrics
Step 1 optimize parameters for maximum enrichment
Step 2 validate metrics over an independent test set



                                                       training
                                                          set
                 training
                    set
                                                       query
                                                        set
  selected
   targets                           known
                                     actives
                                                       test
                     test set                          set
                                                             Slide 20

Optimization of metrics
Step 1 optimize parameters for maximum enrichment




     query set


 1111100010000100001000101000
                                                    Target hits
      query
   fingerprint
                                training
                                   set              Active hits
                                              Slide 21

Optimization of metrics
One step of the algorithm

 v1

 v2

 v3
                            potential variable value
                            temporarily fixed value
                            final value
 vi
                            running variable value



 vn
                                                                     Slide 22

Optimization of metrics
Step 2 validate metrics over an independent test set




     query set

                                                       Target hits
 1111100010000100001000101000




      query
   fingerprint
                                test set               Active hits
                                         Slide 23

Results
Similar structures get closer


           0.57


    0.47          0.55
                                0.20

                                       0.06

                                0.28
                                                                      Slide 24

Results
Hit set size reduction
Active set: 18 mGlu-R1 antagonists
Target set: 10000 randomly selected drug-like structures + 7 spikes

Metric                                       Enrichment   Test  Random
                                                          hits    hits
            Basic                                 70.47    5.43  172.00
Tanimoto




            Scaled                                 7.63    6.00 1101.71
            Asymmetric                            99.36    5.29  106.00
            Scaled Asymmetric                     11.94    5.86  731.14
            Basic                                  5.59    5.43 1456.57
Euclidean




            Normalized                            11.33    5.14  791.29
            Asymmetric Normalized                 18.58    4.71  368.71
            Weighted Normalized                  296.30    4.14    27.57
            Weighted Asymmetric Normalized       281.30    3.43    17.00
                                                                 Slide 25

Results
Improvement by optimization

     Active set   size   Euclidean   Optimized    Improvement
                                                      ratio
   5-HT3            12       12.55      239.24           49.26
   ACE              89        1.42         6.50           4.64
   Angiotensin2     10       27.81        85.45          11.15
   Beta2            50        1.52        24.70          17.42
   D2               13       27.64      123.25           11.19
   delta            20       11.66      243.57           69.11
   Ftp              35       46.88        71.54           5.35
   mGluR1           18        5.59      296.30           70.93
   NPY-5           139        1.00         3.22           3.25
   Thrombin          8        2.56         4.57           2.62
                                                                        Slide 26

Results
Active Hit Distribution

 offers a more intuitive way to evaluate the efficiency of screening
 based on sorting random set hits and known actives on
  dissimilarity values and counting the number of random set hits
  preceding each active in the sorted list
                          number of virtual hits
            0.014
            0.015
            0.017
            0.020
            0.022
            0.023
            0.027
            0.041
            0.043

                                                   number of actives
                                                                                                          Slide 27

Results
ACE (pharmacophore similarity)

                 10000




                  1000
Number of hits




                                                                                                     Euclidean
                   100

                                                                                                     Optimized
                                                                                                     Euclidean

                    10




                     1
                         1   2   3   4   5      6    7   8    9   10   11   12   13   14   15   16
                                             Number of actives among the hits
                                                                                                    Slide 28

Results
NPY-5 (pharmacophore similarity)

                 10000


                  1000
Number of Hits




                   100


                    10


                     1
                         1   3   5   7   9   11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

                                                           Number of Active Hits

                                         Tanimoto      Euclidean     Optimized     Ideal
                                                                                              Slide 29

Results
β2-adrenoceptor (pharmacophore similarity)

                     10000
    Number of Hits




                     1000

                      100

                       10

                        1
                             1   2   3    4   5    6   7   8   9 10 11 12 13 14 15 16 17 18
                                                       Number of Active Hits

                                         Tanimto        Euclidean       Optimized   Ideal
                                                                          Slide 30

Results
Structural or pharmacophore fingerprint?
      Active set     size     chemical      pharmacophore        diversity*
    5-HT3               12        692.21              239.24             0.30
    ACE                 89           4.29               6.50             0.56
    Angiotensin2        10        190.76               85.45             0.40
    Beta2               50          10.98              24.70             0.50
    D2                  13        358.10              123.25             0.30
    delta               20        249.40              243.57             0.32
    Ftp                 35        575.16               71.54             0.30
    mGluR1              18        350.86              296.30             0.37
    NPY-5              139           1.52               3.22             0.47
    Thrombin             8           3.59               4.57             0.46

* Average 1-Tanimoto coefficient between each pair of compounds in the active
set, based on chemical fingerprint.
                   Slide 31

Results
Scaffold hopping
                                                              Slide 32

Acknowledgements


Contributors: Nóra Máté
              Szilárd Dóránt

             Bernard Przybylski (Axovan)


The research was supported by
                                (Axovan is now part of Actelion.)
                                                                     Slide 33

Bibliography
 J. Xu: GMA: A Generic Match Algorithm for Structural Homomorphism,
  Isomorphism, and Maximal Common Substructure Match and its
  Applications, J. Chem. Inf. Comput. Sci., 1996, 36, 1, 25-34.
 L. Xue, F. L. Stahura, J. W. Godden, J. Bajorath: Fingerprint Scaling
  Increases the Probability of Identifying Molecules with Similar Activity in
  Virtual Screening Calculations, J. Chem. Inf. Comput. Sci., 2001, 41, 3,
  746-753.
 G. Schneider, W. Neidhart, T. Giller, and G. Schmid: 'Scaffold-Hopping' by
  Topological Pharmacophore Search: A Contribution to Virtual Screening,
  Angew. Chem. Int. Ed., 1999, 38, 19, 2894-2896
 D. Horvath: High Throughput Conformational Sampling and Fuzzy
  Similarity Metrics: A Novel Approach to Similarity Searching and Focused
  Combinatorial Library Design and its Role in the Drug Discovery
  Laboratory; manuscript
 J. Bajorath: Virtual screening in drug discovery: Methods, expectations
  and reality
  http://www.currentdrugdiscovery.com/pdf/2002/3/BAJORATH.pdf

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:12/21/2011
language:
pages:33