wefr by GuillermoMartinezAriza

VIEWS: 0 PAGES: 38

									medicinal chemistry design
       challenges
chemically intelligent data mining
multiparameter optimisation for medicinal chemists
how to handle petabytes of data – Google Chemistry!
activity prediction


                         Dr Tony Wood
          VP, Head of Worldwide Medicinal Chemistry
           Pfizer Global Research and Development
                  anthony.wood@pfizer.com
the challenge for design?


              2002-2005 the primary causes of
              attrition were safety and
              pharmacology
                     in vivo toxicity
   results of an analysis of 349 studies on 315 compounds covering 90 targets
   at 985 doses with >10,000 organ evaluations in 4 species
   PK known for all cases with strong correlation between AUC and Cmax
   compound set has similar diversity to Pfizer file

High Concentration

                       Toxic Exposures

                                    Minimum exposure with observed toxicity
                CmaxLowTox          (CmaxLowTox). Set to an arbitrarily high number if
                                    no toxic event is observed at any dose
                        Uncertain
                                    Maximum exposure without observed
                CmaxHiCln           toxicity (CmaxHiCln). Set to zero if toxicity
                                    was observed at all doses assessed



                        Clean Exposures


 Low Concentration
      toxicity threshold selection
300
                                          exposure thresholds were chosen to
                                          obtain a balance of toxicity/non-toxicity.
250
                                          set to 10uM for the total-drug threshold.
                                          approx 40% of evaluations above
200                                       threshold & 40% below.

150                                       Clean
                                          Uncertain
100                                       Toxic

 50                                       similar analysis for free drug levels gives
                                          a threshold of 1 uM.
  0
      100nM   1uM   10uM   100uM 1000uM
          Threshold (Total Drug)
      TPSA and clogP are key




the y-axis here is a generalized odds, i.e., the ratio of the probability of a compound
with a given parameter value being toxic to the probability of it not being toxic
                   toxicity odds
combining low TPSA and high cLogP exacerbates the risk
(numbers in parentheses indicate number of outcomes in database)
 holds for both free-drug or total-drug thresholds

                ratio of toxic to non-toxic outcomes

Total-Drug   TPSA>75     TPSA<75     Free-Drug    TPSA>75     TPSA<75

 ClogP<3     0.39 (57)   1.08 (27)    ClogP<3     0.38 (44)    0.5 (27)

 ClogP>3     0.41 (38)   2.4 (85)     ClogP>3     0.81 (29)   2.59 (61)
toxicity and promiscuity

                ratio of promiscuous to non-
                promiscuous compounds

                        TPSA>75      TPSA<75

             ClogP<3    0.25 (25)    0.80 (18)

             ClogP>3    0.44 (13)    6.25 (29)

               promiscuity defined as >50%
               activity in >2 Bioprint assay out
               of a set of 48 (selected for data
               coverage only)
       clogP and organ toxicity
        does a good cell viability profile increase the probability of a compound
           being a CNS CAN w/o organ tox in the clogp risky group (clogp>3)?


                        20                       23
                  15%
                                                                  THLE Cv bin
                                                                    x < 25 uM
                                      39%                   39%     25 < x < 100 uM
ClogP > 3                                                           x > 100 uM
            25%
                                60%

                                                      22%
                        2                        22
                                                      5% 14%
                                                                  organ tox or not
                                                                  attrition CNS
ClogP < 3   50%                 50%
                                                                  CANs set
                                       82%


                    Organ Tox               No Organ Tox
                      DEREK
“a place to store toxicological knowledge”
knowledge-based expert system

broad range of toxicity endpoints covered

identifies structural alert

provides literature-based rationale for prediction

qualitative or semi-quantitative predictions

now has an API for integration into 3rd party software
products
                                                                      No.of active alerts




                                                        0
                                                            10
                                                                 20
                                                                      30
                                                                           40
                                                                                50
                                                                                     60
                                                                                          70
                                                                                               80
                                                                                                    90
                                                                                                         100
                 alpha-2-mu-Globulin nephropathy
                                            Anaphylaxis
                       Bladder urothelial hyperplasia
                                         Carcinogenicity
                                           Cardiotoxicity
                                       Cerebral oedema
                                               Chloracne
                             Cholinesterase inhibition
                                Chromosome damage
            Cumulative effect on white cell count
                                Cyanide-type effects
                               Developmental toxicity
                                            Genotoxicity
                                           Hepatotoxicity
                              HERG channel inhibition
                                     High acute toxicity
                                  Irritation (of the eye)
            Irritation (of the gastrointestinal tract)
                  Irritation (of the respiratory tract)
                                 Irritation (of the skin)
                                           Lachrymation
                               Methaemoglobinaemia
                                            Mutagenicity




Endpoint
                                          Nephrotoxicity
                                            Neurotoxicity
                                 Occupational asthma
                                          Ocular toxicity
                                          Oestrogenicity
                             Peroxisome proliferation
                                       Phospholipidosis
                                                                                                                                                                          carcinogenicity and skin sensitization




                                      Photoallergenicity
                                 Photocarcinogenicity
                                      Photogenotoxicity
             Photo-induced chromosome damage
                                     Photomutagenicity
                                            Phototoxicity
                                     Pulmonary toxicity
                            Respiratory sensitisation
                                       Skin sensitisation
                                           Teratogenicity
                                                                                                                                                                                                                                what’s in DEREK?




                                      Testicular toxicity
                                         Thyroid toxicity
           Uncoupler of oxidative phosphorylation
                                                                                                               some recent efforts in hepatotoxicity and teratogenicity
                                                                                                                                                                          main strengths are mutagenicity, chromosome damage,
          challenge #1
these relationships were determined using
a small well characterised data set
much more data lies in non curated data
sets with no structure keys
we need chemically intelligent data mining
to derive knowledge including SAR from
this resource
              properties of CNS drugs
LE         90% ≥ 0.36                lipE                                         CLOGP             95% Range               LOGD_7.4




                        Drugs
                        CANs


 .2 .3 .4 .5 .6 .7 .8 .9 1 1.1 1.2    -1 0 1 2 3 4 5 6 7 8 9 10 11 12              -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8    -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8


            Drugs: 0.52                                        6.2                                       2.9                                      1.8
 Median                                           LLE                                          ClogP                                   ClogD
            CANs: 0.47                                         6.3                                       3.4                                      2.3


TPSA                                 MW                                           HBDON                                     BASIC1PKA




                        Drugs
                        CANs


 0             100             200    100   200    300   400    500   600   700      0     1    2    3    4     5    6       1 2 3 4 5 6 7 8 9 10 11 12 13


             Drugs: 47                             MW          305                             HBD       1.0                             pKa      8.4
 Median
            CANs: 53                                           360                                       1.0                                       8.4
               CNS MPO summary
 for design (prospective, accurate and constant)
 increasing CNS MPO enhances the probability of candidate survival and
 alignment of in-vitro
                                               1


                                              0.8


                                              0.6




                                                                                                                                                      ↑CNS MPO increases
                                              0.4




ClogP
                                              0.2




         C,P,S                                 0

                                                     -2        -1         0
                                                                              cLogP
                                                                                   1           2        3            4         5            6




                                               1
                                                                                                                                                        the probability of
                                                                                                                                                      successfully aligning
                                              0.8


                                              0.6


                                              0.4




TPSA                  Permeability                                                                                                                          attributes
                                              0.2




         P,S                                   0

                                                    0              20         40
                                                                                   PSA
                                                                                       60              80            100           120




                       (including efflux)
                                               1




ClogD
                                              0.8


                                              0.6




         C,S                                  0.4




                                                                                                                                                            CNS MPO
                                              0.2


                                               0

                                                          -2        -1        0        1           2             3         4            5




                       Clearance
                                                                          LOGD7.4




MW       P
                                               1


                                              0.8


                                              0.6
                                                                                                                                                            Desirability
                                              0.4


                                              0.2


                                               0                                                                                                              124
                                                    100             200             300                400               500                    600
                                                                                   MW
                                                                                                                                                                                 75
                      Safety                                                                                                                                            26%


HBD      P
                       (including high risk
                                               1

                                              0.8

                                              0.6
                                                                                                                                                                                       44%
                                              0.4
                                                                                                                                                                          56%                83%
                             space)
                                              0.2

                                               0


                                                               0          1            2               3             4              5
                                                                        HBDONORCNT                                                                    74%


pKa      P,S                                   1


                                              0.8
                                                                                                                                                            Drugs               CANs
                                              0.6                                                                                                            D rug s            CA N         P re
                                                                                                                                                                       Binned Binned Drug_or_C
                                              0.4


                                              0.2


                                               0

                                                          2               4                6                 8                 10
                                                                              pKa (1)
          challenge #2
design is now based on a probabilistic
basis using complex MPO relationships
we need transparent easy to construct
and understand methods to perform
multiparameter optimisation
      chemoinformatic predictions
               Pfizer                                     Serine proteases
             in house

Inpharmatica
  StARLITe                        Cysteine proteases

                                                                              Ion Channels
                                              Kinases
                                                                                    GPCRs (others:
                               Aspartyl proteases
    unified db                                                                      classes A, B & C)

    4.8 M structures
 275k active compounds         Phosphodiesterases
 600k activities (IC50, etc)                                                     Aminergic GPCRs
        3k targets
   800 human targets                                                         Peptide GPCRs
                                       Metalloproteases
                                                                                   Enzymes
                                                                                   (hydrolases, transferases,
                                   Nuclear hormone                                 oxidoreductases & others)
                                   receptors       Miscellaneous
               Thomson
                 IDDB
   Cerep
  BioPrint
                                         node : target            edge : compound
            Bayesian learning
                                                             Rev Thom as Bayes

data set (assay data)                                          ca 1702 - 1761




                        fingerprints are calculated for each
                        molecule
                        check how often fingerprint bit is
“good”       “bad”      observed and how often in “good”
                        compound
actives    inactives
                        assign weighting factor taking into
                        account both activity ratio and sampling
                        size
                        for instance: “good”/total ratio of
 fingerprint bits ~     90/100 is statistically more relevant
   substructures        than 9/10
                        model distinguishes “good” from “bad”
                        predict likelihood molecule is “good

 Bayesian model
          mining large data sets (HTS)
                                                                                     confirmed measurement
          0.45
                   HTA+ > HTA
                   false positive HTA+ or
          0.40
                   false negative HTA                                                colored by Bayesian score
                   predictions                                                       red: high confidence
          0.35     all false negative HTA                                            blue: low confidence
LE HTA+




          0.30


          0.25
                                                                                  HTA > HTA+:
          0.20                                                                    false positive HTA or
                                                                                  false Negative HTA+
                                                                                  predictions
          0.15
                                                                                  red: false HTA+ negative
                                                                                  blue: false HTA positive
          0.10


          0.05
                 0.05   0.10   0.15   0.20    0.25    0.30   0.35   0.40   0.45
                                             LE HTA
                      predicting promiscuity
  238k   actives
                                  3870 compounds with
  ( 10 M) human target
                                  10,806 predictions
  mw < 1000
  pass reactivity filter
   10 actives / target




90% / 214k




 FCFP_6




        698 models
                             Bayesian score
                 searching virtual space
BIG LEAP: searching the Pfizer liquid and virtual compound collections

          real: 0.000025%
                                                    1.2M singletons
                                                                              derive Bayesian model
                                                                              that distinguishes library
                                                                              1 from 2, from 3, etc

Pfizer global virtual library                      liquid screening file
~ 1012 compounds
                                  5000 libraries
                                2.5M compounds




             predict 16 libraries                        search only these
             to which compound                           libraries, in real and
             could belong                                virtual compound
                                                         space
                              BIG LEAP
                     O                  Acids
                         O                                                    N
                                                                                  O
                A1                                            Cl
                             x
                                                                              N               O
            N
                         N               x                               A2               O
                              ?                      1
       B1
                Amines       x
                                                                   CF3
                                                                                      N
                                                          *
                N        N
                             o     1                 2
                                                                         B4
         B2


model is built from synthesized compounds (yellow squares)
nearly all fingerprint features of any virtual compound (square marked with “?”) are shared by at
least one compound from the training set (squares marked with “X”)
virtual products in areas 1 share at least one monomer with a compound from the training set-for
compound “O”, the new monomer B2 is very close to previously used B1
compounds from area 2 can be considered outside the scope of the model because they have few
fingerprint features in common with the existing products as shown for compound “*” where
monomers A2 and B4 are unlike previously used monomers
          a new series for PRA




             ex-PR     ex-PR           ex-PR
acidic       library   library         library




ex-PR
library
                                 new
                     CCT services
  a framework for computational scientists to publish services (protocols,
  models) that can be immediately leveraged by project teams
  a knowledge repository for Computational Scientists to capture and
  share their best practices




when protocols are
published they are
automatically
wrapped as new
PLP component
ligand idea generators
uncharted chemical space
           challenge #3
we are not short of idea generators!
easy to construct vast virtual libraries
we need ways of rapidly scoring and
searching petabytes of data
                   HERG binding model
   training set 98,155 compounds (80%)
   talidation set 19,577 compounds (20%)
   test set 9,241 compounds

   training: Kappa: 0.61, Concordance 80%
   training: Sensitivity 81%, Selectivity 80%

   test: Kappa: 0.46, Concordance 74%
   test: Sensitivity 75%, Selectivity 74%

2000
                                                                   “Grey zone”,
                                                                   uncertain
1500                                                               prediction
                                                                                       prediction is checked
                                                  >60%                     Inactives   against activity of at least
1000
                                                                           Actives
                                       >70%              >85%
                                                                                       3 nearest neighbours to
                                                                                       generate additional
500
                                >85%                                                   confidence measure
                        >95%                                    >95%


  0
       -80 -70 -60 -50 -40 -30 -20 -10        0     10 20 30 40

                  No Dofetilide                     Dofetilide
               HLM stability model
      statistical fingerprint-based model (FCFP-6, Scitegic)
      unstable well predicted, stable not




    Unstable


                                                               HLM stability experiment:
                                                               Stable
                                                               Moderately stable
Experiment                                                     Unstable




    Stable

                       Stable                     Unstable
                                    Prediction
                             V1a: design for stability
 short t1/2
                        100
                                Series 1
                                Series 2
In-vitro Clearance




                         80
                                                                                synthetic effort weighted
                     Stable but likely to be                                    by desirability
                     poorly absorbed orally

                         40


                         20
 long t1/2
                         0
                               -1.5   -1   -0.5   0   0.5   1   1.5   2   2.5

                                                                      CLOGP
                                most compounds stable
                                within this cLogP range
  LiPE and LE are quality indicators
 linker replacement                                                            needed
                                 O                                       N N
                      N N
aryl switch            N         N                           N       N
                                                                          N
                            O
                                                                 N
                                side chain
                                deletion/replacement
       V1a Ki 28 nM                                             V1a Ki 780 nM
          MW 441                                                    MW 334
                                     LE: 0.31                      cLogP 1.0            LE: 0.32
         cLogP 5.5
     t1/2 HLM 6 mins                                          t1/2 HLM 120 mins         LiPE: 4.8
                                     LiPE: 1.9
 (Human Liver Microsomes)



                                     -1.4 log (IC50)
 LE (Ligand Efficiency) =                              “how efficient each heavy atom is”
                                     n Heavy Atoms       Class dependent…0.3 – 0.5


 LiPE (Lipophilic Efficiency) = -log (IC50) - cLogP          “how efficient each lipophilic
                                                             fragment is”
using LiPE to view SAR
                 LiPE=-log (IC50) - cLogP

        pIC50


   8
10nM



  7
        LiPE=6
100nM

                     LiPE=5

  6                               LiPE=3
 1 M
                         LiPE=4
                                       LiPE=2

  5

                 1            2   3        4    5   cLogP
             predicting activity

            DG(expt)
-10   -8     -6   -4   -2   0
                                0
                                               plot of experimental affinity
                                               versus calculated enthalpy
                                -10




                                      D<U+W>
                                               for reference:
                                -20
                                               2 kcals = 26-fold off
                                               4.2 kcals = 1000-fold off
                                -30


                                -40
           kcal/mol

     "Improving Accuracy in Protein-Ligand Affinity Calculations"
        Paper #104, ACS meeting in Philadelphia (Aug 2004)
  Michael K. Gilson, Center for Advanced Research in Biotechnology,
                             Rockville, MD
the source of the problem?
          DG  D U  W  T DS
                 o                                                      o
                                                                        config
         -5 to -10 kcals   15 to -25 kcal          15 to -25 kcal
we usually focus on the interactions D<U+W>
                                   force field (CHARMM, AMBER, etc.)
   potential energy                               van der Waals
                                                  Coulombic
                                                  Hydrogen-bonding


   solvation                       surface area term:
                                                  Hydrophobicity/organophilicity

                                   generalized Born/Poisson-Boltzmann
                                                  Desolvation of polar groups,
we always neglect TDSconfig                       Coulomb screening


                                   sampling/Sum over energy wells
   flexibility, entropy terms                     Preorganization/Strain
                                                  Entropy losses on binding
                                                  (rotational, translational, conformational)

we count on cancellation of errors within series, or other corrections, which
leads to scattered data.
          challenge #4
we are not short of idea generators!
easy to construct vast virtual libraries
we need more accurate activity prediction
to allow filtering and selection
knowledge management
data access tools
learning culture
           web2 technologies
build project teams around Sharepoint/OneNote
implement a RSS strategy around Newsgator
create a literature knowledge sharing culture
use Wiki type technology to share knowledge
Pfizerpedia
        thanks to
BSA                     activity prediction
  David Price              Marcel de Groot
  Simon Bailey             Martin Edwards
  Julian Blagg             Alex Alex
  Nigel Greene             Jeff Howe
                           Ben Burke

CNS MPO
                        VLS
  Patrick Verhoest
                           Giai Paolini
  Travis Wager
                           Willem Van Hoorn
  Anabella Villalobos
                           Enoch Huang
  Spiros Liras
                           Jeff Howe

web 2
  Jerry Lanfear

								
To top