Acrobat PDF

Smart Virtual Screen Protocol

You must be logged in to download this document
Reviews
Shared by: Lisa Baker
Stats
views:
132
rating:
not rated
reviews:
0
posted:
2/7/2008
language:
English
pages:
0
Pharmaceutical case study Page 1 of 8 Creating a Smart Virtual Screening Protocol, Part I: Preparing the Target Protein Industry Sector by Teresa Lyons, Luke Fisher, Shikha Varma, and Deqi Chen This paper outlines a series of steps taken to prepare the protein dihydrofolate reductase (DHFR) for virtual high throughput screening (vHTS). Several atomic structures of DHFR, with and without bound ligands, are already known. Therefore, the process described here explains how to choose the best structure for the docking study by asking questions that help eliminate inappropriate structures. The final cut is done with an eye on the ultimate question: Will the protein score well against inhibitors? Pharmaceutical Organization Accelrys Key Products Discovery Studio® (DS) Gene DS ViewerPro DS Modeling 1.1 and 1.2 SBD (including DS CHARMm®, DS LigandFit, and DS LigandScore) Insight II® Introduction Virtual high throughput screening (vHTS) can be an excellent way to save time and resources in the search for new leads. The process involves screening a library of compounds against a protein structure. Hence, the very first step in any vHTS project is to prepare the protein structure for the screening process. When more than one representative structure is available for the target protein, one must determine whether any structure is appropriate for vHTS and, if so, which one. Where computational resources are unlimited, all target structures can be screened against and there is a potential to account for changes in the binding pocket, using consensus, ensemble, or best-scoring criteria. However, this kind of comprehensive screening is usually not feasible and narrowing the target list is typically necessary. This paper outlines a process for choosing the best protein structure for vHTS and preparing that structure to get the most out of the screening process. DHFR catalyses the reduction of dihydrofolate to tetrahydrofolic acid, an important step in DNA synthesis. DHFRs are the targets of numerous existing drugs, ranging from antibiotics to antiviral to anticancer. DHFR was used for this study because results from a high throughput screen of a diverse library 1 had been recently published . A team of Accelrys scientists sought to reproduce the results of this screen using computational methods. From this work, they developed a model for predicting the results of a subsequent screen of a 2 different library against the same protein . The results of this vHTS effort are published in two other case studies 3, 4 . Accelrys Corporate Headquarters 10188 Telesis Court, Suite 100 San Diego, CA 92121 United States Tel: +1 858 799 5000 Accelrys European Headquarters 334 Cambridge Science Park Cambridge, CB4 0WN, UK Tel: +44 1223 228500 Accelrys Asia Headquarters Nishi-shimbashi TS Bldg 11F Nishi-shimbashi 3-3-1, Minato-ku, Tokyo, 105-0003, Japan Tel: +81 3 3578 3860 Pharmaceuticals case study continued Methods Searching the Protein Data Bank (PDB) and Eliminating the Obvious The PDB website was searched using both ‘DHFR’ and 'dihydrofolate reductase’ for text searches. Several parameters (source, sequence, resolution, and ligand, or HETATM, information) can be obtained directly on the PDB search results using the ‘Create a Tabular Report’ option from the Query Result Browser. The information can be downloaded as a comma-deliminated text file and opened in a sortable spreadsheet. The search results were merged and culled by generating reports directly from the Research Collaboratory for Structural Bioinformatics (RCSB) result list and eliminating structures from the wrong organism, with the wrong sequence, with a lower than 2.0 Angstrom resolution or with no DHF-like inhibitor. The amino acid sequences had to be compared in a separate program. Importing the sequences into DS Gene easily identifies site-directed mutations that might not be mentioned in the PDB title. Checking for Completeness of the Ligands Verifying ligand completeness required downloading the PDB files from RCSB. The structures were loaded into DS ViewerPro and the ligands and cofactors were isolated. DS ViewerPro has a data table that gives a breakdown of chain, monomer, and atom information, including atomic composition. The FORMUL entry in the PDB file header was compared with the atomic composition of the ligands provided in the data table. Preparing Proteins and Ligands for Simulation The PDB structures were loaded into DS Modeling 1.1 with clean options turned on. This option fixes incomplete sidechains, corrects unconventional atom names, and adds hydrogens to the protein part of the structures. Bond orders were manually adjusted on the ligand and formal charges were added, where appropriate. The resulting atomic makeup and charge of the ligands was checked against the FORMUL entry from the PDB file header. Once hydrogens were added to both protein and ligand parts of the structure, the remaining heteroatoms (typically water and some salts) were removed. The atom types were assigned using the CFF forcefield in DS Modeling 1.1. Then the DS CHARMm module of DS Modeling 1.1 was used to minimize the hydrogen atom positions, while holding the heavy atoms with a harmonic restraint. A distance-dependent dielectric was used to approximate solvent effects. The calculation included 200 steps of steepest descent minimization, followed by 200 steps of Adopted-Basis Newton-Raphson minimization. Page 2 of 8 Pharmaceuticals case study continued Preparing Files for Cross-Scoring The structures of the DHFRs were aligned by identifying common amino acids at the protein-ligand interface and superimposing those residues. This was done in Insight II by creating subsets of residues within 2.5 Angstroms of the ligand, listing the subsets, and looking for common residues. The same procedure could have been followed using DS Modeling 1.1. Once identified, the binding pocket residues were superimposed by heavy atom. The ligands were then extracted from the proteins. The ligands and proteins were saved in separate files, but in the same relative coordinate frame. The NADPH cofactor, where present, was preserved as part of the protein structure. Cross-Scoring of the Final Protein Set of vHTS Candidate Structures Each protein was docked with each ligand using DS LigandFit. The original ligand orientations were only rigid-body minimized against the protein; no Monte Carlo search was done against the binding pocket. The resulting complex was scored using DS LigandScore and the follow scoring functions: Ludi 3, PMF, Jain, PLP1, PLP2, and LigScore2. Both DS LigandFit and DS LigandScore are available in DS Modeling 1.2 SBD. The scoring information was tabulated and the average and standard deviation score for each ligand, for each scoring function, was computed. If a protein-ligand complex had a score greater than the average plus one standard deviation, it contributed 1 point to the score for that DHFR structure. A score between a half and one standard deviation above the average was assigned 0.5 point for the DHFR. Page 3 of 8 Results and Discussion Filtering Hits from the RCSB PDB The first step a researcher needs to do before deciding whether a vHTS or docking project is viable is to confirm that there is a structure for the protein target of interest. Some labs are fortunate enough to have internal structure determination efforts and might have an unpublished structure of their protein, perhaps even with a ligand already bound. However, the majority of researchers rely on public databases, such the PDB and PSI-BLAST known. In some cases, there is no structure available for the protein of interest. If there is a related protein that has been structurally characterized, a homology model may be generated based on the known structure. If there is no sufficiently homologous template, the computational effort may shift to a ligandbased design protocol where no structural information for the protein is required. In the case of DHFR, there are many structures of the protein already publicly available. In fact, a search of the PDB using the text ‘dihydrofolate reductase’ yields 115 structures and a search using ‘DHFR’ yields 96 structures (See Methods, Searching the PDB and Eliminating the Obvious). 5 6 NOTE: A researcher who had the correct sequence of DHFR at the outset would have probably skipped the first two questions and used PSIBLAST to search the PDB sequence database. However, a search using default search parameters yields only seven structures, not the 35 that actually exist for E. coli DHFR. So, while NCBI's PSI-BLAST is useful for identifying homologous proteins, it is not the best tool for doing a comprehensive search of the PDB for structures matching a particular amino acid sequence. to find a structure for their protein or the closest related protein whose structure is Pharmaceuticals case study continued Page 4 of 8 When these two result sets are merged, there are a total of 120 DHFR structures deposited in the public domain. Where the potential set of structures is large, it helps to eliminate structures by asking several simple questions: 1) Which are from the correct organism? • In this case, 45 of the 120 are from E. coli, which is the organism of inter- est for this study. 2) Which have the correct sequence? • 35 of the 45 have the correct sequence. 3) Which structures are high-resolution (better than 2 Angstroms)? • 18 of the 35 are high resolution. 4) Which structures contain ligands? • All 18 contain large heteroatoms, but only 14 have a ligand in the DHF pocket. 5) Which of the complexed structures have complete ligands? • The 14 PDB files contain a total of 19 DHFR chains (five have A and B monomers modeled into the crystallographic assymmetric unit). Of these, 12 structures representing 10 PDBs have complete heteroatom groups. Two structures have both the NADPH cofactor and an inhibitor; one has a truncated form of NADPH (2'-monophosphoadenosine-5'diphosphate) in the cofactor pocket and an inhibitor. The other nine DHFR structures contain only an inhibitor. In proteins that do not contain a ligand, the binding pocket tends to be slightly collapsed (smaller) and the sidechains of residues in the pocket may not be pointed in directions suitable for binding a compound. In some cases, there are dramatic changes that occur in the structure of a protein upon binding to an inhibitor. The presence of a ligand assures that the binding pocket is ‘primed,’ i.e. the orientations of the sidechains are favorable to interact with a ligand. Because programs designed for vHTS typically treat proteins as rigid objects and, at best, use a soft VanDerWaal’s potential to imply flexibility, the openness of the pocket is crucial. It is therefore important to take advantage of a protein in a protein-ligand complex, when it is available, rather than settling for a high-resolution apo-protein. At this point, it was helpful to know that DHFR binds to two large molecules− NADPH, which is a cofactor, and dihydrofolate, a DHF-like inhibitor. Because this vHTS effort was focused on finding an inhibitor, and an inhibitor will most likely bind in the DHF pocket, structures containing only an NADPH molecule with nothing in the DHF pocket were discarded, bringing the set of structures down to 14 after question (4), above. NOTE: The ligand record in the RCSB (PDB) report is not always complete. Specifically, two of the 18 structures, 1JOL and 1JOM, came up with no large heteroatom molecule in the report. However, both PDB files actually did contain an inhibitor and were retained moving forward. While these kinds of database curation mistakes are probably rare, double-checking is certainly warranted. Pharmaceuticals case study continued The next step in the process is to examine the structures for completeness (See Methods, Checking for Completeness of the Ligands). Sometimes the whole ligand is not modeled because the experimental information cannot specifically place some atoms in space. In this case, the ligand is in the PDB file, but it may be missing a few atoms. Often the B chain of an A/B set is the less resolved structure. This was indeed the case for DHFR, where three of the B chains had to be discarded for heteroatom incompleteness. The final good candidate set is shown in Table 1. PDB ID 1DYH_A 1DYH_B 1DYI_A 1DYJ_A 1JOL_A 1JOM 1RA2 1RA8 1RG7 1RX2 3DRC_A 3DRC_B Page 5 of 8 Inhibitor DZF DZF FOL DDF FFO FFO FOL FOL MTX FOL MTX MTX NADPH (y/n) n n n n n n y y (ATR) n y n n Resolution 1.9 1.9 1.9 1.85 1.96 1.9 1.6 1.8 2.0 1.8 1.9 1.9 Inhibitor key: DZF: 5-deazafolic acid FOL: folic acid DDF: 5,10-dideazafolate FFO: 5-formyl-6-hydrofolic acid MTX: methotrexate ATR: 2'-monophosphoadenosine-5'diphosphate Table 1: The final good candidate set of high resolution E. coli wild-type DHFRs with inhibitor bound. Choosing a Structure from a Good Candidate Set Twelve is still a rather large number of structures to use for a vHTS experiment. There are many ways one choose the final one or two structures used for screening. In this case, for instance, NADPH can be seen in the crystal structure to contribute to the binding of the inhibitor. We also might want to choose the structure with the highest resolution. Using these two criteria, one could pick 1RA2 right away from the set, as it is the higher-resolution structure of the two that contain both NADPH and an inhibitor. However, the method used for this project focused on choosing the structure that was specifically going to give the highest score for good inhibitors during the vHTS. Hence, the ‘best’ structure should give a high score for correctly docked known inhibitors. To this end, the structures were all used in a cross-scoring experiment. Pharmaceuticals case study continued Preparing the Proteins for Docking The goals of this stage of the protein preparation are to: 1) Prepare the proteins and ligands for simulation and vHTS 2) Structurally align the complexes so that cross-scoring may be done Many of the protein structures had several incomplete residue sidechains. None of the incomplete residues were within eight Angstroms of the bound inhibitor. These sidechains were completed and hydrogens were added to the protein (See Methods, Preparing Proteins and Ligands for Simulation). The ligand and cofactor have only single bonds when they are part of a PDB file. These bond orders were corrected manually. Formal charges were assigned where appropriate and, finally, hydrogens were added to the ligand and cofactor. The PDB file header's FORMUL field is very useful for this, as it contains information about both the number of hydrogens and the formal charge on the heteroatom. Most programs do not optimize hydrogen positions when adding hydrogens, so steric clashes of methyl hydrogens and inappropriate OH orientation are common. Therefore, it is critical that the interactions implied by the heavy atom positions be faithfully portrayed by the hydrogen positions, i.e. that steric clashes involving hydrogens be eliminated and hydrogen bonds be preserved. To this end, the hydrogen atom positions were minimized after their addition. Next, the proteins needed to be aligned with each other (See Methods, Preparing files for Cross-scoring). This could easily be done by a structure alignment based on sequence (since all of these structures have the same protein sequence). To get a more rigorous overlay of the binding pocket, however, just the residues around the binding pocket were superimposed. Seven residues were consistently at the interface in all the structures and their heavy atoms were used for the superimposition. This gives a common orientation for both the ligands and the DHFR structures. For the two DHFRs containing a cofactor, the NADPH was kept as part of the protein structure. For 3DRC, A and B polypeptide chains are identical so the set of protein structures was culled to 11 (all inhibitor structures were kept). The inhibitors were saved separately from the proteins using the same reference coordinate frames. This provided the starting files for the cross-scoring experiment. Cross-Scoring of Ligands to DHFRs The goal of this step is to determine which of the 11 remaining DHFR structures are most permissive to the known inhibitors and will, therefore, most likely give a high score for potential lead molecules in the vHTS to follow. Page 6 of 8 Pharmaceuticals case study continued The simplest way to do this is to score each ligand against all 11 DHFRs in the relative orientations obtained from the structural alignment just described. However, since slight changes in position of the ligand in the binding pocket may result in large changes in scoring (especially in scoring functions where small steric clashes are penalized), a more thorough approach will allow the ligand to rigidly move from its starting position before scoring. Each DHFR structure was screened against each crystallographic inhibitor using a rigidbody minimization of the inhibitor to optimize the protein-ligand contacts (See Methods, Cross-Scoring of the Final Protein Set of vHTS Candidate Structures). Next, the complexes were scored with several scoring functions: Ludi3, PMF, Jain, PLP1, PLP2, and LigScore2. The result of the cross-scoring was a matrix of how each inhibitor scores against each DHFR structure for several scoring functions. To get a consensus of how permissive each protein was across all inhibitors, the average and standard deviation was calculated for each scoring function, for each ligand. One point was given to a protein for which the ligand/protein complex scored better than the average plus a standard deviation. A half of a point was assigned to a complex that scored between a half and one standard deviation over the average. No points were given for the protein in complex with its original ligand. Table 2 shows the results of this scoring method. Scoring Ludi 3 Function -PMF Jain -PLP1 -PLP2 Total -PLP2 LigScore2 Scores Page 7 of 8 3DRC_A 1DYH_A 1DYH_B 1DYI 1DYJ 1JOL 1JOM 1RA2 1RA8 1RG7 1RX2 1.0 0.0 0.5 0.0 0.0 0.0 0.0 11.0 2.5 1.0 11.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 9.0 6.0 5.0 9.0 6.0 0.0 0.0 0.0 0.0 0.5 0.0 8.0 1.5 2.0 9.5 0.5 2.0 1.0 1.5 1.0 2.0 0.0 4.0 0.5 2.0 1.5 1.0 2.0 0.5 1.0 1.0 2.0 0.0 5.0 0.5 2.0 3.5 0.5 3.5 2.5 4.0 4.0 5.5 0.0 0.5 2.5 1.0 0.5 9.0 7.5 4.5 6.5 6.0 10.0 0.5 37.5 13.5 13.0 35.0 Table 2: Points awarded each DHFR structure based on cross-scoring against known inhibitors. Pharmaceuticals case study continued Page 8 of 8 The dataset contained 11 protein structures and 12 ligand structures. So, the highest potential score for any given protein for a scoring function was 11 points (except for 3DRC, where the high was 10 points). Across all six scoring functions, the highest potential total was 66 (or 60 for 3DRC). The DHFRs that scored the highest, by far, were 1RA2 and 1RX2, with 37.5 and 35 total points, respectively. All the other proteins scored a total of 13.5 points or less. This large difference in score between DHFRs with a cofactor bound versus ones with no cofactor strongly demonstrates that NADPH contributes to inhibitor binding. Since 1RA2 has the slightly higher score and is a higher-resolution structure, it was chosen for the vHTS experiment (Figure 1). The preparation of the compound library and virtual screening of DHFR is detailed in two other application notes: • "Creating a Smart Virtual Screening Protocol II: Recursive Partitioning for Sequential Screening" by Varma et al. and • "Creating a Smart Virtual Screening Protocol III: Using Structure-Based Conclusions This paper uses DHFR as an example to outline a logical method of choosing a protein structure for vHTS and preparing it for docking. In instances where there are several representative structures available, the set can be narrowed down based on criteria that range from the trivial to the specific. On the most remedial level, the protein structure must be from the organism of interest and have the correct sequence. Ideally, it should also be primed to accept a ligand and be of high resolution. From a set of 120 publicly available DHFR structures, were good potential candidates. The final cut was made by cross-scoring each protein against the other proteins’ inhibitors in their known binding conformations. The results clearly demonstrated that having the NADPH cofactor in the binding pocket contributes substantially to scoring (and, by extension, binding). Drug Design to Enhance an In Silico Workflow" by Fisher et al. References 1) Zolli-Juran M., Cechetto J.D., Hartlen R., et al., ‘High Throughput Screening Identifies Novel Inhibitors of Escherichia coli Dihydrofolate Reductase that are Competitive with Dihydrofolate,’ Bioorg.Med.Chem.Lett., 2003, 13, 2493-2496. 2) Varma S., et al. American Chemical Society, Southwest Regional Meeting, September 29th to October 2nd, 2004. Abstract 8160. 3) Varma S., Fisher L., Lyons T., Chen D., ‘Creating a Smart Virtual Screening Protocol II: Recursive Partitioning for Sequential Screening,’ available at http://www.accelrys.com/cases/smart_virtual_screening_.pdf. 4) Fisher L., Varma S., Lyons T., Chen D., ‘Smart Virtual Screening Protocol III: Using Structure Based Drug Design to Enhance an In Silico Workflow,’ available at http://www.accelrys.com/cases/appindex.html#rational. 5) The Research Collaboratory for Structural Bioinformatics’ Protein Databank, available at http://nist.rcsb.org/pdb/index.html. 6) The National Center for Biotechnology’s Information Position Specific Iterative (PSI) BLAST, available at http://www.ncbi.nlm.nih.gov/blast/Blast.cgi. Figure 1: A schematic view of the 1RA2 (PDB ID) structure of E. coli DHFR, with cofactor and inhibitor bound. The large pocket where the cofactor and inhibitor sit is a transparent gray surface. NADPH is rendered as sticks. Folate, the inhibitor, is rendered in ball-and-stick mode. Creating a Smart Virtual Screening Protocol, Part I: Preparing the Target Protein

Related docs
Creating a Smart Virtual Screen Protocol
Views: 131  |  Downloads: 5
Screening Protocol
Views: 194  |  Downloads: 6
Virtual Smart Card_1_
Views: 0  |  Downloads: 0
Virtual Smart Card_2_
Views: 0  |  Downloads: 0
virtual
Views: 4  |  Downloads: 0
Virtual Smart Card_3_
Views: 0  |  Downloads: 0
Virtual Smart Card_4_
Views: 0  |  Downloads: 0
Are Smart Cards Useful
Views: 4  |  Downloads: 0
Screen Smart, Lease Smart
Views: 1  |  Downloads: 0
Virtual Smart Card
Views: 0  |  Downloads: 0
SCREEN Australia - Proposal
Views: 2  |  Downloads: 0
smart simplicity
Views: 5  |  Downloads: 0
premium docs
Other docs by Lisa Baker
UNIVERSIDAD DE LOS ANDES
Views: 1132  |  Downloads: 8
UNIDAD SEGUNDA
Views: 884  |  Downloads: 6
Tocar hoy vive para la eternidad
Views: 667  |  Downloads: 2
Timbres Fiscales
Views: 1237  |  Downloads: 0
TÉRMINOS DE REFERENCIA
Views: 783  |  Downloads: 14
Taller de Escalada
Views: 646  |  Downloads: 2
SUB-DIRECCION DE DEFENSA DEL TRABAJADOR
Views: 2641  |  Downloads: 2
SOLICITUD Y FORMULARIO DE VERIFICACIÓN
Views: 662  |  Downloads: 1
SOLICITUD VISA L
Views: 724  |  Downloads: 0
SOLICITUD DE
Views: 457  |  Downloads: 0