Docstoc

Homology modeling workshop

Document Sample
Homology modeling workshop Powered By Docstoc
					  Homology Modeling
     Workshop
GHIKLSYTVNEQNLKPERFFYTSAVAIL
Outline:
• Introduction to protein structure & databases

• Structure prediction approaches
  – Ab-initio
  – Threading
  – Homology modeling

• Hands ON
    From Sequence to Structure
Protein structure is hierarchic:
•   Primary – sequence of covalently attached amino acid
•   Secondary – local 3D patterns (helices, sheets, loops)
•   Tertiary – overall 3D fold
•   Quaternary – two or more protein chains
  From Sequence to Structure
• All information about the native structure of a protein is
  encoded in the amino acid sequence + its native solution
  environment.

• Many possible conformation  still only one or few native
  folds are exhibited for each protein (Levinthal‟s paradox)

• Protein folding is driven by various forces:
   – Ionic forces
   – Hydrogen bonds
   – The hydrophobic affect
   – ...
     Protein 3D Structures
A protein‟s structure has a critical effect on its function:

                  1. Binding pockets




                                                     PDB ID 1nw7
     Protein 3D Structures
A protein‟s structure has a critical effect on its function:

   2. Areas of specific chemical\electrical properties
     Protein 3D Structures
A protein‟s structure has a critical effect on its function:

   3. Importance of the global fold for function
Motivation to Acquire a Structure
• Identifying active and binding sites

• Characterization of the protein‟s mechanism
  (catalysis & interactions)

• Searching for ligand of a given binding site

• Understanding the molecular basis of diseases

• Designing mutants

• Drug design

• And more...
      Determining Structure
• NMR




• X-ray diffraction




• Electron Microscopy
Why predict protein structure if we
  can use experimental tools to
           determine it?
• Experimental methods are slow and expensive

• Some structures were failed to be solved

• A representative family structure can suffice to
  deduce structures of the entire family sequences
Protein databases
            Protein Sequence
          & Structure Databases
        Some of the available databases:

• RCSB- the Protein Data Bank- all deposited structures

• UniProt- main sequence database
   – SwissProt
   – Tremble

• NCBI- lots of databases, including sequence and structures

• PDBsum- combines structural & sequence data
   UniProt- Protein Sequence
            Database
• UniProt is a collaboration between the
  European Bioinformatics Institute (EBI), the
  Swiss Institute of Bioinformatics (SIB) and the
  Protein Information Resource (PIR).

• In 2002, the three institutes decided to pool
  their resources and expertise and formed the
  UniProt Consortium.
       UniProt- Protein Sequence
                Database
• The world's most comprehensive catalog of information on
  proteins

• Sequence, function & more…

• Comprised mainly of the databases:

   – SwissProt –516081 entries– high quality annotation, non-
     redundant & cross-referenced to many other databases.

   – TrEMBL – 10618387 entries – computer translation of the
     genetic information from the EMBL Nucleotide Sequence
     Database  many proteins are poorly annotated since
     only automatic annotation is generated
UniProt- Protein Sequence
         Database
UniProt- Protein Sequence
         Database
         Protein Data Bank (PDB)
• The PDB archive contains information about experimentally-
determined structures of proteins, nucleic acids, and complex
assemblies.

• The structures in the archive range from tiny proteins and bits
of DNA to complex molecular machines like the ribosome.

• There are currently 57013 structures deposited in the PDB.
However, taking out redundant sequences (e.g. 90%) reduces
the number of structures to 19988…

• Each structure receives a unique 4 letter ID
Protein Data Bank (PDB)
   http://www.rcsb.org/pdb/home/home.do

                  PDB ID: 3mht
Protein Data Bank (PDB)
   http://www.rcsb.org/pdb/home/home.do


                                    Download
                                    structure


                   The paper describing
                       the structure




                     Data concerning the
                          structure-
                    resolution, R-value….
                                                 Display
                                                structure
Protein Data Bank (PDB)




           Year
                       PdbSum
• A database providing an overview of all biological
  macromolecular structures

• Connected to UniProt  find the sequence accession of a
  known PDB ID

• Detailed description of many structure properties, e.g.:
  – EC number
  – Chains & ligands and their interactions
  – Clefts
  – Secondary structure
  – FASTA sequence of structure…
  –…
                  PdbSum
         http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/
PDB ID




                                                       Free text




                                                   Search by sequence
              PdbSum


                Useful tabs




   UniProt
  accession




Chains &
 ligands
PdbSum

     Protein tab




            Secondary structure-
               from the PDB
More Sequences Than Structures

• Discrepancy between the number of known sequences and
  solved structures:

             5,047,807 UniRef90 entries vs.
           25566 90% Non-redundant structures




Computational methods are needed to
      obtain more structures
Structure prediction
    approaches
 Structure Prediction Approaches
1. Homology (Comparative) Modeling
Based on sequence similarity with a protein for
which a structure has been solved.

2. Threading (Fold Recognition)
Requires a structure similar to a known structure

3. Ab-initio fold prediction
Not based on similarity to a sequence\structure
                      Ab-initio
Structure prediction from “first principals”:

    Given only the sequence, try to predict the structure
            based on physico-chemical properties
                (energy, hydrophobicity etc.)

•   When all else fails  works for novel folds

•   Shows that we understand the process
               The Force Field
                    (energy function)
    A group of mathematical expressions describing the
            potential energy of a molecular system

•   Each expression describes a different type of physico-
    chemical interaction between atoms in the system:

    •   Van der Waals forces
                                       Non-bonded
    •   Covalent bonds
                                         terms
    •   Hydrogen bonds

    •   Charges

    •   Hydrophobic effects
Approaches to Ab-initio Prediction
               1. Molecular Dynamics
• Simulates the forces that governs the protein within water.
• Since proteins usually naturally fold, this would lead to the
  native protein structure.

Problems:
• Thousands of atoms
• Huge number of time steps to reach folded protein
   feasible only for very small proteins
Approaches to Ab-initio Prediction
                2. Minimal Energy

    Assumption: the folded form is the minimal energy
                 conformation of a protein


 Main principals:
 • Define an energy function.
 • Search for 3D conformation that minimize energy.
                        Ab-initio
• Current methods (e.g. Rosetta) primarily utilize the
  fact that although we are far from observing all
  protein folds, we probably have seen nearly all sub-
  structures:

• A library of known sub-structures
 (fragments less than 10 residues) is created.

• A range of possible conformations for
  each fragment in the query protein are selected.



                         Moult J. Philos. Trans. R. Soc. B. 361:453–458 (2006)
Ab-initio - Example




      Moult J. Philos. Trans. R. Soc. B. 361:453–458 (2006)
        Fold Recognition (Threading):
       Sequence to structure matching
 Given a sequence and a library of folds, thread the sequence
    through each fold. Take the one with the highest score.
• Method will fail if new protein does not belong to any fold in
the library.


• Score of the threading is computed based on known
  physical chemistry properties & statistics of amino acids.


• In practice, fold recognition methods are often mixtures
of sequence matching and threading.
      Structure Prediction Approaches
              Threading: example
                           Input:
1. sequence
    H bond donor
   H bond acceptor
     Glycin
    Hydrophobic


2. Library of folds of known proteins
    Threading: example
H bond donor
H bond acceptor
Glycine
Hydrophobic




      S=-2        S=5     S=20
      Z= -1       Z=1.5   Z=5
            Fold recognition (threading)
       Find best fold for a protein sequence:

                               1)   ...     56)   ...      n)


                                     ...          ...


                              -10   ...    -123   ...   20.5
  MAHFPGFGQSLLFGYPVYVFGD...



                                    Potential fold

We need a scoring (energy) function to distinguish native
structure from misfolded structures.

Ideally, each misfolded structure should have an energy
higher than the native energy, i.e. :Emisfolded-Enative> 0
                      Fold recognition: FFAS03

 •The FFAS03 server provides an interface to the third
 generation of the profile-profile alignment and fold recognition
 algorithm FFAS.

 • Profile-profile alignments utilize information present in
 sequences of homologous proteins to amplify the sequence
 conservation pattern defining the family

 •The result: detection of remote homologies beyond the reach
 of other sequence comparison methods.



Jaroszewski, L., Rychlewski, L., Li, Z., Li, W. & Godzik, A. (2005) FFAS03: a server for profile-profile sequence
alignments. Nucl. Acids Res. 33, W284-W288
                       Fold recognition: HHPRED

                 Profiles are based on Hidden Markov Models:
                                 0.4

           0.1
                           0.1
                 0.5              0.6

                                 0.7
         0.4           0.7 0.2


                 0.3             0.6


                                           Emit Amino acid


Söding J. (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951-960.
                  Fold recognition: HHPRED

  • Profile Hidden Markov Models (HMMs) are similar to sequence
  profiles, but in addition to the amino acid frequencies they
  contain information about the frequency of inserts and deletions.

  • Using profile HMMs in place of simple sequence profiles should
  therefore further improve sensitivity.

  • The first to employ HMM-HMM comparison, based on a novel
  statistical method.

  • Using HMMs both on the query and the database side greatly
  enhances the sensitivity and selectivity over sequence-profile
  based.

Söding J. (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951-960.
    I-TASSER- Hybrid Approach

• In a recent wide blind experiment, I-TASSER
generated the best 3D structure predictions among
all automated servers.

• Based on the secondary-structure threading
and the iterative implementation of the Threading
ASSEmbly Refinement (TASSER) program.
I-TASSER
Homology Modeling
           Homology Modeling –
               Basic Idea
1.   A protein structure is defined by
     its amino acid sequence.

2.   Closely related sequences adopt
     highly similar structures, distantly
     related sequences may still fold
     into similar structures.

3.   Three-dimensional structure of
     proteins from the same family is
                                            Triophospate ismoerases
     more conserved than their              44.7% sequence identity
     primary sequences.                     0.95 RMSD
  Homology modeling requires handling
        structures & sequences

• Query- only the protein sequence is available- usually found
  at the UniProt database

• Template- after identification, both structural and sequence-
  related data should be found- UniPort (or NCBI databases),
  RCSB and PDBsum
                           Homology modeling-
                           widespread technique
                                    Identify
          Query protein                          Homologous protein-
           sequence                               structural template



                             Align query & template
                               protein sequences


                                  Build model

e.g. Fiser et al., 2004;
   Petrey et al., 2005;
          Zhang, 2008           Evaluate model
               General Scheme
1.   Searching for structures related to the query sequence

2.   Selecting templates

3.   Aligning query sequence with template structures

4.   Building a model for the query using information from
     the template structures                 Modeller

5.   Evaluating the model
                     Fiser A et al. Methods in Enzymology 374: 461-491(2004)
General Scheme
    1. Searching For Structures
•   Sequence search against the PDB sequences



•   Sequence-profile search



•   Threading: sequence-structure fitness function
     1. Searching For Structures
If BLAST search against the PDB fail to recognize adequate
templates, turn to fold recognition (threading) servers:

• FFAS03- http://ffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl

• HHPRED- http://toolkit.tuebingen.mpg.de/hhpred

• HMAP (available through the FUDGE pipeline)-
http://wiki.c2b2.columbia.edu/honiglab_public/index.php/Software:
PUDGE

• I-TASSER- http://zhang.bioinformatics.ku.edu/I-TASSER/

These servers not only find optional templates, but also suggest a
pairwise alignment and in some cases even construct the 3D
model.
         2. Selecting Templates
      How to select the right template?
•   Higher sequence similarity - %ID

•   Close subfamily - phylogenetic tree

•   “Environment” similarity - solvent, pH, ligand,
                                               Seq. 1
    quaternary interactions             Seq. 2
                                                  Seq. 3
                                                Seq. 4
•   The quality of the experimentally   determined
                                             Seq. 5
    structure                                 Seq. 6



•   Purpose of modeling - e.g. protein-ligand model vs.
    geometry of active site
          2. Selecting Templates
             More than one template

•   Two ways to combine multiple templates:

    –   Global model – alignment with different domain of
        the target with little overlap between them


    –   Local model – alignment with the same part of the
        target
      2. Selecting Templates
         More than one template

The more the merrier -
multiple structures with
the same fold:
        2. Selecting Templates
                 Trial and error
•   Generate a model for each candidate
    template and/or their combination.


•   Evaluate the models by an energy or
    any other scoring function.
    (will be discussed later…)
         3. Aligning query and
          template sequences


• All comparative modeling programs depend on a
  target-template alignment.

• When the sequence similarity between the template
  and target proteins is high, simple pairwise alignments
  are usually fine (e.g. Needleman-Wunsch global
  alignment).

• Gaps or low/medium sequence similarity indicate that
  we should improve the alignment...
          3. Aligning query and
           template sequences
                    Guidelines:
1.   Create a multiple sequence alignment and extract the
     template-query pairwise alignment.
Pairwise alignments – not enough!
              3. Aligning query and
               template sequences
                         Guidelines:
1.       Create a multiple sequence alignment and extract the
         template-query pairwise alignment.


         Template
         Query




     •     Visual inspection of alignments - difficult to teach…
           a matter of experience…
                3. Aligning query and
                 template sequences
                        Guidelines:
1.      Create a multiple sequence alignment and extract the
        template-query pairwise alignment.

2.      Use secondary structure information to improve
        pairwise alignment- avoid gaps in these regions!


     Query
     Template
          3. Aligning query and
           template sequences
                     Guidelines:
1.   Create a multiple sequence alignment and extract the
     template-query pairwise alignment

2.   Use secondary structure information to improve
     pairwise alignment- avoid gaps in these regions!

3.   Biochemical and structural previous data
           3. Aligning query and
            template sequences
                 Tips for MSA building
• Where? (to find homologues)
   • Structural templates- search against the PDB
   • Sequence homologues- search against SwissProt or
   Uniprot (recommended!)- usually using BLAST


• How many?
   • As many as possible, as long as the MSA looks good
   (next week…)
           3. Aligning query and
            template sequences
                 Tips for MSA building
• How long? (length of homologues)
   • Fragments- short homologues (less than 50,60% the
   query‟s length) = bad alignment
   • Ensure your sequences exhibit the wanted domain(s)
   • N/C terminal tend to vary in length between homologues
• How close? (distance from query sequence)
   • All too close- no information
   • Too many too far- bad alignment
   • Ensure that you have a balanced collection!
           3. Aligning query and
            template sequences
                Tips for MSA building
• From who? (which species the sequence belongs to)
   • Don‟t care, all homologues are welcome
   • Orthologues/paralogues may be helpful
   • Sequences from distant/close species provide different
   types of information


• Which alignment method?
   • The best today are MUSCLE, T-Coffee and MAFFT. All
   available at
        3. Aligning query and
         template sequences
            Tips for MSA building
• Most importantly, make sure that both the query
and the selected template are included in the MSA.


• Sequences which are more distant than the template
are not needed to be included in the alignment.
            3. Aligning query and
             template sequences
         Query-template alignment
      via a profile-to-profile approach:
1. Construct an MSA for the query, serving as profiles depicting
the protein family properties.

2. Align the profile to profiles of all proteins of the PDB, using,
e.g., FFAS03 or HHpred.

3. Compare pairwise alignments constructed via the different
methods – hope to get a consensus prediction…
        3. Aligning query and
         template sequences
Different levels of similarity between the template & query
        initiate various computational approaches:
                  4. Building a model
     Once you have an improved pairwise
  alignment between your query & template




           Use Modeller to build your model!


A. Sali & T.L. Blundell. Comparative protein modelling by satisfaction of spatial
restraints. J. Mol. Biol. 234, 779-815, 1993.
              4. Building a model

                     Modeller
    Generation and Refinement
    Using satisfaction of spatial restrains
    Can perform additional tasks:
     de novo modeling of loops
     Optimization of models – using an objective
       function
     Multiple alignment
     Comparison of protein structures
                4. Building a model

                         Modeller

• Other spatial features, such as
  hydrogen bonds, and dihedral angles,
  are transferred from the templates to
  the target.

• Thus, a number of spatial restraints
  on its structure are obtained.

• The 3D model is obtained by
  satisfying all the restraints as well as
  possible .
                  4. Building a model

                            Modeller
• Distance and dihedral angle restraints on the target are
 calculated from its alignment with template.

• Restraints were obtained also from a statistical analysis of the
  relationships from a large database of pairs of homologous
  structures.

• Various correlations were obtained, e.g. correlations between Ca-
  Ca distances. These relationships can be used directly as spatial
  restraints.

• Restraints and CHARMM energy terms are then combined into an
  objective function, which is optimized in 3D space.
         5. Model Evaluation
• The accuracy of the model depends on its
  sequence identity with the template:
         5. Model Evaluation
    The model can be assessed in two levels:

•   Global- reliability of the model as a whole.
    *Useful when several models are generated and
    one should be chosen as the best one.
    *When different models were based on various
    templates, may help choose the best one.

•   Local- assessing the reliability of the different
    regions, even specific residues, of the model.
    *Useful to detect local mistakes, that may
    originate in many time from alignment errors.
          5. Model Evaluation
        Examples of assessment approaches:

1. Assessment of the model‟s stereochemistry

2. Prediction of unreliable regions of the model -
   “pseudo energy” profile: peaks  errors

3. Consistence with experimental observations

4. Consistence with evolutionary conservation rates
Summary:
5 Basic Steps
Hands ON
            The Query Protein
Name: Dihydrodipicolinate reductase

Enzyme reaction:




Molecular process: Lysine biosynthesis (early stages)

Organism: E. coli

Sequence length: 273 aa
1. Searching For Structures
    1. Searching For Structures

                 Get your sequence
<DAPB_ECOLI
MHDANIRVAIAGAGGRMGRQLIQAALALEGVQLGAALEREGSSLLGSDAGELAGAG
KTGVTVQSSLDAVKDDFDVFIDFTRPEGTLNHLAFCRQHGKGMVIGTTGFDEAGKQ
AIRDAAADIAIVFAANFSVGVNVMLKLLEKAAKVMGDYTDIEIIEAHHRHKVDAPSGTA
LAMGEAIAHALDKDLKDCAVYSREGHTGERVPGTIGFATVRAGDIVGEHTAMFADIGE
RLEITHKASSRMTFANGAVRSALWLSGKESGLFDMRDVLDLNNL

               http://www.uniprot.org/
   1. Searching For Structures
Find templates with significant homology:

• BLAST against the sequences in the PDB


Find also more distant templates, using profile-to-
profile approach:

  • FFAS03 server
  • HHPRED server
1. Searching For Structures
         Blast against the PDB




      http://www.ncbi.nlm.nih.gov/BLAST/
     1. Searching For Structures
              Blast against the PDB



                                         1. Paste
                                        sequence


                                     2. Select the PDB
                                         database


3.




           http://www.ncbi.nlm.nih.gov/BLAST/
1. Searching For Structures
         Blast against the PDB




      http://www.ncbi.nlm.nih.gov/BLAST/
1. Searching For Structures
            Use fold recognition - FFAS03




                                             1. Paste
Select the PDB                              sequence
  database
                   Run
     1. Searching For Structures
                 Use fold recognition - HHPRED
                   http://toolkit.tuebingen.mpg.de/hhpred




Select the PDB                                               1. Paste
  database                                                  sequence




                                                               Run
2. Selecting templates
2. Selecting templates
     Blast against the PDB




                             The real structure
                               of our protein


                         Closest homologous
                              structure
2. Selecting templates
       Blast against the PDB




                                         The selected
                                          template:
                                        1VM6, chain A




   http://www.ncbi.nlm.nih.gov/BLAST/
    2. Selecting templates
            Use fold recognition - FFAS03




http://ffas.ljcrf.edu/ffas-cgi/cgi/get_mu.pl?ses=&qdb=public&tdb
=PDB0408&type=re&key=221830166.3750.0000000
2. Selecting templates
  Use fold recognition - FFAS03




Scores below -9.5  significant
 2. Selecting templates
         Use fold recognition - HHPRED




http://toolkit.tuebingen.mpg.de/hhpred/histograms/8455009
2. Selecting templates
  Use fold recognition - HHPRED
2. Selecting templates
        Who is our template?




                                PDB ID 1VM6 is
                                 UniProt entry
                                „DAPB_THEMA‟

  www.ebi.ac.uk/thornton-srv/databases/pdbsum
3. Alignment
    3. Alignment
http://consurftest.tau.ac.il/
3. Alignment



               No model
                yet…


           We will use ConSurf to
            get homologues and
               build and MSA
                3. Alignment




      Set to
     max- 500
                                  Alignment
                                   method
Redundanc
       y                          Database;
Min. identity                  Swissprot/uniprot/
                                 uniref90/NR
3. Alignment




               Job name
       Email
3. Alignment
3. Alignment




                          PSIBLAST result

                          Filtered sequences

    MSA- download the file- right
       click on the mouse
              Easiest Using Bioedit
• http://www.mbio.ncsu.edu/BioEdit/BioEdit.html

• Easy-to-use sequence alignment editor

• View and manipulate alignments up to 20,000 sequences.

•Four modes of manual alignment: select and slide, dynamic grab
and drag, gap insert and delete by mouse click, and on-screen
typing which behaves like a text editor.

•Reads and writes Genbank, Fasta, Phylip 3.2, Phylip 4, and
NBRF/PIR formats. Also reads GCG and Clustal formats
 Easiest Using Bioedit




http://www.mbio.ncsu.edu/BioEdit/bioedit.html
                Easiest Using Bioedit
• Find a specific sequence: “Edit-> search -< in titles”

• Erase\add sequences: “Edit-> cut\paste\delete sequence”

• “Sequence Identity matrix” under “Alignment”-
   useful for a rough evaluation of distances within the alignment.

• After taking out sequences, “Minimize Alignment” under
  “Alignment” takes out unessential gaps.

• Can save an image using:
  “File -< Graphic View” & then “Edit -< Copy page as BITMAP”


         http://www.mbio.ncsu.edu/BioEdit/bioedit.html
                    3. Alignment
        Extract query-template pairwise alignment

1. Open: Start  Phylogeny  BioEdit

2. Open the alignment: file  open  „query.aln‟

2. Select the template:
          Edit  Search  Find in Titles  “DAPB_THEMA”
         3. Alignment
Extract query-template pairwise alignment

                            “DAPB_THEMA”
                      3. Alignment
         Extract query-template pairwise alignment

4. Add the query to the template selection: ctrl + „query‟

5. Invert selection: Edit  invert title selection

6. Delete other sequences: Edit  Cut Sequences(s)

7. Minimize gaps: Alignment  Minimize Alignment

8. Save the pairwise alignment:
   File  Save as (Fasta format)  “DAPB_ECOLI_1VM6.fas”
                        3. Alignment
        Extract query-template pairwise alignment



           query
           DAPB_THEMA




                                       File name




Save as “fasta” format!!!!!!!
     3. Alignment
  Use fold recognition - FFAS03




Scores below -9.5  significant
                         3. Alignment
                    Use fold recognition - FFAS03




http://ffas.ljcrf.edu/ffas-cgi/cgi/get_mu.pl?ses=&qdb=public&tdb
=PDB0408&type=re&key=221830166.3750.0000000
             3. Alignment
         Use fold recognition - HHPRED




http://toolkit.tuebingen.mpg.de/hhpred/histograms/8455009
    3. Alignment
Use fold recognition - HHPRED
                   3. Alignment
       Inspect query-template pairwise alignment
• Generally speaking, in this step we would compare the
  pairwise alignments computed by the three approaches:
   • MSA-derived
   • FFAS03
   • HHPRED

• We don‟t have the time/patience for that now….

• Thus, we will now edit the pairwise from the MSA- Modeller
  requires a specific format, which we have to manually adjust
                       3. Alignment
            Edit query-template pairwise alignment
                        The name of the query protein (this will
                        be the name of the modeled PDB file)
>P1; DAPB_ECOLI
sequence:DAPB_ECOLI:1:A:274:A :::: Start, end and chain
MHDANIRVAIAGAGGRMGRQLIQAALALEGVQLGAALEREGSSLLGSDAGELAGAGKTGV
TVQSSLDAVKDDFDVFIDFTRPEGTLNHLAFCRQHGKGMVIGTTGFDEAGKQAIRDAAAD
IAIVFAANFSVGVNVMLKLLEKAAKVMGDYTDIEIIEAHHRHKVDAPSGTALAMGEAIAH
ALDKDLKDCAVYSREGHTGERVPGTIGFATVRAGDIVGEHTAMFADIGERLEITHKASSR
MTFANGAVRSALWLSGKESGLFDMRDVLDLNNL*
                                        The PDB file of the template
>P1;1VM6                                  (rename DAPB_THEMA)
structureX:1VM6:1:A:212:A ::::
-----MKYGIVGYSGRMGQEIQKVFSE-KGHELVLKVDV---------------------
---NGVEEL-DSPDVVIDFSSPEALPKTVDLCKKYRAGLVLGTTALKEEHLQMLRELSKE
VPVVQAYNFSIGINVLKRFLSELVKVLE-DWDVEIVETHHRFKKDAPSGTAILLESAL--
------------------GK----SVPIHSLRVGGVPGDHVVVFGNIGETIEIKHRAISR
TVFAIGALKAAEFLVGKDPGMYSFEEVI-----*

                       Save as “dapb_ecoli_1vm6.pir”
4. Model Building
A script for Modeller- copy to a text file….
 from modeller import *
 from modeller.automodel import *

 log.verbose()
 env = environ()

 a = automodel(env,
           alnfile = 'dapb_ecoli_1vm6.pir',
           knowns = ('1VM6'),
           sequence = 'DAPB_ECOLI')
 a.starting_model= 1
 a.ending_model = 1

 a.make()
4. Model Building
                                          1. Paste the
                                           template‟s
4. Model Building                        PDB ID “1VM6”
 Get the template structure




                                                         2.




  http://www.rcsb.org/pdb/home/home.do
                  4. Model Building
              Get the template structure: 1vm6 chain A


  Save as:
“1VM6.pdb”



  Notice:
   case
 sensitive!
          4. Model Building
                 Running modeller:

1. Put the PDB file, PIR alignment and modeller
    script in a specific directory, e.g. c:\test
2. Desktop  Modeller:
          4. Model Building
                Running modeller:

3. “cd c:\test”
4. “mod9v7 [modeller script name]
          4. Model Building
                Running modeller:

5. The run completed successfully:
             4. Model Building
                    Running modeller:
6. Output files:
   • Model, e.g. “P2RX1_HUMAN.B99990001.pdb”
   • Log file- very important- specifies the problems of
       the run
   • Other, not important, files

7. Open pymol and look at your model….

8. Evaluate it- tomorrow!
             4. Model Building
         Edit query-template pairwise alignment

Watch out! Modeller can fail owing to:

1. Non-matching start and end points of the template
   at the PIR alignment and PDB template file

2. Small discrepancies between the sequence of the
  template and in the PIR alignment… may have to
  manually edit the alignment a little…

This, and more, will be reported in the log file 