Docstoc

Shoba Ranganathan

Document Sample
Shoba Ranganathan Powered By Docstoc
					      Biomolecular Modeling:
         building a 3D protein
     structure from its sequence
          Prof Shoba Ranganathan
   Dept. of Chemistry and Biomolecular Sciences,
      Macquarie University, Sydney, Australia &
Dept of Biochemistry, Yong Loo Lin School of Medicine
           National University of Singapore
               (shoba@bic.nus.edu.sg)
               Why protein structure?
   In the factory of the living cell, proteins
    are the workers, performing a variety of
    tasks
     Each   protein adopts a particular folding
      pattern that determines its function
        The 3D structure of a protein brings
         into close proximity residues that are
         far apart in the amino acid sequence
          How does a protein fold?
   Most newly synthesized proteins fold
    without assistance!
     Ribonuclease A: denatured protein
      could refold and recover its activity (C.
      Anfinsen -1966)
        “Structure implies function”

           The amino acid sequence encodes
            the protein’s structural information
1. Understanding Protein Structure
2. A Quick Overview of Sequence Analysis
3. Finding a Structural Homologue
4. Template Selection
5. Aligning the Query Sequence to
   Template Structure(s)
6. Building the Model
                                 The basics
   Proteins are linear heteropolymers: one or more
    polypeptide chains
      Repeat units: 20 amino acid residues

         Range from a few 10s-1000s

            Three-dimensional shapes (“folds”)
             adopted vary enormously
               Experimental methods: X-ray

                crystallography, electron microscopy
                and NMR (nuclear magnetic
                resonance)
            The (L-)amino acid

            R   Side chain = H,CH3,…


                  Backbone
Amino
        +   Ca
                           O
    N
                   C   -   Carboxylate
                     O
The peptide bond
Coplanar atoms
               Levels of protein structure
   Zeroth: amino acid composition
   Primary
       This is simply the order of covalent linkages
        along the polypeptide chain, i.e. the sequence
        itself
                Levels of protein structure
   Secondary
       Local organization of the protein backbone: a-
        helix, b-strand (which assemble into b-sheets),
        turn and interconnecting loop
   Ramachandran / phi-psi plot

b-sheet

                        a-helix (left
                         handed)
           y
 a-helix
  (right
handed)
               f
Levels of protein structure
          Tertiary
              packing of secondary
               structure elements into
               a compact spatial unit
              “Fold” or domain – this
               is the level to which
               structure prediction is
               currently possible
Levels of protein structure
          Quaternary
              Assembly of homo- or
               heteromeric protein
               chains
              Usually the functional
               unit of a protein,
               especially for enzymes
                  Structural classes

All-a (helical)      All-b (sheet)
                         Structural classes

a/b (parallel b-sheet)    a+b (antiparallel b-sheet)
                 Structural information

   Protein Data Bank: maintained by the
    Research Collaboratory for Structural
    Bioinformatics
     http://www.rcsb.org/pdb

     > 45,744 structures of proteins

     Also contains structures of DNA,
      carbohydrates, protein-DNA complexes
      and numerous small ligand molecules.
                          The PDB data
   Text files
   Each entry is identified by a unique 4-
    letter code: say 1emg
   1emg entry
      Header information

      Atomic coordinates in Å (1 Ångstrom

       = 1.0e-10 m)
                                PDB Header details
            identifies the molecule, any modifications, date
             of release of PDB entry
HEADER        GREENFLUORESCENT PROTEIN                12-NOV-98   1EMG
TITLE         GREEN FLUORESCENT PROTEIN (65-67 REPLACED BY CRO, S65T
TITLE        2 SUBSTITUTION, Q80R)
COMPND        MOL_ID: 1;
COMPND       2 MOLECULE: GREEN FLUORESCENT PROTEIN;
COMPND       3 CHAIN: A;
COMPND       4 ENGINEERED: YES;
COMPND       5 MUTATION: 65 - 67 REPLACED BY CRO, S65T SUBSTITUTION, Q80R
COMPND       6 SUBSTITUTION;
COMPND       7 BIOLOGICAL_UNIT: MONOMER

            organism, keywords, method
            Authors, reference, resolution if X-ray structure
            Sequence, x-reference to sequence databases
                                           The data itself
       Coordinates for each heavy (non-hydrogen) atom
        from the first residue to the last
ATOM       1   N    SER   A   2   29.089    9.397   51.904   1.00   81.75
ATOM       2   CA   SER   A   2   27.883   10.162   52.185   1.00   79.71
ATOM       3   C    SER   A   2   26.659    9.634   51.463   1.00   82.64
ATOM       4   O    SER   A   2   26.718    8.686   50.686   1.00   81.02
ATOM       5   CB   SER   A   2   28.039   11.660   51.932   1.00   75.59
ATOM       6   OG   SER   A   2   27.582   12.038   50.639   1.00   43.28
-------
ATOM    1737   CD1 ILE A 229      39.535   21.584   52.346   1.00 41.62
TER     1738       ILE A 229


       Any ligands (starting with HETATM) follow the
        biomacromolecule
       O of water molecules (also HETATM) at the end
                         Structural Families
   SCOP - Structural Classification Of
    Proteins
       http://scop.mrc-lmb.cam.ac.uk/scop
   FSSP – Family of Structurally Similar
    Proteins
       http://www.ebi.ac.uk/dali/fssp/
   CATH – Class, Architecture, Topology,
    Homology
       http://www.biochem.ucl.ac.uk/bsm/cath
         Structure comparison facts
   Proteins adopt a limited number of
    topologies.
      Homologous sequences show very
       similar structures, with strong
       conservation in secondary structural
       elements: variations in non-conserved
       regions.
         In the absence of sequence

          homology, some folds are preferred
          by vastly different sequences.
            Structure comparison facts
   The “active site” (a collection of functionally
    critical residues) is remarkably conserved,
    even when the protein fold is different.
      Structural models (especially those based

       on homology) provide insights into
       possible function for new proteins.
         Implications for

            protein engineering

            ligand/drug design,

            function assignment of genomic data.
              Visualizing PDB information
   RASMOL: most popular, available for all platforms
    (Sayle et al, 2005)
    http://www.bernstein-plus-sons.com/software/rasmol

   DeepView Swiss-PDBViewer: from Swiss-Prot
    (Guex & Peitsch, 1997)
    http://tw.expasy.org/spdbv/

   Chemscape Chime Plug-in: for PC and Mac
    http://www.mdli.com/products/framework/chemscape

   PyMOL: Very good, available for all platforms
    (DeLano, W.L. The PyMOL Molecular Graphics System, 2002)
    http://pymol.sourceforge.net
    RASMOL views - SH2 domain
All-atom model         Space-filling model
        Atom colors:   NOCS
                    RASMOL views – 1sha

     Ca Trace                        Ribbon




Rainbow coloring: N to C   Coloring: by structural units
                      Homologous folds
   Hemoglobin and
    erythrocruorin: 31%
    sequence identity
                        Analogous folds
   Hemoglobin and
    phycocyanin: 9%
    sequence identity
        Surface Properties
Cro repressor –
  DNA complex
 Basic residues
  in blue
 Acidic residues

  in red
Mapping Functional Regions
             Immunoglobulin l
               light chain - dimer
              Hydrophobhic
               residues in
               magenta
              Hydrophilic and
               charged residues
               in cyan
1. Understanding Protein Structure
2. A Quick Overview of Sequence Analysis
3. Finding a Structural Homologue
4. Template Selection
5. Aligning the Query Sequence to
   Template Structure(s)
6. Building the Model
        Siblings and Cousins
   Siblings or homologues: sequences with at
    least 30% sequence identity over an
    alignment length of at least 125 residues and
    conservation of function.
   Cousins or paralogues: < 30% identity but
    with conservation of function
   Both show structural conservation
   Homologues located using a database search
    tool such as BLAST (free webserver):
        http://www.ncbi.nlm.nih.gov/BLAST
   Paralogues require a more sensitive method
    such as PSI-BLAST
        Multiple Sequence Alignment
Finding the best way to match the residues of
related sequences
 Identical residues must be lined up

 The rest should be arranged, based on
       observed substitution in protein families
       chemical similarity
       charge similarity
   Where it is impossible to get the residues to
    line up, the biological concept of
    insertion/deletion in invoked: the „gap‟ in
    alignments
                       MSA Methods
   CLUSTALW / CLUSTALX (Thompson et al, 1997):
    freely available for all platforms and one of the best
    alignment programs
       http://www-igbmc.u-strasbg.fr/BioInfo/ClustalX/Top.html
   MAXHOM (Sander & Schneider, 1991): alignment
    based on maximum homology; available via the
    PredictProtein webserver, free for academics
       http://cubic.bioc.columbia.edu/predictprotein/
   MALIGN (Johnson et al, 1994): freely available UNIX
    program, based on the structural alignment of
    protein families
       http://www.abo.fi/fak/mnf/bkf/research/johnson/software.html
             Alignment Checks
   Conservation of functionally important
    residues: e.g. the catalytic triad (Asp-Ser-
    His) that are essential for serine
    proteinase activity
   Line up of structurally important residues:
    e.g. cysteines forming disulfide bonds
   Overall, maximizing the alignment of “like”
    residues
   Completely conserved residues usually
    indicate some conserved structural or
    functional role, especially buried charges
    Sequence Motifs & Patterns
   From the analysis of the alignment of
    protein families
   Conserved sequence features, usually
    associated with a specific function
   PROSITE (Hulo et al, 2006) database for
    protein “signature” patterns:
       http://www.expasy.ch/prosite
        Aligned Sequence Families
   From alignments of homologous
    sequences:
       PRINTS
       PRODOM:
        http://www.toulouse.inra.fr/prodom.html
   From Hidden Markov Model based
    methods:
       PFAM: http://www.sanger.ac.uk/Pfam
             Protein Domains
   Most proteins are composed of structural
    subunits called domains
   A domain is a compact unit of protein
    structure, usually associated with a function.
   It is usually a “fold” - in the case of
    monomeric soluble proteins.
   A domain comprises normally only one
    protein chain: rare examples involving 2
    chains are known.
   Domains can be shared between different
    proteins: like a LEGO block
           Protein Architectures
   Beads-on-a-string: sequential location:
    tyrosine-protein kinase receptor TIE-1
    (immunoglobulin, EGF, fibronectin type-3 and
    protein kinase).


   Domain insertions: “plugged-in” - pyruvate
    kinase (1pyk)

   SMART: smart.embl-heidelberg.de
    Simple Modular Architecture Retrieval Tool
      Dissection into Domains
   A sequence, usually > 125 residues should
    be routinely checked to see how many
    domains are present.
   Conserved Domain Architecture Retrieval
    Tool (CDART) uses information in Pfam and
    SMART to assign domains along a sequence
   E.g. NP_002917 shows similarity to G-protein
    regulators:
1. Understanding Protein Structure
2. A Quick Overview of Sequence Analysis
3. Finding a Structural Homologue
4. Template Selection
5. Aligning the Query Sequence to
   Template Structure(s)
6. Building the Model
            Structural Homologues
   BLASTP vs. PDB database or PSI-BLAST:
    look for 4-character PDB ID
       E < 0.005
   Domain coverage: at least 60% coverage
    is recommended
   Gaps: we don‟t want them. Choose
    between:
    few gaps and reasonable similarity scores or
     lots of gaps and high similarity scores?
         Small Proteins: Disulfide bonds
        BLAST-type methods may not locate
         homologues, if Conserved Domain search is
         not turned on.
  gnl|Pfam|pfam00095, wap, WAP-type (Whey Acidic Protein)
 four-disulfide core'.
     CD-Length = 46 residues, 100.0% aligned
     Score = 43.9 bits (102), Expect = 1e-06

Q:49 KAGFCPWNLLQMISSTGPCPMKIECSSDRECSGNMKCCNVDCVMTCTPP 97
D: 1 KPGVCPWVSISE---AGQCLELNPCQSDEECPGNKKCCPGSCGMSCLTP 46

        Are the Cys residues conserved?
        Gaps: where are they on the structure?
       Metal-binding domains
C2H2 Zinc Finger
 2 Cys & 2 His binding to
  Zinc
 Not detected even by CD-
  search in BLAST
 Detected by Pfam &
  SMART
 Sequence Pattern:

#-X-C-X(1-5)-C-X3-#-X5-#-
  X2-H-X(3-6)-[H/C]
    Structure Prediction Methods
   Secondary Structure Prediction: identify local
    structural elements such as helices, strands
    and loops.
   > 75% accuracy achievable
      PredictProtein or PHD

        http://cubic.bioc.columbia.edu/pp/
      PSIPRED

        http://bioinf.cs.ucl.ac.uk/psipred/
      SSPro

        http://promoter.ics.uci.edu/BRNN-PRED/
             Folds from Secondary
             Structure Predictions
   Assembling SSEs into folds is a combinatorial
    problem
   Current methods depend on available
    structural data for mapping predictions:
      FORREST

        http://abs.cit.nih.gov/foresst/foresst.html
      TOPITS from the PHD server

        http://cubic.bioc.columbia.edu/pp
    Tertiary Structure Prediction
   Fold recognition/Threading: < 20% identity
    typically
   Best results obtained by combining several
    database search and knowledge-based tools:
   3D-PSSM
        http://www.sbg.bio.ic.ac.uk/~3dpssm/
   FUGUE
        http://www-cryst.bioc.cam.ac.uk/fugue/
1. Understanding Protein Structure
2. A Quick Overview of Sequence Analysis
3. Finding a Structural Homologue
4. Template Selection
5. Aligning the Query Sequence to
   Template Structure(s)
6. Building the Model
                One or many templates?
            Sequence similarity: extract template
             sequences and align with query: select
             the most similar structure
            Completeness: Missing data?
REMARK   465   MISSING RESIDUES
REMARK   465   THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE
REMARK   465   EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN
REMARK   465   IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.)
REMARK   465
REMARK   465   M RES   C SSSEQI
REMARK   465     MET   A 1
REMARK   465     THR   A 230
REMARK   470   M RES   CSSEQI ATOMS
REMARK   470     GLU   A 5 OE2
REMARK   470     GLU   A 6 CG CD OE1 OE2
REMARK   470     GLU   A 17 OE1
           One or many templates?
   X-ray or NMR?:
       Lowest resolution X-ray structure
       X-ray and then NMR
       NMR average over assembly
   One or many?:
       Structure alignment of Ca atoms
       If 2 templates are very close, keep only one
       Keep templates that provide new information
              Many templates
   Sequence alignment from structure
    comparison of templates (SSA) can be
    different from a simple sequence
    alignment (SA).
   For model building,
    1. align templates structurally
    2. extract the corresponding SSA
1. Understanding Protein Structure
2. A Quick Overview of Sequence Analysis
3. Finding a Structural Homologue
4. Template Selection
5. Aligning the Query Sequence to the
   Template Structure(s)
6. Building the Model
         Query - Template Alignment
   >40% identity: any alignment method is OK
   Below this, checks are essential.
       Collect close sequence homologues (about 10)
        and align to query to get MSA (multiple sequence
        alignment)
       Collect several structural templates (at least 5)
        and align them using structure comparison
        methods: extract the SSA (structural sequence
        alignment)
       Align MSA to SSA using profile alignment
       Extract query and selected template(s) from the
        final alignment – QTA.
                   QTA Checks
   Residue conservation checks
       Functional regions
       Patterns/motifs conserved?
   Indels
       Combine gaps separated by few residues
   Editing the alignment
       Move gaps from secondary structures to
        loops
       Within loops, move gaps to loop ends, i.e.
        turnaround point of backbone
                   QTA Checks
   Residue conservation checks
       Functional regions
       Patterns/motifs conserved?
   Indels
       Combine gaps separated by few residues
   Editing the alignment
       Move gaps from secondary structures to
        loops
       Within loops, move gaps to loop ends, i.e.
        turnaround point of backbone
Visual Inspection of Indels
                      2-residue
                       deletion from
                       sequence
                       alignment
                      End-of-loop
                       2-residue
                       deletion
1. Understanding Protein Structure
2. A Quick Overview of Sequence Analysis
3. Finding a Structural Homologue
4. Template Selection
5. Aligning the Query Sequence to
   Template Structure(s)
6. Building the Model
        Input for Model Building

   Query sequence
   Template structure
     Template sequence

   Query-template sequence alignment
         Methods Available
1.   WHATIF (Vriend G, 1990) : "
      High quality models where template is
       available
        Indels not modelled
      Side chain rotamers
      In silico mutations
      In silico disulfide bond creation
         Methods Available
2. SWISS-MODEL (Schwede et al, 2003) : "
   Automatic modeling mode with multiple
    templates
   Query + template input
   High Homology situations
   DeepView for input file creation
            Methods Available
3. MODELLER (Sali & Blundell, 1993) :
     High quality models
     Sequence alignment
     Structure analysis/alignment
     Multiple templates
     Multiple chains
     Ligand/cofactor present


4. ESyPred3D (uses MODELLER): "
     QTAs from several methods & neural networks
     http://www.fundp.ac.be/urbm/bioinfo/esypred/
             Methods Available
5. ICM (Ruben et al, 1994) :
    High quality models
    Loop modelling
      Multiple templates not possible
    Sequence/Structure alignment/analysis
    Ab initio peptide modeling
    Secondary structure prediction


6. Geno3D (Combet et al, 2002) : "
      Automated modelling
      Distance geometry used for loops
      http://geno
           Methods Available
7.   3D-JIGSAW (Bates et al, 2001) : "
        Automatic modeling mode
        Interactive user mode to select templates
        Multiple templates
        Multidomain protein modeling
         Methods Available
8. CPH-MODELS (Lund et al, 1997) : "
   Fully automated
   FASTA search for templates
   Not validated
     Automatic or Manual Mode?
   Automatic: High homology

   Manual
       Medium/Low homology
       Template from structure prediction
       Multiple templates
       Multiple chains
       Ligand present
       How good is the model?
Structural Quality Analysis
   PROCHECK (Laskowski et al, 1993) :
   WHATIF (Vriend G, 1990) : "
   ERRAT (Colovos & Yeates, 1993) :   "
     Improving ill-defined regions
   Iterative model building
       Rebuild or anneal bad regions
       Check/edit alignment and rebuild
   Molecular dynamics and/or Monte Carlo
    simulations
       Compute intensive
       Input files need to be set up
       Optional
      Molecular Modeling Protocol
   Resources required
     The query sequence

     Personal      computer    with internet
      connectivity
     RASMOL/DeepView       for PDB structure
      visualization
     CLUSTALX sequence alignment software

     Access to a UNIX workstation

     MODELLER/ICM – UNIX software

     WHATIF – UNIX/PC software

     PROCHECK – UNIX or Windows software
    MM Protocol – Input Files
Minimum requirement
 Query sequence

 Template structure

   Template sequence

 Query-Template alignment
         Ex 1. High Homology Case
   Human SOX9 WT - homologous to
    SRY (PDB: 1HRY) - 49% identity
     S9WT:           ..AGAACAATGG..     highest
    SOXCORE:         ..GCAACAATCT..      least
   Mutants (campomelic dysplasia):
       F12L: No DNA binding
       H65Y: Minimal binding
       P70R: altered specificity; no SOXCORE
       A19V: near WT but normal binding
           1. SOX9 Models

   WT & P70R
    models built
   Ca overlay:
    WT-SRY ~ 0.72 Å
                       SOX9 & P70R
   J. Biol. Chem. 274
    (1999) 24023
                       based on SRY
      Ex 1. SOX9               SOX9-WT
      + DNA Models
                 SRY




                              SOX9-P70R
   Observed disease-linked
    mutations mapped
   Other residues in DNA-
    binding groove
    determined
Ex 2. Low Homology Situation
   Pigments from reef-building corals:
    similar to Pocilloporin
   fluoresce under UV and visible
    radiation
   similar to the Green Fluorescent
    Protein - GFP (19.6% identity)
   contain „QYG‟ instead of „SYG‟ in
    GFP, as proposed fluorophore
2. Alignment of POC4 & GFP
               2. POC4 Model
   Barrel ends open
   C-ter not included
   b-sheet OK
   „QYG‟ fits the site!
   26 residues within
    5Å of QYG (only
    19 in GFP)
   Increased thermal
    stability
   UV protection
         Ex 3. Small Disulfide-bonded
        Protein: Complement Factor H
   20 tandem homologous units =          C-ter
    SCRs (short consensus repeat)
    or “sushi” regions
   Each SCR is ~ 60 aa:                       C4
                                       C2
      conserved Y, P, G
      2 disulfide bridges:1-3 & 2-4
      Linkers of 3-8 aa                HV
                                       loop
   Heparin binding SCRs: 7 (high                  C3
    affinity) & 20
   Previous SCRs required for
    activity: minimum constructs are          C1
    fH67 and fH18-20
                                                   N-ter
             3. Sequence Alignment of Close
                 Functional Homologues
               Site A                      Site d Site B      Site c

hfH     385 C L R K C Y F P Y L E N G Y N Q N H G R K F V Q G K S I D V A
fHR-3   83   C L RKC Y F P Y L E NG YNQN YGRK F VQGN S T E V A
bfH     292 C L R Q C I F N Y L E N G H N Q H R E E K Y L Q G E T V R V H
mfH     385 C V R K C V F H Y V E N G D S A Y W E K V Y V Q G Q S L K V Q
Consensus     * : * : *   *   * : * * *     .         .   : : * * : :     *


hfH     416 C H P G Y A L P K A - Q T T V T C M E N G W S P T P R C I R 444
fHR-3   114 C H P G Y G L P K V R Q T T V T C T E N G W S P T P R C I R 143
bfH     323 C Y E G Y S L Q N D - Q N T M T C T E S G W S P P P R C I R 351
mfH     416 C Y N G Y S L Q N G - Q D T M T C T E N G W S P P P K C I R 444
Consensus     * :   * * . *   :     *     * : * *   * . * * * * . * : * * *
    3. Templates for fH SCRs 6-7
   hfH SCRs 15&16 (fH1516; PDB ID:
    1HFH)
   Vaccinia virus complement control
    protein domains 3&4 (vcp34; PDB ID:
    1VVC)
hfH15
                               vcp3
                               hfH15
                 vcp3

   Orientations differ considerably
   Vcp34 28% identical to hfH67 compared
    to hfH1516 (25%) !
3. Query-Templates alignment
3. hfH67 model

                   Sialic acid




                   Heparin

 hfH67
hfH1516          disaccharide
                    repeat
            3. Locating residues for
             mutation from model
SCRs 6&7        Lys-410               SCRs 15&16
                          Lys-405
     Lys-388                   His
                               -402




                     Arg-404
           Arg-387




 Pacific Symposium of Biocomputing 2000, 5:155
        Ex 4: Protein Engineering
      Thermolysin-like
       protease unstable at
       high temperatures
       (> 40 ºC unlike trypsin)
      Homology Model built
      G8 & N60 suited for
       disulfide bond
      Double Mutant
       functional at 92.5 ºC
J Mansfeld et al. Extreme Stabilization of a Thermolysin-like Protease by
an Engineered Disulfide Bond” J. Biol. Chem. 1997 272: 11152-11156.
Ex 5. Multiple chains: Human
Hand, Foot & Mouth Disease
Virus capsid  2000 outbreak of
                 HFMD in Singapore:
                 thousands of
                 children affected – 4
                 deaths (The Lancet,
                 2000, 356, 1338)
                Major etiological
                 agent: EV71
                 (enterovirus group)
                Neurological
                 complications
           5. EV71 genome structure
                  Capsid                             Replication

          VP4 VP2      VP3    VP1    2A 2B      2C     3A,B 3C     3D
                                                                                 PolyA
  VPg
         5’ AUG                                                         3’ UAG

Pan-enterovirus primers EV71 specific primers

       95/94% (RNA) homology coxsackievirus A16/B3
       Only 1% difference between neurovirulent and
        non-neuro virulent isolates
       Most variations in non-capsid regions
       Within capsid regions, VP1 shows maximum
        variability relative to other Evs
       Differences in capsid region 1: VP1 & VP2, 2: VP3
         5. Picornaviridae
Icosahedral
Capsid
             5. Template hunt

BLASTP against PDB sequences
 VP1: 3 templates

   1BEV 38.7% (bovine enterovirus)

   1EAH 36.5% (poliovirus type 2 strain Lansing)

   1FPN 38.0% (human rhinovirus serotype 2)

 VP2: 1BEV 56.7%

 VP3: 1BEV 54.9%

 VP4: 1BEV* 50.0%
     5. Fixing the VP1 Alignment

   Structural alignment of templates: using
    VAST (Gibrat, Madej, & Bryant, 1996)
   Extract corresponding sequence
    alignment
   Match HFMDV VP1 to aligned
    templates using profile alignment in
    CLUSTALW
                          5. VP1 alignment to templates
                                                                                      3,10-helices a-helices                                                                                                                                              b-strands
 VP1
1BEV1   14    Q   A   A   G   A   L   V   A   G   T   S   T   S   T   H   S   V   A   T   D   S   T   P   A   L   Q   A   A   E   T   G   A   T   S   T   A   RD ESM              I   E   T   R   T   I   V   P   T   H   G   I H     E T     S   V   E   S   F   F   G   R   S   S   L   V   G   M
1EAH1   24    -   -   -   A   N   N   L   P   D   T   Q   S   S   G   P   A   H   S   -   K   E   T   P   A   L   T   A   V   E   T   G   A   T   N   P   L   VP S D T            V   Q   T   R   H   V   I   Q   K   R   T   RS      ES      T   V   E   S   F   F   A   R   G   A   C   V   A   I
1FPN1   15    -   -   -   -   L   V   V   P   N   I   N   S   S   N   P   T   T   S   -   N   S   A   P   A   L   D   A   A   E   T   G   H   T   S   S   V   QP EDV              I   E   T   R   Y   V   Q   T   S   Q   T   RD      EM      S   L   E   S   F   L   G   R   S   G   C   I   H   E
EV711   23    A   L   P   A   P   T   G   Q   N   T   Q   V   S   S   H   R   L   D   T   G   E   V   P   A   L   Q   A   A   E   I   G   A   S   S   N   T   S D ESM             I   E   T   R   C   V   L   N   S   H   S   TA      E T     T   L   D   S   F   F   S   R   A   G   L   V   G   E
                                              .       .       *                               .   .   *   *   *       *   .   *       *       :   .               . .             :   :   *   *       :           .   :               *       :   :   :   *   *   :   .   *   .   .       :

1BEV1   84    P   L   L   A   T   -   -   -   -   -   -   G   T   S   I   T   HWR         I   DF R E          FV      QL      R   A   KM      SW      F   T   Y   M   R   F   DV      E   F   T   I   I   A   T   S   S   -   T   G   Q   N   V   T   T   E   Q   H   T   T   Y Q     V   MY      V
1EAH1   90    I   E   V   D   N   D   -   -   -   -   -   S   K   L   F   S   V WK        I   TY K D          TV      QL      R   R   KL      E F     F   T   Y   S   R   F   DM      E   F   T   F   V   V   T   S   N   Y   T   D   A   N   N   G   H   A   L   N   Q   V   Y Q     I   MY      I
1FPN1   80    S   K   L   E   V   T   L   A   N   Y   N   K   E   N   F   T   V WA        I   NL Q E          MA      Q I     R   R   K F     E L     F   T   Y   T   R   F   DS      E   I   T   L   V   P   C   I   S   A   L   -   -   -   S   Q   D   I   G   H   I   T   MQ      Y   MY      V
EV711   93    I   D   L   P   L   E   -   G   T   T   N   P   N   G   Y   A   NWD         I   D I TG          Y A     QM      R   R   KV      E L     F   T   Y   M   R   F   DA      E   F   T   F   V   A   C   T   P   -   -   -   -   -   T   G   E   V   V   P   Q   L   L Q     Y   M F     V
                                                                      :   :     *         *                     .     * :     *       * .     .       *   *   *       *   *   *       *   :   *   :   :                                                                         *         * :     :

1BEV1   147   P   P   G   A   P   V   P   S   NQD         SF      QWQ         S   G   C   N   P   S   V   FAD         T   D   G   P   P   A   Q   F   S   V   P   FMS         S   A   N   A   Y   S TV        Y   D   G   Y   A   R   F   M   -   -   -   D   T   -   -   -   DP D R          Y   G
1EAH1   161   P   P   G   A   P   I   P   G   K WN        DY      TWQ         T   S   S   N   P   S   V   FY T        Y   G   A   P   P   A   R   I   S   V   P   Y V G       I   A   N   A   Y   S H F       Y   D   G   F   A   K   V   P   L   A   G   Q   A   S   T   E   GDS L           Y   G
1FPN1   147   P   P   G   A   P   V   P   N   S RD        DY      AWQ         S   G   T   N   A   S   V   FWQ         H   G   Q   A   Y   P   R   F   S   L   P   F L S       V   A   S   A   Y   Y MF        Y   D   G   Y   D   E   -   -   -   -   -   -   -   -   -   -   QDQN            Y   G
EV711   157   P   P   G   A   P   K   P   E   S R E       SL      AWQ         T   A   T   N   P   S   V   FV K        L   T   D   P   P   A   Q   V   S   V   P   F MS        P   A   S   A   Y   QW F        Y   D   G   Y   P   T   F   G   -   -   -   E   H   K   Q   E   K D L E         Y   G
              *   *   *   *   *       *       .   :       .        * *        :   .       *   .   *   *   *                       .       .   :   .   *   :   *   : : .           *   .   *   *       .       *   *   *   :                                                                   *   *

1BEV1   211   I   LP S N          F L G       FM      Y   F   R   T   L   E   D   -   -   -   A   A   H   Q   V   R   F   R   I   Y   A   K I     K   H   T   S   CW I        P   R   A   PR      Q   A   P   Y   K   K   R   Y   N   L   V   F   S   -   -   G   -   D   S   D   R   I   C   S   N
1EAH1   231   A   A S L N         D FG        S L     A   V   R   V   V   N   D   H   N   P   T   K   L   T   S   K   I   R   V   Y   M   KP      K   H   V   R   V WC        P   R   P   PR      A   V   P   Y   Y   G   P   -   G   V   D   Y   K   -   -   D   -   G   L   A   P   -   L   P   G
1FPN1   207   T   A N TN          N MG        S L     C   S   R   I   V   T   E   K   H   I   H   K   V   H   I   M   T   R   I   Y   H   KA      K   H   V   K   AWC         P   R   P   PR      A   L   E   Y   T   R   A   H   R   T   N   F   K   I   E   D   R   S   I   Q   T   A   I   V   T
EV711   224   A   CP N N          MMG         T F     S   V   R   T   V   G   S   S   -   K   S   K   Y   P   L   V   V   R   I   Y   M   RM      K   H   V   R   AW I        P   R   P   MR      N   Q   N   Y   L   F   K   A   N   P   N   Y   A   -   -   G   N   S   I   K   P   T   G   T   S
                        *           : *         :             *       :       .                                           *   :   *       :       *   *   .         *         *   *   .    *                  *                               :               .       .

1BEV1   275   R   A   S   L   T   S   Y   281
1EAH1   296   K   -   G   L   T   T   Y   301
1FPN1   277   R   P   I   I   T   T   A   283                                                                                                                                                 Pocket-factor
EV711   291   R   T   A   I   T   T   -   296
              :           :   *   :                                                                                                                                                              binding
                                                                                                                                                                                                residues
    5. Model building steps
   Build all 4 capsid proteins (VP1-VP4)
    together to ensure 3D fit
   Use 1BEV alone for VP2-VP4
   For VP1: use aligned 1BEV, 1EAH,
    1FPN
   Check model
    5. Round 1: VP1,VP2,VP3,VP4
   Clip hanging
    ends
   Re-position
    problem loops:
            adjust
    gaps in
    alignment
   Build again
       5. Round 2: Pentamer Check

   Loops look OK
   Build pentamer
   Publish….
   Oops: clash in
    pentamer
    assembly. Go
    back
       5. Close encounters of the 3rd Kind
   Build only VP3
    pentamer
   N-terminus of
    each VP3
    hydrogen-
    bonded
   Also, in BEV,
    Asp-Lys ion
    pair
   First 25 aa
    overlay v. well
5. Fourth foray: build with 5 VP3s
                   Only first 50aa of the
                    other 4 VP3s
                    included
                   Model resulted in
                    knots due to
                    insufficient refinement
                    cycles
                   However, VP3
                    pentameric region OK
5. Fifth
and final
attempt
5. Canyon Pit and Antigenic Sites
                      Poliovirus
                        sites




                         Neurovirulent
                         Polio (mouse)
                         Cardiovirus
  5. Putative antigenic sites




VP1
                      VP2
                5. HFMDV Conclusions
   Unique surface loops identified for
      Immunodiagnostic assays

      Vaccine design

      Antibodies being generated

   Canyon pit: depth is similar to BEV
   Mapping the antigenic regions of other related
    enteroviruses on the HFMDV surface: specific VP1
    and VP2 sites buried
   Sunita Singh, Vincent T. K. Chow, C. L. Poh, M. C.
    Phoon: Dept. of Microbiology, NUS
   Applied Bioinformatics, Vol 1, issue 1, 43-52: invited
    research article

				
DOCUMENT INFO