f2007 com s

Shared by: Oc2G2xuR
Categories
Tags
-
Stats
views:
0
posted:
6/17/2012
language:
pages:
76
Document Sample
scope of work template
							                         BCB 444/544


                           Lecture 23

     Protein Tertiary Structure
              Prediction


                          #23_Oct15



BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   1
          Required Reading
             (before lecture)

 Mon Oct 15 - Lecture 23
       Protein Tertiary Structure Prediction
      • Chp 15 - pp 214 - 230
Wed Oct 17 & Thurs Oct 18 - Lecture 24 & Lab 8                                  (Terribilini)
       RNA Structure/Function & RNA Structure Prediction
      • Chp 16 - pp 231 - 242

Fri Oct 18 - Lecture 25

       Gene Prediction
      • Chp 8 - pp 97 - 112




        BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction       10/15/07    2
           New Reading & Homework Assignment

ALL: HomeWork #4         (emailed & posted online Sat AM)
        Due: Mon Oct 22 by 5 PM (not Fri Oct 19)

Read:
 Ginalski et al.(2005) Practical Lessons from Protein Structure
 Prediction, Nucleic Acids Res. 33:1874-91.
 http://nar.oxfordjournals.org/cgi/content/full/33/6/1874
 (PDF posted on website)

• Although somewhat dated, this paper provides a nice overview of
  protein structure prediction methods and evaluation of predicted
  structures.

• Your assignment is to write a summary of this paper - for details
 see HW#4 posted online & sent by email on Sat Oct 13
        BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   3
          Seminars Last Week

Dr. Klaus Schulten (Univ of Illinois) - Baker Center Seminar
 The Computational Microscope
                            2:10 PM in E164 Lagomarcino
 http://www.bioinformatics.iastate.edu/seminars/abstracts/2007_2008/Klaus_Schulte
 n_Seminar.pdf
  • Check out links on Schulten's website (videos, etc)
       • http://www.ks.uiuc.edu/~kschulte/


  • Great seminar - amazing simulations of dynamics in proteins and
    large macromolecular assemblies
  • Very computationally intensive - very impressive demonstration
    of power of computation to produce insights not attainable using
    only experimental approaches



        BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   4
        Seminars this Week

BCB List of URLs for Seminars related to Bioinformatics:
    http://www.bcb.iastate.edu/seminars/index.html


• Oct 18 Thur - BBMB Seminar 4:10 in 1414 MBB
  • Sachdeve Sidhu (Genentech) Phage peptide and antibody
    libraries in protein engineering and ligand selection

• Oct 19 Fri - BCB Faculty Seminar 2:10 in 102 ScI
  • Lyric Bartholomay (Ent, ISU) TBA




      BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   5
           Protein Sequence & Structure: Analysis

• Diamond STING Millennium - Many useful structure analysis
 tools, including Protein Dossier
 http://trantor.bioc.columbia.edu/SMS/

• SwissProt (UniProt)
 Protein knowledgebase
 http://us.expasy.org/sprot

• InterPro
 Sequence analysis tools
 http://www.ebi.ac.uk/interpro




         BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   6
        Chp 14 - Secondary Structure Prediction


SECTION V           STRUCTURAL BIOINFORMATICS

Xiong: Chp 14
      Protein Secondary Structure Prediction

  • √Secondary Structure Prediction for Globular Proteins
  • √Secondary Structure Prediction for Transmembrane Proteins
  • √Coiled-Coil Prediction




        BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   7
Where Find "Actual" Secondary Structure?
          In the PDB




BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   8
          How Does Predicted Secondary Structure
             Compare with Actual? (An example)


 Actual - Calculated from PDB coordinates by DSSP or author:
  DSSP
 Author




Query     MAATAAEAVASGSGEPREEAGALGPAWDESQLRSYSFPTRPIPRLSQSDPRAEELI
GOR V     CCCCHHHHHHHHCCHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHH
FDM       CCCCCCCCCCCCCCCCCEECCCCCCCCCHHHCCCCCCEECCCCCCCCCCHHHHH
CDM       CCCCHHHHHHCCCCCCCEECCCCCCCCCHHHCCCCCCEECCCCCCCCCCHHHH

Predicted - Using 3 methods (from CMD server, Jernigan Group, ISU)




          BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   9
     Chp 15 - Tertiary Structure Prediction


SECTION V            STRUCTURAL BIOINFORMATICS

Xiong: Chp 15
      Protein Tertiary Structure Prediction

    •   Methods
    •   Homology Modeling
    •   Threading and Fold Recognition
    •   Ab Initio Protein Structural Prediction
    •   CASP


     BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   10
         Structural Genomics - Status & Goal

    ~ 20,000 "traditional" genes in human genome
          (recall, this is fewer than earlier estimate of 30,000)
    ~ 2,000 proteins in a typical cell
    > 4.9 million sequences in UniProt (Oct 2007)
    > 46,000 protein structures in the PDB (Oct 2007)

     Experimental determination of protein structure lags
      far behind sequence determination!
Goal: Determine structures of "all" protein folds in
    nature, using combination of experimental structure
    determination methods (X-ray crystallography, NMR,
    mass spectrometry) & structure prediction


        BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   11
     Structural Genomics Project


TargetDB: Database of Structural Genomics Targets
                http://targetdb.pdb.org




     BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   12
   Database of Theoretical Structures?

PMDB: Protein Model Database
          http://mi.caspur.it/PMDB/help.php

also, via NAR's Molecular Biology Database Collection
http://www.oxfordjournals.org/nar/database/summary/855


Theoretical structural models (predicted) are no longer accepted
by the PDB (since 10/15/06); but, it is possible to search for
models deposited earlier:
http://www.rcsb.org/pdb/search/searchModels.do




  BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   13
        Protein Structure Prediction
             or Protein Folding Problem

"Major unsolved problem in molecular biology"

In cells:         spontaneous
                  assisted by enzymes
                  assisted by chaperones

In vitro:         many proteins can fold to their "native"
                  states spontaneously & without assistance
                     but, many do not!



       BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   14
Deciphering the Protein Folding Code


                               • Protein Structure Prediction or
                               • Protein Folding Problem
                                     Given the amino acid sequence of
                                     a protein, predict its
                                     3-dimensional structure (fold)
                               • Inverse Folding Problem
                                     Given a protein fold, identify
                                     every amino acid sequence that
                                     can adopt its 3-dimensional
                                     structure



BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   15
         Protein Structure Prediction

Structure is largely determined by sequence
            BUT:
  •   Similar sequences can assume different structures
  •   Dissimilar sequences can assume similar structures
  •   Many proteins are multi-functional

  2 Major Protein Folding Problems:
     1- Determine folding pathway
     2- Predict tertiary structure from sequence
            Both still largely unsolved problems



       BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   16
           Steps in Protein Folding

1- "Collapse"- driving force is burial of
   hydrophobic aa’s       (fast - msecs)
2- Molten globule - helices & sheets
   form, but "loose" (slow - secs)
3- "Final" native folded state -
   compaction & rearrangement of
   2' structures




Native state?
- assumed to be lowest free energy
- may be an ensemble of structures


          BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   17
      Protein Dynamics

• Protein in native state is NOT static
• Function of many proteins requires conformational
      changes, sometimes large, sometimes small
• Globular proteins are inherently "unstable"
       (NOT evolved for maximum stability)
• Energy difference between native and denatured
      state is very small (5-15 kcal/mol)
       (this is equivalent to ~ 2 H-bonds!)
• Folding involves changes in both entropy & enthalpy



     BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   18
   Difficulty of Tertiary Structure Prediction

Folding or tertiary structure prediction problem can
be formulated as a search for minimum energy
conformation

• Search space is defined by psi/phi angles of
  backbone and side-chain rotamers
• Search space is enormous even for small proteins!
• Number of local minima increases exponentially
  with number of residues

 Computationally it is an exceedingly difficult problem!




  BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   19
       Tertiary Structure Prediction Methods
2 (or 3) Major Methods:
1. Comparative Modeling:
     • Homology Modeling (easiest!)
     • Threading and Fold Recognition (harder)
2. Ab Initio Protein Structural Prediction (really hard)




       BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   20
    Comparative Modeling?


Comparative modeling - term is sometimes used
 interchangeably with homology modeling, but also
 sometimes used to mean both:
         • homology modeling
         • threading/fold recognition




   BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   21
           Ab Initio Prediction

1. Develop energy function
       • bond energy
       • bond angle energy
       • dihedral angle energy
       • van der Waals energy
       • electrostatic energy
2. Calculate structure by minimizing energy function
       • usually Molecular Dynamics (MD) or Monte Carlo (MC)
   Ab initio prediction - impractical for most real (long) proteins
    •   Computationally? very expensive
    •   Accuracy? Usually poor for all except short peptides
                    (but much improvement recently!)

    Provides both folding pathway & folded structure
        BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   22
         Comparative Modeling


Two types:
   1) Homology modeling
   2) Threading (fold recognition)

Both rely on availability of experimentally determined
   structures that are "homologous" or at least
   structurally very similar to target



                  Provide folded structure only


      BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   23
          Homology Modeling

1.   Identify homologous protein sequences (-BLAST)
2.   Among available structures (in PDB), choose one with closest
     sequence to target as template
               (can combine steps 1 & 2 by using PDB-BLAST)
3.   Build model by placing target sequence residues in
     corresponding positions on homologous structure & refine by
     "tweaking" modeled structure (energy minimization)

    Homology modeling - works "well"
     •    Computationally? "relatively" inexpensive
     •    Accuracy? higher sequence identity  better model

           Requires ~30% sequence identity with sequence for
               which structure is known


         BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   24
          Threading - Fold Recognition
         Identify “best” fit between target sequence & template structure

1.   Develop energy function
2.   Develop template library
3.   Align target sequence with each template in library & score
4.   Identify top scoring template (1D to 3D alignment)
5.   Refine structure as in homology modeling
    Threading - works "sometimes"
     •   Computationally? Can be expensive or cheap, depends on
         energy function & whether "all atom" or "backbone only"
         threading is used
     •   Accuracy? in theory, should not depend on sequence identity
         (should depend on quality of template library & "luck")
        Usually, higher sequence identity to protein of known
         structure  better model


         BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   25
        Threading: the Motivation

• Basic premise:
        The number of unique structural folds in nature is
        fairly small (probably 2000-3000)
• Statistics from Protein Data Bank (>46,000 structures)

        Prior to Structural Genomics Project, 90% of "new"
        structures submitted to PDB were similar to existing
        folds in PDB - suggesting that almost all folds in
        nature have been identified

• Thus, chances for a protein to have a native-like structural fold
  in PDB are quite good

  • Note: Proteins with similar structural folds could be either
             homologs or analogs

       BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   26
           Steps in Threading

 Target                             ALKKGF…HFDTSE
Sequence



Structure
Templates




      1. Align target sequence with template structures
            in fold library (usually from the PDB)
      2. Calculate energy score to evaluate "goodness of fit"
         between target sequence & template structure
      3. Rank models based on energy scores
      BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   27
          Threading Goal - & Issues

Find “correct” sequence-structure alignment of a target sequence with
its native-like fold in template library (usually derived from PDB)

    • Structure database - must be "complete"
         • Can't build a good model if there is no good template in library!

    • Sequence-structure alignment algorithm:
         • Bad alignment  Bad score!
    • Energy function or Scoring Scheme:
         • Must distinguish correct sequence-fold alignment from
            incorrect sequence-fold alignments
         • Must distinguish “correct” fold from close decoys
    • Prediction reliability assessment - How determine whether
             predicted structure is correct? (or even close?)


         BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   28
     Threading: Template database

• Build a database of structural templates
      e.g., ASTRAL domain library derived from the PDB




Sometimes, supplement with additional decoys
 e.g., generated using ab initio approach such as Rosetta (Baker)




    BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   29
      Threading: Energy function

• Two main methods (& combinations of these)

    • Structural profile (environmental)
          physicochemical properties of amino acids


    • Contact potential (statistical)
        based on contact statistics from PDB
                 famous one: Miyazawa & Jernigan (ISU)




      BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   30
         Protein Threading: Typical energy function


What is "probability"                                                How well does a
that two specific                                                    specific residue fit
residues are in                                                      structural environment?
contact?

                                                                      Alignment gap
                                                                      penalty?


                           Total energy: Ep + Es + Eg

           Goal: Find a sequence-structure alignment that
                      minimizes energy function



        BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction     10/15/07     31
      A Local Example:    Rapid Threading Approach for
                 Protein Structure Prediction

                                                  Kai-Ming Ho, Physics
                                                       Haibo Cao
                                                       Yungok Ihm
                                                       Zhong Gao
                                                       James Morris
                                                       Cai-zhuang Wang

                                                  Drena Dobbs, GDCB
                                                       Jae-Hyung Lee
                                                       Michael Terribilini
                                                       Jeff Sander

Cao H, Ihm Y, Wang, CZ, Morris, JR, Su, M, Dobbs, D, Ho, KM (2004)
Three-dimensional threading approach to protein structure recognition
Polymer 45:687-697
      BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   32
         Motivations for & Assumptions of
              Ho Threading Algorithm
Goal: Develop a threading algorithm that:
     • Is simple & rapid enough to be used in high throughput
       applications
     • Is relatively "insensitive" to sequence similarity between
       target protein sequence & sequence of template structure
          (to enhance detection of remote homologs & structures that are
          similar due to convergent evolution)
     • Can be used to answer questions such as:
           What are predicted structures of all "unassigned" ORFs in
           Arabidopsis?
           Does Arabidopsis have a protein with structure similar to
           mammalian Tumor Necrosis Factor (TNF)?
Assumptions:
     • Native state of a protein is lowest free energy state
     • Hydrophobic interactions drive protein folding

        BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   33
             Simplify: Template structure representation

                                                             1
                                                                      i
                                                                                  j



                                                                                                 N


              Template structure                       C ( N  N contact matrix)

               Cij  1, if rij  6.5 Å                           (contact)


               Cij  0,     Otherwise
                                                   (non-contact)
                            A neighbor in sequence
Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction       10/15/07       34
      Simplify: Target Sequence Representation

• Miyazawa-Jernigan (MJ) model: inter-residue contact
  energy M(i,j) is a quasi-chemical approximation based on pair-
  wise contact statistics extracted from known protein structures
  in the PDB: 20 X 20 matrix = 210 values ("letters")

• Li-Tang-Wingreen (LTW): factorize the MJ interaction
  matrix to reduce the number of parameters associated with
  amino acids from 210 to 20 q values

• Hydrophobic-Polar (HP): represent amino acids as either H
  (hydrophobic) or polar (P); Dill et al demonstrated the utility of
  this simple binary alphabet representation: 2 values

   Compare results with 210 vs 20 vs 2 letter representations
     How low can we go?

      BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   35
                Simplify: Energy Function

   • Interaction “counts” only if two hydrophobic amino acid
     residues are in contact
   • At residue level, pair-wise hydrophobic interaction is
     dominant:
                                    E = i,j Cij Uij

                                    Cij : contact matrix
                                    Uij = U(residue I, residue           J)

                                             MJ:                U = Uij
                                             LTW:               U = Qi*Qj
                                             HP:                U = {1,0}


Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   36
             Energy calculation: Contact energy

                                             C      M       F       I     L
Miyazawa-Jernigan (MJ) matrix:
                                         C
     Statistical potential               M   046
                                         F
       210 parameters
                                  M     I
                                             054
                                             049
                                                   -020
                                                   -001 006
                                         L   057    001 003 -008
                                         V   052    018 010 -001 -004
                                         W



                                  ~
Li-Tang-Wingreen (LTW):           Mij  C 2{( qi   )( qj   )   }
      20 parameters
                                                            qi      ~ solubility
                                                            Qi      ~ hydrophobicity
                                                            C       contact matrix

                   Ec   (QiCijQj   Cij )
                           N

Contact Energy:                                                 Qi   qi  
                          ij 1                      with
                                                                  0.6797 ,   0.2604
                                              
Yungok Ihm
             Summary of Ho Threading Procedure
     Template Structure
                               Contact Matrix
        1
             i
                 j             Cij  1, if rij < 6 5 Å
                               Cij  0, otherwise
                        N               (a neighbor in sequence)




     Sequence                  Sequence Vector

        AVFMRIHNDIVYNDIANTTQ   S  (QA, QV , QF ,.....,QE )
                                (0.7997, 0.9897, 1.1197, 0.6497)


                               Contact Energy
     Scoring Function
                                Ec   QiCijQj  
                                        N


Yungok Ihm                             ij 1
 Can complexity be further reduced?
            Consider simplifying structure representation, too

                          ALKKGF…HFDTSE




        Sequence – Structure            (1D – 3D problem)


        Sequence – Contact Matrix       (1D – 2D problem)


        Sequence – 1D Profile           (1D – 1D problem)

Haibo Cao
              Examine eigenvectors of contact matrix


  Hydrophobic Contacts                                                  ~ 2
                                                                    i (ViT )
              ~ N ~                                           ri  N
      T CT  T (  iV iVi )T
       ~
                                                                   
                                                                          ~
                  i 1
                                                                     i (ViT )2
                                                                   i 1
       N
       i V iT   1V 1T
            ~ 2       ~ 2
       i 1




 C   :contact matrix
 i :i-th eigenvalue of C
 Vi :i-th eigenvector
 V 1 :eigenvector with largest eigenvalue
 T :protein sequence of the template structure
 ri :fraction of hydrophobic contacts from i-th eigenvector
Haibo Cao
    Represent contact matrix by its dominant
    eigenvector (1D profile)




   • First eigenvector (with highest eigenvalue) dominates the overlap
     between sequence and structure
   • Higher ranking (rank > 4) eigenvectors are “sequence blind”


Haibo Cao
               Threading Alignment Step - now fast!
                    Align target sequence vector (1D) with
                    eigenvector profile of template structure (1D)

                             1D Profile P  V 1




                 Maximize the overlap between the
                  Sequence (S) and the profile (P)
                   S  P allowing gaps
                                                                           New profile P  CP
                         Calculate contact energy
                          using the alignment:           Ec

Cao et al
              BCB
Polymer 45 (2004) 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction    10/15/07   42
               Parameters for alignment?

 • Gap penalty:
    Insertion/deletion in helices or
    strands is strongly penalized; smaller
    penalties for in/dels in loops
                                                                ALKKGFG…HFDTSE

 Gap penalties apply to alignment score
   only, not to energy calculation

                                                                                     Loop
 • Size     penalty:
     If a target residue and aligned
    template residue differ in radius by
    > 0.5Å and if residue is involved in
    > 2 contacts, alignment is penalized                                           Helix
 Size penalties apply to alignment score
   only, not to energy calculation



Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   43
                How incorporate secondary structure?
       • Predict secondary structure of target sequence
                  (PSIPRED, PROF, JPRED, SAM, GOR V)

            N+ = total number of matches between predicted
                  & actual secondary structure of template
            N- = total number of mismatches
            Ns = total number of residues selected in alignment

           “Global fitness” :               f = 1 + (N+ - N-) / Ns


                             Emod = f * Ethreading



Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   44
    How much better is this “fit” than random?


     Eshuffle : Shuffled Sequence vs Structure


             Erelative = Emod – Eshuffled


    E score modifed to reflect
  fit with predicted 2' structure    Avg E score for same sequence
                                    shuffled (randomized) many times




Yungok Ihm
   Performance Evaluation? "Blind Test"


CASP5 Competition (CASP7 is most recent)
(Critical Assessment of Protein Structure Prediction)


        Given: Amino acid sequence
        Goal: Predict 3-D structure
                   (before experimental results published)




  BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   46
            Typical Results:               (well, actually, our BEST Results):
                  HO = #1-Ranked CASP5 Prediction for this Target

                                                               Predicted Structure
         • Target 174
         • PDB ID = 1MG7



                    T174_1
                                                                  Actual Structure



                     T174_2




Cao, Ihm, Wang, Dobbs, Ho #23 - Protein Tertiary Structure Prediction
          BCB 444/544 F07 ISU Dobbs                                         10/15/07   47
              Overall Performance in CASP5 Contest
                          ~8th out of 180 (M. Levitt, Stanford)
•   FR Fold Recognition
•   (targets manually assessed by Nick Grishin)
•   -----------------------------------------------------------
•   Rank      Z-Score Ngood           Npred        NgNW     NpNW         Group-name
•     1         24.26       9.00      12.00           9        12       Ginalski
•     2         21.64       7.00      12.00           7        12       Skolnick Kolinski
•     3         19.55       8.00      12.50           9        14       Baker
•     4         16.88       6.00      10.00           6        10       BIOINFO.PL
•     5         15.25       7.00       7.00           7         7       Shortle
•     6         14.56       6.50      11.50           7        13       BAKER-ROBETTA
•     7         13.49       4.00      11.00           4        11       Brooks
•     8         11.34       3.00       6.00           3         6       Ho-Kai-Ming
•     9         10.45       3.00       5.50           3         6       Jones-NewFold
•   -----------------------------------------------------------

•   FR NgNW - number of good predictions without weighting for multiple models
•   FR NpNW - number of total predictions without weighting for multiple models




          BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction       10/15/07   48
         CASP - Check it out!

Critical Assessment of Protein Structure Prediction
 http://predictioncenter.gc.ucdavis.edu/

  • CASP7 contest - 2006:
    • http://www.predictioncenter.org/casp7/Casp7.html
    • Provides assessment of automated servers for protein
             structure prediction (LiveBench, CAFASP, EVA)
                 & URLs for them
  • Related contests & resources:
        • Protein Function Prediction (part of CASP)
       • CAPRI = Critical Assessment of Predicted Interactions
    • New: CASPM = CASP for M = Mutant proteins
       • Predict effects of small (point) mutations, e.g., SNPs

        BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   49
        Another Convenient List of Links for
             Protein Prediction Servers

http://en.wikipedia.org/wiki/List_of_protein_structure_pre
 diction_software




       BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   50
     Chp 13 - Protein Structure Visualization,
          Comparison & Classification

SECTION V            STRUCTURAL BIOINFORMATICS

Xiong: Chp 13
      Protein Structure Visualization, Comparison &
      Classification

        • Protein Structural Visualization
         Protein Structure Comparison
        • Protein Structure Classification




     BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   51
        Protein Structure Comparison Methods
3 Basic Approaches for Aligning Structures
                     (see Xiong textbook for details)
       1. Intermolecular
       2. Intramolecular
       3. Combined
   But, very active research area - many recent new methods

3 Popular Methods:
   •     DALI = Distance Matrix Alignment of Structures (Holm)
          • FSSP Database
   •     SSAP = Sequential Structure Alignment Program (Orengo)
          • CATH Database
   •     CE = Combinatorial Extension (Bourne)
          • VAST at NCBI
URLS:       http://en.wikipedia.org/wiki/Structural_alignment_software


        BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   52
Another local example: Combining Structure Prediction,
      Machine Learning & "Real" (wet-lab) Experiments to
      Investigate the Lentiviral Rev Protein:
                  A Step Toward New HIV Therapies

                                         Susan Carpenter
                                                (Washington State Univ)
                                                 Wendy Sparks
                                                 Yvonne Wannemuehler
                                         Drena Dobbs, GDCB
                                                 Jae-Hyung Lee
                                                 Michael Terribilini
                                         Kai-Ming Ho, Physics
                                                 Yungok Ihm
                                                 Haibo Cao
                                                 Cai-zhuang Wang
                                         Gloria Culver, BBMB
                                                 Laura Dutca


BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   53
                Macromolecular interactions mediated by
                  Rev protein in lentiviruses (HIV & EIAV)
                                             Provirus



                                               Rev AAAA          RNA BINDING
                            pre-mRNA
                                                                         (protein-RNA)

                           Spliceosome
                                                        Rev Rev AAAA         MULTIMERIZATION
                                        AAAA
                                                                                      (protein-protein)

               Tat
                                                                                  NUCLEAR EXPORT
                     Rev
                                                                                         (protein-protein)
                             NUCLEAR IMPORT                     Rev RevAAAA
                                        (protein-protein)

                                                                Late: Structural Proteins
           Early: Regulatory Proteins                                 Progeny RNA

Susan Carpenter
          BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction           10/15/07        54
    Rev is essential for lentiviral replication

• Rev is a small nucleoplasmic shuttling protein
       (HIV Rev 115 aa; EIAV Rev 165 aa)
• Recognizes a specific binding site on viral RNA:
       Rev Responsive Element (RRE)
• Interacts with CRM1 to export incompletely spliced
  viral RNAs from nucleus to the cytoplasm
• Specific domains of Rev mediate nuclear localization,
  RNA binding, and nuclear export
• Critical role of Rev in lentiviral replication makes it
 an attractive target for antiviral (AIDs) therapy


     BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   55
   Problem: no high resolution Rev structure!
     not even for HIV Rev, despite intense effort ($$)


• Why??
  • Rev aggregates at concentrations needed for NMR or X-
    ray crystallography
• What about insights from sequence comparisons?
  • "undetectable" sequence similarity among Revs from
    different lentiviruses (eg, EIAV vs HIV <10%)
• But:
  • We know that lentiviral Rev proteins are functionally
    "homologous" - even in highly diverse lentiviruses




    BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   56
      Hypothesis: Rev proteins from diverse lentiviruses
      share structural features critical for function


Approach:
• Computationally model structures of lentiviral Rev proteins
   - using structural threading algorithm (with Ho et al)
• Predict critical residues for RNA-binding, protein interaction
   - using machine learning algorithms (with Honavar et al )
• Test model and predictions
   - using genetic/biochemical approaches (with Carpenter & Culver)
   - using biophysical approaches (with Andreotti & Yu groups)


       Initially: focus on EIAV Rev & RRE


       BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   57
               Functional domains: EIAV vs HIV Rev

 EIAV Rev
      exon 1                                        exon 2

  1            31                                                                           165
                                             RBM         Folding?

                      NES                                                                      NLS




                                     RRDRW                   ERLE                        KRRRK

 HIV-1 Rev


  1                                                               116

                    NLS/RBM                   NES
                                                                        NES - Nuclear Export Signal
                                                                        NLS - Nuclear Localization Signal
                                                                        RBM - putative RNA Binding Motif
               RQARRNRRRRWR
           BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction          10/15/07    58
                      Predicted EIAV Rev Structure




Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   59
                 Comparison of Predicted Rev Structures

     EIAV                                   FIV                             HIV




              SIV Dimer                             HIV Dimer




Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   60
              Predicted vs Experimental Structure of
                   N-terminal region of HIV Rev
          A                                 B                                C




      Predicted Structure                   NMR Structure           Overlay
            HIV Rev                       HIV Rev N-terminal Alignment of Predicted
          N-terminus                            Peptide         & NMR Structures
                                        (Battiste & Williamson)

Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   61
               Location of functional residues EIAV Rev?

                                                            Leu95 & Leu109:
                                                         Buried in core, critical
                    NES                               hydrophic contacts for fold?
             Leu36,45,49:
              On surface,
          consistent with role
           in nuclear export




                                                              Putative RBM

Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   62
            Mutate hydrophobic residues predicted to be
            critical for helical packing in core
          L65           vs          L95 & L109

       Single mutants: Leu to Ala
                                                                                  L109
                       Leu to Asp                                      L65
       Double mutants: Leu to Ala


    Single Ala                          Negligible effect
    Mutation                                                            L95
                                        on Rev activity
     LA
                 Insert charged aa in
                 hydrophobic core
    Single Asp                          Dramatic change
    Mutation                            in Rev activity?
     LD

    Double Ala                          Reduction in
    Mutation                            Rev activity?
    LL  AA




Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction          10/15/07   63
                                                Functional Analysis of Rev Structural
                                                     Mutants in vivo (CAT assay)
         Activity of Rev Structural Mutants




                                              150




                                              100




                                              50
                                                    Sham


                                                           pcDNA3




                                                                         A D   A D   A D        L65A   L65A    L95A
                                                                    RI




                                                                         L65   L95   L109       L95A   L109A   L109A



                                                      Contr ols            Single M uta tions    Double M ut ations


Wendy Sparks444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
         BCB                                                                                                           10/15/07   64
         Functional domains: EIAV vs HIV Rev
                                                                      Red  - RNA interaction
                                                                     Green - Protein interaction
                                                                     NES - Nuclear Export Signal
                                                                     NLS - Nuclear Localization Signal
 EIAV Rev                                                           RBM - putative RNA Binding Motif

                                           RBM         Folding?

                    NES                                                                     NLS




                                   RRDRW                   ERLE                        KRRRK


 HIV-1 Rev    1                                                                 116

                                   NLS/RBM                   NES



                            RQARRNRRRRWR

         BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction         10/15/07    65
                Putative RNA-binding Motifs & Predicted
                RNA-binding Residues Mapped onto
                Predicted EIAV Rev Structure

                                  ERLE
                                                                                    KRRRK

Yungok Ihm

                                                                                RRDRW
  61                71                   81                91
  ARRHLGPGPT QHTPSRRDRW IREQILQAEV LQERLEWRIR …
  ++ +++++++ ++++++++++                     +                              +

                 121                131               141                151            161
                 HFREDQRGDF SAWGDYQQAQ ERRWGEQSSP RVLRPGDSKRRRKHL
                                    + ++++                   ++ +++ +++++++++++++++

Michael Terribilini
            BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction       10/15/07   66
             Express & purify MBP-ERev deletion mutants
                                  1                 31                          57        RBM Folding?                          125      146         165

                                                         NES                                                                                   NLS
            MBP-ERev


                  1-165     MBP

                31-165      MBP

                31-145      MBP

                            MBP
                57-165
                57-145      MBP

                57-124      MBP

               125-165      MBP

               146-165      MBP




                                                                                                                     125-165

                                                                                                                               146-165
                                                                       31-165

                                                                                 31-145

                                                                                          57-165

                                                                                                   57-145

                                                                                                            57-124
                                           Marker




                                                               1-165
                                                     MBP




                                      60
                                      42
                                      30
                                      22




           Lee
Jae-Hyung BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction                                                          10/15/07          67
             MBP-ERev binds specifically to RRE in vitro


                        UV crosslinking                                                           Competition

                       sense                    antisense




                                                                                    No cold RRE
                                                                       No protein
                           MBP


                                                                                                        Cold RRE




                                                              31-165
                                 31-165
                        1-165




                                                      1-165
                                                MBP
                                          BSA
                 BSA




Undigested
32P-RRE




           Lee
Jae-Hyung BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction                                 10/15/07   68
    EIAV Rev: Binding Predictions vs Experiments
            PREDICTED:
Structure
                                                                                 KRRRK
Protein binding residues             +
RNA binding residues                 +
                                                                               RRDRW
       41         51        61        71         81        91
GPLESDQWCRVLRQSLPEEKISSQTCI ARRHLGPGPTQHTPSRRDRWIREQILQAEVLQERLEWRI
++++++++      ++            +++++++++++++++
                              ++++++++++++++++
           VALIDATED:                          131       141       151       161
                                          QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDSKRRRKHL
Protein binding residues                  ++++++++++             ++    +++   ++++++
                                                 +             ++++++++++++++++++++
RNA binding residues
                                         31           57                      125       145   165
                           145-165




                                                           RBM     FOLD?
                            31-165
                            31-145
                           57-165
                MBP




                                              NES                                         NLS/RBM
                      WT




                                                       RRDRW           ERLE              KRRRK



                                                    Lee et al (2006)      Terribilini et al (2006)
Jae-Hyung Lee                                       J Virol 80:3844       PSB 11:415
              Roles of Putative RNA Binding Motifs?

        1          31                 57                                124       146        165

                                                         RBD                                 RBD


                             NES                                                                   NLS




                                           RRDRW                 ERLE                      KRRRK


                                            AADAA

                                                                AALA

                                                                                             KAAAK

                                                                 ERDE

Jae-Hyung Lee 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
          BCB                                                                           10/15/07         70
  Rev RNA Binding Motifs: Predicted vs Experiment
            PREDICTED:
Structure
                                                                        KRRRK
Protein binding residues           +
RNA binding residues               +
                                                                      RRDRW
       41         51        61        71         81        91
GPLESDQWCRVLRQSLPEEKISSQTCI ARRHLGPGPTQHTPSRRDRWIREQILQAEVLQERLEWRI
++++++++      ++            +++++++++++++++
                              ++++++++++++++++
            VALIDATED:                         131       141       151       161
                                          QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDSKRRRKHL
 Protein binding residues                 ++++++++++             ++    +++   ++++++
                                                 +             ++++++++++++++++++++
RNA binding residues
                                       31         57                  125   145    165
                                                       RBM   FOLD?
                           KAAAK




                                            NES                                 NLS/RBM
                                                                                 NLS
                WT




                                                  RRDRW        ERLE          KRRRK


                                                   
                                                  AADAA       AALA
                                                                                 
                                                                             KAAAK



Jae-Hyung Lee
                                                               
                                                              ERDE
             Summary: Predictions vs Experiments


                                                          KRRRK
                       ERLE


                                                        RRDRW
               31                 57                        125         145   165


                    NES                RBM       FOLD                     NLS/RBM



                                  RRDRW            ERLE                    KRRRK



        41         51             61         71         81        91
GPLESDQWCRVLRQSLPEEKISSQTCI ARRHLGPGPTQHTPSRRDRWIREQILQAEVLQERLEWRI
++++++++        ++                +++++++++++++++
                                     ++++++++++++++++
                                                      131       141       151       161
                                                 QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDSKRRRKHL
Lee et al (2006) Terribilini et al (2006)        ++++++++++             ++    +++   ++++++
J Virol 80:3844 PSB 11:415                              +             ++++++++++++++++++++
                 Conclusions & Future Directions

  Combination of computational & wet lab approaches revealed that:
     • EIAV Rev has a bipartite RNA binding domain
     • Two Arg-rich RBMs are critical
           • RRDRW in central region     (but not ERLE)
           • KRRRK at C-terminus, overlapping the NLS
  • Based on computational modeling, the RBMs are in close proximity
    within the 3-D structure of protein
  • Lentiviral Rev proteins & their cognate RRE binding sites may be
    more similar in structure than has been appreciated
 Future:
             Computational: Use Rev-RRE model system to discover
                            "predictive rules" for protein-RNA recognition
            Experimental?

Lee et al (2006) Terribilini et al (2006)
              BCB PSB 11:415
J Virol 80:3844 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   73
Experimentally determine the structure of
     Rev-RRE complex !!!


   BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   74
      Building “Designer” Zinc Finger DNA-binding Proteins
                J Sander, P Zaback, F Fu, J Townsend, R Winfrey
               D Wright, K Joung, L Miller, D Dobbs, D Voytas




Wright et al (2006) Sander et al (2007)
Nature Protocols    Nucleic Acids Res
        Chp 16 - RNA Structure Prediction


SECTION V            STRUCTURAL BIOINFORMATICS

Xiong: Chp 16 RNA Structure Prediction (Terribilini)

    •   Introduction
    •   Types of RNA Structures
    •   RNA Secondary Structure Prediction Methods
    •   Ab Initio Approach
    •   Comparative Approach
    •   Performance Evaluation




        BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction   10/15/07   76

						
Related docs
Other docs by Oc2G2xuR
neuro 2008 9
Views: 2  |  Downloads: 0
CAPTIONS FOR INTRO SLIDE SHOW:
Views: 1  |  Downloads: 0
Acta de Reunion 2011 09 01 EquipoIntegradores
Views: 89  |  Downloads: 0
LES SOURCES DE DROIT
Views: 17  |  Downloads: 0
Folds and Faults - PowerPoint
Views: 4  |  Downloads: 0
Le piano de la vieille dame
Views: 3  |  Downloads: 0
Name _____ Date _____ Per
Views: 30  |  Downloads: 0
Le Fast-Track en pratique :
Views: 0  |  Downloads: 0
Form203A Appendix Rev 1 2010
Views: 0  |  Downloads: 0