Protein Structure & Prediction by Y2038O

VIEWS: 8 PAGES: 27

									Protein Structure Prediction

                     Samantha Chui
                      Oct. 26, 2004
Central Dogma of Biology




DNA sequence                   Protein sequence             Protein structure
               transcription                      folding
               & translation




   Question: Given a protein sequence, to what
    conformation will it fold?
How does nature do it?
   Hydrophobicity vs.
    hydrophilicity
   Van der Waals
    interaction
   Electrostatic
    interaction
   Hydrogen bonds
   Disulfide bonds
Current Approaches
   Experimental Methods
       X-ray crystallography
       NMR spectroscopy
   Computational Methods
       Homology modeling
            Similar sequences fold into similar structures
       Threading
            Dissimilar sequences may fold into similar structures
       Ab initio
            No similarity assumptions
            Conformational search
   Assembly of sub-structural units
  known      fragment     protein   predicted
structures     library   sequence   structure




               …
“Small Libraries of Protein Fragments Model
Native Protein Structures Accurately”
Rachel Kolodny, Patrice Koehl, Leonidas Guibas, and Michael Levitt, 2002




     Goal: Find finite set of protein fragments that
      can be used to construct accurate discrete
      conformations for any protein

1. Generate fragments from known proteins
2. Cluster fragments to identify common
  structural motifs
3. Test library accuracy on proteins not in the
  initial set
    Datasets of protein fragments
       200 unique protein domains from Protein Data Bank
        (PDB)
           36,397 residues
       Four sets of backbone fragments
           4, 5, 6, and 7-residue long fragments
       Divide each protein domain into consecutive
        fragments beginning at random initial position




f
Fragment structural similarity
   Coordinate root-mean-square (cRMS)
    deviation of Cα atoms



   cRMS(A,B) = sqrt(Σdi2/N)
       one to one mapping between atoms in structure A
        and structure B
       Translate and rotate to find best alignment
       0 if superimpose perfectly
Pruning and clustering
   Outliers have large cRMS deviation from all
    other fragments
       Discard according to some fragment-length
        specific threshold
   k-means simulated annealing clustering
       Repeatedly run k-means clustering, merge nearby
        clusters and split disperse clusters
       Scoring function: total variance =   Σ (x – μ)2
       Less sensitive to initial choice of cluster centers
        than k-means
Compiling the libraries
   Select cluster centroids as library entries
       Minimum sum of cRMS deviations from all the
        other cluster fragments
       Form representative set of protein fragments
   Library contents highly dependent upon
    clustering procedure
       For each set of fragments, start with 50 random
        seeds and choose library with minimal total
        variance score
Evaluating quality of a library
   Local-fit
       How well library fits local conformation of
        all proteins in test set.
   Global-fit
       How well library fits global three-
        dimensional conformation of all proteins in
        test set
Local-fit method
   Protein structures broken into set of all
    overlapping fragments of length f
   Find for each protein fragment the most
    similar fragment in the library (cRMS)
   Score = Average cRMS value over all
    fragments in all proteins in the test set
Local-fit results
Global-fit method
   Concatenate best local-fit library fragments
    just found
   Determine fragment’s orientation by
    superimposing its first three Cα atoms onto
    last three Cα atoms of preceding fragment
Global-fit method
   Number of possible sequences of fragments
    exponential in protein’s length
   Greedy algorithm finds good rather than best
    global-fit approximation
       Start at N terminus, approximate increasingly
        larger segments of the protein
       Concatenate library fragment which will yield
        structure of minimal cRMS deviation from
        corresponding segment
       Deterministic, linear time
Global-fit results
    0.91 Å               1.85 Å                2.78 Å




 100 fragments         20 fragments          50 fragments
   5 residues           5 residues            7 residues
10 states/residue   4.47 states/residue   2.66 states/residue
   Assembly of sub-structural units
  known      fragment     protein   predicted
structures     library   sequence   structure




               …
“Protein structure prediction via combinatorial
assembly of sub-structural units”
Yuval Inbar, Hadar Benyamini, Ruth Nussinov, and Haim J. Wolfson, 2003
CombDock
   Input: structural units (SUs) with known 3D
    conformations
   SUs considered rigid bodies
       rotated and translated with respect to each other
   Goal: predict overall structure
   Constraints
       Penetration: avoid steric clashes
       Backbone: restriction on maximum distance
        between consecutive SUs
All pairs docking
   N(N-1)/2 pairs of SUs
   Calculate candidate transformations according to
    matching complementary local features on surface of
    SUs
       Apply transformation on 2nd SU of pair
   Keep K best for each
   Clustering to ensure all K transformations yield
    significantly different complexes
Combinatorial assembly
   Multigraph representation
       Vertices = SUs
       Edges = transformations between two SUs
            K parallel edges between any two vertices
                           i

             1 2   …   K            Transformation between
                                       i and k induced by
                                     transformations (ij, jk)
        j

                               k


   Final protein conformation = spanning tree
       N SUs, one connectivity component, no cycles
Combinatorial Assembly
   NN-2KN-1 different spanning trees
       Not all spanning trees are valid complexes
       Use heuristical algorithm
   Two subtrees adjacent iff there exists an
    index i so that vertex i is in one subtree and
    i+1 is in the other
   Sequential tree: recursive definition
       One vertex
       Tree with edge that connects two adjacent
        sequential trees
Combinatorial Assembly
   Hierarchical algorithm of N stages
       ith stage: generate sequential trees with i vertices
       Construct trees by connecting adjacent sequential
        trees of smaller sizes generated earlier
       Keep D best sequential trees at each step
            Discard trees which do not meet backbone and
             penetration constraints
            Score = sum of scores of transformations
Combinatorial Assembly
CombDock Results
             Conclusion

  known         fragment     protein   predicted
structures        library   sequence   structure      Experimental Methods
                                                           X-ray crystallography
                                                           NMR spectroscopy
                                                      Computational Methods
                                                           Homology modeling
                                                                 Similar sequences fold into
                                                                  similar structures
                                                           Threading
                                                                 Dissimilar sequences may
                                                                  fold into similar structures
                                                           Ab initio
                                                                 No similarity assumptions
                                                                 Conformational search
                  …
References
   Kolodny et al., “Small libraries of
    protein fragments model protein
    structures accurately”
   Inbar et al., “Protein structure
    prediction via combinatorial assembly of
    sub-structural units”

								
To top