Predicting Protein Folding Pathways Zaki_ Nadimpally_ Bardhan_ and by malj


									Predicting Protein Folding Pathways
       Zaki, Nadimpally, Bardhan, and Bystroff

       Data Mining in Bioinformatics

       Presented by: Michelle Cavallo
             I.B. PhD Student
         Advisor: Dr. R. Narayanan
• Problem:
   – Identify a time-ordered sequence of folding events that make up
     a structured protein folding pathway

• Solution:
   – Novel “unfolding” approach for predicting the folding pathway
       • Apply graph-based methods on a weighted secondary structure
         graph of a protein to predict the sequence of unfolding events
       • Reverse the event sequence to see the folding pathway

• Experiments:
   – Successful predictions for proteins with partially known folding

• Proteins fold
  spontaneously and
  reproducibly in an
  aqueous solution

• Structure is determined
  by sequence

• Function is determined by
  structure                   Hemoglobin: a globular protein
            Protein Problems
• Two major protein problems for bioinformatics:

  – The Structure Prediction Problem
     • Determine 3D 3º structure from linear amino acid sequence

  – The Pathway Prediction Problem
     • Given an amino acid sequence and its 3D structure,
       determine the folding pathway that leads from the linear
       structure to the 3º structure

• Major focus has been on structure prediction
          Structure Prediction
• Traditional approaches to structure prediction have
  focused on:
   – Evolutionary homology
   – Fold recognition (goodness of fit score for sequence-structure
   – Ab initio simulations (conformational search for the lowest
     energy state)

• Conformational search space is huge
   – Proteins fold in milliseconds—a structured folding pathway must
     play an important role in this conformational search
   – Experimental evidence does indicate that certain events always
     occur early in the folding process and certain others always
     occur later
      Towards Pathway-based
        Structure Prediction
• To make pathway-based approaches to
  structure prediction a reality, plausible
  protein folding pathways need to be

• The ability to predict folding pathways
  can greatly enhance structure
  prediction methods.
   Studying Folding Pathways
• One approach to studying folding pathways is to identify folding
  possibilites in an unfolded protein.
    – This is infeasible—there are too many possibilities.

• The approach used in this study is to start with a folded protein in its
  final state and learn how to “unfold” the protein.

• The reversed unfolding sequence could then be a plausible protein
  folding pathway.

• The solution: Use minimum cuts on weighted graphs to
  determine a plausible sequence of unfolding steps.
        Protein Contact Maps
• A protein contact map represents the distance between
  every two residues of a 3D protein structure in a 2D
   – Represented in a symmetrical, square Boolean matrix of
     pairwise interresidue contacts

• “A contact map for a protein with N residues is an N x N
  binary matrix C whose element C (i, j) = 1 if residues i
  and j are in contact and C (i, j) = 0 otherwise”

• Protein contact maps can be created using different
  tools, e.g. BioPython, Structer
            Protein Contact Maps

Figure 7.2 shows the 3º structure and contact map for IgG-binding protein from PDB
  Graphs and Minimum Cuts
• A protein can be represented as a weighted
  secondary structure element graph (WSG)

  – Vertices = the SSEs that make up the protein
     • β-strands represented as triangles
     • α-helices represented as circles

  – Edges denote proximity relationships between SSEs
     • Edges weighted by strength of interactions between SSEs

• Edge construction and weights are determined
  from the contact map
Graphs and Minimum Cuts
   Solution/Approach Outline
• Approach to predicting a folding pathway using
  the idea of “unfolding”

• “Use a graph representation of a protein, where
  a vertex denotes a 2º structure and an edge
  denotes the interactions between the two SSEs”
  (2º structure elements).”

• Unfold the protein through a series of mincuts
         Unfolding via Mincuts
• Unfold one piece at a time, each time choosing
  the cut which will have the least impact on the
  remaining structure

• The sequence can then be reversed to identify
  plausible pathways for protein folding

• This series of mincuts predicts the most likely
  sequence of unfolding events
      Unfolding via Mincuts
• A mincut represents the set of edges that
  partition a WSG into two components with the
  smallest number of bonds between them

• Stoer-Wagner (SW) deterministic polynomial-
  time mincut algorithm was used since it is
  simple and fast.

• “The SW algorithm works iteratively by merging
  the vertices until only one unmerged vertex
      Unfolding via Mincuts
• “SW starts with an arbitrary vertex and
  adds the most highly connected vertex to
  the current set”

• This process is repeated until all vertices
  have been added in order of decreasing
  attraction to the first
              Unfolding via Mincuts

“An unfolding event is a set of edges that form a mincut in the WSG for a protein.”
       The UNFOLD Algorithm
• Determine mincut for initial WSG

• Break ties arbitrarily

• Delete edges forming this cut from WSG
   – This yields two new connected subgraphs

• Recursively process each subgraph to yield a sequence
  of mincuts corresponding to the unfolding events

• Reverse this sequence to obtain predicted folding
        The UNFOLD Algorithm

• Sequence of mincuts
  that can be visualized as
  a tree

• Nodes represent sets of
  vertices (graphs)
  produced by mincuts

• Children of a node
  represent partitions
  resulting from the mincut
• “Allowance should be made for several folding
  events to take place simultaneously.

• However, there may be intermediate stages that
  must happen before higher order folding can
  take place.”

• “The results should not be taken to imply a
  strict folding timeline, but rather as a way to
  understand major events that are mandatory
  in the folding pathway.”
• No one has determined a complete protein
  folding pathway

• However, there is evidence supporting
  intermediate pathway stages for several
  well-studied proteins

• Proteins with known intermediate pathway
  stages were analyzed with UNFOLD
   Detailed Test Case: 4DFR
• Dihydrofolate Reductase (PDB ID: 4DFR)

• Involved in nucleotide metabolism

• Has an adenine binding domain which is formed
  (folded) early on in the folding pathway
  – An α1 and β2 interaction

• 4DFR has four α-helices and eight β-strands.
 4DFR Detailed Test Case Continued
• Shown below are the WSG, unfolding sequence, and a series of
  intermediate stages in the folding pathway

• “According to the mincut-based UNFOLD algorithm, the vertex set
  {β2α2β3β1} lies on the folding pathway in agreement with the
  experimental results.”
4DFR Detailed Test Case Continued

     Predicted folding sequence for 4DFR
 Pathways for Other Proteins
• Several other proteins with known protein folding
  pathway intermediate stages were UNFOLDed

  – Bovine Pancreas Trypsin Inhibitor, Chymotrypsin
    Inhibitor 2, Human Procarboxypeptidase A2, Cell
    Cycle Protein p13suc1, β-lactoglobulin, Interleukin-1β,
    Protein Acylphosphatase, Twitchin Ig Superfamily
    Domain Protein, Myoglobin and leghemoglobin

• UNFOLD results reflected experimental results

• A repeat mincut approach (UNFOLD
  algorithm) can be used for automated
  prediction of protein folding pathways
       Future Perspectives
• Plan to test UNFOLD on the entire
  collection of proteins in the PDB

  – Want to study proteins from the same family
    to look for prediction of consistent pathways

    • Similarities and dissimilarities are both of interest
• “UNFOLD arbitrarily picks only one micut out of perhaps
  several mincuts that have the same capacity”
   – Constructing all possible pathways might provide stronger
     evidence of intermediate states

• “All native interactions are considered energetically
  equivalent, and thus larger stabilizing interactions are not
   – Simplified model based on topology
       • Folding mechanism inferred from native structure alone
   – May be ok, because investigations indicate folding mechanisms
     are largely determined by topology
      Biology Perspectives
• “The ability to predict folding pathways
  can greatly enhance structure
  prediction methods”

• We want to predict structures to assign
  putative functions to novel genes!
Biology Perspectives Continued
• It is very difficult to determine a protein
  structure in the lab
  – X-ray crystallography
     • Technique is difficult to perform
     • Results are difficult to interpret

• We would like to have fast, easy methods
  for predicting structure in silico.
Biology Perspectives Continued
• Protein folding pathway prediction is of particular
  interest in prion research

• Prions = misfolded proteins which cause
  transmissible spongiform encephalopathy
   –   Creutzfeldt-Jakob Disease
   –   Gerstmann-Sträussler-Scheinker Syndrome (GSS)
   –   Fatal Familial Insomnia (FFI)
   –   Kuru

To top