Protein folding by yurtgc548

VIEWS: 1 PAGES: 33

									A too simple model for
    protein folding
          Ethan Bolker
Mathematics and Computer Science
         UMass Boston

         Clark University
          April 14, 2004
             Preliminaries
• Problem source: biology teaching need,
• Analysis mixes biology, cs, mathematics
  (= applied mathematics)
• Ongoing help from Bogdan Calota
• See www.cs.umb.edu/~eb/folding
             How life works
•   DNA (gene) makes RNA
•   RNA makes polypeptide
•   Polypeptide folds into protein
•   Proteins interact (biochemistry)
•   Cells … organisms … communities …
•   Natural selection makes gene mix evolve
  Virtual teaching laboratories
• For Brian White (Biology, UMass Boston)
• Virtual Genetics Laboratory (VGL)
  – Mendelian genetics
  – http://intro.bio.umb.edu/VGL/index.htm
  – Science, April 16, 2004
• GenExplorer
  – the central dogma
  – www.cs.umb.edu/genex/
• Watch this space …
          Polypeptide  protein
• Polypeptide: sequence of amino acids
    chemical (biological) activity depends on
    three dimensional configuration (folding)
• Protein: polypeptide folded into active shape
• Given the sequence, what’s the shape?
  – Wet lab
     • lots of chemistry
     • x-ray crystallography
     • (newer tools)
  – Virtual lab
     • compute shape from chemical principles
     • need supercomputer or grid
          folding@home
www.stanford.edu/group/pandegroup/folding/
       For beginning biologists
• Problem: give students hands on experience
  showing how sequence determines shape
• Solution: very simple model
  – amino acid = disk in the plane, hydrophobic index
    hi expresses wish to avoid wet environment
  – fold polypeptide on hex grid to minimize energy
      energy =    Σ (# exposed edges)     hi
                 acids
                       folding@umb
51882 possible configurations
(5279 modulo dihedral group symmetry)
minimum energy -131.17
minimum occurs once

topology
0: [2, 7]
1: [ ]
2: [0, 7]
3: [ ]
4: [ ]
5: [ ]
6: [ ]
7: [0, 2]
                  folding@umb




51882 possible configurations (5279 modulo dihedral group symmetry)
minimum energy -13.161
minimum occurs twice (second - obvious - answer has same topology)
          Brute force search
• Try all nonintersecting walks of length n
  on plane grid of hexagons:
   1, 6, 30, 138, 618, 2730, 11946, 51882,
   224130, 964134, 4133166, …
• Sequence # A001334 in the
  Online Encyclopedia of Integer Sequences
  www.research.att.com/~njas/sequences/
• No closed form expression
• Growth rate obviously O(5n), actual  4.25n
• To count foldings, divide by 12 (symmetry)
   A (random) chain of length 17
• Five of the 11 minimum energy foldings
• All 11 show same 8 acid cool ring, hot core
• Essentially the same topology
• 12 hour computation
     Open questions (statistical)
• How many minima?
• What is the energy distribution
  – for one polypeptide, over all foldings?
  – of minima, over all polypeptides of fixed length?
• Do all minima for a pp have same topology?
  (several possible definitions for topology)
• Do approximate minima have same topology?
  (several possible definitions for approximate)
   Which amino acid universe?
Random polypeptides – acids chosen
• hi uniformly distributed in [-1,1]
• hi = (1,-1) with probability (p, 1-p)
• from (Ala, Arg, … , Tyr, Val) with
   – measured hydrophobic indices
   – measured probabilities of occurrence

        the natural universe
              Digression
• How do you interpolate visually between
  red and green?
• in RGB space, white is halfway
• in HSB space, yellow is halfway
• Application uses cubic interpolation to
  adjust contrast near the midpoint
             Cubic interpolation
// Map a range of hydrophobic indices h to a continuum of
// colors between RED and GREEN in HSB space.
//
// First map h linearly to x between 0.0 and 1.1 so that we
// can form convex combinations. To get better visual effect
// replace x by
//             f(x)     = ax^3 + bx^2 + cx
//             color(x) = f(x)*RED + (1-f(x))*GREEN
// f(0) = 0 means color(0) = GREEN. Then find a, b and c so that
// f(1) = 1, f(1/2) = 1/2 and f '(1/2) = k (to be determined). Then
// color(1) = RED and color(1/2) = 1/2 (RED+GREEN) = YELLOW,
//
// When k = 1, f(x) = x is linear, not cubic (check the algebra).
// That works well for the natural table. But for the virtual table it
// provides too little contrast near the center. k= ½ flattens out the
// cubic at its inflection point there and seems to be just about right.
    Open questions (biological)
• Nature isn’t random: naturally occurring
  polypeptides are not a random selection
  from the natural universe
• Which shapes can occur as the minimum
  energy configurations of polypeptides?
  – which are beautiful? (polypeptide tangrams)
  – which are interesting? (designer drugs)
    (I like cool rings, Brian White likes hot cores)
          Folding algorithms
• Conjecture: brute force is NP-complete
• Look for an approximate algorithm
  – polynomial time
  – close to true minimum with high probability
  – not stochastic
• Conjecture: no local algorithm will do
       Incremental Folding
 int lookahead
int step ≤ lookahead
while there are acids to place
     explore all positions for the next
            lookahead acids that minimize
the         energy of configuration so far
     place the first step of those lookahead
       acids
         Incremental Folding
• lookahead = step = 1 is greedy
• lookahead = step = n is brute force

             n
• time = O(       4.xlookahead )
            step

• linear in n, but exponential in lookahead
50 acids, randomly chosen from natural universe
seed 2255
minimum energy -352.38
lookahead 8, step 1
time 139 seconds
50 acids, randomly chosen
from natural universe
seed 2255
minimum energy -338.42
lookahead 8, step 4
time 29 seconds
50 acids, randomly chosen
from natural universe
seed 2255
minimum energy -351.54
lookahead 8, step 5
time 27 seconds
50 acids, randomly chosen from natural universe
seed 2255
minimum energy -343.98
lookahead 8,
step 7
time 15 seconds
brute force folding
for one random
chain of length 17
         incremental: step sensitivity
                                 lookahead 7


           0
               brute force

          -2   1             2     3      4    5   6   7
          -4
          -6
energy




          -8
         -10
         -12
         -14
         -16
         -18
                                        step
         incremental: lookahead sensitivity
                                         step 5

          0
               brute force



               1             2   3   4    5       6
                                                  9    1
                                                       7   1
                                                           8   1
                                                               9    1
                                                                   10    1
                                                                        11
          -2                 5   6   7    8
                                                       0   1   2    3    4
          -4
          -6
energy




          -8
         -10
         -12
         -14
         -16
         -18
                                           lookahead
          Incremental Folding
• Topology highly sensitive to step
• Energy not monotone with step or lookahead
• Can always be fooled
                      ●●●




• May be realistic biologically
• Suffices for teaching goal
            More geometry
• Square grid folding is faster: O(2.xlookahead)
  instead of O(4.xlookahead)
• But not nearly as pretty
             Folding in space
• Cubic grid has same folding complexity as hex
  grid in plane since each cell has six neighbors
• 3D analogue of hex grid is spherical close
  packing
  – oranges at the market
  – layers of hexagonally close packed planes
  – cell is a rhombic dodecahedron
  – each sphere has 12 neighbors
  – folding complexity O(10.xn )
Packing spheres
     H. Steinhaus
Mathematical Snapshots
            Foldings in space




energy 15.6            energy 37.8
time 0 seconds         time 18 seconds
explored 8185 chains   explored 752057 chains
                  Summary
•   The customer is satisfied
•   You can play with the applet
•   The software needs work
•   All the interesting questions are still open

								
To top