Docstoc

CSE 182 Biological Data Analysis

Document Sample
CSE 182 Biological Data Analysis Powered By Docstoc
					Protein sequencing and Mass Spectrometry
  Sample Preparation




Enzymatic Digestion
     (Trypsin)
         +
   Fractionation
   Single Stage MS




      Mass
   Spectrometry




LC-MS: 1 MS spectrum / second
                                  Tandem MS




Secondary Fragmentation

         Ionized parent peptide
              The peptide backbone

             The peptide backbone breaks to form
             fragments with characteristic masses.



       H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH
                    Ri-1          Ri              Ri+1
N-terminus                                                 C-terminus

             AA residuei-1   AA residuei   AA residuei+1
                           Ionization

             The peptide backbone breaks to form
             fragments with characteristic masses.


                             H+
       H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH
                    Ri-1          Ri             Ri+1
N-terminus                                                 C-terminus

             AA residuei-1   AA residuei   AA residuei+1




                       Ionized parent peptide
              Fragment ion generation

             The peptide backbone breaks to form
             fragments with characteristic masses.


                                 H+
     H...-HN-CH-CO           NH-CH-CO-NH-CH-CO-…OH
                      Ri-1         Ri               Ri+1
N-terminus                                                  C-terminus

             AA residuei-1    AA residuei   AA residuei+1




                       Ionized peptide fragment
                Tandem MS for Peptide ID
          88     145    292   405     534     663    778     907   1020   1166   b ions
           S      G      F     L       E       E      D       E      L      K
         1166   1080   1022   875     762     633    504     389    260    147   y ions


100
% Intensity




                                          [M+2H]2+




         0
                250                 500                750                1000
                                            m/z
                             Peak Assignment
          88     145     292      405     534    663        778       907   1020      1166     b ions
           S      G       F        L       E      E          D         E      L         K
         1166   1080    1022      875     762    633        504       389    260       147     y ions

                                                                  y6
100
                            Peak assignment implies
% Intensity




                            Sequence (Residue tag)                          y7
                            Reconstruction!
                                         [M+2H]2+

                                                      y5
                       b3
                 y2               b4
                             y3         y4 b5
                                                           b6         b7         b8    b9 y8   y9
         0
                250                     500                     750                   1000
                                                m/z
     Database Searching for peptide ID



• For every peptide from a database
   – Generate a hypothetical spectrum
   – Compute a correlation between observed and
     experimental spectra
   – Choose the best
• Database searching is very powerful and is the de
  facto standard for MS.
   – Sequest, Mascot, and many others
         Spectra: the real story


• Noise Peaks
• Ions, not prefixes & suffixes
• Mass to charge ratio, and not mass
   – Multiply charged ions
• Isotope patterns, not single peaks
Peptide fragmentation possibilities
            (ion types)


                           xn-i
                                  yn-i
                                  vn-i                   yn-i-1
                                         wn-i
                                           zn-i
  -HN-CH-CO-NH-CH-CO-NH-
            Ri                           CH-R’
                                             i+1

            ai
                    bi
                                         R”
                                          i+1
                                                  bi+1
                             ci

    low energy fragments
                                         di+1                high energy fragments
          Ion types, and offsets


•   P = prefix residue mass
•   S = Suffix residue mass
•   b-ions = P+1
•   y-ions = S+19
•   a-ions = P-27
            Mass-Charge ratio


• The X-axis is (M+Z)/Z
  – Z=1 implies that peak is at M+1
  – Z=2 implies that peak is at (M+2)/2
     • M=1000, Z=2, peak position is at 501
  – Suppose you see a peak at 501. Is the mass 500, or is it
    1000?
         Spectral Graph


                 • Each prefix residue mass
                   (PRM) corresponds to a
                   node.
                 • Two nodes are connected
                   by an edge if the mass
                   difference is a residue
     G             mass.
87        144    • A path in the graph is a de
                   novo interpretation of the
                   spectrum
                                  Spectral Graph
•   Each peak, when assigned to a prefix/suffix ion type generates a unique
    prefix residue mass.
•   Spectral graph:
     –   Each node u defines a putative prefix residue M(u).
     –   (u,v) in E if M(v)-M(u) is the residue mass of an a.a. (tag) or 0.
     –   Paths in the spectral graph correspond to a interpretation




     0       87         144 146          273275         332        401

                  100             200             300
         S         G               E
                                                         K
              Re-defining de novo interpretation
•   Find a subset of nodes in spectral graph s.t.
     –   0, M are included
     –   Each peak contributes at most one node (interpretation)(*)
     –   Each adjacent pair (when sorted by mass) is connected by an edge (valid residue
         mass)
     –   An appropriate objective function (ex: the number of peaks interpreted) is
         maximized




    87             G
                                  144

     0       87         144 146         273275         332      401

                  100             200            300
         S         G               E
                                                       K
                           Two problems

• Too many nodes.
  – Only a small fraction are correspond to b/y ions (leading to true
    PRMs) (learning problem)
  – Even if the b/y ions were correctly predicted, each peak generates
    multiple possibilities, only one of which is correct. We need to find a
    path that uses each peak only once (algorithmic problem).
  – In general, the forbidden pairs problem is NP-hard




    0       87         144 146         273275         332   401

                 100             200            300
        S         G               E
                                                      K
                       However,..


• The b,y ions have a special non-interleaving
  property
• Consider pairs (b1,y1), (b2,y2)
   – If (b1 < b2), then y1 > y2
              Non-Intersecting Forbidden pairs




    0    87   100          200           300   332      400



        S       G            E
                                               K
•   If we consider only b,y ions, ‘forbidden’ node pairs are non-intersecting,
•   The de novo problem can be solved efficiently using a dynamic programming
    technique.
     The forbidden pairs method


• There may be many paths that avoid forbidden
  pairs.
• We choose a path that maximizes an objective
  function,
   – EX: the number of peaks interpreted
     The forbidden pairs method


• Sort the PRMs according to increasing mass values.
• For each node u, f(u) represents the forbidden pair
• Let m(u) denote the mass value of the PRM.




     0   87   100      200        300     332   400


              u                         f(u)
            D.P. for forbidden pairs

• Consider all pairs u,v
   – m[u] <= M/2, m[v] >M/2
• Define S(u,v) as the best score of a forbidden pair path from 0-
  >u, v->M
• Is it sufficient to compute S(u,v) for all u,v?




       0   87   100        200       300   332       400

                u                                v
         D.P. for forbidden pairs


 • Note that the best interpretation is given by


                    max((u,v )E ) S(u,v)



   0   87   100          200       300   332   400


                       u         v
                 D.P. for forbidden pairs

•   Note that we have one of two cases.
    1.       Either u < f(v) (and f(u) > v)
    2.       Or, u > f(v) (and f(u) < v)
•   Case 1.
    –        Extend u, do not touch f(v)


                           S(u,v)  max u':(u,u' )E S(u,u')  1
                                              (                )
                                                  u' f (v )




         
         0           100            200               300   f(u)   400
                 u
                                                  v
              The complete algorithm

for all u /*increasing mass values from 0 to M/2 */
    for all v /*decreasing mass values from M to M/2 */
           if (u > f[v])

                      S[u,v]  max (w,u)E  S[w,v]  1
                                                      
                                            w f (v ) 

          else if (u < f[v])
                     S[u,v]  max (v,w )E  S[u,w]  1
                                                    
                                           w f (u) 
       (u,v)E
       If
             /*maxI is the score of the best interpretation*/
             maxI = max {maxI,S[u,v]}

     

				
DOCUMENT INFO