Mass Spectrometry - PowerPoint 1 by liuqingyan


									L529 - Presentation


  - Yogita Mantri
   -Arvind Gopu
Introduction – What is Proteomics?

    “The identification, characterization and
   quantification of all proteins involved in a
particular pathway, organelle, cell, tissue, organ
 or organism that can be studied in concert to
provide accurate and comprehensive data about
                  that system.”
    Central lesson from eukaryotic genome projects

   Evolutionary complexity is not primarily determined by increasing
    the number of genes, but by increasing variation on the level of
    the synthesized proteins.
   This is achieved by generating MULTIPLE proteins from only
    ONE gene e.g. by
     different combinations of exons by alternative splicing

     post-translational protein processing (e.g. cleavage of pro-
     post-translational protein modifications (e.g. acetylation,
     modified central dogma: DNA --> RNA --> protein(s)

     it is important to perform analyses on the level of gene
       PRODUCTS Key
   Key advantage of proteomics
       Researchers work on the level of gene products and
        deal with genes that are really expressed to give a
        detectable PRODUCT and are not just "expressed“
        which only says they produce a detectable mRNA but it
        is not clear whether there is a gene product or not.
   Key limitation of proteomics
       Usually, only a fraction of the proteins synthesized can
        be detected in a proteomics experiment, whereas the
        expression of ALL genes can be monitored in a whole-
        genome array experiment.
   Key prerequisite of proteomics
       A genome sequence for the investigated organism or at
        least a collection of many cDNA sequences is required.
Experimental Background

   Mass Spectrometry
                   What is Mass Spec?

   Analytical tool measuring molecular weight (MW) of
   Only picomolar concentrations required
   Within an accuracy of 0.01% of total weight of
    sample and within 5 ppm for small organic
   For a Mr of 40 kDa, there is a 4 Da error
   This means it can detect amino acid substitutions /
    post-translational modifications
            What sort of info is returned?

   Structural information can be generated
   Particularly using tandem mass spectrometers
   Fragment sample & analyse products
   Useful for peptide & oligonucleotide sequencing
   Plus identification of individual compounds in
    complex mixtures
    How does a Mass Spectrometer work?

   3 fundamental parts: the ionisation source, the
    analyser, the detector
   Samples easier to manipulate if ionised
   Separation in analyser according to mass-to-charge
    ratios (m/z)
   Detection of separated ions and their relative
   Signals sent to data system and formatted in a m/z
                  Simplified Schematic

   The analyser, detector and ionisation source are under high
    vacuum to allow unhindered movement of ions
   Operation is under complete data system control
Schematic of a typical TOF-MS/MS
          Sample Introduction
             & Ionisation
   Direct into ionisation source or via
    chromatography for component
    separation (HPLC, GC, capillary
   Ionisation can be positively charged (for
    proteins) or negatively charged (for
    saccharides and oligonucleotides)
              Ionisation methods
   Atmospheric Pressure Chemical Ionisation (APCI)
   Chemical Ionisation (CI)
   Electron Impact (EI)
   Electrospray Ionisation (ESI)
   Fast Atom Bombardment (FAB)
   Field Desorption / Field Ionisation (FD/FI)
   Matrix Assisted Laser Desorption Ionisation
    (MALDI) (Clemmer Group)
   Thermospray Ionisation
          Detection & Recording of Ions

   Detector monitors ion current, amplifies it and
    then transmits signal to data system
   Common detectors: photomultiplier, electron
    multiplier, micro-channel plate
Mass spectrometry is a very powerful method to
analyse the structure of organic compounds, but
suffers from 3 major limitations:

Compounds cannot be characterised without clean

This technique has not the ability to provide sensitive
and selective analysis of complex mixture

For big molecules like peptides spectra are very complex
and very difficult to interpret
 Tandem MS or MS/MS has 2 mass spectrometers in series.

In first mass spectrometer (MS1) is used to SELECT, from the
primary ions, those of a particular m/z value which then pass into
the Fragmentation Region. The ion selected by the MS1 is the
parent ion and can be a molecular ion resulting from the primary
fragmentation. DISSOCIATION occurs in the fragmentation
region. The daughter ions are analysed in the Second
Spectrometer (MS2). In fact, the MS1 can be viewed as an ion
source for MS2.

     MS1                                        MS2
                  Peptide Sequencing
   Peptides of 2.5 kDa or less give best data
   Protein sample often taken from 2-D gels and digested
   A protein digest can be analysed as entire mix
   Initial MS spectrum showing Mr of all components in digest
    (peptide map) may be enough for a database search and
   Peptides fragmented along the amino acid backbone in tandem
    mass spectrometry
   Some peptides generate enough info for full sequence, others
    only generate partial sequences of 4-5 amino acids
   Often this “tag” sequence is sufficient for database identification
Data Analysis
Common Data Analysis - Pipeline
          Issue #1 (Relatively Minor?)

   Diverse set of Mass Spectrometers…
       More flexibility BUT ...
       Different data formats
       Limited Data analysis possible
       Exchange of RAW datasets and creation of public
        repositories for the data/software? Not easy if not
           Work Around for Issue #1?

   To get around this problem
       Convert to ASCII text - speed and loss of precision can be
        an issue
       Other formats specific to this field
       A lot of XML based file formats seem be floating around
       Of course using XML format (for example) gives raise to
        additional level of complexity -- parsers, formatters, etc
       It does add flexibility between data formats
       Indexing techniques used to speed up access
              Issue #2 (Much bigger!)

   Data Size
   Higher Dimensionality
   The combination is even deadlier!
       More detail in a minute … Before that …
   The LC/MSMS spectrum data looks like this:
     LC    Drift   TOF         Intensity
       i.e., 3-D + Intensity
                Issue #2 (Continued…)

   As a first step in data analysis:
       Find peaks in the LC/MSMS data
           Peaks is kind of a misnomer.
           Center of mass (or something like that) is a better term.
           Illustrates inherent non-uniformity within proteomics circles
           Easier said than done as we found out!

   Let us start with a simpler case of finding peaks in 2-
    D data – a little more complicated than 1-D …
Peak Finding – 2-D data
    Peak Finding - Higher Dimensions?
   As mentioned earlier data is of the form:
       LC         Drift        TOF        Intensity
       i.e., 3-D + Intensity

   Add to this huge data size and get a hang of how difficult
    a problem it is
               Some Possible Solutions

   Solutions we thought about:
     Find peaks using a brute force approach

           Not computationally feasible in terms of time and
       Squeeze 3-D data into 2-D, find peaks and then work
           This is the algorithm implemented by Frank - one of the
            IU Chemistry folks
       Use existing implementations of graph functions
        available in packages (For example: LEDA) to
        preprocess data and then find peaks on smaller data
          Our Peak Finding Algorithm

   Used LEDA package for C++
   Specifically made use of O(n Log n) implementation
    of Delaunay Triangulation Neighbor Finding
    algorithm in 3 D space
   Once neighbors were found then do a brute force
    peak finding step
       How good were our results?
   More details? Take a look at our summer
    presentation at Chemistry
   Sample of the data … What it looks like?
               Peptide Assignment

   Find sequence of amino acids that can
    generate the list of masses seen in the
    tandem MS scan.
   Many different strategies:
       Searching MS/MS spectra against a sequence
        database (Sequest, Mascot, etc)
       De novo sequencing (no database!)
       Hybrid
           Scoring Peptide Sequences

   Multiple search engines are available
       Sequest and Mascot
   They use different scoring algorithms
   Search outputs are not comparable
   Search outputs usually require expert
    validation …
    An example of scoring system: SCOPE

   Probabilistic model for scoring tandem MS
    against peptide database
   Two stage model
   Uses dynamic programming
   Incorporates fragment ion probabilities, noisy
    spectra and instrument measurement error
   Details:
        /17/suppl_1/S13.pdf (Scoring Spectra section)
             Peptide Validation

   Validate peptide assignments made during
    the database search step.
   Obviously, method used should be
    standardized and independent from the
    experimental and computational methods
                     Manual Validation

   Filtering by database search scores
       Problems:
           Filtering criteria vary among researchers
           Error rates are unknown
           Possible only on very small datasets
            Model Based Validation?

   Empirical Statistical Model to estimate
    accuracy …
       Anal. Chem 2002, 74, 5383 – 5392
   Employs Expectation Maximization and
    Machine Learning techniques
   Learns to distinguish between correct from
    incorrect database search results
    Model Based Validation – EM algorithm

   Each peptide assignment evaluated w.r.t. all
    other assignments including incorrect ones
   Denote correct and incorrect assignments as
    (+) and (-); Scores as x_1, x_2 … x_s
       P(+ | x_1, x_2 … x_s) =
               P(x_1, x_2 … x_s | +) * P(+)
               ∑ P(x_1, x_2 … x_s | i) * P(i)
Model Based Validation – EM algorithm
(Continued …)
   Replace search scores with discriminant
    function F
                       P(F| +) * P(+)
       P(+ | F) = -------------------
                  ∑ P(F| i) * P(i)
   Bunch of probabilistic parameters considered
   Ended up approximating distributions to
    Gaussian and Gamma distrs.
   (More details out of scope of this presentation, please refer paper)
    Example of Automated Validation

   An example: Protein Prophet
   Compute probabilities that peptides assigned to
    MS/MS spectra are correct
   Learns distributions of search scores and peptide
    properties among correct and incorrect results
   The computed probabilities are claimed to be a true
    measure of the confidence!
   Combines probabilities of peptides assigned to
    MS/MS spectra to compute probability that
    corresponding proteins are present in the sample

   Assign a biological meaning to the output of
    the pipeline
Current Issues and Challenges
             After Proteomics…..
Functional Genomics

                 Slide adapted from
        Limitations of Proteomics

Experimental limitations:
    Large-scale protein analysis difficult because:
    -Proteins are fragile
    -They can exist in multiple isoforms
    -There is no protein equivalent of PCR for
    amplification of a small sample
Data Analysis Limitations:
 -Data contains a lot of noise that is difficult
 to separate from actual signal. This results
 in wastage of computing resources on
 searching for unlikely spectra.
 -Database searches for matching spectra
 only give scores, leaving manual
 intervention necessary for eliminating false
Biomedical limitations
-In practice, it is very difficult to trace the complete
   progression of a disease.
-Hence, using proteomics for monitoring the
   biochemistry of a disease is like using a photo
   camera to record a football match.
-Case of breast cancer research:
References and Further Reading
   Explains the whole process nicely -- article
   Mascot Home page -- help section
   Presentation about MS MS data
   Some info about drug discovery/economic issues n such:
   Paper on interpreting MSMS data
   How to estimate correctness of MS MS prediction -- EM !!!


   Others:

   Delaunay Triangulation:

   SCOPE paper -- screen PDF
Internet sites
    (Dr Alison E. Ashcroft at Leeds)
 (The American Society for Mass Spectroscopy)
 (Base Peak)
    Mass Spec tools

Internet sites :

Ionization Methods
Further Reading
1. For MALDI beginner:
2. For MALDI lab user:

3. For MALDI tutorial:

4. Ionization Methods 1:
5. Ionization Methods 2:
SELDI Web sites:
• Molecular Analytical Systems (MAS).
• Manufacturers of ProteinChip(R)

To top