An Introduction to Proteomics

Document Sample
An Introduction to Proteomics Powered By Docstoc
					An Introduction to Proteomics


 The PROTEin complement of the genOME.




              Judson Hervey
              UT-ORNL GST
             Graduate Student
             whervey@utk.edu
        What is Proteomics?
• Defined as “the analysis of the entire
  protein complement in a given cell, tissue,
  or organism.”
• Proteomics “also assesses activities,
  modifications, localization, and interactions
  of proteins in complexes.”
• Proteomes of organisms share intrinsic
  differences across species and growth
  conditions.
           Alternative View

• Definition by Mike Tyers (U Toronto):
• Lumping everything “post-genomic”
  together, and eluding to proteomics as
  “protein chemistry on an unprecedented,
  high-throughput scale.”
• In any case, no matter which definition you
  accept, consider proteomics as the “next
  step” in modern biology.
                   Importance of Proteins:
• they serve as catalysts that maintain metabolic processes in the cell,
• they serve as structural elements both within and outside the cell,
• they are signals secreted by one cell or deposited in the extracellular
    matrix that are recognized by other cells,
•   they are receptors that convey information about the extracellular
    milieu to the cell,
•   they serve as intracellular signaling components that mediate the
    effects of receptors,
•   they are key components of the machinery that determines which
    genes are expressed and whether mRNAs are translated into
    proteins,
•   they are involved in manipulation of DNA and RNA through
    processes such as: DNA replication, DNA recombination, RNA
    splicing or editing.
http://www-users.med.cornell.edu/~jawagne/proteins_&_purification.html
  But what about the Genome?
• What does having the genome of an organism
 give us?
   A great diagram, or “blueprint,” of the genes
    within an organism.
   Think of the genome as code that needs
    compiled into functional units.
   The genome gets “compiled” into the proteome
    via the central dogma of biology.
   Proteomic strategies attempt to utilize information
    from the genome in an attempt to conceptualize
    protein function.
                Experimental Platforms
Tyers and Mann, pg 194




      “Systems biology is an approach to studying complex biological systems made possible
      through technological breakthroughs such as the human genome project. …systems biology
      simultaneously studies the complex interaction of many levels of biological information to
      understand how they work together.”         http://www.systems biology.org/
      Challenges facing Proteomic Technologies


• Limited/variable sample material
• Sample degradation (occurs rapidly, even during sample
    preparation)
•   Vast dynamic range required
•   Post-translational modifications (often skew results)
•   Specificity among tissue, developmental and temporal
    stages
•   Perturbations by environmental (disease/drugs)
    conditions
•   Researchers have deemed sequencing the genome
    “easy,” as PCR was able to assist in overcoming many
    of these issues in genomics.
The Peptide Bond



           QuickTime™ and a
  TIFF (Uncompressed) decompressor
     are neede d to see this picture.
          Protein Structure



                       QuickTime™ an d a
              TIFF (Uncompressed) decompressor
                 are need ed to see this picture .




Figure 3-35. Three levels of organization of a
protein. (Alberts - Molecular Biology of the Cell)
Amino Acid Properties



               QuickTime™ and a
      TIFF (Uncompressed) decompressor
         are neede d to see this picture.
    Pillar Proteomic Technologies

•   Amino Acid Composition
•   Array-based Proteomics
•   2D PAGE
•   Mass Spectrometry
•   Structural Proteomics
•   Informatics (and the challenges facing the
    Human Proteome Project)
  Amino Acid Composition (Edmund)

• Pioneering method of obtaining
  information from proteins.
• Cumbersome and tedious by today’s
  standards.
• Requires the use of terrible smelling ß-
  mercaptoethanol. 
• Not “high-throughput” by today’s
  standards, hence, aa comp is no longer
  the most widely used technique.
   Protein Sequencing
step 1, fragmenting into peptides
                                  Protein Sequencing
                step 2, sequencing the peptides by Edmund degradation.




Separation by HPLC and detect by absorbance at 269nm.
     Array-based Proteomics
• Employ two-hybrid assays
• Use GFP, FRET, and GST
   GFP = green florescent protein
   FRET = florescence resonance energy
    transfer
   GST = glutathione S-transferase, a well
    characterized protein used as a marker
    protein.
Array-based Proteomics
      Array-based Proteomics

• Offer a high-throughput technique for
  proteome analysis.
• These small plates are able to hold many
  different samples at a time.
• Current research is ongoing in an attempt
  to interface array methodologies with
  Mass Spectrometry at ORNL.
Two-Hybrid Assay




                   Figure 12-
                   35. Griffiths
                   et. al.
                   Modern
                   Genetic
                   Analysis.
                2D PAGE
• 2-D gel electrophoresis is a
  multi-step procedure that can be used to
  separate hundreds to thousands of
  proteins with extremely high resolution.
• It works by separation of proteins by their
  pI's in one dimension using an immobilized
  pH gradient (first dimension: isoelectric
  focusing) and then by their MW's in the
  second dimension.
                 2D PAGE

• 2-D gel electrophoresis process consists
    of these steps:
•   Sample preparation
•   First dimension: isoelectric focusing
•   Second dimension: gel electrophoresis
•   Staining
•   Imaging analysis via software
2D PAGE product of Hs plasma




                        Quic kTime™ an d a
              TIFF ( Uncompress ed) decompr essor
                 ar e need ed to s ee this picture .




http://us.expasy.org/ch2dothergifs/publi/elc.gif
        Drawbacks of 2D PAGE
• Technique precision lacks reliable
    reproduction.
•   Spots often overlap, making identifications
    difficult.
•   More of “an art” than “a science.”
•   Slow and tedious.
•   Process contains may “open” phases
    where contamination is possible.
       Structural Proteomics
• Pioneering work is undergoing by
  Baumeister et al, which can significantly
  reduce the amount of painstaking labor in
  the crystallization of proteins.
• Current techniques are not considered
  “high throughput” within the structural
  realm.
• Novel solutions combine current
  technologies, such as NMR and XRC.
                       Informatics
• Significant improvements are needed in:
    Data presentation standards and formatting
    Software infrastructure
       • ISB - have created many powerful software packages that
          interpret data from different techniques.
• EBI and HUPO have come together to promote uniform data storage
  and analysis:
    http://psidev.sourceforge.net
• The proteomics community has, over the course of the past four
  years, become slightly “less proprietary.” Ron Beavis of U.
  Manitoba has developed x! tandem, an open-source search
  algorithm as an alternative to SEQUEST.
• Development of novel software for both analysis and strategies [for
  biologists ] to manage the data are two fronts that I can see as
  opportunities for folks with a CS background.
             Clinical Proteomics
• This area of proteomics focuses on accelerating
    drug development for diseases through the
    systematic identification of potential drug targets.
•   How could this be accomplished?
•   Hopefully, we will have more specific
    information, instead of raw genes, that will make
    those complex differential equations much
    simpler in the coming years.
            Mass Spectrometry
• Mass Spectrometry is another tool to analyze
    the proteome.
•   In general a Mass Spectrometer consists of:
     Ion Source
     Mass Analyzer
     Detector
• Mass Spectrometers are used to quantify the
    mass-to-charge (m/z) ratios of substances.
•   From this quantification, a mass is determined,
    proteins are identified, and further analysis is
    performed.
   “Mass Spec” Analyses can be run in Tandem


• MS/MS refers to two MS experiments
  performed “in tandem.”
• Among other things, MS/MS allows for the
  determination of sequence information,
  usually in the form of peptides (small parts
  of a protein).
• This information is used by algorithms to
  identify a protein on the basis of mass of a
  constituent peptide.
              If you are lost….
• Consider an example: calculating a person’s
    weight, without them knowing.
•   If we have a backpack that we know is 10
    pounds, we could have them put it on.
•   Then, walk the subject over a hidden scale in the
    floor.
•   The weight of the person could be obtained by
    subtracting the weight of the backpack.
         In a similar manner:

• Mass spectrometers allow the
  determination of a mass-to-charge ratio of
  the analyte.
• By knowing the charged state of the
  analyte through the addition of protons
  (the backpack in the example), the mass
  can be calculated after deconvolution of
  the spectrum.
LCQ Mass Spectrometer
  Example MS/MS Spectrum




This spectrum shows the fragmentation of a peptide, which
is used to determine the sequence of the peptide, via a
search algorithm.
Typical MS experiment:
     Algorithmic approaches to “tag” identification


• Peptide sequence tags (Mann): extract and
    unambiguous sequence tag for ID.
•   Cross Correlation (Eng et al. - SEQUEST): comparison
    between observed and theoretically generated spectra.
•   Probability-based matching (Perkins, and the proprietary
    Mascot by Matrix Science): takes into statistical
    significance of fragmentation.
•   Which one of these approaches would you employ?
    (Hint: Discussion fostering question.)
•   Could DP be employed in the searching for post-
    translational modifications in future designs?
•   Could it be done in advance in order to factor account for
    PTMs to speed up the time of the search?
De novo algorithms
De novo algorithms
Sequence tagging Algorithms
+/- Sequence Tagging Algorithms
Other Proteomic Tools FYR
                         My $0.02:
• Proteomics is undoubtedly a critical component of systems biology,
  however:
    The lack of hypothesis-driven experiments isn’t necessarily
     “good” for science. Discovery-based science should be guided
     by hypotheses, IMO.
    Along these lines, as with the HGP, when it comes to literature,
     what do you do, just publish the whole thing?
       • This is another stumbling block of what to do with all of this
         information.
    Proteomics needs its “own PCR,” or “miracle” tool, to increase
     the throughput.
       • A new technology, or instrument that combines other
         approaches, would be useful, esp. in structural proteomics,
         quantification, and sample reproduction.
              References

• Nature Insight: Proteomics. Nature 422:
  191-237.
• Zhu, H. et al. Proteomics. Annual Review
  of Biochemistry 72: 783-812.
• Griffiths et al. Modern Genetic Analysis.
  Online: http://ncbi.nih.gov