Proteomics Data analysis by malj

VIEWS: 2 PAGES: 19

									Proteomics Data analysis


       Sanjoy Dey
iTRAQ labeling
Isobaric labeling
                 Case Study1
• Goal:

  – biosignature of alveolar epithelial cell repair and
    migration.

  – This can shed light on lung injury.

  – Molecular networks and pathways to physiological
    states.

  – identify molecular targets for drug design
                        Case Study1
• Experimental design:

   – isolated alveolar epithelial cells comparing lung injury and recovery.

   – hyperoxia exposure in first three time step.

   – the recovery in room air stage.

   – iTRAQ labeling with respect to a control without any hyperoxia
     exposure.

   – Two sets of protein: Soluble and membrane.

   – Three duplicate sets for each of the runs.
                                   Case Study1
                                           Run1                         Run2   Run3
                                                                 7Day
           Name    24 Hr   48 Hr   60 Hr    12 R   36R    96 R    R

           Msn     1.24    1.48    1.08     0.94   1.58   1.26   1.47


           Alcam   1.23    1.61    1.19     1.05   1.21   1.25   1.11
Proteins




           Cd9     1.26    1.33    1.29     1.10   1.56   1.16   1.24

           Actg1   0.94    0.89    1.07     1.26   0.98   0.91   0.82

           Actg2   0.98    1.26    1.13     1.10   1.34   1.02   1.23

           Actg3   1.04    0.91    1.07     0.98   1.40   0.80   0.77




     - 740 proteins. with 293 in the soluble and 609 in the insoluble fraction.
     -passed false discovery rate of 1% with a fold-change threshold (>1.2)
     and an associated P-value of at least 0.05 by ProteinPilot3.
     -74 proteins for soluble and 294 for insoluble.
       Data quality evaluation




• Computed the hamming distance/similarity of the
  72 soluble proteins.
• Binarized the data with fold change>1
                   Data quality evaluation
            Run1 &   Run1 &    Run2 &
            Run2     Run3      Run3

1           0.76     0.57      0.62

2           0.70     0.80      0.70

3           0.71     0.70      0.83

4           0.80     0.49      0.71

5           0.77     0.81      0.97

6           0.62     0.60      0.22

7           0.02     0.59      0.46




    • Data agrees between the runs qualitatively but not quantitatively.

    • May be different environmental affects lead to different experimental
      bias.
     Some groups of biomarkers


           Pan
           el                                                                  Panel
           A                                                                   B




Figure: A cluster of 25 proteins included ten proteins with greater than 1.5 fold change at
36 hours of recovery (Panel A). These proteins participate in different cellular processes
(Panel B).
 Case study2: Enhancing Prostate
        Cancer Diagnosis
• Current Approach:     Our Approach: finding
  The current diagnosis  proteomic biomarkers for
  technique using        prostate cancer which can
  Prostate Specific      be used for improving the
  Antigen (PSA) has      specificity of current
  several limitations:   diagnosis technique.
  – Lack of specificity        –  Can take advantage of field
                                 affect of tumor
  – Cancer missed in
                               – Can detect malignancy
    biopsy negative              associated changes in normal
    patients                     cell
  – Inability to distinguish   – Can discover novel proteins that
    aggressive and latent        are altered by cancer and/or
    prostate cancer              related to Gleason grade
• Frozen tissue blocks of 7
prostates was collected.

• Identified four tissue areas of
interest:
 1.Cancer(Ca)
 2.Benign close to cancer.(BN)
 3.Benign distant to cancer.(BD)
 4.Benign prostatic hyperplasia.
           (BPH)
               Mass Spec analysis
•   We performed 2D Liquid
    Chromatography.
                                                           iTRAQ La




                               Run
•   8-plex iTRAQ labeling
    scheme.                             113    114    115    116    1
                                       13-    8-               13-
•   Pepetide identification          1 CaP    CaP    8-BN 8-BD BN
    through mass-
    spectroscopy(MS/MS)                16-    16-    16-    16- 17-
                                     2 CaP    BN     BD     BPH BD
•   Protein Pilot 3 software           34-    34-    34-    34- 36-
    was used for protein             3 CaP    BN     BD     BPH BD
    quantification with p-
    value< 0.05 (FDR                   10-    10-    10- 10-       33-
    corrected.)                      4 CaP    BD     BPH BN        CaP
                                 Methods
Challenges:                              Method:
                                           •   Biomarker discovery was
                                               conducted for each pair of four
−   Somewhat different proteins are
    identified in different iTRAQ runs         regions

−   Contains relative abundances           •   Data was normalized to reduce the
    rather than absolute values                sample variation

−   Many pairs of comparison, e.g.,        •   Both parametric(one-sided t-test)
    BPH vs Ca, BPH vs. BN, etc.                and non-parametric(sign-rank)
                                               hypothesis tests
−   Extremely low sample size (n<<p)
                                           •   Correction was made for multiple
−   Missing values for most of the             hypothesis corrections using FDR
    proteins                                   with p-value<0.05

                                           •   Missing value imputations using K-
                                               nearest neighbor impute algorithm
           Results
Ca   BN


BD   BPH




               Cardiovascular System Development and
               Function, Organism Development, Tissue
                             Morphology
 Summary on iTRAQ data analysis
• Data quality is not great.

• Variability among different runs.

• Some proteins are inherently abundant. (Wang et al. Proteomics 09)

• Extremely low sample size.

• Statistical power is low.

• Finding interaction between proteins is hard.

• Contains only relative abundance rather than absolute abundance.

• Prior knowledge about the pathway from other sources can be
  incorporated.
Acknowledgement•                 References
                         Oberg, Ann et al. Statistical design of
                         quantitative mass spectrometry-based
• Michael Wilson         proteomic experiments. J Proteomic Research
                         2009.
• Chris wendt        •   Roy, P. et al. Protein mass spectra data
• Pratik D. Jagtap       analysis for clinical biomarker discovery: a
                         global review. Briefings in Bioinformatics 2010.
• LeeAnn Higgins     •   de Jong, E.P et al. Quantitative proteomics
                         reveals myosin and actin as promising saliva
• Lorraine               biomarkers for distinguishing pre-malignant
  Anderson               and malignant oral lesions. Plos ONE 2010.
                     •   Hill EG et al.. A statistical model for iTRAQ
• Maneesh                data analysis. J Proteome Res 2008.
  Bhargava           •   Liu J, et al. (2008). Bayesian mass spectra
                         peak alignment from Mass charge ratios.
• Trisha L. Becker       Cancer Informatics.
                     •   Machine learning methods for predictive
• Gaurav Pandey          proteomics. A Barla et al. Briefings in
                         Bioinformatics 2008.
                     •   Generally detected proteins in comparative
                         proteomics - A matter of cellular stress
                         response? Wang et al. Proteomics 2008.
Questions/Comments




     Thanks!

								
To top