Zhang - Download as PowerPoint by byrk88

VIEWS: 21 PAGES: 49

									Integrative Omics for Cancer
          Biology
                        Xiang Zhang, PhD

                        Department of Chemistry
   Center for Regulatory and Environmental Analytical Metabolomics
              University of Louisville, Louisville, KY 40292

                        xiang.zhang@louisville.edu
Systems Biology

    is a field in biology aiming at systems level understanding of
    biological processes, where a bunch of parts that are connected to
    one another and work together. It attempts to create predictive
    models of cells, organs, biochemical processes and complete
    organisms.

      •Integrative systems biology
             Extracting biological knowledge from
             the ‘omics through integration


      •Predictive systems biology
             Predicting future of biosystem using
             ‘omics knowledge, e.g. in-silico
             biosystems




 Davidov, E.; Clish, C. B.; Oresic, M.; Zhang, X; et al. Omics: A Journal of Integrative Biology. 2004, 8, 267--288.
 Clish, C. B.; Davidov, E.; Oresic, M.; Zhang, X; et al. Omics: A Journal of Integrative Biology. 2004, 8, 3--13.
Omics Space




              Differential omics is
              the beginning of
              Systems Biology


                molecule
                cell
                tissue
                organism
                …
Differential Proteomics &
Metabolomics
   1. Differential proteomics and metabolomics are qualitative and
      quantitative comparison of proteome and metabolome under
      different conditions that should unravel complex biological
      processes

   2. It can be used to study any scientific phenomena that may
      change the proteome and/or metabolome of a living system.

              Cancer Biomarker Discovery
      NIH
              Nano-medicine

              Environment
                                        preventative medicine
              Food and nutrition
Biomarker Discovery is Major
Research Field of Differential Omics
   Biomarkers are naturally occurring biomolecules useful
   for measuring the prognosis and/or progress of diseases
   and therapies.

    These substances may be normally present in small amounts
     in the blood or other tissues
    When the amounts of these substances change, they may
     indicate disease.
    Valid biomarkers should
         demonstrate drug activity sooner
         facilitate clinical trial design by defining patient populations
         optimize dosing for safety and efficacy
         be sensitive and easy to assay to speed drug development
What Types of Change Are
Expected?
     Protein                      Protein
                  degradation
    structure                   structure is
   unchanged                      changed         •Sensing structural
                                                  change is a major element
                                                  of comparative
                                                  proteomics
                                      post-
  concentration                   translational
                                  modification
                                                  •Most of metabolomics
                                                  works focus on
                                                  concentration change
                                                  only.


                  sequence
                  (mutation)
Challenges in Proteomics

  Sample complexity
        About 25K types of protein coding-genes present in        Body Fluid profiling: biomarker platform
         human. IPI human database (v3.25) has 67,250 entries,
         which could generate about 106-8 peptides                     Generic
                                                                                             High concentration
                                                                                                compounds
        More than one hundred post translational modifications      Sample prep.
                                                                                    g/ml
         (PTMs) could happen in a proteome
  Large protein concentration difference
        107-8 in human cells, and at least 1012 in human plasma
                                                                                    ng/ml
        Dynamic range of a LC-MS is about 104-6
  The top 12 high abundant proteins constitute
   approximately 95% of total protein mass of
                                                                                    pg/ml
   plasma/serum
        Albumin, IgG, Fibrinogen, Transferrin, IgA, IgM,
         Haptoglobin, alpha 2-Macroglobulin, alpha 1-Acid             Focused
         Glycoprotein, alpha 1-Antitrypsin and HDL (Apo A-I &       Sample prep.
                                                                                              Low concentration
         Apo A-II).                                                                             compounds

  Dynamic system, large subject variation
Challenges in Metabolomics

 •Metabolites have a wide range of molecular
 weights and large variations in concentration

 •The metabolome is much more dynamic
 than proteome and genome, which makes the
 metabolome more time sensitive

 •Metabolites can be either polar or nonpolar,
 as well as organic or inorganic molecules.
 This makes the chemical separation a key
 step in metabolomics

 •Metabolites have chemical structures, which    cholesta-3,5-diene
 makes the identification using MS an
 extreme challenge
Differential Omics
biomarker discovery



          Diseased              Healthy

     A      A   A     A    A    A    A    A
     B      B   B     B    B    B    B    B
     C      C   C     C    C    C    C    C
     D      D   D     D    D    D    D    D
     …      …   …     …    …    …    …    …
     Z      Z   Z     Z    Z    Z    Z    Z
     S1    S2   S3    S4   S5   S6   S7   S8
Informatics Platform


                                                                                                     data re-examination
                                                                                                                                              Protein
                                                                                                                                             Function
                                                                                                                          Molecular                        Pathway
               LIMS                                                                                                                         Interaction
                                                                                                                          networks                         modeling

                                                                                                                                            Correlation




                                                                                                                                                                         assembling
                                                                                                                                                                         Knowledge
                                                                                    Peak alignment
                                              transformation




                                                                deconvolution




                                                                                                         normalization
               Experiment




                              Experiment
 information




                                                                  Spectrum
                               execution




                                                 Raw data
   Sample




                 design




                                                                                                                                         Regulated




                                                                                                            Peak
                                                                                                                         Significance
                                                                                                                             test         peaks



                                                                                                                           Pattern         Cluster        Regulated
                                                                                                                         recognition      loadings        molecules


                                                                              Molecular                                                 Molecular
                                                                            identification                                              validation
                            Quality control                                                                                                               Unidentified
                                                                                                                                                           molecules

                                                               targeted tandem MS
Roadmap


     Systems Biology        Differential omics

          1.   Experimental design
          2.   Molecular identification
          3.   Data preprocessing
          4.   Statistical significance test
          5.   Pattern recognition
          6.   Molecular networks
MDLC Platforms
                                                                                                  Sample

   •      MudPIT, i.e. SCX followed by RP
                                                                                                      APR
          •     The proteome is split into 10-20X more
                fractions
          •     There is carry-over between fractions                                      AP                  AP
          •     LC fractions generally still are too complex
                for MS
                                                                                                            Digestion
   •      Affinity Selection
          •     Avidin selection of Cys-containing peptides
                                                                                                              SCX
          •     Cu-IMAC for His-containing peptides
          •     Ga-IMAC for phosphorylated peptides
          •     Lectins for glycosylated peptides
                                                                                                 F1     F2      F2      F2



                                                                                                         RPC-MS
 Qiu, R.; Zhang, X. and Regnier, F. E. J. Chromatogr. B. 2007, 845, 143-150.
 Wang, S.; Zhang, X.; and Regnier, F. E. J. Chromatogr. A 2002, 949, 153-162.
 Regnier, F. E.; Amini, A.; Chakraborty, A.; Geng, M.; Ji, J.; Sioma, C.; Wang, S.; and Zhang, X. LC/GC 2001, 19(2), 200-213.
 Geng, M.; Zhang, X.; Bina, M.; and Regnier, F. E. J. Chromatogr. B 2001, 752, 293-306.
In-Gel Stable Isotope Labeling
a sample gel based platform


                                                         •Avoiding gel-to-gel variability
                                                         •Only labeling K-containing peptides
                                                         •Accurate quantification


                                                                                   d)




  Asara, J. M.; Zhang, X.; Zheng, B.; Christofk, H. H.; Wu, N.; Cantley, L. C. Nature Protocols, 2006, 1, 46-51. .
  Asara, J. M.; Zhang, X.; Zheng, B.; Christofk, H. H.; Wu, N.; Cantley, L. C. J. Proteome Res., 2006, 5, 155-163.
  Ji, J.; Chakraborty, A.; Geng M.; Zhang, X.; Amini, A.; Bina, M.; and Regnier, F. E. J. Chromatogr. A 2000, 745, 197-210.
Roadmap



     Systems Biology      Differential omics

          1. Experimental design
          2. Molecular identification
                protein identification
                metabolite identification
          3. Data preprocessing
          4. Statistical significance test
          5. Pattern recognition
          6. Molecular networks
Protein Identification
database searching
  The database searching approach uses a protein database to
  find a peptide for which a theoretically predicted spectrum best
  matches experimental data.

                                                            Protein



                                                            Peptide


                                                             Mass
                                                            matched
                                                            peptide
Protein Identification
database searching


   More than 20 algorithms have been developed.

     Sequest
     Spectrum Mill                              1. About 20% of tandem ms spectra
     Mascot                                        could provide confident peptide
                                                    identification
     X! Tandem                                  2. < 50% of peptides can be
     OMSSA                                         identified by all algorithms




   Zhang, X.; Oh, C.; Riley, C. P.; Buck, C. Current Proteomics 2007, 4, 121-130.
Protein Identification
de novo sequencing


 de novo sequencing
 reconstructs the
 partial or complete
 sequence of a
 peptide directly from
 its MS/MS spectrum.


 Performance of de novo
 method is limited by low mass
 accuracy, mass equivalence,
 and completeness of
 fragmentation.


 Pevtsov, S.; Fedulova, I.; Mirzaei, H.; Buck, C.; Zhang, X. Journal of Proteome Research. 2006, 5, 3018-3028.
 Fedulova, I.; Ouyang, Z.; Buck, C.; Zhang, X. The Open Spectroscopy Journal 2007, 1, 1-8.
Incorporating Peptide Separation
Information for Protein Identification
structure of pattern classifier


       VSFLSALEEYTKK

       LSPLGEEMR                                                                           Input
                                                                                           layer
                                                                                                        Hidden
                                                                               Feature 1
                                                                                                         layer
       DYVSQFEGSALGKQLNLK                                                                                             Output
                                                                                                                       layer
                                                                               Feature 2
       DSGRDYVSQFEGSALGK                                                                                                          Flow
                                                                                                                                through


       AKPALEDLRQGLLPVLESFK
                                             Feature                           Feature 3
                                                                                                                                Partition

                                            Extraction                                                                          Elution
       DLATVYVDVLKDSGR                                                                                                     zn
                                                                                                                 wo
                                                                               Feature N                   ym
       THLAPYSDELR                                                                                 wh
                                                                                      xl

       QGLLPVLESFKVSFLSALEEYT
       K
       VQPYLDDFQKK

       QGLLPVLESFK




      Oh, C.; Zak, S. H.; Mirzaei, H.; Regnier, F. E.; Zhang, X. Bioinformatics 2007, 23, 114-118.
Training the ANNs with Generic
Algorithm
    Initial candidate solutions

                                                Crossover

            whji wokj thj tok
                                                                                             Optimal solution

                                                                                              whji wokj thj tok
                          Encoding

         Initial population




                                                 Mutation




                                                                                                            Best
                                                                                                        chromosome



                                                                            Selection


   Oh, C.; Zak, S. H.; Mirzaei, H.; Regnier, F. E.; Zhang, X. Bioinformatics 2007, 23, 114-118.
Protein Identification Using Multiple
Algorithms and Predicted Peptide
Separation in HPLC
PIUMA architecture
                                                                                        Unknown modification
                                          Unmatched spectra
                                                                                              search
                        2
  Raw LC/MS/MS
       data
                                                                                            Protein List
                                                               Mascot




                                                                                                                     Chromatography
                                                                                                                     Modeling based
                                                                                                                        Validation
                                                                          machine
                                                                          learning




                                                                                                                                      Report
    mzData or      Processed MS/               Database
                                      1                        Sequest                      Peptide List
  mzXML format        MS data                  seraching

                        3                                     X! Tandem




                 Unmatched spectra                             Lutefisk




                                                                            consensus
                                     De novo sequencing       novoHMM                       Peptide List               Color legend
                                                                                                                   existing algorithms
                                                               Peaks                                           algorithms to be developed
                                                                                                                  method descriptions




 Oh, C.; Zak, S. H.; Mirzaei, H.; Regnier, F. E. and Zhang, X. Bioinformatics, 2007, 23, 114-118.
 Zhang, X.; Oh, C.; Riley, C. P.; Buck, C. Current Proteomics 2007, 4, 121-130.
Roadmap

     Systems Biology      Differential omics

          1. Experimental design
          2. Molecular identification
          3. Data preprocessing
                Spectrum deconvolution
                Quality control
                Alignment
                Normalization
          4. Statistical significance test
          5. Pattern recognition
          6. Molecular networks
Spectrum Deconvolution
GISTool, single sample analysis


  1. To differentiate signals arising from the real analytes as opposed
     to signals arising from contaminants or instrument noise
  2. To reduce data dimensionality, which will benefit down stream
     statistical analysis.

  Functionality      •Smoothing and centralization
                     •Peak cluster detection
                     •Charge recognition
                     •De-isotope
                     •Peak identification at LC level
                     •Doublet recognition
                     •Doublet quantification
GISTool Algorithm
Deconvoluting MS spectra

                                              748.97                                            748.97



                                                +3 pep                                               +2 pep

                               100
                                     748.64
                                                749.29
                                                                           748.97
                                                                                                                         748.6354 3+
                                                   749.62                                            749.47              748.9694 2+
                               80                        749.97                                           749.97
               intensity (%)




                                                                                                                750.50
                               60                                            749.29
                                                                  748.64


                                          749            750                   749.47
                                                                                            749           750
                               40
                                                                                 749.62
                                                                                       749.97
                               20                                                           750.50                       Single sample
                                0                                                                                        analysis
                                          747            748           749            750         751
                                                                     m/z



    Zhang, X.; Hines, W.; Adamec, J.; Asara, J.; Naylor, S.; and Regnier, F. E. J. Am. Soc. Mass
         Spectrom. 2005, 16, 1181-1191.
Quality Assessment / Control
                                                                                                       0.08




    •
                                                                                                       0.06

          Biological Sample QA/C




                                                                           D value
          •      protein assay                                                                         0.04




                                                                                                       0.02




    •     Experimental Data QA/C                                                                             0


          •
                                                                                                                 1   2    3   4       5       6           7   8        9   10

                 2D K-S test                                                                                                         sample ID



          •      Percentile of detected peaks
          •      Percentile of aligned peaks
          •      Retention time variance vs.




                                                                                                             5
                 retention time



                                                                              retention time variation (%)

                                                                                                             4
          •      m/z variance vs. retention time



                                                                                                             3
          •      Frequency distribution of RT & m/z

                                                                                                             2
                 variance
                                                                                                             1
                                                                                                                     20       30                  40              50            60

                                                                                                                                   retention time (min)




 Zhang, X.; Asara, J. M.; Adamec, J.; Ouzzani, M.; and Elmagarmid, A. K.
 Bioinformatics, 2005, 21, 4054-4059.
Data Alignment

   To recognize peaks of the same molecule occurring in different
   samples from the thousands of peaks detected during the
   course of an experiment.


   1. MS to MS data alignment
         •Referenced alignment
         •Blind alignment
         •Quality depending on the information of peak detection


   2. MS to MS/MS data alignment
         •Depends on experimental design
LC-MS Data Alignment
XAlign software for proteomics & metabolomics
data

                                                                                                                                                      0.8


          •Detecting median sample




                                                                                                                   retention time difference (min)
                                                                                                                                                      0.4

                  Mj =  Ii,jMi,j /  Ii,j
                                                                                                                                                       0


                  Tj =  Ii,jTi,j /  Ii,j
                                                                                                                                                     -0.4
                           s
                  Di =  |Ti,j -µj|
                         j=1                                                                                                                         -0.8
                                                                                                                                                            10   20       30            40             50       60       70
                                                                                                                                                                                retention time (min)


          •Aligning samples to the median sample
                                                                                                                                 10000




                                                                           intensity of aligned peaks (sample 2)
                                                                                                                                                     1000




                                                                                                                                                      100
                                                                                                                                                                                                 y = 1.3636x + 16.511
                                                                                                                                                                                                      R2 = 0.9475



                                                                                                                                                       10
 Zhang, X.; Asara, J. M.; Adamec, J.; Ouzzani, M.; and Elmagarmid, A. K.                                                                                    10            100                          1000             10000
                                                                                                                                                                      intensity of aligned peaks (sample 1)
 Bioinformatics, 2005, 21, 4054-4059.
          Chromatogram of Serum Analyzed on GCGC/TOF-MS
GCxGC-MS Data Alignment
metabolite component of human serum




                                              •Four dimension
                                              •1535 peaks have
                                              been detected
GCxGC/TOF-MS Data Alignment
MSort software for metabolomics




                                                                                                   Criteria for alignment
                                                                                                   •1st dim. rt
                                                                                                   •2nd dim. rt
                                                                                                   •spec. correlation

                                                                                                   Features
                                                                                                   *peak entry merging
                                                                                                   *cont. exclusion




 Oh, C.; Huang, X.; Buck, C.; Regnier, F. E. and Zhang, X. J. Chromatogr. A. 2008, 1179, 205-215
Analysis Results of MAlign
53 standard acids

                                                                                                                                              5
                                                1000                                                                                   x 10
                                                                                                                                  10
    The number of rows in the alignment table




                                                 800                                                                               8



                                                 600                                                                               6




                                                                                                                      Peak area
                                                 400                                                                               4



                                                 200                                                                               2



                                                                                                                                   0
                                                     0
                                                         1   2   3   4   5   6   7   8   9   10 11 12 13 14 15 16                       1         2     3    4   5   6    7   8   9   10 11 12 13 14 15 16
                                                             The number of peak entries in a row of alignment table                                   The number of peak entries in a row of alignment table



                                                1.       8 [OA + FA] samples and 8 [AA + FA] samples
                                                2.       derivatization reagent: (N-Methyl-N-t-butyldimethylsilyl)-trifluoroacetamide (MTBSTFA)


  Oh, C.; Huang, X.; Buck, C.; Regnier, F. E. and Zhang, X. J. Chromatogr. A. 2008, 1179, 205-215
Normalization




                                                    8000
  To reduce concentration effect and
  experimental variance to make the




                                                    6000
  data comparable.




                                        intensity

                                                    4000
                                                    2000
  Methods

                                                    0
           Log linear model xij = ai  rj  eij
                                                           0   200   400           600   800   1000
      1.                                                                   peak index

      2.   Reference sample normalization
      3.   Auto-scaling
      4.   Constant mean / trimmed constant mean
      5.   Constant median / trimmed constant median
CV Distribution of Peak Intensities
human serum sample



                         Before Normalization                                                     Intensity Variation


                                                                                100
             250




                                                                                80

                                                  20.7%
                                                              rel peak no (%)
 Frequency




                                                                                60
             150




                                                                                40




                                                                                                                                    Log linear model:
                                                                                20
             50




                                                                                                                                    xij = ai  rj  eij
             0




                                                                                0




                   0.0    0.2    0.4        0.6   0.8   1.0                           0.0    0.2       0.4        0.6   0.8   1.0

                                       CV                                                                    CV                     log(xij) = log(ai) + log(rj) + log(eij)
                           After Normalization                                                    Intensity Variation
                                                                                100
             250




                                                                                80
                                                              rel peak no (%)
 Frequency




                                                                                60
             150




                                                  17.3%
                                                                                40
             50




                                                                                20
             0




                         0.2    0.4     0.6       0.8   1.0                                 0.2      0.4          0.6   0.8   1.0

                                       CV                                                                    CV
Roadmap


     Systems Biology       Differential omics

          1.   Experimental design
          2.   Molecular identification
          3.   Data preprocessing
          4.   Statistical significance test
          5.   Pattern recognition
          6.   Molecular networks
Statistical Significance Tests

   To find individual peaks for which there are significant
   differences between groups.

   Methods
   1. Pair-wise t-test (diff. mean?)
   2. Mann-Whitney U test (diff. median?)
   3. Kolmogorov-Smirnov test (diff. population?)
   4. Kruskal-Wallis analysis of variance
Statistical Significance Tests
metabolome of great blue heron fertilized eggs
contaminated by PCBs
             8




                                                                        PCBs: polychlorinated biphenyls
             6
  p (-log)

             4




                      down-regulated                 up-regulated       fold change = I_c / I_n
                                                                        blue line: p=0.05
             2




                                                                        dashed line: fold change = 0
             0




                 -3       -2     -1         0        1      2       3

                                 fold change (log)
Roadmap

     Systems Biology        Differential omics

          1.   Experimental design
          2.   Molecular identification
          3.   Data preprocessing
          4.   Statistical significance test
          5.   Pattern recognition
          6.   Molecular networks
Clustering or Classification


    Resulting pattern recognition provides the first glimpse of
    improvement in understanding the underlying biology.


     Unsupervised Methods
            Principle component analysis (PCA)
            Linear Discriminant Analysis (LDA)
            Clustering objects on subsets of attributes (COSA)
     Supervised Methods
            Support vector machine (SVM)
            Artificial neural network (ANN)
Cross Species Comparison




  27 of the 28 control humans and all 8 control rats cluster to one group
 11 of the 14 diseased human and all diseased rats cluster to second group
Differential Metabolomics of
Human Blood
breast cancer samples vs. control samples
Differential Metabolomics of
Human Blood
breast cancer samples vs. control samples
Roadmap

     Systems Biology        Differential omics
          1.   Experimental design
          2.   Protein identification
          3.   Data preprocessing
          4.   Statistical significance test
          5.   Pattern recognition
          6.   Molecular networks
                   correlation network
                  interaction network
                   regulation network
                  pathway analysis
Molecular Correlation Analysis
pair wised correlation of proteins and metabolites



          Diseased                        Healthy

     A     A    A    A               A    A    A     A
     B     B    B    B               B    B    B     B
     C     C    C    C               C    C    C     C
     D     D    D    D               D    D    D     D
     …     …    …    …               …    …    …     …
     Z     Z    Z    Z               Z    Z    Z     Z
     S1    S2   S3   S4              S5   S6   S7    S8
Molecular Correlation Network
an example of drug effect on disease state                                                                                                                                                                                                        ApoE_1
                                                                                                                                                                                                                                                           L-5b
                                                                                                                                                                                                                                                                       2
                                                                                                                                                                                                                                                               SerPI_II_
                                                                                                                                                                                                                                               L-5a
                                                                                                                                                                                                                                                                    L-11b
                                                                                                                                                                                                                                            L-11a                       L-18a
                                                                                                                                                                                                                                         L-18b
                                                                                                                                                                                                                                                                        L-28b
                                                                                                                                                                                                                                          L-26a                         L-21b
                                                                                                                                                                                                                                            L-12a
                                                                                                                                                                                                                                                                    L-21a
                                                                                                                                                                                                                                                                 L-9b
                                                                                                                                                                                                                                                    C18:1 L-9a
                                                                                                                                                                                                                                                    LPC
                                                                                                                                                                      Emb
                                                                                                                                                                            ALP L-24b L-24a L-19b L-15b
                                                                                                                                                               GP-1a Tiss                              L-15a
                                                                                                                                                                                               L-27b        L-14b
                                                                                                                                                            L-6b                                               L-14a
                                                                                                                                                         L-7a



    •Reveal important relationships
                                                                                                                                                                                                                 L-1b
                                                                                                                                                      L-7b                                                             L-26b
                                                                                                                                                   L-8a
                                                                                                                                               L-12b                                                                     L-20a
                                                                                                                                                                                                                            L-27a
                                                                                                                                           L-6a
                                                                                                                                                                                                                               L-13b
                                                                                                                                       GP-1b
                                                                                                                                                                                                                                L-1a


    among the various                                                                                                          ApoA1_6
                                                                                                                                 Unkn1
                                                                                                                             C52:2 TG
                                                                                                                                       L-20b
                                                                                                                                                                                                                                    L-16a
                                                                                                                                                                                                                                       L-13a

                                                                                                                                                                                                                                        L-10b

                                                                                                                                                                                                                                        L-10a



    components
                                                                 C33:1 PC           C32:0 PC                  M
                                                                                                       C24:1 SP
                                                                                                                             a-glucose                                                                                                  L-8b
                                                                                    C30:0 PC                  M
                                                                                                       C24:0 SP
                                                                            AMBP                                             C52:1 TG
                                                                FBGB                                                                                                                                                                     L-22a
                                                                                              ALB
                                                                 phenylalanine     L-17b      TP                             C50:4 TG                                                                                                   L-23b

                                                                 alanine           L-17a                                     FetuinA_2                                                                                                  L-28a

                                                                C18:2 LPC        A1MG_5                                       C52:5 TG                                                                                                 L-23a

                                                                 C54:5 TG        C54:5 TG                                            leucine                                                                                        L-22b
                                                                                                                         A1MG_2         valine                                                                                   L-19a




    •Complimentary to abundance
                                                                                                                     K           C34:2 PC        formate                                                                       L-16b
                                                                                                          C52:3 TG                     C32:1 PC
                                                                                                                                                                                                                        C20:4 CE
                                                                                                                                             A1I3_3         ApoA1_5
                                                                                                          C52:4 TG                    C34:1 PC                                                                       TRIG
                                                                                                                                                             C54:3 TG                                              leucine
                                                                                                                                     C36:2 PC                C54:2 TG                                           NEF A
                                                                                                              isoleucine
                                                                                                                           BUN                                                                                GLYC
                                                                                                                                                             HDL


    level information                                                                                            glutamine
                                                                                                           glutamine
                                                                                                                     C58:5 TG
                                                                                                                                         C36:1 PC

                                                                                                                                               GLUC
                                                                                                                                                             C54:1 TG
                                                                                                                                                                 C54:6 TG   C52:6 TG
                                                                                                                                                          C22:5 CE C58:3 TG C56:3 TG
                                                                                                                                                       lactate
                                                                                                                                                               ITIH3_1
                                                                                                                                                                                        C56:2 TG
                                                                                                                                                                                                         C60:3 TG
                                                                                                                                                                                                     C58:2 TG
                                                                                                                                                                                         C58:4 TG C60:4 TG



                                                                                                      glutamine                                                    Afamin_2
                                                                                                                                                                     valine
                                                                                                 tyrosine
                                                                                                                                                                        glutamine
                                                                                            C16:1 CE                                                                      valine




    •Provides information about
                                                                                      C16:1 LPC                                                                             valine
                                                                                           tyrosine
                                                                                                                                                                             alanine
                                                                                      creatine                                                                                     e
                                                                                                                                                                              lactat
                                                                                       tyrosine                                                                               lactate
                                                                                      acetate                                                                                 lactate                                  Lipids (LCMS)
    the biochemical processes                                                          tyrosine

                                                                                      creatine
                                                                                          tyrosine
                                                                                                                                                                              alanine
                                                                                                                                                                             C38:4 PC                                  NMR (DE)
                                                                                                                                                                                                                           diffusion
                                                                                                                                                                                                                       NMR (CPMG)
                                                                                                                                                         C46:1 TG
                                                                                       isoleucine


    underlying the disease or drug
                                                                                                                                                        C48:1 TG
                                                                                              lactate
                                                                                                      e
                                                                                                 lactat
                                                                                                                                                      C18:0 CE
                                                                                                                                                    C16:0 CE                                                           Peptides
                                                                                                                                                                                                                       Proteins
                                                                                           phenylalanine                                          C18:1 CE
                                                                                                    C20:5 CE       C36:4 PC
                                                                                                                         C20:3 CE
                                                                                                                                               C20:2 CE
                                                                                                                                           C18:2 CE
                                                                                                                                                                                                                       Clinical

    response
                                                                                                           gen
                                                                                                    Plasmino                            a-glucose
                                                                                                           C19:0 LPC               C22:6 CE
                                                                                                                     C56:4 TG C18:3 CE

                                                                                                                                                                                                                        = positive correlation
                                                                                                                                                                                                                        = negative correlation


                                                                                                                     leucine
                                                                                       phenylalanine                                                                                                                   = higher in treated g roup
                                                                                                                                                  ApoA1_3
                                                                                                                         A1I3_4 TT_2
                                                                                     phenylalanine                                                 C20:4 LPC
                                                                                                                          phenylalanine                                                                                = lower i n treated g roup
                                                                                                                                                  C18:0 LPC
                                                                                                                                          Hemopex_1
                                                                                             C18:1 LPC                    ApoA1_7
                                                                                                                         b-glucose                     LD
                                                                                                  PlasPre_2      b-glucose      TT_1
                                                                                                                                                 A2GC
                                                                                                                                         FG




  Clish, C. B.; Davidov, E.; Oresic, M.; Plasterer, T.; Lavine, G.; Londo, T. R.; Meys, M.; Snell, P.; Stochaj, W.; Adourian, A.;
  Zhang, X.; Morel, N.; Neumann, E.; Verheij, E.; Vogels, J, T.W.E.; Havekes, L. M.; Afeyan, N.; Regnier, F. E.; Greef, J.;
  Naylor, S. Omics: A Journal of Integrative Biology 2004, 8, 3--13.
SysNet: Interactive Visual Data
Mining of Molecular Correlation
Network                                                                                                 a)
 An interactive integration and
 visualization environment for
 molecular correlation of ‘omics data.
       •Integrating molecular expression
       information generated in different ‘omics

       •Visualizing molecular correlation in
       interactive mode
                                                                                                        b)
       •Enabling time course data visualization and
       analysis

       •Automatically organizing molecules based
       on their expression pattern in time course.

 Zhang, M.; Ouyang, Q.; Stephenson, A.; Salt, D.; Kane, D. M.; Burgner J.; Buck, C. and Zhang, X. BMC
 Systems Biology. Accepted by BMC Systems Biology.
Biomarker Verification


          Wet-lab verification
             AQUA
             MRM
             Antibody

          In-silico verification
             tracing lineage
             pathway analysis
Automated Lineage Tracing

 •Interested in identifying the
 connections between input and
 output data for a program




                                                               Analysis Software
 •Tracing of fine-grained lineage




                                                                                   Lineage Tracing
 through run-time analysis

 •Developed based on dynamic slicing
 techniques used in debugging

 •Applicable to any arbitrary
 function

 Zhang, M.; Zhang, X.; Zhang, X. and Prabhakar, S. 33rd International
 Conference on Very Large Data Bases (VLDB 2007), 2007.
Summary

  • Informatics platform developed in my group can be used to analyze
    protein and metabolite profiling data to differentiate disease and
    normal samples for biomarker discovery
  • Groups identified using clustering analysis reflected the phenotypic
    categories of cancer and control samples, the animal and human
    subjects, etc. with high degree of accuracy
  • The application of SysNet using an interactive visual data mining
    approach integrates omics data into a single environment, which
    enables biologists performing data mining
  • Lineage tracing technology is an efficient and effective approach for
    in-silico biomarker verification. This technique will significantly
    reduce the false discovery rate (FDR) of biomarker discovery
Acknowledgements



    Irina Fedulova      Dr. John Burger       Dr. David Clemmer
    Dr. Hamid Mirzaei   Dr. Michael D. Kane   Dr. John Asara
    Dr. Cheolhwan Oh    Dr. Fred E. Regnier   Dr. Mu Wang
    Sergey E. Pevtsov   Dr. David Salt        Dr. Jake Chen
    Ouyang Qi           Dr. Mohammad Sulma    Dr. Steve Valentine
    Alan Stephenson     Dr. Daniel Raftery    Dr. Steve Naylor
    Mingwu Zhang        Dr. Sunil Prabhakar
Postdoc Positions
 Posting Title:   Industrial Postdoctoral Fellow - Bioinformatician
 Work Location:   University of Louisville, KY
 Job Type:        Full time
 Starting Date:   Position immediately available

 Job Description: Predictive Physiology and Medicine (PPM) Inc. is an exciting
 health and life sciences company based in Bloomington, Indiana focused on
 developing analytical systems for the individualized health and wellness industry.
 We have an immediate opening for a postdoctoral fellow. The successful
 candidate will develop bioinformatics systems for mass spectrometry based
 quantitative proteomics and metabolomics.
 Requirements: The position requires a bioinformatician with strong
 computational background. Priority will be given to the candidate with a PhD in
 bioinformatics, computer science, statistics, engineer, or computational physics.
 The successful candidate should have strong understanding of statistics and
 pattern recognition. Programming skills using Matlab, Microsoft .NET, or Java to
 accomplish analyses is required. Experience in analyzing biological data is not
 required; however, interest in multidisciplinary research is a must.

								
To top