Embed
Email

Zhang

Document Sample
Zhang
Shared by: HC111111154326
Categories
Tags
Stats
views:
4
posted:
11/11/2011
language:
English
pages:
49
Integrative Omics for Cancer

Biology

Xiang Zhang, PhD



Department of Chemistry

Center for Regulatory and Environmental Analytical Metabolomics

University of Louisville, Louisville, KY 40292



xiang.zhang@louisville.edu

Systems Biology



is a field in biology aiming at systems level understanding of

biological processes, where a bunch of parts that are connected to

one another and work together. It attempts to create predictive

models of cells, organs, biochemical processes and complete

organisms.



•Integrative systems biology

Extracting biological knowledge from

the ‘omics through integration





•Predictive systems biology

Predicting future of biosystem using

‘omics knowledge, e.g. in-silico

biosystems









Davidov, E.; Clish, C. B.; Oresic, M.; Zhang, X; et al. Omics: A Journal of Integrative Biology. 2004, 8, 267--288.

Clish, C. B.; Davidov, E.; Oresic, M.; Zhang, X; et al. Omics: A Journal of Integrative Biology. 2004, 8, 3--13.

Omics Space









Differential omics is

the beginning of

Systems Biology





molecule

cell

tissue

organism



Differential Proteomics &

Metabolomics

1. Differential proteomics and metabolomics are qualitative and

quantitative comparison of proteome and metabolome under

different conditions that should unravel complex biological

processes



2. It can be used to study any scientific phenomena that may

change the proteome and/or metabolome of a living system.



 Cancer Biomarker Discovery

NIH

 Nano-medicine



 Environment

preventative medicine

 Food and nutrition

Biomarker Discovery is Major

Research Field of Differential Omics

Biomarkers are naturally occurring biomolecules useful

for measuring the prognosis and/or progress of diseases

and therapies.



 These substances may be normally present in small amounts

in the blood or other tissues

 When the amounts of these substances change, they may

indicate disease.

 Valid biomarkers should

 demonstrate drug activity sooner

 facilitate clinical trial design by defining patient populations

 optimize dosing for safety and efficacy

 be sensitive and easy to assay to speed drug development

What Types of Change Are

Expected?





•Sensing structural

change is a major element

of comparative

proteomics



•Most of metabolomics

works focus on

concentration change

only.

Challenges in Proteomics



 Sample complexity

 About 25K types of protein coding-genes present in

human. IPI human database (v3.25) has 67,250 entries,

which could generate about 106-8 peptides

 More than one hundred post translational modifications

(PTMs) could happen in a proteome

 Large protein concentration difference

 107-8 in human cells, and at least 1012 in human plasma

 Dynamic range of a LC-MS is about 104-6

 The top 12 high abundant proteins constitute

approximately 95% of total protein mass of

plasma/serum

 Albumin, IgG, Fibrinogen, Transferrin, IgA, IgM,

Haptoglobin, alpha 2-Macroglobulin, alpha 1-Acid

Glycoprotein, alpha 1-Antitrypsin and HDL (Apo A-I &

Apo A-II).

 Dynamic system, large subject variation

Challenges in Metabolomics



•Metabolites have a wide range of molecular

weights and large variations in concentration



•The metabolome is much more dynamic

than proteome and genome, which makes the

metabolome more time sensitive



•Metabolites can be either polar or nonpolar,

as well as organic or inorganic molecules.

This makes the chemical separation a key

step in metabolomics



•Metabolites have chemical structures, which cholesta-3,5-diene

makes the identification using MS an

extreme challenge

Differential Omics

biomarker discovery







Diseased Healthy



A A A A A A A A

B B B B B B B B

C C C C C C C C

D D D D D D D D

… … … … … … … …

Z Z Z Z Z Z Z Z

S1 S2 S3 S4 S5 S6 S7 S8

Informatics Platform

Roadmap





Systems Biology Differential omics



1. Experimental design

2. Molecular identification

3. Data preprocessing

4. Statistical significance test

5. Pattern recognition

6. Molecular networks

MDLC Platforms

Sample



• MudPIT, i.e. SCX followed by RP

APR

• The proteome is split into 10-20X more

fractions

• There is carry-over between fractions AP AP

• LC fractions generally still are too complex

for MS

Digestion

• Affinity Selection

• Avidin selection of Cys-containing peptides

SCX

• Cu-IMAC for His-containing peptides

• Ga-IMAC for phosphorylated peptides

• Lectins for glycosylated peptides

F1 F2 F2 F2







RPC-MS

Qiu, R.; Zhang, X. and Regnier, F. E. J. Chromatogr. B. 2007, 845, 143-150.

Wang, S.; Zhang, X.; and Regnier, F. E. J. Chromatogr. A 2002, 949, 153-162.

Regnier, F. E.; Amini, A.; Chakraborty, A.; Geng, M.; Ji, J.; Sioma, C.; Wang, S.; and Zhang, X. LC/GC 2001, 19(2), 200-213.

Geng, M.; Zhang, X.; Bina, M.; and Regnier, F. E. J. Chromatogr. B 2001, 752, 293-306.

In-Gel Stable Isotope Labeling

a sample gel based platform





•Avoiding gel-to-gel variability

•Only labeling K-containing peptides

•Accurate quantification





d)









Asara, J. M.; Zhang, X.; Zheng, B.; Christofk, H. H.; Wu, N.; Cantley, L. C. Nature Protocols, 2006, 1, 46-51. .

Asara, J. M.; Zhang, X.; Zheng, B.; Christofk, H. H.; Wu, N.; Cantley, L. C. J. Proteome Res., 2006, 5, 155-163.

Ji, J.; Chakraborty, A.; Geng M.; Zhang, X.; Amini, A.; Bina, M.; and Regnier, F. E. J. Chromatogr. A 2000, 745, 197-210.

Roadmap







Systems Biology Differential omics



1. Experimental design

2. Molecular identification

protein identification

metabolite identification

3. Data preprocessing

4. Statistical significance test

5. Pattern recognition

6. Molecular networks

Protein Identification

database searching

The database searching approach uses a protein database to

find a peptide for which a theoretically predicted spectrum best

matches experimental data.



Protein







Peptide





Mass

matched

peptide

Protein Identification

database searching





More than 20 algorithms have been developed.



 Sequest

 Spectrum Mill 1. About 20% of tandem ms spectra

 Mascot could provide confident peptide

identification

 X! Tandem 2. < 50% of peptides can be

 OMSSA identified by all algorithms









Zhang, X.; Oh, C.; Riley, C. P.; Buck, C. Current Proteomics 2007, 4, 121-130.

Protein Identification

de novo sequencing





de novo sequencing

reconstructs the

partial or complete

sequence of a

peptide directly from

its MS/MS spectrum.





Performance of de novo

method is limited by low mass

accuracy, mass equivalence,

and completeness of

fragmentation.





Pevtsov, S.; Fedulova, I.; Mirzaei, H.; Buck, C.; Zhang, X. Journal of Proteome Research. 2006, 5, 3018-3028.

Fedulova, I.; Ouyang, Z.; Buck, C.; Zhang, X. The Open Spectroscopy Journal 2007, 1, 1-8.

Incorporating Peptide Separation

Information for Protein Identification

structure of pattern classifier









Oh, C.; Zak, S. H.; Mirzaei, H.; Regnier, F. E.; Zhang, X. Bioinformatics 2007, 23, 114-118.

Training the ANNs with Generic

Algorithm









Oh, C.; Zak, S. H.; Mirzaei, H.; Regnier, F. E.; Zhang, X. Bioinformatics 2007, 23, 114-118.

Protein Identification Using Multiple

Algorithms and Predicted Peptide

Separation in HPLC

PIUMA architecture









Oh, C.; Zak, S. H.; Mirzaei, H.; Regnier, F. E. and Zhang, X. Bioinformatics, 2007, 23, 114-118.

Zhang, X.; Oh, C.; Riley, C. P.; Buck, C. Current Proteomics 2007, 4, 121-130.

Roadmap



Systems Biology Differential omics



1. Experimental design

2. Molecular identification

3. Data preprocessing

Spectrum deconvolution

Quality control

Alignment

Normalization

4. Statistical significance test

5. Pattern recognition

6. Molecular networks

Spectrum Deconvolution

GISTool, single sample analysis





1. To differentiate signals arising from the real analytes as opposed

to signals arising from contaminants or instrument noise

2. To reduce data dimensionality, which will benefit down stream

statistical analysis.



Functionality •Smoothing and centralization

•Peak cluster detection

•Charge recognition

•De-isotope

•Peak identification at LC level

•Doublet recognition

•Doublet quantification

GISTool Algorithm

Deconvoluting MS spectra









748.6354 3+

748.9694 2+









Single sample

analysis







Zhang, X.; Hines, W.; Adamec, J.; Asara, J.; Naylor, S.; and Regnier, F. E. J. Am. Soc. Mass

Spectrom. 2005, 16, 1181-1191.

Quality Assessment / Control



• Biological Sample QA/C

• protein assay



• Experimental Data QA/C

• 2D K-S test

• Percentile of detected peaks

• Percentile of aligned peaks

• Retention time variance vs.

retention time

• m/z variance vs. retention time

• Frequency distribution of RT & m/z

variance







Zhang, X.; Asara, J. M.; Adamec, J.; Ouzzani, M.; and Elmagarmid, A. K.

Bioinformatics, 2005, 21, 4054-4059.

Data Alignment



To recognize peaks of the same molecule occurring in different

samples from the thousands of peaks detected during the

course of an experiment.





1. MS to MS data alignment

•Referenced alignment

•Blind alignment

•Quality depending on the information of peak detection





2. MS to MS/MS data alignment

•Depends on experimental design

LC-MS Data Alignment

XAlign software for proteomics & metabolomics

data



•Detecting median sample

Mj =  Ii,jMi,j /  Ii,j



Tj =  Ii,jTi,j /  Ii,j

s

Di =  |Ti,j -µj|

j=1



•Aligning samples to the median sample









Zhang, X.; Asara, J. M.; Adamec, J.; Ouzzani, M.; and Elmagarmid, A. K.

Bioinformatics, 2005, 21, 4054-4059.

Chromatogram of Serum Analyzed on GCGC/TOF-MS

GCxGC-MS Data Alignment

metabolite component of human serum









•Four dimension

•1535 peaks have

been detected

GCxGC/TOF-MS Data Alignment

MSort software for metabolomics









Criteria for alignment

•1st dim. rt

•2nd dim. rt

•spec. correlation



Features

*peak entry merging

*cont. exclusion









Oh, C.; Huang, X.; Buck, C.; Regnier, F. E. and Zhang, X. J. Chromatogr. A. 2008, 1179, 205-215

Analysis Results of MAlign

53 standard acids









1. 8 [OA + FA] samples and 8 [AA + FA] samples

2. derivatization reagent: (N-Methyl-N-t-butyldimethylsilyl)-trifluoroacetamide (MTBSTFA)





Oh, C.; Huang, X.; Buck, C.; Regnier, F. E. and Zhang, X. J. Chromatogr. A. 2008, 1179, 205-215

Normalization





To reduce concentration effect and

experimental variance to make the

data comparable.









Methods

1. Log linear model xij = ai  rj  eij

2. Reference sample normalization

3. Auto-scaling

4. Constant mean / trimmed constant mean

5. Constant median / trimmed constant median

CV Distribution of Peak Intensities

human serum sample









20.7%



Log linear model:

xij = ai  rj  eij

log(xij) = log(ai) + log(rj) + log(eij)







17.3%

Roadmap





Systems Biology Differential omics



1. Experimental design

2. Molecular identification

3. Data preprocessing

4. Statistical significance test

5. Pattern recognition

6. Molecular networks

Statistical Significance Tests



To find individual peaks for which there are significant

differences between groups.



Methods

1. Pair-wise t-test (diff. mean?)

2. Mann-Whitney U test (diff. median?)

3. Kolmogorov-Smirnov test (diff. population?)

4. Kruskal-Wallis analysis of variance

Statistical Significance Tests

metabolome of great blue heron fertilized eggs

contaminated by PCBs







PCBs: polychlorinated biphenyls









down-regulated up-regulated fold change = I_c / I_n

blue line: p=0.05

dashed line: fold change = 0

Roadmap



Systems Biology Differential omics



1. Experimental design

2. Molecular identification

3. Data preprocessing

4. Statistical significance test

5. Pattern recognition

6. Molecular networks

Clustering or Classification





Resulting pattern recognition provides the first glimpse of

improvement in understanding the underlying biology.





Unsupervised Methods

Principle component analysis (PCA)

Linear Discriminant Analysis (LDA)

Clustering objects on subsets of attributes (COSA)

Supervised Methods

Support vector machine (SVM)

Artificial neural network (ANN)

Cross Species Comparison









27 of the 28 control humans and all 8 control rats cluster to one group

11 of the 14 diseased human and all diseased rats cluster to second group

Differential Metabolomics of

Human Blood

breast cancer samples vs. control samples

Differential Metabolomics of

Human Blood

breast cancer samples vs. control samples

Roadmap



Systems Biology Differential omics

1. Experimental design

2. Protein identification

3. Data preprocessing

4. Statistical significance test

5. Pattern recognition

6. Molecular networks

correlation network

interaction network

regulation network

pathway analysis

Molecular Correlation Analysis

pair wised correlation of proteins and metabolites







Diseased Healthy



A A A A A A A A

B B B B B B B B

C C C C C C C C

D D D D D D D D

… … … … … … … …

Z Z Z Z Z Z Z Z

S1 S2 S3 S4 S5 S6 S7 S8

Molecular Correlation Network

an example of drug effect on disease state





•Reveal important relationships

among the various

components



•Complimentary to abundance

level information



•Provides information about

the biochemical processes

underlying the disease or drug

response









Clish, C. B.; Davidov, E.; Oresic, M.; Plasterer, T.; Lavine, G.; Londo, T. R.; Meys, M.; Snell, P.; Stochaj, W.; Adourian, A.;

Zhang, X.; Morel, N.; Neumann, E.; Verheij, E.; Vogels, J, T.W.E.; Havekes, L. M.; Afeyan, N.; Regnier, F. E.; Greef, J.;

Naylor, S. Omics: A Journal of Integrative Biology 2004, 8, 3--13.

SysNet: Interactive Visual Data

Mining of Molecular Correlation

Network

An interactive integration and

visualization environment for

molecular correlation of ‘omics data.

•Integrating molecular expression

information generated in different ‘omics



•Visualizing molecular correlation in

interactive mode



•Enabling time course data visualization and

analysis



•Automatically organizing molecules based

on their expression pattern in time course.



Zhang, M.; Ouyang, Q.; Stephenson, A.; Salt, D.; Kane, D. M.; Burgner J.; Buck, C. and Zhang, X. BMC

Systems Biology. Accepted by BMC Systems Biology.

Biomarker Verification





 Wet-lab verification

 AQUA

 MRM

 Antibody



 In-silico verification

 tracing lineage

 pathway analysis

Automated Lineage Tracing



•Interested in identifying the

connections between input and

output data for a program









Analysis Software

•Tracing of fine-grained lineage









Lineage Tracing

through run-time analysis



•Developed based on dynamic slicing

techniques used in debugging



•Applicable to any arbitrary

function



Zhang, M.; Zhang, X.; Zhang, X. and Prabhakar, S. 33rd International

Conference on Very Large Data Bases (VLDB 2007), 2007.

Summary



• Informatics platform developed in my group can be used to analyze

protein and metabolite profiling data to differentiate disease and

normal samples for biomarker discovery

• Groups identified using clustering analysis reflected the phenotypic

categories of cancer and control samples, the animal and human

subjects, etc. with high degree of accuracy

• The application of SysNet using an interactive visual data mining

approach integrates omics data into a single environment, which

enables biologists performing data mining

• Lineage tracing technology is an efficient and effective approach for

in-silico biomarker verification. This technique will significantly

reduce the false discovery rate (FDR) of biomarker discovery

Acknowledgements







Irina Fedulova Dr. John Burger Dr. David Clemmer

Dr. Hamid Mirzaei Dr. Michael D. Kane Dr. John Asara

Dr. Cheolhwan Oh Dr. Fred E. Regnier Dr. Mu Wang

Sergey E. Pevtsov Dr. David Salt Dr. Jake Chen

Ouyang Qi Dr. Mohammad Sulma Dr. Steve Valentine

Alan Stephenson Dr. Daniel Raftery Dr. Steve Naylor

Mingwu Zhang Dr. Sunil Prabhakar

Postdoc Positions

Posting Title: Industrial Postdoctoral Fellow - Bioinformatician

Work Location: University of Louisville, KY

Job Type: Full time

Starting Date: Position immediately available



Job Description: Predictive Physiology and Medicine (PPM) Inc. is an exciting

health and life sciences company based in Bloomington, Indiana focused on

developing analytical systems for the individualized health and wellness industry.

We have an immediate opening for a postdoctoral fellow. The successful

candidate will develop bioinformatics systems for mass spectrometry based

quantitative proteomics and metabolomics.

Requirements: The position requires a bioinformatician with strong

computational background. Priority will be given to the candidate with a PhD in

bioinformatics, computer science, statistics, engineer, or computational physics.

The successful candidate should have strong understanding of statistics and

pattern recognition. Programming skills using Matlab, Microsoft .NET, or Java to

accomplish analyses is required. Experience in analyzing biological data is not

required; however, interest in multidisciplinary research is a must.


Related docs
Other docs by HC111111154326
Price_
Views: 45  |  Downloads: 0
raven_101
Views: 0  |  Downloads: 0
acma_radcomms_cchapman_final
Views: 1  |  Downloads: 0
CourseDesc2011
Views: 0  |  Downloads: 0
0703 20all
Views: 12  |  Downloads: 0
kemin
Views: 0  |  Downloads: 0
internet
Views: 0  |  Downloads: 0
xls_synthese_des_sujets_2010
Views: 1  |  Downloads: 0
IUID_SNT_SIM_UIT_Integration_V14
Views: 0  |  Downloads: 0
MarriageEvolutionFutureWorkshop
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!