Knowledge Discovery in Biomedicine
Limsoon Wong Institute for Infocomm Research
Plan
• Knowledge discovery in brief • Eg 1: Optimizing treatment of childhood ALL • Eg 2: Predicting survivals of patients with DLBC lymphoma • Concluding remarks
Copyright © 2004 by Limsoon Wong
Copyright © 2004 by Limsoon Wong
Knowledge Discovery in Brief
What is Knowledge Discovery?
Jonathan’s blocks
Jessica’s blocks Whose block is this? Jonathan’s rules : Blue or Circle Jessica’s rules : All the rest
Copyright © 2004 by Limsoon Wong
What is Knowledge Discovery?
Question: Can you explain how?
Copyright © 2004 by Limsoon Wong
Steps of Knowledge Discovery
• Training data gathering • Feature generation
– k-grams, colour, texture, domain know-how, ...
• Feature selection
– Entropy, 2, CFS, t-test, domain know-how...
• Feature integration
– SVM, ANN, PCL, CART, C4.5, kNN, ...
Some classifiers/learning methods
Copyright © 2004 by Limsoon Wong
Knowledge Discovery for
Optimizing Treatment of Childhood ALL
Copyright © 2004 by Limsoon Wong
Image credit: Yeoh et al, 2002
Childhood ALL
• Major subtypes: T-ALL, E2A-PBX, TEL-AML, BCR-ABL, MLL genome rearrangements, Hyperdiploid>50, • Diff subtypes respond differently to same Tx • Over-intensive Tx
– Development of secondary cancers – Reduction of IQ
• The subtypes look similar
• Conventional diagnosis
– Immunophenotyping – Cytogenetics – Molecular diagnostics
• Under-intensiveTx
– Relapse
Copyright © 2004 by Limsoon Wong
•
Unavailable in most ASEAN countries
Single-Test Platform of Microarray & Knowledge Discovery
training data collection
feature integration
Image credit: Affymetrix
Copyright © 2004 by Jinyan Li and Limsoon Wong
Impact
Conventional Tx: • intermediate intensity to all 10% suffers relapse 50% suffers side effects costs US$150m/yr
Our optimized Tx: • high intensity to 10% • intermediate intensity to 40% • low intensity to 50% • costs US$100m/yr
Copyright © 2004 by Jinyan Li and Limsoon Wong
•High cure rate of 80% • Less relapse • Less side effects • Save US$51.6m/yr
Knowledge Discovery for
Predicting Survival of Patients with DLBC Lymphoma
Copyright © 2004 by Limsoon Wong
Image credit: Rosenwald et al, 2002
Diffuse Large B-Cell Lymphoma
• DLBC lymphoma is the most common type of lymphoma in adults • Can be cured by anthracycline-based chemotherapy in 35 to 40 percent of patients DLBC lymphoma comprises several diseases that differ in responsiveness to chemotherapy
Copyright © 2004 by Limsoon Wong
• Intl Prognostic Index (IPI)
– age, “Eastern Cooperative Oncology Group” Performance status, tumor stage, lactate dehydrogenase level, sites of extranodal disease, ...
• Not good for stratifying DLBC lymphoma patients for therapeutic trials Use gene-expression profiles to predict outcome of chemotherapy?
Knowledge Discovery from Gene Expression of “Extreme” Samples
240 samples 7399 genes
“extreme” sample selection
47 shortterm survivors 26 longterm survivors
80 samples
knowledge discovery from gene expression
84 genes
T is long-term if S(T) < 0.3 T is short-term if S(T) > 0.7
Kaplan-Meier Plot for 80 Test Cases
p-value of log-rank test: < 0.0001 Risk score thresholds: 0.7, 0.5, 0.3
Improvement Over IPI
(A) IPI low, p-value = 0.0063
(B) IPI intermediate, p-value = 0.0003
Merit of “Extreme” Samples
(A) W/o sample selection (p =0.38)
(B) With sample selection (p=0.009)
No clear difference on the overall survival of the 80 samples in the validation group of DLBCL study, if no training sample selection conducted
Knowledge Discovery for
A Few Other Biomedical Applications
Copyright © 2004 by Limsoon Wong
Predict Epitopes, Find Vaccine Targets
• Vaccines are often the only solution for viral diseases • Finding & developing effective vaccine targets (epitopes) is slow and expensive process • Develop systems to recognize protein peptides that bind MHC molecules • Develop systems to recognize hot spots in viral antigens
Recognize Functional Sites, Help Scientists
• Effective recognition of initiation, control, & termination of biological processes is crucial to speeding up & focusing scientific expts • Data mining of bio seqs to find rules to recognize & understand functional sites
Dragon’s 10x reduction of TSS recognition false positives
Understand Proteins, Fight Diseases
• Understanding function & role of protein needs organised info on interaction pathways • Such info are often reported in scientific paper but are seldom found in structured db • Knowledge extraction system to process free text • extract protein names • extract interactions
Benefits of Bioinformatics
• To the patient:
– Better drug, better treatment
• To the pharma:
– Save time, save cost, make more $
• To the scientist:
– Better science
Copyright © 2004 by Limsoon Wong
References
• A. Yeoh et al, “Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling”, Cancer Cell, 1:133--143, 2002 • A. Rosenwald et al, “The use of molecular profiling to predict survival after chemotherapy for diffuse large B-cell lymphoma”, NEJM, 346:1937--1947, 2002 • H. Liu et al, “Selection of patient samples and genes for outcome prediction”, Proc. CSB2004, pages 382-392
Copyright © 2004 by Limsoon Wong
Copyright © 2004 by Limsoon Wong
Any Question?
• • • •
To be presented 10/10/04, 8.30--10.00am Raffles Convention Centre NHG-IBM Symposium
Copyright © 2004 by Limsoon Wong