IPAM #3: Childhood Sarcoma Classification by Gene Expression Profiles
Timothy J. Triche CHLA/USC
Clinical Classification of Childhood Cancer
• Historical: Morphologic diagnosis +
clinical data => risk group, protocol eligibility, treatment (eg, group-based treatment) immunophenotype, genomic defect) => patient-specific group-based treatment on multi-genic phenotype?
• Current: Combined (morphology,
• Future: Patient-specific therapy, based
Osteosarcoma
• Five histologic types, no prognostic value
• Weak prognostic features: site, size, age
• No specific, predictive genetic abnormality (RB, p53)
• Clinical stage only significant prognostic indicator at presentation
Osteosarcoma Prognosis
• Pre-resection chemotherapy => major increase in survival • Improved survival limited to patients with ≥95% tumor kill • Patients w/ metastases can be salvaged But, many exceptions occur:
– Responders who metastasize & die – Non-responders who survive – Metastatic patients who survive after resection of mets
Thus, predicting outcome & tailoring therapy remains a major problem
Osteosarcoma: Response to Chemo
Before After
Osteosarcoma Survival
• Surgery only: <10% • Metastases, no surgery: 0% • Metastases, surgery: ~20% • Single-agent chemotherapy: <20%
• Conventional chemotherapy: ~44%
• Up-front chemotherapy: ~65% • Responders: ~80% • Non-responders: <40%
Multi-gene Analysis by Microarrays
• Single gene abnormalities, even when present, are inadequate alone to:
– – – – – Establish a diagnosis Identify individual patients risk profile Predict clinical course Predict response to therapy Predict outcome
• Increasing evidence suggests gene expression profiles may favorably address these issues
Gene Expression Analyses
• Scatter Analyses
– 1X1 – Groups
• Outlier Gene Analyses
– Up & down regulated from mean – Identity
• Cluster Analyses
– All genes – Various methods
Specimen Handling
A) Cut pilot section of OCT embedded frozen tissue
B) Cut ~12 frozen sections
C) Extract RNA (<5ug total RNA) D) Synthesis of double-stranded cDNA E) In-vitro transcription w/ biotinylated nucleotides F) Size confirmation of cRNA transcripts
tumor
non-tumor
dissection of tumor tissue when possible
pure tumor
500 bp
G) Fragmentation of cRNA
Osteosarcoma: Gross Appearance
Histopathology of Osteosarcoma
Gene Expression: Osteosarcoma
Pilot data
2
1
6= primary tumor, 1993
11= first metastasis, 1996
9= second metastasis, 1998 (died 1999)
Met 1 vs. met 2: little similarity
Primary vs. 1st Metastasis
Primary 1st Pulmonary Met
Primary vs. Metastatic Osteosarcoma
Ribosomal Protein L30 Oste one ctin Ribosomal Protein L37A TF SL1 Thymosin IMP E16 Pinch Prote in CPT1 Cyclin A Tat-SF1 CAMP PK RII subunit NGF be ta Tyrosine Phosphatase PIGA, A Uncoupling Protein 3 PSA Ribosomal Protein L32 PSG11
Differential Gene Expression:
Osteonectin lost in metastasis
Primary Metastasis
0 50000 100000 150000 200000 250000
-50000
Primary vs. “Metastasis”
Primary, 1993, pre-Rx Tibia lesion, 1998, pre-Rx
Gene Expression Data Clustering
Multiple methods work
Pattern Recognition
No process knowledge
Data Postulated Patterns
Millions of possible patterns
Generate possible patterns:
Scenario analysis Non-numeric simulations Computational linguistics Neural networks Linear/non-linear optimization methodology
Discovery of patterns buried in massive dataset
Iterative Process
Pattern Tested
Neural net uses data to optimize pattern
New Postulated Patterns
Pattern recognition
New rules developed
Optimized Set of Patterns
Limited set of probable patterns
Agglomerative vs. Optimizing Hierarchical Clustering
• Both build a tree of clusters, with data points as leaves, & “nearby” data points as siblings. • Agglomerative method repeatedly finds closest pair and irreversibly groups them. Bottom-up. Binary tree. • Optimization methods reconsider assignments based on other assignments and their effects on cluster means & variances. • Minimize sum of squared distances.
– Distance measure matters. – Relate to statistical noise models, co-regulation models & likelihood of fit.
Agglomerative vs. Optimizing Hierarchical Clustering, cont.
• Optimize means, variances, and cluster memberships. • Currently we optimize top-down, by levels • Expectation Maximization: soft memberships. K-means: hard. • Optimize tree topology (fanout) by CV • SOM also optimizes at one level, and requires low-dimensional grid embedding of cluster means. • Alternative to data-cluster distances: cliques of low data-data distances. Also has EM-like stat mech algorithms.
Mimir User Interface
Courtesy of Eric Mjolsness, JPL
Data Flow forSarcoma Analysis
gene clustering
sample clustering
classifiers
data
labels
scoring
Pilot Study of Sarcomas
17 cases of osteosarcoma and rhabdomyosarcoma 6800 GeneChip analysis 6800 genes yield 14 gene clusters
Reduced mean space yields 4 sample clusters
OS OS
OS, OS ERMS
OS, ARMS 1ª ERMS X 4
OS x 3 ARMS met X 3
Expandable Tree of Variables Characterizing a Tissue Sample
All variables
Subject Conditions Genes
Outcomes
Clinical
Demographics
Clinical response
Metastasis
Survival
Treatment Pathology
Age,
Sex, etc.…
EM (Expectation Maximization) Gene Clustering
Sarcoma Dataset: 45 cases of RMS (Alv + Emb) & Osteosarcoma (R + NR)
A
B
C
F
G
D
J
K
= POOR
= INTERMEDIATE
= FAVORABLE
Working hypothesis:
Gene expression profiling can detect prognostic distinctions among sarcomas independently of conventional clinical or diagnostic criteria
Future Directions
• Analyze larger data set (institutional, COG) to test hypothesis • Expand to all sarcomas (RMS, non-RMS, OS, ESFTs) • Identify biologically important genes
• Creation of custom “sarcoma” arrays using oligomers representing these genes
• Long term studies of COG sarcoma patients using arrays in context with current clinical & biology studies
All osteosarcomas
Osteosarcoma vs RMS Genes
Log (0steo)
5.68 194.82 6.50 1952.82 7.31 2240.65 6.01 162.35 6.66 5.90 5.75 1225.65 6.43 5.64 8.63 12771.76 8.25 12913.76 6.18 8.77 19340.53 7.09 6.36 6.62 732.18 1659.82 907.00 8.58 10916.54 7.72 4836.68 8.00 6595.57 1.83 1.84 1124.24 8.01 7048.57 6.29 1756.00 1.90 2.25 5.63 2011.11 2.33 1166.59 890.82 8.40 9229.32 7.48 3556.00 10.75 77271.57 2.33 2.36 2.39 2058.88 629.94 8.93 13677.36 7.93 9770.93 7.71 3923.54 2.44 2.49 2.56 8.28 8099.00 2.71 9.77 31139.89 2.79 8.87 16832.71 2.83
Mean (Osteo)
Log (Rhabdo)
7.96
Mean (Rhabdo)
6357.93
Mean log Rhabdo
2.83 2.83
GENE DESCRIPTION
TNNT1 Troponin T1, skeletal, s low MEST Mesoderm specific trans cript (mouse) homolog IGF2 Insulin-like growth factor 2 (somatom edin A) FGFR4 Fibroblast growth factor receptor 4 IGF2 Insulin-like growth factor 2 (somatom edin A) Adrenal-Specific Protein Pg2 Steroid receptor coactivator (SRC-1) m RNA GB DEF = DNA for cellular retinol binding protein (CRBP) exons 3 and 4 RBP1 Cellular retinol-binding protein Insulin-Like Growth Factor 2 PTN Pleiotrophin (heparin binding growth factor 8, neurite growthpromoting factor 1) Muscle acetylcholine receptor alpha-s ubunit MMP2 Matrix m etalloproteinase 2 (gelatinase A; collagenase type IV) CCND2 Cyclin D2 TNNI1 Troponin I, s keletal, s low MYL1 Myos in light chain (alkali)
Proposed COG Study of All Sarcomas
Acknowledgements
• CHLA:
– Deb Schofield – Jingsong Zhang
• Caltech:
– Barbara Wold – Chris Hart
• USC:
– Jonathan Buckley – Kim Siegmund
• JPL:
– – – – Eric Mjolsness Tobias Mann Joe Roden Ben Bornstein
• NCCF:
– Mark Krailo
• UBC:
– Poul Sorensen