caArray Database in the cancer Biomedical Informatics Grid, caBIG

Reviews
Shared by: Shame Ona
Stats
views:
9
rating:
not rated
reviews:
0
posted:
2/11/2009
language:
pages:
0
National Cancer Institute caArray Database in the cancer Biomedical Informatics Grid, caBIG Project Goals: • • • • MIAME compliant data annotation Data sharing Provide microarray data for analytical applications via the MAGE-OM API Data Integration across different domains in caBIG/caGRID National Cancer Institute The NCI Center for Bioinformatics: Who are we? • The Center for Bioinformatics is the NCI’s strategic and tactical arm for research information management • We collaborate with both intramural and extramural groups • Production, service-oriented organization • Mission to integrate and harmonize disparate research data National Cancer Institute Motivation for the caBIG program • Overwhelming volume of data • Multitude of sources • Each part of the health community speaks its own scientific “dialect” (e.g. lab values, genetic profile, clinical data) • Lack of consensus on common standards and terms • Lack of coordination across, and collaboration within, the cancer research enterprise National Cancer Institute Cancer Biomedical Informatics Grid™ (caBIGTM) • Common, widely distributed infrastructure permits research community to focus on innovation • Shared vocabulary, data elements, data models facilitate information exchange • Collection of interoperable applications developed to common standards • Biomedical research data will be available for mining and integration National Cancer Institute Current caBIG™ community • NCI-designated Cancer Centers (50) – Academic Centers (integrated into broader biomedical infrastructure) – Stand-alone (community leaders) – Community outreach • • • • • NCI Divisions NIH NECTAR grantees Government Industry International Groups – Standards development organizations – U.K.’s National Cancer Research Institute • ~800 active participants Four Domain Workspaces and two Cross Cutting Workspaces have been launched DOMAIN WORKSPACE 1 Clinical Trial Management Systems DOMAIN WORKSPACE 2 Integrative Cancer Research DOMAIN WORKSPACE 3 Tissue Banks & Pathology Tools DOMAIN WORKSPACE 4 Imaging Addresses the need for consistent, open and comprehensive tools for clinical trials management. Provides tools and systems to enable integration and sharing of information. Provides for the integration, development, and implementation of tissue and pathology tools. Provides for the sharing and analysis of in vivo imaging data. Responsible for evaluating, developing, and integrating CROSS CUTTING WORKSPACE 1 systems for vocabulary and ontology content, Vocabularies & Common standards, and software systems for content delivery Data Elements Developing architectural standards and architecture necessary for other workspaces. CROSS CUTTING WORKSPACE 2 Architecture National Cancer Institute caArray Data Portal http://carraydb.nci.nih.gov •Number of raw data files in the database: 699 (565 .cel, 134 .gpr) •Number of public experiments in the database: 35 •Registered users:306 •Number of downloads from the download center: 460 •Average unique users per month: 282 •Average time spent per session: 16 minutes National Cancer Institute caArray: Compliance with Standardization Efforts • MIAME – Minimum Information About a Microarray Experiment – 1.1 Draft 6 (April 1, 2002) – http://www.mged.org/Workgroups/MIAME/miame_1.1.html • MAGE-ML – MicroArray and GeneExpression Object Model and Markup Language – 1.1 (October 2003) – http://www.omg.org/docs/formal/03-10-01.pdf • MGED Ontology – Microarray Gene Expression Data Ontology – 1.1.8 (April 2004) caAMEL- caArray Integration October 2006 – http://mged.sourceforge.net/ontologies/MGEDontology.php National Cancer Institute caBIG Compatibility Guidelines National Cancer Institute caArray Workflow National Cancer Institute MGED Ontology (MO) and caArray Now: MO stored in caArray Future: caArray integration with the Enterprise vocabulary services (EVS) caCORE Server caArray caCORE LexBIG Mediator API API EVS Two Categories: 1) Terms provided by MO, e.g. BioMaterialLabel 2) MO references existing ontologies, e.g. DiseaseState National Cancer Institute LabelCompound: Compounds that are used for labeling extracts. National Cancer Institute National Cancer Institute Ontology Entry External (to the MGED ontology) controlled vocabulary or ontology that can be referred such as ICD-9 or Gene Ontology. Used in classes: – CellLine CellType ClinicalHistory ClinicalTreatment Compound DevelopmentalStage DiseaseStaging DiseaseState FamilyMember GeographicLocation Histology Organism OrganismPart Phenotype SequenceOntologyBioSequenceType StrainOrLine TargetedCellType TestType TumorGrading National Cancer Institute BioMaterialCharcteristics, Disease State The name of the pathology diagnosed in the organism from which the biomaterial was derived. The disease state is normal if no disease has been diagnosed. National Cancer Institute DiseaseStateDatabase National Cancer Institute Ontology Entry in caArray External (to the MGED ontology) controlled vocabulary or ontology that can be referred such as ICD-9 or Gene Ontology. National Cancer Institute BioMaterialCharacteristics in caArray National Cancer Institute National Cancer Institute caArray Data Portal & Data Analysis Tools 1. Data Portal: submission of original, raw data files with associated experiment and sample information (MIAME annotations). Data analysis and visualization tools:  caBIG tools: 1. 2. 3. 4. 5. 6. 7. 8. caWorkbench - Columbia GenePattern – MIT/Broad DWD - UNC Lineberger GenePattern - MIT/Broad VISDA – Georgetown Cancer Molecular Pages – Burnham Function Express – Wash U Siteman webGenome - RTI 2. National Cancer Institute caArray Configuration (federated) caArray 1 caWorkbench caBIO caArray schema caDSR / EVS caARRAY EJB MAGE-OM API Security JAVA APP MAGE-ML caGRID (future) MAGE-OM API NCICB Security GRID caARRAY EJB caArray schema caWorkbench caDSR / EVS caBIO NCICB National Cancer Institute geWorkbench www.geworkbench.org Columbia University, NY: Andrea Califano, Aris Floratos, Manjunath Kustagi Key Features: • Gene expression data – Support for several formats – Seamless integration with caArray – Rich collection of visualizers – Filtering, normalization, analysis components – • Reconstruction and visualization of molecular interaction networks Sequence analysis – Sequence homology (BLAST, HMM, Smith-Waterman) – Promoter analysis – Motif discovery – Synteny Annotations – Integration with CGAP gene annotations and BioCarta pathways – Mapping to GO categories Development platform – Open source, Java-based – Component architecture, facilitating customization • • GenePattern: A platform for integrative genomics Ted Liefeld, Broad Inst. Module Repository Pipeline Environment MAGE caArray, file, or URL Graphical Environment 65+ Modules Microarray Analysis Proteomics Clustering Marker selection SNP/Copy # Data visualization MAGE-ML ImportViewer Import MAGE-ML data Bicluster PreprocessDataset extract breast samples Heat Map Prediction Results -add your own analyses methods to GenePattern Task Integrator GeneNeighbors compute nearest neighbors of cyclin D1 in breast cells SelectFeaturesColumns extract ovary samples Programming Environments # source("D:/CGP2003/GenePattern_modules/Golub_et_al_1999.R", echo = TRUE) # GenePattern # # Molecular Classification of Cancer: Class Prediction by Gene Expression # # Summary: This R/GenePattern script implements the supervised prediction metho # in Golub et al 1999, Science 286:531-537 (1999). # Load and set up GenePattern commands and server SelectFeaturesRows get expression data for breast neighbors in ovary cells source("http://wilkins.wi.mit.edu:7070/gp/GenePattern.R", echo = FALSE, print.ev server <- SOAPServer("http://wilkins.wi.mit.edu", "/axis/servlet/AxisServlet", 7 source(paste("http://", server@host, ":", server@port, "/gp/getAllTaskWrappers.j # Neighborhood analysis MS.out <- MarkerSelection("data.filename" = "http://www-genome.wi.mit.edu/mpr/pu "class.filename" =“” "pred.results.file" = "pred.results", "data.results.file" = "data.results", "num.permutations" = "25", file.show(MS.out$pred.results) file.show(MS.out$data.results.gct) ConvertToMAGEML write out MAGE XMl file data <- read.table(MS.out$pred.results, header=T, sep="\t", skip=14) GenePattern MAGE data file National Cancer Institute caGrid 0.5.4 current stable, supported, grid environment for caBIG Advertise, Query, and Discover services and data caGrid 0.5 Test Bed Current caGrid 0.5 Nodes NCICB 1. GUMS 2. CAMS 3. GME 4. caBIO 5. caArray 6. caMOD 7. SNP500Cancer Duke 8. rProteomics Pittsburg Medical Center 9. caTIES caBIO caArray caTIES Pittsburgh GUMS Index Service CAMS caArray Georgetown caBIG NCI PIR Duke rProteomics GME Lombardi Georgetown 10. PIR: proteomics inf. resource National Cancer Institute http://cagrid-browser.nci.nih.gov caGrid Development team: Ohio State University NCICB SAIC, Oracle, Panther Informatics, TerpSys, BAH National Cancer Institute Discovery utilizing caBIG™ “Idealized” ICR Data Integrated Cancer Research Tools Flow FunctionEx press Promoter DB Identify recurring promoter elements Gene Pattern Gene annotation Analysis Identify up-regulated genes in specific pathways Clinical Trials Tumor Samples 400 brain tumor tissue samples acquired Pathology reports Gene expression profiling caArray Pathways Tool caTIES Discrete and manual annotation on tissues Clinical Annotation Modules caTissue Potential Drug Targets and Biomarkers Mutation identification TrAPSS Proteomics LIMS Analysis Q5 Annotation PIR National Cancer Institute NCICB Download http://ncicb.nci.nih.gov/download & GForge site http://gforge.nci.nih.gov National Cancer Institute Training and Support ncicb@pop.nci.nih.gov National Cancer Institute Acknowledgements caArray team: • • • • • • • • Mervi Heiskanen (NCI) Anand Basu (NCI) Xiaopeng Bian (NCI) Rob Daly (5am Solutions) Eric Tavela (5am Solutions) Leslie Power (5am Solutions) Steven Matyas (5am Solutions) Jerry Eads (NTVI) heiskame&mail.nih.gov caBIG Developers & Adopters EVS team: Gilberto Fragoso (NCI) Frank Hartel (NCI) Training & Support: Don Swan (TerpSys)

Related docs
Other docs by Shame Ona