caBIGо Essentials

Document Sample
caBIGо Essentials Powered By Docstoc
					          caBIG® Essentials:
An Introduction to the cancer
 Biomedical Informatics Grid

                   January 2011
Table of Contents

• Section 1: Introduction to caBIG®

• Section 2: caBIG® Community

• Section 3: caBIG® Capabilities

• Section 4: caBIG® Infrastructure

• Section 5: caBIG® Data Sharing and Security Framework

• Section 6: Deploying caBIG®

• Section 7: Working Toward Interoperability

• Section 8: Accessing caBIG® Support
From Bench to Bedside

Personalized Medicine
• Applies genomic,
  molecular, and clinical
  data to target the
  delivery of health care.
• Allow treatments to be
  tailored based on the
  genetic make up of the
• Focuses on variations,
  to better match
  patients with the
  therapy most effective
  for them.
Transforming Scientific Research

• Biomedicine and cancer research are
  moving towards an enhanced
  understanding of the molecular basis of
  disease                                                   Researcher

• Information technology, informatics and
  adoption of standards are accelerating the
  collection, analysis, integration and                                   Center
  dissemination of data associated with
  cancer research and care

• caBIG® is an information network enabling                  Researcher
  the cancer community – researchers,
  clinicians, patients – sharing data and
  knowledge to accelerate the discovery and
  improve outcomes.
caBIG® Vision and Goals

caBIG® Vision
A virtual network of interconnected data, individuals, and organizations
that redefines how research is conducted, care is provided, and
patients/participants interact with the biomedical research enterprise.

caBIG® Goals
  • Connect the cancer research community through a shareable,
    interoperable infrastructure
  • Deploy and extend standard rules and a common language to more
    easily share information
  • Build or adapt tools for collecting, analyzing, integrating, and
    disseminating information associated with cancer research and care
caBIG® Core Principles

• Open Access – caBIG® is open to all,
  enabling wide-spread access to tools,
  data, and infrastructure

• Open Development – Planning,
  testing, validation, and deployment of
  caBIG® tools and infrastructure are
  open to the entire research community

• Open Source – The underlying
  software code of caBIG ® tools is
  available for use and modification

• Federation – Resources can be
  controlled locally, or integrated across
  multiple sites
caBIG® Strategy
•   Community
     • Establish an open community of participants from a wide spectrum of
       disciplines, geographies, types of institutions, etc.
     • Facilitate the work of others who are building capabilities in life sciences
       and clinical research
     • Adopt a federated model to allow local control of sharing and
       partnerships and to support individual labs and institutions
•   Content
     • Facilitate access to rich primary data
     • Leverage existing academic and commercial software, wherever
       possible, to avoid unnecessary development time and expense
     • Invest primarily in open source tools that the community does not have
•   Connectivity
     • Recognize legacy IT systems to avoid “rip and replace” costs
     • Wherever feasible, make disparate applications compatible for “plug-
       and-play” compatibility and data-sharing through standards-based
       interoperable infrastructure
How Connecting with caBIG® Benefits You

• Benefits of caBIG® for cancer research organizations:

 • Manage and use local and publicly-accessible biomedical data
   for research
 • Connect and streamline research workflows
 • Increase accuracy of analytical processes
 • Standardize and streamline data collection
 • Perform complex analysis across data sets
 • Share data appropriately

• caBIG® addresses a critical problem facing basic and clinical
  researchers today: large volumes of data and increased
  needs for collaboration that require new approaches for data
  collection, management, and analysis.
caBIG® Supports Individual and Institutional Needs

       Analytical Tools                           Molecular Analysis
Analytical tools are linked with Molecular analysis tools, biospecimens
  and clinical trials data within a research unit

        Biospecimens              Unit
                                                   Clinical Trials
caBIG® Supports Individual and Institutional Needs

        Analytical Tools                                                                             Microarray Data
                    Research                                    Research                                           Center
          Biospecimens                                                                                 Clinical
                   Medical                                                                              Trials
                                                                  Common Data

                                                    Index         Vocabularies &    Workflow    Federated
                                   Dorian GTS                       Ontologies     Management     Query                      Research
                                                                                     Service     Service                      Center
                                                                  GME Schema                    Federated
                                   Security     Advertisement                       Workflow
                                                                  Management                      Query
                                                 / Discovery


                                                             Research                                       Center


   These tools are supported with infrastructure, shared vocabularies and workflow tools
The caBIG® Community
The caBIG® Community:
Sponsor and Workspaces

• caBIG® is sponsored by the National Cancer Institute (NCI) and
  is administered by the National Cancer Institute‟s Center for
  Biomedical Informatics and Information Technology (NCI CBIIT).

• caBIG® priorities are set by the NCI and by groups called
  workspaces, comprised of caBIG® community members
  • Domain workspaces focus on specific areas of cancer research
    and oversee the development of domain-focused software tools.
  • Cross-cutting and Strategic workspaces support these groups by
    developing standards, infrastructure, policies, and documentation.

  Workspaces conduct regular open teleconferences and face-to-face
   meetings – visit the caBIG® Community Web site to learn more.
caBIG® Workspaces Bring Together Communities

The caBIG® Workspaces are the thematic areas and virtual environments
where caBIG® activities are grouped and prioritized. Anyone can help
shape the future of caBIG® by attending calls and meetings.

Gathering Stakeholders:     Conducting Many Activities:

• NCI                       •   Bring together, within each domain, funded
• caBIG® Program Staff          and volunteer community members
• Developers/Adopters           engaged in development and adoption
• Subject Matter Experts    •   Serve as key operational units of caBIG®
• Patient Advocates             and as representatives of the larger
• Industry and Government       community
  Stakeholders              •   Open to all who wish to participate
• Support Representatives   •   Convened through regularly scheduled
                                teleconferences, webinars, and face-to-face
                            •   All products openly and publicly shared
                                through website, forums, wikis and listservs.
caBIG® Workspaces
  Domain-level                                                 Strategic-level

                                                                and Training
    Clinical                                      Tissue
                 Integrative                                  Workspace (D&T)
     Trials                       In Vivo        Banks &
  Management                     Imaging        Pathology
   Systems                      Workspace          Tools
  Workspace                       (IMAG)        Workspace      Data Sharing &
    (CTMS)                                        (TBPT)        Intellectual


         caBIG® Vocabularies and Common Data Elements Workspace (VCDE)

                      caBIG® Architecture Workspace (ARCH)
Domain Workspace Charters/Focus

• Clinical Trials Management Systems Workspace:
 • Specification, implementation and integration of tools for clinical data
• Integrative Cancer Research Workspace:
 • Extension and implementation of tools to enable researchers to integrate
   and share data collected from a variety of heterogeneous sources,
   including de-identified patient clinical data and information from high-
   throughput research techniques
• Tissue Banks and Pathology tools Workspace:
 • Integration and implementation of tools for collection and maintenance of
   tissue and pathology data
• In Vivo Imaging Workspace:
 • Development of tools and methods based on standardization and
   interoperability to share and optimize imaging information
Cross-Cutting Workspaces: Charter/Focus

• Vocabularies and Common Data Elements Workspace:
 • Evaluate/integrate systems for vocabulary and ontology content
   development, and software systems for content delivery
 • Develop standards for representation of ontologies and vocabularies

• Architecture Workspace:
 • Bring new architectural developments and standards to caBIG®
 • Act as interface to IT communities developing novel architectures
 • Develop white papers on new architectural opportunities
 • Provide architectural expertise and solutions
Cross-Cutting Workspaces: Charter/focus, cont‟d

• Data Sharing and Intellectual Capital Workspace:
  • Development of white papers on means of protecting intellectual capital
    while allowing for the exchange of scientific information among cancer
  • Suggesting software specifications for inclusion in systems providing
    federated data to the community
  • Evaluation of effects of Federal privacy regulations
  • Structuring discussions on how Institution Review Boards (IRBs) and
    Patient Consent can impact the aggregation, storage and analysis of data
    from clinical trials

• Documentation and Training Workspace:
  • Creation and dissemination of documentation and training materials
  • Development of educational resources for those deploying caBIG®
The Enterprise Support Network (ESN)
Extends caBIG® Community and Support
• Knowledge Centers ( serve
  as a key support structure for an expanding community employing caBIG®
  tools, standards, and infrastructure in a specific domain. Knowledge Center
  staff can provide expert guidance to end users, IT staff and senior decision
  makers implementing caBIG® tools and infrastructure.

• Support Service Providers (
  are able to provide comprehensive technical support under client-specific

  • Help Desk Support
  • Adaptation and Enhancement of caBIG®-Compatible Software
  • Deployment Support for caBIG® Software Applications
  • Documentation and Training Materials and Services
caBIG® Supports the NCI Cancer Community

caBIG® participation:
• 56 NCI-designated
  Cancer Centers
• 30 NCI Community
  Cancer Centers
• 2,300+ participants from
  more than 700 institutions
• 149 “nodes” connected
• 19 licensed Support
  Service Providers (SSPs)
• 70+ caBIG® applications

                               Green squares: NCI-designated Cancer Centers
                               Orange circles: NCCCP sites
Partial List of NCI Programs Added since caBIG® Launch
• Mouse Models in Human Cancer (MMHCC)
• Integrated Cancer Biology Program (ICBP)
• Interagency Oncology Taskforce
• Clinical Proteomic Technologies for Cancer Initiative
• Nanotechology Alliance
• Specialized Programs Of Research Excellence (SPOREs)
• Strategic Partnering to Evaluate Cancer Signatures (SPECS)
• Cancer Genetic Markers of Susceptibility (CGEMS)
• Glioblastoma Multiform Diagnostic Initiative (REMBRANDT)
• Clinical Trials Working Group
• Translational Research Working Group
• The Cancer Genome Atlas (TCGA)
• Therapeutically Applicable Research to Generate Effective Treatments
  (TARGET) Initiative
• Office of Biorepositories and Biospecimen Research (caHUB)
• NCI Community Cancer Centers Program
International Collaborations

• 16 countries are engaged with and/or using caBIG® tools and
  technologies, including:

  •   United Kingdom
  •   India
  •   China
  •   Mexico, Brazil, Uruguay, Argentina, Chile
  •   Czech Republic
  •   The Netherlands
  •   Germany
  •   Finland
  •   Jordan
  •   Pakistan
  •   Australia
  •   New Zealand
caBIG® Capabilities:
     Clinical Trials
Clinical Trials Management:
The Research Landscape & Needs

           • A vast amount of clinical data is still generated using
             paper-based methods and remains unavailable to
             support research.
           • Adult patient enrollment in clinical trials is relatively low
             for many reasons, leading to insufficient sample sizes
             and a corresponding lack of research data.

           • While hospital registries maintain data, they may lack
             real time data capture, clinically meaningful
             information, treatment detail, and data on recurrence.

           • Informatics solutions must reflect regulatory and legal
             requirements – balancing the public good of research
             against personal privacy and protection
Clinical Trials Management:
The Research Landscape & Needs

• In the current landscape, organizational and data “disconnects” slow the
  time to translate research findings into safe and effective products.

• Clinical data from disparate sources are difficult to integrate, making it
  hard to track patients across sites and time.

• Researchers need tools that:
  • Can track clinical trial registrations
  • Facilitate automatic capture of clinical laboratory data
  • Manage reports describing adverse events during clinical trials
  • Facilitate data sharing within an institution or across a multi-site trial
  • Rapid integration of evolving science, such as patient reported
    outcomes or evolving research areas such as epigenetics
Clinical Trials Management
         Patterns of Usage:
         • Complete end-to-end clinical trials management capability
         • Specific capability to fill a gap in existing capabilities
         • Linkage tools (Integration Hub, caGrid, data standards) to
           connect existing in-house or commercial applications
         • Specifications to build or modify in-house developed
           applications for interoperability
         caBIG® Capabilities:
         •   Collect clinical trials data - C3D
         •   Enroll, register, and track clinical trial participants across multiple sites - C3PR
         •   Create and manage clinical trial participant schedules and activities – PSC
         •   Store, browse clinical laboratory data; share with other systems - Lab Viewer
         •   Collect and report adverse events - caAERS
         •   Connect systems and support clinical trials workflow integration; provide interoperability
             associated with SOA - Integration Hub
         •   Interoperate / Share data with 3rd party CDMS systems - Clinical Connector
         •   Exchange of clinical trials data across multiple systems - CTODS
         •   Provide common data elements and case report forms for standardization and reusability - CRF
         •   Help investigators comply with Federal registration requirements – FIREBIRD
         •   Clinical Trials Portfolio Management - CTRP
caBIG®                                              Patient Study Calendar (PSC)                       Lab Viewer
Clinical Trials
  CTMS Tools
  Access at
                                                                                                    Identifies labs, loads
                                                                                                    them into the CDMS
                                                    Tracks patient schedules                        and/or AE system
                                                    throughout study
caBIG® Central Clinical
Participant Registry (C3PR)
                                                                                                           Clinical Data
                                                                   caBIG® Integration
                                                                                                            e.g., clinical
Eligibility verified, patients registered                                                                   chemistry labs
to studies
                                                           Clinical             caBIG® Adverse Event
   Electronic Data Capture (EDC)                           Connector            Reporting System (caAERS)
          (C3D or others)

                                                                                   Identifies and tracks adverse
                                                                                   events and any associated
                                                                                   schedule changes
Monolithic to Modular: Enterprise Software
Development Best Practice
• Deployable as standalone components, or in any permutation
  as an interoperable “suite”
• Where suitable proprietary and other non-caBIG® developed
  software components are available:
 • Can be adapted to work as part of the caBIG® infrastructure
 • Processes for:
       • Adaptation to caBIG® compatibility
       • Verification
• Where the community has
 identified an unmet need:
 • caBIG® develops components
 • Makes them freely available
 • Non-viral, open-source license
Standard Case Report Form Modules:
Building a Consensus

• Per Data Category and Module (e.g., Patient Data - Demography)
• Automatically retrievable for rendering via Cancer Data
  Standards Repository (caDSR)
• Industry harmonization via Clinical Data Acquisition
  Standards Harmonization (CDASH)
   • Initiative of Clinical Data Interchange Standards Consortium
     (CDISC), commissioned by Food and Drug Administration
• EDC / CDMS Systems that can currently consume caBIG®‟s
  standard CRF modules include:
   • C3D (Oracle / NCI)
   • Medidata, Oncore, Velos (commercial)
   • REDCap (NIH Clinical and Translational
     Science Awards; CTSA)
NCI Clinical Trials Reporting Program (CTRP)
• Enables any organization, including NCI, to manage its portfolio
  (monitoring accrual, identifying gaps / duplicative studies,
  prioritizing trials); facilitates community reporting
   • Provides a clinical trials management tool for Centers
• NCI abstracts data into system
   • Enables conduct of NCI trials in-house with reduced data curation
   • Reusable for Clinical reporting; XML Data file
     generated by CTRP is provided back to site, can be uploaded to, eliminating dual entry for required elements
• Next: generate NCI Cancer Center (Core Grant) reports
• Commercial clinical trials systems, Cancer Center in-house
  systems being re-engineered to seamlessly send data to CTRP
• Currently 38 NCI-designated Cancer Centers registering trials
  with CTRP
Data Standards: Engage, reuse, seek consensus

• BRIDG (Biomedical Research Integrated Domain Group) model
  ( of the semantics of protocol-driven research
• Unprecedented collaborative effort between NCI/caBIG®, Food and
  Drug Administration and the two leading international standards
  bodies in biomedical data standards - Clinical Data Interchange
  Standards Consortium (CDISC) and Health Level Seven (HL7)
• Real results seen almost immediately for users of NCI apps
   • caBIG® clinical application developers had already been required
     to “preharmonize” with BRIDG
   • Allowed clinical research software applications to interoperate fast
     - working demo within two months
Looking to the Future:
Harnessing Clinical Care Data

• Several capabilities, needs and trends are converging:
  • Vast and growing amount of molecular information
  • Ability to aggregate and process clinical information on an
    unprecedented scale
  • Unsustainable cost of new drug development
  • $44+ Billion U.S. investment in Electronic Health Records

• Opportunity: harness these trends to deliver availability of
  clinical data for research use
  • Single source: driving clinical care data into the research
  • Potential to deliver very large, molecularly definable cohorts
  • National-scale patient-trial matching
caBIG® Clinical Information Suite
• NCI has collaborated with American Society of Clinical
  Oncology (ASCO), other professional societies, to develop
  specifications for an “oncology-extended Electronic Health
  • Project: Clinical Oncology Requirements for the Electronic Health
    Record (CORE)

• NCI is developing a series of software modules (collectively
  called the caBIG® Clinical Information Suite), based on the
  CORE specifications and the expressed needs of the NCI
  • Will use the caBIG® investment in a scalable, interoperable
    infrastructure as the “bridge” between clinical care and research
• Key to enablement of molecular medicine
caBIG® Capabilities:
    In Vivo Imaging
In Vivo Imaging:
The Research Landscape & Needs

             •   In Vivo Imaging supports both diagnosis and
                 the monitoring of the effectiveness of treatment,
                 using less invasive methods than other
             •   Imaging is a vital tool to support a variety of
                 different kinds of research studies.
             •   The transmittal of imaging data among
                 specialists and institutions remains
                 cumbersome, for both patients and health care
             •   Shared standards across imaging tools allow
                 data to be more easily compared across
                 imaging events and between researchers.
In Vivo Imaging:
The Research Landscape & Needs

           Researchers need imaging tools that:
           • Enhance communications between different
             specialists, such as oncologists, radiologists,
             surgeons, and pathologists
           • Help evaluate and annotate images, especially to
             evaluate tumor burden and response to treatment
           • Allow the secure and easy sharing of images,
             image analysis, and visualization algorithms
           • Enhance collaboration and enable data sharing
             between physicians through telemedicine, without
             costly image duplication
           • Foster interdisciplinary research that integrates
             medical imaging techniques into basic, translational,
             and clinical cancer research studies
In Vivo Imaging

         Patterns of Usage:
         •   Local (or NCI-hosted) management of digital images
         •   Public hosting of digital images
         •   Both of the above together; image sharing can be enabled after
             publication embargo/grace period
         •   Commercial radiology providers embracing standards, integrating
             caBIG® capabilities into their platforms
         caBIG® Capabilities:
         •   Store, annotate and share DICOM format medical images –
             National Biomedical Image Archive (NBIA)
         •   Capture radiologist‟s notes and share with colleagues using
             standards-based annotations – Annotation and Image Markup
         •   Rapidly evaluate multiple image analysis algorithms – eXtensible
             Imaging Platform – XIP/ Algorithms Validation Tools – AVT
         •   Connect commercial PACS systems with other 3rd party image
             databases – a Imaging Middleware/Virtual PACS
National Biomedical Imaging Archive (NBIA):
In Vivo Imaging Repository

 • Searchable online repository of
   in vivo images
 • Image formats supported
   include CT, MRI, and digital
   x-rays (DICOM standard)
 • Annotation files (PDF, image markup) and data provided
 • Federatable – can create virtual database across multiple
   instances of NBIA software
 • Supporting technologies:
    • AIM: structured annotation of images
    • XIP framework / AVT: rapid evaluation of multiple image
       analysis algorithms
    • Virtual PACS: enabling sharing of collections of images with
       commercial PACS systems
caBIG® Capabilities:
 Population Science
Population Sciences

• Population Sciences is concerned with:

 • Prevention, screening, and control screening of health risk

 • Post treatment effects and the impact of behavioral changes

 • Environmental interaction studies: drawing from both environmental
   information and molecular data

 • Exploring biomarkers that could indicate cancer progression
   potential among specific populations, such as the aging
PopSciGrid 1.0:

 •    Aggregate analysis of 14 datasets spanning 6 years (HINTS / NHIS tobacco
      tax data)
 •    Datasets located at multiple geographically dispersed sites
 •    Real-time access/analysis of public health and economic data
 •    Prospective geospatial analytics

Shaikh AR, Contractor N, Moser R, et al.. PopSciGrid: Using cyberinfrastructure to enable data harmonization, collaboration, and advanced
computation of nationally representative data. American Public Health Association Annual Meeting; November 11, 2009; Philadelphia, PA.
PopSciGrid 2.0
(Coming online 2011)

Application Layer (e.g., Enhanced disease
modeling, dashboards, data widgets…)

Grid Cyberinfrastructure

Common Vocabularies (Shared
ontologies, common data elements)

   Public Surveillance
   • BRFSS                                                        Biomedical
   • HINTS                                                        • Biological
   • NHIS                                                         • Genomic/proteomic
   • Tax            Grantees
   • US Census,... • CECCRS     Clinical/Health System   Community/Contextual
                    • CPHHD     • CRN                    • „Community health labs‟
                    • GEI       • QCCC projects          • GIS (geo-spatial data)
                    • TREC      • Registries (SEER)      • Physical/Built environment
                    • TTURCS                             • Real-time data capture
caBIG® Capabilities:
      Life Sciences
Scope of Life Sciences in caBIG®

• Bench-to-Bench
  • Rapid data exploration
  • Cross domain integration
  • Biomarker selection and qualification

• Biospecimen Management
  • Biospecimen inventory, tracking, annotation
  • Search and retrieval of biospecimen data to identify samples for
    translational research studies

• Bench-to-Bedside
  • Support for linking clinical outcomes with molecular findings
  • Hypothesis generation for new trials
Molecular Characterization:
The Research Landscape & Needs

                • Today’s research involves complex study designs
                  including the capture and refinement of clinical and
                  imaging data, and selection of samples for molecular
                • High throughput methods and sophisticated analysis
                  methods allow for the combination of proteomics,
                  gene expression, and other basic research data.
Researchers Need Tools That:
• Submit and annotate microarray data.
• Integrate data from multiple providers.
• Permit analysis and visualization of data.
• Enable sophisticated analyses involving interdisciplinary teams of
  investigators require interoperable data exchange and analysis.
Integrative Cancer Research Workspace

           Patterns of Usage:
           • Individual applications to address specific
             analysis needs
           • Access to NCI-hosted instances of applications
             online to manage data
           • “Software-as-a-service” from 3rd party hosting
           • Multiple applications independently to provide
             comprehensive analytical capabilities
           • Linked applications to create custom analysis
           • “Mix-and-match” analysis capabilities from the
             growing NCI Enterprise Services portfolio
           • Functional specifications to develop or modify,
             or connect in-house applications
Key Capabilities Supporting Data Management

• caTissue
  • Biorepository tool for biospecimen inventory management, tracking, and annotation.
    Permits users to enter and retrieve data concerning the collection, storage, quality
    assurance, and distribution of biospecimens
  • 2.16+ million biospecimens available through caTissue, 1.56+ million
    biospecimens are available for sharing via caGrid. 34 organizations in
    production usage
• caArray
  • Based on the MAGE standard, this tool to support the distributed management of
    microarray data and associated sample and experiment annotations
  • 42,900+ microarray experiments available for research use on caGrid
• caIntegrator
  • Allows researchers to set up custom, caBIG®-compatible web portals that bring
    together heterogeneous clinical, microarray and medical imaging data to enrich
    multidisciplinary research.
  • Supports data from Glioma Diagnostic Initiative, I-SPY, DC Lung Study, TCGA,
    and other large initiatives
• caBIO
  • Cancer Bioinformatics Infrastructure Objects is a robust resource for accessing
    biomedical annotations from curated data sources in an integrated view in support of
    knowledge discovery.
  • Data from more than 25 sources are made available on caGrid.
Key Capabilities Supporting Molecular Analysis
•   GenePattern - Broad Institute
     • Web-based genomic analysis platform provides more than 125 tools for gene
        expression analysis, proteomics, SNP analysis and common data processing
        tasks, including the creation of multi-step analysis pipelines
     • Estimated 13,000+ users at over 2,200 institutions
•   geWorkbench - Columbia University
     • Open-source software platform for genomic data integration, with more than 40
        analysis and visualization tools for gene expression, sequences, protein
        structures, pathways, and other biomedical data.
     • Estimated 400-500 users
•   Bioconductor - Fred Hutchison Cancer Research Center
     • Open-source, open-development platform for the analysis and comprehension of
        high-throughput genomic data. Bioconductor uses the R statistical programming
     • More than 380 analytical modules available
•   Cancer Genome Workbench
     • Hosts mutation, copy number, expression, and methylation data and provides
        tools for visualizing sample-level genomic and transcription alterations in various
     • Includes data from TCGA, TARGET, COSMIC, GSK, NCI60 and LPG.
    caBIG® Portals for The Cancer Genome Atlas (TCGA)
    Data: CMA and caArray
• The Cancer Molecular Analysis Portal
  • Enables users to access, search,
    visualize, and integrate genomic data
    with corresponding clinical information
  • Helps find novel correlations between
    data and observations that would be
    difficult or impossible to find using
    conventional analytical tools and

•   caArray
     • Provides web-based and caGrid
       access to raw and normalized TCGA
       microarray data
         •   Gene expression
         •   SNP
         •   Copy number variation
         •   Methylation
         •   microRNA
caBIG® Data Portals for TCGA Data:
Cancer Genome Workbench
• Viewers for integrated genomic data: genome, heatmap, landscape, protein, 3-D structure
• CGWB data is available over caGrid: mutation, copy number, SNP, sample annotations:
  more than 40,000 unique visitors since 2006


                                                                     Copy Number
                              350                                    mRNA Expression
                              300                                    miRNA Expression
                                                                     Exome (mutation)
                              250                                    RNAseq
                                                                     Whole Genome
                  # Samples





                                    GBM   OV   READ   LUSC   LUAD   COAD     BRCA   LAML   KIAP

                                          TCGA datasets available in CGWB
  TCGA Radiology System Overview
                                                                       Clear Canvas
        NBIA                                                         AIM Data Service

                                Structured                           Structured
                              interpretation                    annotations available
                              by radiologists                       on the grid

MRI images
available on

               caIntegrator                                                   Preliminary Findings
                                                        • Gene expression correlations with imaging features can be
                                                          identified; a number of genes are associated with multiple
                                       Integrated         features
                                      visualization     • Genes involved in angiogenesis are associated with several
                                      and analysis        imaging features
                                                        • Survival of patients with greater thickness of enhancement
                                                          was significantly for shorter than those who had less.
Biospecimen Management:
The Research Landscape & Needs

           • For many researchers, biospecimens are among
             the most rare and valuable resources in the
             research process, providing a bridge between
             emerging molecular information and clinical data.

           • Can include a range of biological materials: tumor
             biopsies, bone marrow, blood, and others.

           • While biospecimens may be used primarily for
             diagnostic purposes or as a part of a treatment
             intervention, in some cases, patients may donate
             these resources for further research as well.
Biospecimen Management:
The Research Landscape & Needs
                   Researchers need tools that:

                   • Track the consent process when donating
                     biospecimens. This is critical, as it drives
                     how these resources may or may not be
                     used for future research.

                   • Implement best practices for biospecimen
                     management to extend the usefulness and
                     quality of these resources.

                   • Access libraries of well characterized and
                     clinically annotated biospecimens

                   • Inventory and track the storage, distribution,
                     and quality assurance of their own
Tissue Banks & Pathology Tools Workspace

           Patterns of Usage:
           • One central biorepository to manage multiple
              biospecimen collections
           • Separate local instances of biorepository for each
           • Biorepository to securely share samples among
              multiple collaborating organizations
           • caBIG® biorepository as a back end to existing
              commercial or legacy biospecimen management
              systems in-house

           caBIG® Capabilities:
           • Collect, store, annotate, aliquot, search, and track
              distribution of many types of biospecimens --
              caTissue Suite
Access caBIG® Capabilities and Tools

Resource                               Link

Access the full list of caBIG® tools

Access the caBIG® Workspaces 
caBIG® Infrastructure
caBIG® Infrastructure: Key Elements
• Enhance interoperability: Extend existing interoperability paradigms
  (Web Services, REST, etc.) to support data semantics. This capability
  is called a semantic Services Oriented Architecture (sSOA).
• Provide for:
  • Discovery
  • Common Semantics
  • Workflow
  • Federated security infrastructure
  • Toolkits that support these capabilities
• Reuse of existing community capabilities
  • Software
  • Vocabularies, Terminologies and Ontologies
  • Standards
• Ensure that the caBIG® Infrastructure provides “just enough”
Usage Patterns for caBIG® Systems

• Service Specifications: caBIG® services include layered service
  specifications (conceptual model, platform independent model and platform
  specific model) for organizations wishing to build or adapt products that can
  interoperate with other local or national services.

• Reference Implementations: caBIG® provides reference implementations of
  applications and services that can be used to support the local development
  and deployment of derivative systems

• Community Deployed Systems: caBIG® systems are deployed at
  institutions to support local biomedical informatics needs while allowing for
  integrations with other local systems or with partners outside of an institution.

• caBIG® Hosted Systems: caBIG® maintains hosted versions of its services
  (either standalone or part of larger systems depending on the nature of the
  service) that are accessible to appropriately authorized entities
Architecture Workspace

Patterns of Usage:
• Provide interoperability between systems to support query and aggregation
   across multiple data repositories, both within institutions and between
• Enable cross-system integration of data to support translational research
• Provide workflow capabilities so that researchers can create unique analysis
• Develop specifications that can be used to build or modify in-house developed
   applications for interoperability

caBIG® Capabilities:
• caGrid, the caBIG® interoperability infrastructure
• GAARDS, the caBIG® federated security environment
• Interoperability Standards to enable easier integration

                    caBIG® Architecture Workspace (ARCH)
Vocabulary and Common Data Elements Workspace
Patterns of Usage:
• Provide interoperability between systems to support query and aggregation
   across multiple data repositories, both within institutions and between
• Enable cross-system integration of data to support translational research
• Provide workflow capabilities so that researchers can create unique analysis
• Develop specifications that can be used to build or modify in-house developed
   applications for interoperability
• Provide data standards to support interoperability and reuse

caBIG® Capabilities:
• Standard Vocabularies including the NCI Thesaurus, SNOMED-CT, CTC-AE,
   Hawaii Nutritional Ontology, etc.
• Standard Data Elements defined by the community
• Interoperability Standards to enable easier integration

          caBIG® Vocabularies and Common Data Elements Workspace
What is caGrid?

• caGrid provides the core software infrastructure and tooling needed to share
  well-defined data and resources. Its development is supervised by the caBIG®
  Architecture Workspace.

• caGrid tools create the G in caBIG®.: Connecting interoperable services and
  data sources to the Grid is a key target of achieving interoperability

• “Getting a service on the Grid” means creating a service which meets
  interoperability standards, and then connecting that service or data source to
  a Grid (internal or external). Aspects of the standards include interfaces,
  methods, and security. Services are advertised on an index service, allowing
  others to find them.
 caGrid: Support for Infrastructure Requirements

lexEVS Service
caDSR Service
GME Service
Data Standards


                                                            FQP Service
Authentication Service
                                                           Workflow Engine
   Dorian Service
    Grid Grouper
 Delegation Service                                     Security
        CLM                                   Toolkits and Transport
GAARDS: caGrid Security Infrastructure
     Authentication service provides an interface to local credentialling capabilities.
     Interfaces to Dorian through signed SAML assertions. Reference
     implementations for LDAP, database and Shibboleth

     Dorian manages X.509 certificates and associated proxies. Accepts signed
     SAML assertions from authentication services and provides common invocation
     credentials for services.

     Grid Trust Service (GTS) validates that signed SAML assertions, X.509
     certificates (and associated PKI artifacts) and host certificates are from trusted

    Grid Grouper provides the necessary capabilities to manage virtual
    organizations with associated roles. Integrates directly with systems, or can
    integrate via caBIG® Common Security Module (CSM)

    Delegation service provides the ability to delegate user credentials for the
    purpose of implementing workflows. Used frequently in the Clinical Trials Suite
    to support work in multiple systems.
caGrid in use outside of caBIG®

• A variety of organizations are adopting or adapting caGrid to
  build interoperability Infrastructures. Examples include:
   • National Cancer Research Institute (NCRI) UK: Using caGrid
     as the basis for their ONIX infrastructure
   • NHLBI CardioVascular Research Grid (CVRG): NCI caGrid
     and CVRG have begun cross-indexing services to allow access
     to capabilities available on both Grids
   • Clinical Translational Science Awards (CTSA): Ohio State
     University implementing caGrid for CTSA awardees
   • Centre for the Development of Advanced Computing
     (CDAC), Ministry of Information Technology, Government of
     India: Implementing caGrid to bring Indian research
     organizations into the community of cancer research.
Common Semantics: Vocabularies Hosted by caBIG®
• General Purpose            • Clinical Sciences
   • NCI Thesaurus              • CTC-AE
   • NCI Metathesaurus          • ICD-10
   • Biomed GT                  • ICD-9-CM
   • UMLS Semantic Network      • LOINC
• Life Sciences                 • MedDRA
   • Gene Ontology              • NDF-RT
   • HUGO                       • PDQ
   • MGED Ontology              • RadLex
   • NanoParticle Ontology      • SNOMED-CT
   • OBI
   • Zebrafish
     Common Semantics: Vocabulary Content and Usage
        Cumulative NCI Thesaurus                             caBIG® Vocabulary Partners
           Concepts by Year                           •   National Library of Medicine
                                                      •   US FDA
    50000                                             •   NIH Institutes (NICHD, NIDCR, NINDS)
    30000                                             •   Mayo Clinic
                                                      •   Clinical Data Interchange Standards
            2005   2006   2007   2008   2009   2010       Consortium
                                                      •   Health Level 7
•      With the introduction of newer
       browsers in 2009, usage has                    •   National Center for Biomedical
       grown to over 1,000,000                            Ontologies (NCBO)
       accesses per month in 2010.
•      Terminology servers have been
       stable at 1 to 2 million
       accesses per month for the last
       2 to 3 years.
Migrating to the Cloud
• The cross-cutting workspaces are continuing to support community
  driven infrastructural needs. Currently there are two major initiatives:
  • Semantics 2.0 to enhance support for dynamic semantics and on-site
  • Platform/Security/Tools 2.0 to support advanced analytical workflows,
     update technology, and enhance support for high-performance computing
     while continuing to reduce the barrier to entry to caBIG®.
• Ultimately, the future of caBIG® infrastructure is in the cloud, leveraging
  new, commoditized computing models such as Software as a Service
  (SaaS), Platform as a Service (PaaS) and Infrastructure as a Service
• caBIG® investments in semantics (ontologies, common data elements)
  and distributed computing are enablers for institutions that wish to
  migrate applications to the cloud or use advanced analytic capabilities
  provided by cloud infrastructure.
• Initial experiments with cloud suggest that caBIG® technology adapts
  well to a variety of private and public cloud environments
For More on the Grid

 Resource                   Link

 caGrid Knowledge Center – Main
                            page (paste link in browser)

 caGrid Portal     – The Web
                            interface to caGrid, allowing browsing for
                            available services and inspecting service details

 caGrid Project Web Sites – Primary development site
                            caGrid - caGrid product Web site
caBIG® Data Sharing and
    Security Framework
Benefits of Data Sharing

• One of the primary goals of caBIG® is to enable and promote
  data sharing. This is important because:

 • The large volumes of research data created by high throughput
   genomics and proteomics can best be harvested by teams.
 • Translational and personalized medicine promise great benefits.
   Realizing these benefits requires that collaboration occur both
   within disciplines and between disciplines, harnessing the broadest
   possible knowledge and skill bases.
 • Data sharing raises the visibility of individual studies and data
   collections, and opens avenues for data dissemination and
 • Grants from NIH exceeding $500,000 require a plan for data
Barriers to Sharing Data

• Historically, there have been a number of legal, cultural, and
  technical barriers to data sharing:

 • Varying obligations under federal and state privacy and security
   laws and standards
 • Varying local requirements and oversight of human subjects
   research by ethical review boards
 • Traditional academic reward structures driven by individual
   success, rather than team-based work
 • Researcher, institutional, and sponsor concerns about the
   protection of intellectual property
 • Patient safety concerns related to premature access to unvalidated
 • Public perceptions regarding privacy, security, and confidentiality of
   electronic health data
Data Sharing and Security Framework (DSSF)

• Designed to facilitate appropriate data sharing among
  organizations by addressing legal, regulatory, policy, ethical,
  proprietary, contractual, and socio-cultural barriers.

• Offers resources to address the potential and perceived
  restrictions on data sharing:
   •   Tools to evaluate the sensitivity of data
   •   Tools to expedite data sharing arrangements between organizations
   •   Guidance documents and white papers to inform institutions on data sharing

• Offers technology infrastructure to facilitate appropriate access
  to data:
   •   Policies to assure that organizations adhere to security standards
   •   Tools to implement federated authentication and authorization
caBIG® Data Sharing and
Security Framework (DSSF)
The Data Sharing and Security Framework includes:

• Resources to address the legal and cultural barriers to data sharing:

  • Tools to evaluate the sensitivity of data
  • Tools to expedite the execution of data sharing agreements between
    organizations (guidelines for data sharing plans)
  • Tools to capture patient consent to share data (model informed consent
  • Other model documents and white papers to inform institutions on data
    sharing issues

• Technology infrastructure to ensure secure data exchanges:

  • Tools to implement federated authentication and authorization
  • Policies to assure that organizations adhere to security standards
    (Grid Host Agreement)
Resources to address the legal and cultural barriers
Goal of the DSSF: Selecting Access Controls

• The appropriate mechanism for sharing data depends in part on the
  nature of the data to be shared:

  • Does it contain personal health information?
  • Does it contain proprietary information?
  • Does it relate to a closed study?

• Depending on the answers to these and other questions, it may be
  appropriate to share data:

  • Publicly, without access controls;
  • Via a standard data sharing agreement between organizations; or
  • Via a customized data sharing agreement between organizations

• caBIG® developed the DSSF to help organizations determine the access
  controls needed for any given set of data.
Resources to address the legal and cultural barriers
How the Data Sensitivity
Evaluation Framework is Used
• For the data being considered, assess sensitivity for each
  principal element:

   • Economic/Proprietary Value (to Researcher/Institution)
   • Privacy Considerations
   • IRB/Ethical Restrictions
   • Sponsor Restrictions or Requirements

• Assign a low (green), medium (yellow) or high (orange)
  sensitivity rating to the data

• Use sensitivity analysis to determine (1) security/data access
  controls; and (2) the possible type of data sharing agreements

   The institution providing the data makes this determination!
  Resources to address the legal and cultural barriers
  Guidelines for Data Sharing Plans

• On the basis of applying the DSSF, an organization may determine that a
  data sharing plan is required. caBIG® supplies guidelines for developing
  such plans.

• The guidelines provide a framework for organizing information about the
  data to be shared, and the designated mechanism for sharing. This

  • Background about the project that will use caBIG® infrastructure
  • Issues that drive legal/regulatory determinations such as summary of data
    elements, intended recipients, mechanisms for data sharing (access controls,
    agreements), timing, objectives of the project, and who may have interest in
    the data
  • Information about institutional units that must approve data sharing plans
  • Open-ended questions regarding additional anticipated challenges
 Resources to address the legal and cultural barriers
 Model Informed Consent/Authorization

In some instances, patient consent must also be obtained for data sharing.
  caBIG® supplies model language for such consents.

• Model language about caBIG® to use in pre-existing
  • Provide basic language to facilitate data sharing within their own documents; facilitate
    adoption of caBIG® language in other models

• Standardized choices for research participants related to specimen use
  and/or data sharing
  • Standardize choices in authorization/consent forms; facilitate adherence to
    patient/participant choices

• Model informed consent and HIPAA authorization document – “the whole
  package” (disclosure and options)
  • Assist institutions and smaller provider-based participants in drafting informed
    consent/authorization forms compliant with Common Rule, FDA and HIPAA
Technology infrastructure to ensure secure data exchanges
Security Basics:
Authentication and Authorization

• In addition to data sharing plans and consents, a technology
  infrastructure is needed to ensure secure data exchange.

• The fundamental security issues are:

  • Authentication / identity management: How can an organization
    sharing data be sure that a user is who he claims he is?

  • Authorization / privilege management: Once a user’s identity is
    assured, how can an organization sharing data be sure that he will be
    allowed access to only the agreed-upon data?
Technology infrastructure to ensure secure data exchanges
•   Authentication is the process of verifying that a user is who he claims to be.
    Typically, a user presents credentials to an entity that is responsible for authenticating.
    If the credentials are accepted, the entity issues a “certificate.” Credentials can
    include: Something you know (a password); Something you are (a fingerprint); or
    Something you have (a hardware key).

•   Different data sharing scenarios require different levels of assurance about
    authentication. The level of assurance about authentication can be increased by: (1)
    Using factors that are not easily shared (a fingerprint, a one-time password, a
    hardware key); and (2) Requiring multiple credentials.

•   caBIG® will use the levels of assurance specified in the Federal E-
    Authentication Initiative

     •   Level 1 – e.g., no identity vetting is performed
     •   Level 2 – e.g., one credential is required
     •   Level 3 – e.g., multiple credentials are required
     •   Level 4 – e.g., multiple credentials are required and one must be a
         hardware token
Technology infrastructure to ensure secure data exchanges
Authorization is the process of determining if an authenticated person
qualifies to conduct certain activities in cyberspace. The process of
authorizing a user is:

 1. The user wishing to access data sends his authentication certificate to the
    organization that owns the data.
 2. The organization that will be sharing data sends this authentication certificate on to a
    “source of authority”, which will determine whether that user qualifies for access to
    the data.
 3. The source of authority uses the authentication certificate to check whether it
    “knows” this user.
 4. If it does, the source of authority then checks whether the identified person qualifies
    for access to the data.
 5. The source of authority sends its determination to the organization that will be
    sharing the data.
 6. Based on this determination, the organization either allows or forbids data access

 The institution that is the steward of the data authorizes who can access
    the data. The source of authority is simply the mechanism that sends
    the authentication credentials back and forth.
Technology infrastructure to ensure secure data exchanges
Federated Security, the Trust Fabric

• Past data sharing has been implemented via custom bilateral or
  multilateral communication, but this approach is not scalable.

• caGrid is built to support federated authentication and authorization
  whereby organizations can accept authentication and authorization
  performed at other organizations.

  • Technical components of the caGrid support for federated authentication and
    authorization are described in the lesson on caGrid.

• Organizations hosting a grid node must sign the Grid Host Agreement, a
  set of policies to assure that organizations adhere to security standards.

• The network of mutual trust relationships among organizations regarding
  authentication and authorization is sometimes called the “Grid Trust
Learning More About the DSSF

Resource                               Link

caBIG® Data Sharing and Intellectual
Capital Workspace                      s/DSIC_SLWG/

caBIG® Data Sharing and Intellectual   https://cabig-
Capital Knowledge Center     
                                       C_Tools (paste link in browser)
Deploying caBIG®
caBIG® Scope of Impact

• Deploying caBIG® is an
  enterprise-level project – not just
  an IT or research initiative.

• Connecting involves:

  Decisions impacting groups and
  users across the organization.

  Integration of new workflows,
  data requirements, and perhaps
  new systems.                          The caBIG® deployment
                                        project team needs to include
  New data sharing complexities         people from all areas of the
  and opportunities.                    organization, with diverse
What Does Successful caBIG® Deployment Mean?
Key Steps in caBIG® Deployment

1.   Identify Key Stakeholders
2.   Conduct Self-Assessment
3.   Determine Goals
4.   Create Implementation Plans
5.   Implement and Assess
 Mapping Specific Actions
  Define Center caBIG® Goals and obtain sign-off from Senior
     Leadership              caBIG® Goals           Implementation          Implementation
     Assessment                                            Plan
  Explore target data and analytic resources to be shared
  Where are we now?
  Explore resource sharing we Where do
                               want to be?
                                                 How will
                                                            and there?
                                          agreements we getrequired policies   Getting There!

  Develop our Center‟s Center ® Implementation Plan Acquire resources
• Develop                • Define                • Develop our Center’s •
                                                                           with sign-off
     from Senior Leadership and
   understanding of        caBIG® Goals            caBIG®                 required for caBIG®
   our current             obtain sign-off from    Implementation Plan
  Identify tasks, timelines, performance metrics and implementation
   bioinformatics          Senior Leadership       with sign-off from
                                                                        • Deploy caBIG® at our
  Work with the caBIG® target data identify Leadership resources
   landscape and         • Explore team to         Senior support         Center
   strategy                and analytic          • Identify tasks,      • Deploy functional caGrid
• Acquire resources required for caBIG® implementation
   Identify key staff at   resources to be         timelines,             node
   our Center to           shared                  performance metrics • Adopt and/or adapt
   support a caBIG®• at our resource
  Deploy Self-                      Center         and resources
                           Explore                                        standards and/or
  Deploy functional caGrid node • Work with the
                           sharing agreements                             applications
• Conduct our              and required policies   caBIG® team to       •
   Center’s and/or adapt standards and/or applications Apply the Data Sharing
  Adopt Self-                                      identify support       and Security Framework
   Assessment using                                resources
  Apply the Data Sharing and Security Framework
   template provided
  by caBIG®
Key Stakeholder Groups

Engaging stakeholders upfront helps identify shared goals,
revealing possible benefits for many different groups.

  Senior Leadership - Provide vision and direction, and caBIG
Senior Leadership Provide vision and direction for the help align
  deployment. Help align caBIG® deployment goals with overall organization
                    deployment goals with overall organization goals.
 Domain Leads &          Primary end users of caBIG® users of caBIG® tools
Domain Leads & Researcher End User - Primary endtools and standards.
                         Identify the use and gaps that lead to help make
   and standards. Identify the use casescases and gaps, andspecific
                         deployment adopt/adapt
   deployment goals, and help makedecisions. decisions.
End User
 Technologists - Needed to deploy caBIG® infrastructure and tools. Harmonize
                         Deploy caBIG® infrastructure and tools. Harmonize
   local vocabularies, standards, and data models with the caBIG® program.
   Connect caBIG® support resources with local needs. and data models.
                         local vocabularies, standards,
                         Connect caBIG® support with needs.
 Policy/Regulatory Stakeholders - Key influencers include regulatory and
   legal experts, and policy makers who establish standards for data sharing
   within the institution.
Policy/Regulatory Regulatory and legal experts, and policy makers
Stakeholders          who establish standards for data sharing.
Conduct a Self-Assessment

• Your key stakeholders can assist you in answering important
  questions to help establish goals. The caBIG® Deployment Self-
  Assessment (PDF) is a tool to help gather this information.

• Starting point thoughts:

 – It is vital to understand what you already have (tools and
   infrastructure) in identifying what you need. This self-assessment
   will also help “preview” technical and organizational readiness.

 – Establishing conceptual support and buy-in at senior levels is
   critical – and – it is the stakeholders with the domain and technical
   skills who will ultimately translate that commitment into the tactical
   action needed. You need both to be successful.
  Key Questions Will Help
  Define Goals and Direction

• What is our overarching             • What biomedical informatics
  biomedical informatics strategy?      capabilities and tools do we have?
  What are the key needs and
  drivers propelling that strategy?   • Where are our gaps, and what do
                                        we most need? What local policies
• How do we want to improve             do we need to understand and
  our biomedical research data          adhere to?
  collection, management, and
  analysis activities?                • What activities do we wish to take
                                        on? How will we do it? What
• What data might we want or            resources will we need? Who will
  need to share, with whom?             be involved? What are our desired
Establish Goals and Create Plans

• Establishing goals helps you determine where you are going; the
  Implementation Plan proposes how to get there.

• Here are links to presentations and templates that will help you
  create these planning tools. They were created to support
  formal Cancer Center Deployment efforts in 2007-2008; however,
  they are good models for anyone hoping to deploy caBIG®.

  Goals Documents
 • Overview Presentation -
 • Goals Template -
  Implementation Planning
 • Overview Presentation –
 • Implementation Template -
caBIG® and Interoperability

Options for leveraging caBIG® for data interoperability.

 • Adopt existing caBIG® software tools

 • Use caBIG® Application Programming Interfaces and Software
   Development Kits to modify or develop tools to be interoperable

 • Work with a vendor to adapt existing software to be interoperable with
   caBIG® or to enable connection to other tools

 • Implement caGrid technology to connect multiple databases or make data
   from diverse legacy systems interoperable

 • Use the 75+ biomedical terminologies and vocabularies maintained by
   caBIG® to manage and annotate data

 • Select re-usable software modules from NCI to develop customized
   applications that are interoperable with other tools and technology
Strategic Success Factors

• Center Deployment Leads have identified a number of critical
  success factors in deploying caBIG®
• Consider these areas in developing a strategic plan for your
  deployment effort
• Many insights are organizational, not technical
Establish Leadership Support

• Many Deployment Leads agree that Senior Leadership support is
  a critical factor in success

• The more closely aligned the organization‟s Deployment Lead is
  with Senior Leadership, the more successful he or she is likely to

• Leadership is also needed at a project level – project champions
  with tangible needs or problems provide focus and clarity amidst
Staffing: Get the Right People at the
Right Time for the Right Need

• Deployment Leads cannot do this alone – accessing staff
  resources is a critical success factor.

• Includes information technologists as well as Institution Review
  Board (IRB) liaisons, information managers, Legal Counsel, Tech
  Transfer, Privacy Officers, IT Security.

• Having access to the right people when you are ready in the
  project timeline, and being able to flex access to staff as hurdles
  are encountered and removed, will speed work.
Facilitate Alignment:
Initiatives, Tools, and People

• Identifying other initiatives with complementary goals can help
  identify potential points of value, and the opportunity for
  resource sharing

• The most potential for value lies in the connections and
  integration points between tools and across infrastructure:
  individual tools are useful, integrated sets increase value

• Find other Deployment Leads and connect with them to share
  approaches, get support, and facilitate collaboration between
Manage Expectations Appropriately

• Realizing and maximizing the true benefits of interoperability
  takes time and investment – deploying caBIG® is part of a larger
  biomedical informatics strategy.

• As capabilities continue to evolve, updates and refreshes will be
  needed – deployment is a long term commitment.

• Plan a generous amount of time for:
  • Tool evaluation, selection, and installation
  • Integration with existing tools and infrastructure
  • Data migration
  • User training
caBIG® Offers Support
Along the Way
                              Support Resources

• A range of caBIG®           • NCI Web Resources to access tools,
  support options               training and education and
  provide expert                discussion forums
  guidance to those
                              • Product Representatives introduce
  selecting and
                                tools to a group, and guide interested
  implementing caBIG®
                                parties through the system
  tools and infrastructure.
                              • Knowledge Centers offer domain
• More about support            and tools expertise
  options later in this
                              • Support Service Providers for
                                customized client support
                              • Application Support for tool trouble-
                                shooting and user accounts
Working Towards
What is caBIG® Interoperability?

• A key goal for caBIG® is to use standards to ensure
  interoperability among caBIG® tools – so that data can be
  exchanged and understood between systems.
             The Interoperability Spectrum

                      Single                                       Community of             Broad                                  Any
                                       Local        University                                                  Any
 Scope of

                   Researcher or                                    Standards            Communities                            Authorized
                                     Colleagues     Colleagues                                               Researcher
                       Lab                                            Users             of Researchers                             User

                   Proprietary        One-off          Data        Uniform             Grid                    sSOA             Platform
                      Data         connections     Standards     Data Models      Infrastructure                              Independent

                    Formats            (open                     (Ontologies)                                                    Access
                                     formats,   Web                                     Semantic Web
                                     parsers)  Portals
                                                                                           Multi-disciplinary data analysis

                                                                              caBIG® Capabilities
                                                    Industry                             caGrid
                                                      HL7                                   caIntegrator2

                                                                  Data Mgmt     Data Portals (TCGA, CMA)
                                                                                            Clinical Trials Suite
                                                                                                                    Enterprise Services
When Interoperability Is Important

• caBIG® capabilities that are adopted are already designed to be

• You need to work towards compatibility if:

  • If you want to share data (“a data service”) with others via the Grid
  • If you want to share data analysis software (“analytical service”) with others
    via the Grid
  • You want to integrate a caBIG® tool into your organization’s workflow or
    existing tool base, and exchange data between tools and the Grid
  • You want to extend or modify an existing open source caBIG® tool to meet
    your needs
Elements of Interoperability

• There are four areas of compatibility– an application must meet
  guidelines in all four areas to be interoperable with caBIG®

• Semantic Interoperability
  • Information Models
  • Vocabularies and Ontologies
  • Common Data Elements (CDEs)
                                                     Vocabularies   Information
• Syntactic Interoperability                                          Models

  • Programming and Messaging Interfaces
    (e.g., Application Programming Interfaces)
How the Elements Work Together

1. caBIG® compatibility begins with an Information (or Data)
   Model, which represents the interfaces and relationships of a

2. The information model is then annotated with Controlled
   Vocabularies to establish shared meaning across model
   components (called “semantic integration”).

3. This annotated information model is then converted into
   Common Data Elements (CDEs) that provide the structure (or
   format) for the data.

4. The information model is also used to generate the
   Application Programming Interface (API): the mechanism by
   which data are exchanged.
The Five Steps to Compatibility
•      There are five steps in working towards compatibility:

        1. Creating an Information Model in a modeling tool
        2. Performing Semantic Integration (Vocabularies) using the Semantic Integration
           Workbench (SIW)
        3. Transforming the Information Model into Metadata (Common Data Elements)
           using the UML Loader
        4. Generating Code and Messaging Interfaces (APIs) using caCORE SDK Code
        5. Generating a caGrid Interface using “Introduce”

                                                                           y
    Create an          Perform             Transform the    Generate Code      Generate a
    Information        Semantic            Information      and Messaging      caGrid Interface
                       Integration using   Model into       Interfaces using   using “Introduce”
    Model in a
                       the Semantic        Metadata using   the caCORE SDK              y
    Modeling Tool      Integration         the UML Loader   Code Generator
                       Workbench (SIW)

     Information       Vocabularies            CDEs             APIs
Variables Impacting Development Time

• Past experience has shown that six key variables impact the effort and
  time required to work towards interoperability:

  • Existing familiarity with tools: Access to a development team that has knowledge
    and skills related to UML and CDE models and tools, Enterprise Architect, and NCI
    tools such as the Semantic Integration Workbench (SIW), caAdapter, and Introduce
    will help speed development.
  • UML modeling skills: Team understanding and facility with classes, attributes, and
    data types; cardinality; UML structures; understanding of inheritances and
    associations; and camel case naming conventions will speed development time.
  • Projected number of classes and attributes: The projected number of
    classes/attributes for the tool in question will impact development time.
  • Technology/Infrastructure: The following technical environment and infrastructure
    need to be in place: Windows, Java WebStart, Internet Explorer 6.0, and the
    Enterprise Architect software tool.
  • Access to Domain Expertise: Access to the appropriate domain specialists to
    support the data modeling effort will facilitate and speed development.
  • Time Availability: Having the team development available to spend concentrated
    time on the project will help speed efforts.
Learning More About Interoperability

Current Version of Compatibility Guidelines -
The caCORE UML browser ( )
     is a great resource when you are adapting a legacy tool and want to:
       • Identify the CDEs that make up the database schema of an existing
          caBIG® tool
       • Determine degree of overlap between the data fields/elements of a caBIG®
          tool with a legacy tool
       • Conduct a mapping exercise to connect a legacy database to a caBIG®
Vocabulary Knowledge Center -

caCORE and Infrastructure Wiki -
caCORE Curriculum -
Design Pattern Overview

•   The following slides outline SIX different approaches for adapting a
    tool to be caBIG®-compatible. These are conceptual models –
    foundational strategies – for shaping an adaptation roadmap. This is
    neither an exhaustive nor mandated list – it simply provides examples
    that we believe can facilitate adaptation based on experience to date.

    Understanding Design Patterns

    The six approaches offered here are based on the concept of design patterns, which
    Wikipedia defines as “a general reusable solution to a commonly occurring problem in
    software design.”

    Design patterns are essentially templates: standardized conceptual solutions that can
    be applied to many different situations. Based on caBIG® experiences, there are six
    standard ways to approach the software design problems presented by the adapt path
    identified. These are presented in the following slides.
    1: Wrapper

•    Use Scenario: There is a desire to continue
     using an existing legacy tool, without             caGrid

     adopting one from caBIG®. In this case, a
     new caBIG® API is generated to allow data
     exchange between the existing tool and           caBIG®-
     caGrid.                                         Compatible

•    Description: Involves using caCORE to
     create and semantically map a new caBIG®
     compatible API to the existing API. The
     caBIG® API then connects to caGrid. This         Database
     may or may not require a Unified Modeling
                                                   User Interface (UI)
     Language (UML) model – it could be a field-
     to-field mapping activity.                     Existing Tool
    2: Direct Data Access
    (Interim Solution)

•    Use Scenario: There is a desire or need
     to retain the legacy database, but also
     willingness to adopt a caBIG® tool for
     its user interface and to facilitate
     connection to caGrid.
•    Description: Data “live” in the legacy
     database; caBIG® tool is adapted to
     query that database (can be dynamic or        API                  API
     on demand).                                              Or

•    NOTE: This may be an effective             Database            Database
     transition strategy but is not
                                                    UI                  UI
     recommended as a long-term
     alternative. It may be an interim step     caBIG® Tool        Existing Tool
     leading into the following two patterns.
    3: Message Broker

•    Use Scenario: The existing tool
     already uses standardized
     messages, such as HL7, to
     facilitate dynamic information                       If tools already send
                                                          messages (e.g., HL7), this
     exchange between diverse tools.      caGrid          may be the most effective
                                                          way to transmit data from
                                                          existing tools to caGrid.
•    Description: A “messaging hub,”
     such as caXchange, brokers the
     transfer of messages from the         API                 API             API
     existing tools, feeds into a
     caBIG® tool, and then out to        Database          Database        Database
     caGrid. This same concept can
     be used with other file formats        UI                  UI             UI
     like CSV or tab-delimited ASCII.   caBIG® Tool             Existing Tools
     caAdapter facilitates mapping
     into caBIG® standard elements.
                                                    Message Broker Hub
    4: Data Warehouse

•    Use Scenario: There is a desire
     or need to retain the legacy
     database and user interface, but
     there is a willingness to use a
     caBIG® tool to connect to caGrid.

•    Description: Data “live” in the
     legacy database; Extract,
     Transfer, and Load (ETL)               API                                 API
     techniques are used to
                                                                  ETL Script
     periodically import data into the
                                          Database                             Database
     caBIG® tool to serve to caGrid.                          Warehouse
     ETL techniques are designed to          UI      SQL Script                      UI
     ensure that data are semantically
                                         caBIG® Tool                 Existing Tool
    5: Clone and Own

•    Use Scenario: There is a desire or need to
     retain the legacy database and user                     caGrid
     interface, and willingness to use a caBIG®
     tool API to connect to caGrid.

                                                           caBIG® API
•    Description: An existing caBIG®-
     compatible API is used and “copied” into
     the existing tool. This involves mapping               Database
     legacy schema to the caBIG® API. Reusing
     existing caBIG® CDEs is key to this effort. If
     there are modifications, changes, or                 Existing Tool
     upgrades, the center must incorporate the
     changes in the API.                              Need to either directly use
                                                      the caBIG® application API or
                                                      develop a duplicate API that
                                                      operates exactly the same
                                                      way in the application
    6: Generate an API

•    Use Scenario: There is a desire to retain the           caGrid
     legacy database, and no existing caBIG® API is
     appropriate to map from the legacy database to
     caGrid. Instead, caBIG® metadata descriptions
     and CDEs are used to generate a new API to
     rest on the legacy tool.                                  API

•    Description: This alternative involves using           Database
     the caCORE Software Development Kit‟s (SDK)
     shared vocabularies and CDEs to construct a
     brand new caBIG®-compatible API that is             Existing Tool
     mapped to the tables in the existing legacy
     database.                                        Use caCORE SDK to
                                                      generate an API from
                                                      existing caBIG® meta-
                                                      data descriptions and
                                                      link to legacy database
Variables Affecting Adapt Options

Three variables will help determine the investment required to adapt an existing
tool – as well as the possible approach that may be most appropriate.

Characteristics of Existing Tool - The more visibility you have
about the tool, the more options there are for adaptation.

Staffing, Size, Skills, & Budget - A team with both domain and
technical experts will greatly facilitate the process – technical skill sets include
data modeling and curation, annotation, familiarity with data model reuse,
metadata reuse, semantic integration, Java programming, and database
development skills.

Technology Environment - The degree of control over the
infrastructure and environment may affect the choice of the best adapt path.
Accessing caBIG® Support
Content Overview

•   Overview of caBIG® Customer Support Map
•   Web Resources
•   Product Representatives
•   Workspaces
•   Knowledge Centers
•   caBIG® Support Service Providers
•   Applications Support
•   Understanding the Boundaries: Examples
caBIG® Customer Support Map
                                Selection and Use of Tools, Infrastructure,           Institutional
PHASE:        Exploration         Standards, Policy and/or Guidelines                  Adoption

                            ®                      Tools
  NCI Web     •          caBIG Website
                    The Customer Support diagram illustrates the different
                        Getting Connected        Inventory
                    support resources available from caBIG® at different phases
                        caBIG Essentials          Training

                    of caBIG® deployment, from exploration to the selection and
                                                          to institutional
                    use of caBIG® resourcesKnowledge Center adoption. Support
                    often begins with information available through the caBIG ®
                    websites; as exploration deepens, the caBIG® Product
                    Representatives, Workspacethe caBIG® Knowledge Centers and
                    Workspaces offer a range of resources and opportunities to
                    participate in the caBIG® community. NCICB Applications
                    Support isProductavailable as a supplemental
                    resource. Organizations wishing to receive customized
                    caBIG® -related services may wish to establish an agreement
                    with a caBIG® Support Service Provider.             Application

                                                    Customer/Provider             Support Service
                                                       Agreement:                   Providers
Web Resources: caBIG®
Community Website is Central Source
• Community Website
  • Getting Connected
  • caBIG® Essentials
  • Tools Landing Pages
  • Training Portal

• Examples of Web Resources:
  • The caBIG® Deployment
    Self-Assessment is a self-guided
    tool to help you identify technical
    and organizational readiness to deploy caBIG®.
    The automated report can be downloaded for later use.
  • Tool Landing Pages offer information about each caBIG® tool with links to
    installation files, demos, documentation, and training.
Product Representatives

CBIIT Product Representatives can provide your organization an overview tour
of specific caBIG® tools in the life sciences and clinical sciences domains. Send
request to

Use Product Reps as a resource in your caBIG® exploration process!
    caBIG® Workspaces

The caBIG® Workspaces are the thematic areas and virtual environments
where caBIG® activities are grouped and prioritized. Anyone can help
shape the future of caBIG® by attending calls and meetings.

Gathering Stakeholders:     Conducting Many Activities:

• NCI                       •   Bring together, within each domain, funded
• caBIG® Program Staff          and volunteer community members
• Developers/Adopters           engaged in development and adoption
• Subject Matter Experts    •   Serve as key operational units of caBIG®
• Patient Advocates             and as representatives of the larger
• Industry and Government       community
  Stakeholders              •   Open to all who wish to participate
• Support Representatives   •   Convened through regularly scheduled
                                teleconferences, webinars, and face-to-face
                            •   All products openly and publicly shared
                                through website, forums, wikis and listservs.
caBIG® Workspaces

  Domain-level                                                 Strategic-level

                                                                and Training
    Clinical                                      Tissue
                 Integrative                                  Workspace (D&T)
     Trials                       In Vivo        Banks &
  Management                     Imaging        Pathology
   Systems                      Workspace          Tools
  Workspace                       (IMAG)        Workspace      Data Sharing &
    (CTMS)                                        (TBPT)        Intellectual


         caBIG® Vocabularies and Common Data Elements Workspace (VCDE)

                      caBIG® Architecture Workspace (ARCH)
 caBIG® Knowledge Centers

• Established at institutions with demonstrated
  expertise in a specific area of focus or domain            caGrid

                                                        Clinical Trials
• Web-based environment for education, outreach,
                                                     Management Systems
  training, and deployment needs to the caBIG®
  and broader cancer and biomedical research            Data Sharing and
  community                                            Intellectual Capital

                                                    Molecular Analysis Tools
• Key services include:
  • Domain Expertise                                  Tissue/Biospecimen
                                                    Banking and Technology
  • Community Outreach                                       Tools
  • Web-Based Support for Tools
  • Repository of domain-specific tools,
    documentation, policies, standards
 caBIG® Knowledge Centers

• Clinical Trials Management Systems: Duke University Comprehensive
  Cancer Center, with Robert H. Lurie Comprehensive Cancer Center at
  Northwestern University, Cancer and Leukemia Group B – Information Systems
  (CALGB-IS), and SemanticBits
• Tissue/Biospecimen Banking and Technology Tools: Siteman Cancer
  Center, Washington University at St. Louis
• Molecular Analysis Tools: Columbia University Herbert Irving Comprehensive
  Cancer Center with The Broad Institute of MIT and Harvard
• Vocabulary: Mayo Clinic
• caGrid: The Ohio State University and The Ohio State Comprehensive Cancer
  Center, with Emory University, and the University of Chicago and the Argonne
  National Lab
• Data Sharing and Intellectual Capital: University of Michigan
 Support Offered by Knowledge Centers

Knowledge Center support consists of Web-based interaction allowing for
education, outreach, training, and deployment needs through the use of:

    •   Wiki-Based Resources and Knowledge Base
    •   Web Forums (Electronic Bulletin Boards)
    •   Bug and Feature Request Tracking System
    •   Repository of Knowledge Center Development Code, Documentation,
        Policies, and Standards

Visitors can browse the resources across Knowledge Centers without an account.
  To edit wiki pages, add forum postings, report defects and bug requests, and
   download tool code, you need to register for an account. A user’s login and
                password allows access to all Knowledge Centers.
Accessing the Knowledge Centers
caBIG® Support Service Providers

• caBIG® Support Service Providers (SSPs) are independent organizations
  approved by NCI as meeting specific criteria for performance in areas
  important to the caBIG® community.

• SSPs provide client-specific caBIG® support in specific service areas under
  negotiated client-provider business arrangements.

• Licensed SSPs hold a limited license to NCI’s caBIG® program trademarks,
  adequate to identify the applicant as a caBIG® SSP, and to market and
  communicate the Provider’s services.

• The caBIG® Support Service Provider Announcement is not a procurement
  and no funds are associated with a license. Designation as a Provider does
  not constitute endorsement of the Provider’s business by NCI.
SSP License Categories

• caBIG® Support Service Provider licenses have been negotiated in four
  • Help Desk Support
  • Adaptation and Enhancement of caBIG® -Compatible Applications
  • Deployment Support for caBIG® Software Applications
  • Documentation and Training Materials and Services

• Receiving the SSP designation requires an in-depth application and
  evaluation process. Applicants for each service category are evaluated by an
  independent team using objective criteria including technical capabilities,
  staffing and scalability, geographic coverage (when applicable), and domain
  expertise in biomedicine.

• See the full list of currently licensed Service Providers at:
 NCI-CBIIT Applications Support

• The NCI-CBIIT Applications Support team provides e-mail and phone support
  for NCI tools and websites. Write to them at

• Telephone support is available Monday to Friday, 8 am – 8 pm Eastern Time,
  excluding government holidays, at the following numbers: 301-451-4384 or toll
  free: 888-478-4423.

• Write to Applications Support if you have difficulty with any of the caBIG®
  websites, NCI tools, or if you just aren’t sure where to go!
Understanding the Boundaries:
Side-by-Side Examples
           Workspaces                           Knowledge Centers
  Set priorities within the domain – establish potential needs for competitive
    requests for proposals (RFP’s)
Set priorities within the domain – Gather requirements from community, and
  Define potential needs for         bring these to the workspace for
establish and prioritize minor and major enhancements required for existing
competitive requests for
    software                         consideration; foster open development
proposals (RFP’s)                    across community
  Establish environment for strategic discussions related to the domain
Knowledge Centersminor and
Define and prioritize                 Develop and document minor
   Gather requirements from community, and bring to existing software based
major enhancements required for enhancements these to the workspace for
existing software foster open development across community
                                      upon workspace priorities
   Develop and document minor enhancements to existing software based upon
Establish environment for
     workspace priorities             Provide stable, and publicly-available
strategic discussions related to      repository for caBIG® technology and
   Provide stable, and publicly-available repository for caBIG® technology and
the domain                            information base: versioned source code,
     information base: versioned source code, documentation, domain
                                      documentation, domain expertise, forums
     expertise, forums and responses, domain-relevant training/train the trainer
                                      and responses, domain-relevant
                                      training/train the trainer
 Understanding the Boundaries:
 Side-by-Side Examples
 Knowledge Centers
      Knowledge web-based
   Support is primarily Centers                     Service Providers
   Minor enhancements to existing         Support is defined by priorities
Support is primarily web-based software based upon workspaceclient-specific
   Foster open development across community
   Serve as custodian of caBIG ® software code base
                                          Speculative development based on
Minor enhancements to existing materials, documentation and training materials
   Maintain and update tool outreach
software based upon workspace             workspace and end-user needs
 Service Providers
   Support is defined by client-specific
Foster open development across agreements/contracts
                                            Customize applications to meet
   Speculative                              individual end-user needs
community development based on workspace and user needs
   Customize applications to meet individual user needs
Serve as custodian of caBIG®                Create “shrink wrap” full-featured
   Create “shrink wrap” full-featured applications – which may use caBIG® tools as a base.
software code base                          applications – which may use caBIG®
   Customize documentation and training for specific clients/end users based on their
                                            tools as a base.
Maintain and update tool outreach         Customize documentation and training
materials, documentation and              for specific clients/end users based on
training materials                        their needs
Links to Support Resources

• Support Information:

• Write to a Product Representative:

• Knowledge Centers:

• Support Service Providers:

• Training Portal:
For More About caBIG®

Resource                                       Link

The caBIG® Web site is the best centralized
resource for general caBIG® information

To send in a specific question about caBIG®,
write to this e-mail

Read postings to the “caBIG® Announce
Listserv” where community news and notices     big_announce.html
are posted

Contact NCICB Applications Support for
technical help with NCICB applications and

Shared By: