NCI Semantic Infrastructure and caBIG Compatibility

					The NCI Semantic
    Infrastructure
       and caBIG®
     Compatibility


How the Pieces Fit Together


       A Product of the Documentation
              and Training Workspace
                           April 2010
Topic, Audience & Prerequisites

•   Topic Statement: This presentation provides an overview of the caBIG®
    Semantic Infrastructure, caCORE framework, and Compatibility Criteria. It
    specifically illustrates how these three components of the caBIG® program
    work together to support the development of interoperable systems.

•   Target Audience: This material is designed for technical audiences that
    intend to use the caCORE framework to build a caBIG® -compatible system.
    It introduces high-level concepts only, and it is not a project planning guide or
    hands-on tutorial.


•   Prerequisites: This training assumes moderate technical knowledge, as well
    as preexisting knowledge about caBIG®. For example, audiences should
    understand the meaning and use of the following terms: interoperability, UML
    model, and API to fully engage with this material.
NCI Semantic Infrastructure
and caBIG Compatibility

• NCI Semantic Infrastructure
   • Background
   • caCORE
   • Getting Support

• caBIG® Compatibility
   • Overview
   • Relationship to the Semantic Infrastructure

• Resources
NCI Semantic Infrastructure

• The NCI semantic infrastructure supports the development of
  interoperable cancer information management systems

• This infrastructure provides the basis for semantic
  interoperability
   • Information exchanged between two systems can be used meaningfully
     only if the two systems share a common meaning of the data
   • Requires the use of common terminologies and metadata


• The semantic infrastructure includes:
   •   Terminology services and browsers
   •   Metadata repository and browsers
   •   Administration and curation tools
   •   Software Development Kit (SDK)
NCI Semantic Infrastructure
and caCORE

•     The NCI semantic infrastructure is collectively known as caCORE
       •   caCORE = Cancer Common Ontologic Representation Environment
       •   Built on formal techniques and standards
       •   Provides a framework for the creation, storage, and utilization of metadata


•     caCORE is a robust set of tools and resources to support the
      development of caBIG-compatible systems
       •   NCI offers comprehensive training for caCORE tools


     Create an           Perform           Transform the          Generate          Generate a
    Information          Semantic            Model into        Programming            caGrid
       Model            Annotation           Metadata            Interfaces          Interface



•     Each step in the development of a caBIG-compatible system is
      supported by one or more components of caCORE
Key Components of caCORE

•   Enterprise Vocabulary Services (EVS)
     •   Controlled terminologies
     •   Systems for hosting and managing terminologies


•   Cancer Data Standards Repository (caDSR)
     •   Systems for hosting and managing metadata


•   Software Development Kit (SDK)
     •   A model-driven software engineering tool for creating
         caBIG-compatible systems that can be easily integrated
         with caGrid


•   Key components are shown on the following slides
     •   Related tools for curating and browsing the data exist but
         are not described here
Enterprise Vocabulary Services (EVS)
Controlled Terminologies

• NCI Enterprise Vocabulary Services (EVS) provides controlled
  vocabularies, which form the semantic base for caBIG

• Development of and support for controlled terminologies
   • NCI Thesaurus (NCIt) - A cancer-focused terminology that is used for the
     semantic annotation of UML models in caBIG

   • NCI Metathesaurus (NCIm) - A mapping of concepts across multiple
     vocabularies

   • Biomedical Grid Terminology (BiomedGT) - A project to develop an
     open, federated ontology for translational research; BiomedGT is based
     on the NCI Thesaurus
Enterprise Vocabulary Services (EVS)
Software Tooling

•   LexEVS
    •   A collection of programmable interfaces that allow users to access controlled
        terminologies hosted by NCI Enterprise Vocabulary Services (EVS)
    •   LexEVS is built on LexGrid, a standardized system for terminology storage, and it
        provides tooling for loading and distributing vocabulary content


•   NCI Protégé
    •   A collection of Protégé plug-ins that provide a customized editing environment
        tailored to the needs of NCI terminologies
    •   Used by NCI curators to edit the content of BiomedGT and the NCI Thesaurus


•   BiomedGT Wiki
    •   A collaborative terminology authoring platform based on Semantic MediaWiki
    •   Provides terminology users and subject matter experts the ability to browse and
        discuss terminology content and structure


•   NCI Terminology Browser and NCI BioPortal
    •   Web-based terminology browsers
Cancer Data Standards Repository
caDSR

•   The caDSR is a centralized repository for data elements
     •   Based on the ISO/IEC 11179 standard for metadata registries


•   Data Elements
     •   Structured as defined in ISO/IEC 11179
     •   Defined by the classes and attributes in a UML model
     •   Annotated using terms from controlled vocabularies (e.g., NCI-Thesaurus -
         Provides common meanings for each object in a system


•   Related Tools
     •   CDE Browser allows users to search the caDSR by data element
     •   UML Model Browser allows users to search the caDSR from the perspective of
         the underlying UML models that were used to generate data elements
     •   Semantic Integration Workbench (SIW) is used to map concepts from
         terminologies in EVS to classes and attributes in the UML model
     •   Other software tools exist for metadata creation and curation
Software Development Kit (SDK)

•   A framework for data management and application development
     •   Creates, compiles, and runs caCORE-like software
     •   Builds a system for collaborative research environment
     •   Based on the principles of Model Driven Architecture (MDA)


•   Includes a code generator
     •   Creates application programming interfaces (APIs) from a UML model
     •   Creates Java and web services APIs
     •   Provides uniform access to the underlying data stores
     •   SDK-generated systems can be easily integrated with caGrid
     •   Contains security modules


•   Related Tools
     •   caAdapter is used to perform object-relational mapping
NCI Semantic Infrastructure
Use Case Examples

• Developers of caBIG® -compatible systems
   •   To semantically annotate UML models using controlled terminologies
   •   To create and maintain metadata that describes the objects in a system
   •   To search existing metadata for data elements that could be reused
   •   To generate APIs based on a UML model


• Users of caBIG-compatible systems
   • To search terminologies for concepts related to a domain
   • To search for data elements that represent "things" in a system, and the
     semantic annotations that define them


• These use cases will be revisited in the following section
NCI Semantic Infrastructure
Summary

• The NCI semantic infrastructure supports the development of
  interoperable cancer information management systems

• The NCI semantic infrastructure is based on standards
   • Systems are modeled in the Unified Modeling Language (UML)
   • Objects in systems are semantically annotated using common
     terminologies (EVS)
   • Annotated objects are represented as structured metadata (ISO/IEC
     11179)


• Using the caCORE framework, a system is generated from a
  conceptual model that is represented by structured metadata,
  which are defined by concepts from common terminologies
NCI Semantic Infrastructure
Summary

• Key components of the caCORE framework include:
   • Enterprise Vocabulary Services (EVS)
      • Provides controlled vocabularies, which are used to annotate
        metadata
      • Tools for terminology authoring, editing, and services
   • Cancer Data Standards Respository (caDSR)
      • Metadata repository for data elements generated from UML models
      • Data elements are semantically annotated using terms from EVS
   • caCORE Software Development Kit (SDK)
      • Code generator for services based on UML models
      • Objects in the API are registered and defined in the caDSR
NCI Semantic Infrastructure
Getting Support

•   NCI CBIIT Applications Support and the Vocabulary Knowledge Center
    work together to provide support for the NCI semantic infrastructure

•   NCI CBIIT Applications Support
     •   ncicb@pop.nci.nih.gov
     •   Supports caCORE tooling, including the terminology browsers, SIW, caDSR, CDE
         and UML Model browsers, metadata creation/curation tools, and the caCORE SDK


•   Vocabulary Knowledge Center
     •   https://cabig-kc.nci.nih.gov/Vocab/KC
     •   https://cabig-kc.nci.nih.gov/Vocab/forums/
     •   vocabkc@mayo.edu
     •   Supports LexEVS (terminology server and APIs), BiomedGT Wiki/LexWiki (platform
         for collaborative terminology authoring), and NCI-Protégé (terminology editor)
     •   Serves as point of contact for NCI BioPortal and NCI terminology content (NCI-
         Thesaurus, NCI-metathesaurus, and CTCAE)
NCI Semantic Infrastructure
and caBIG Compatibility

• NCI Semantic Infrastructure
   • Background
   • caCORE
   • Getting Support

• caBIG Compatibility
   • Overview
   • Relationship to the Semantic Infrastructure

• Resources
caBIG Compatibility Guidelines
Compatibility and Interoperability


• caBIG® provides guidelines for creating and adopting software
  systems that are syntactically and semantically interoperable

• Interoperability can be defined as the ability of a system to
  access and use the parts of another system

    • Syntactic interoperability
       • Requirements include programmatic access to data and tools, not just
         interactive access from end-user interfaces

    • Semantic interoperability
       • Guidelines include requirements for the creation of metadata (data
         elements) that are annotated with terms from controlled vocabularies,
         which provide common meanings for the objects in a system
caBIG Compatibility Guidelines
Areas of Interoperability


• Semantic Interoperability

                                                                 CDEs
    • Information Models
                                                    APIs
    • Common Data Elements (CDEs)
    • Vocabularies and Ontologies
                                                Vocabularies   Information
                                                                 Models


• Syntactic Interoperability

    • Programming and Messaging Interfaces


• An application must meet the guidelines
  specified in all four areas to be certified
  "caBIG Compatible"
caBIG Compatibility Guidelines
Areas of Interoperability

•   Information Models
    •   Object-oriented model of the interfaces of a system
    •   Annotated with terms from controlled vocabularies
    •   Translated into Common Data Elements
•   Common Data Elements
    •   Metadata descriptions that define and describe data such that remote users data
        can understand what the data represents
    •   Annotated using terms from controlled vocabularies
•   Vocabularies and Ontologies
    •   Contain agreed-upon concepts, terms, and definitions
    •   Used to define metadata (e.g., CDEs) and data (e.g., Permissible Values)
•   Programming and Messaging Interfaces
    •   Allow systems to access resources from other systems
    •   Based on the information model
    •   Syntax of interfaces are defined by agreed-upon standards
caCORE Supports Compatibility

•     Many of the requirements specified by the caBIG compatibility
      guidelines are satisfied in full or in part when the NCI semantic
      infrastructure and the caCORE development process are used

                            caCORE Development Process

     Create an         Perform         Transform the        Generate      Generate a
    Information       Semantic           Model into      Programming        caGrid
       Model         Integration         Metadata          Interfaces      Interface

Information
                   Vocabularies           CDEs                  APIs        APIs
  Models


                             Primary Area of Interoperability

•     The following slides illustrate some examples of how the caCORE
      tooling supports common software development activities and facilitates
      the generation of a caBIG-compatible system
caCORE Supports Compatibility
Examples

  Create an             Perform         Transform the      Generate         Generate a
 Information           Semantic           Model into    Programming           caGrid
    Model             Integration         Metadata        Interfaces         Interface

 Information
                    Vocabularies           CDEs            APIs               APIs
   Models


Activity                        Tooling Used               Compatibility Criteria

Search existing models for      UML Model Browser          Components of UML models
components to reuse                                        are reused when appropriate;
                                                           caBIG UML modeling best
                                                           practices are followed
Search NCI-Thesaurus for        Terminology Browser        Concepts assigned to classes
terms to use for annotation                                and attributes must be
Semantically annotate model     Semantic Integration       synonymous with the UML
using terms from NCI-T          Workbench (SIW)            definition (semantic
                                                           annotation is accurate);
Submit suggestions for new      BiomedGT Wiki              concepts are from a public
terms to be added to NCI-T                                 controlled terminology
caCORE Supports Compatibility
Examples

  Create an             Perform         Transform the      Generate         Generate a
 Information           Semantic           Model into    Programming           caGrid
    Model             Integration         Metadata        Interfaces         Interface

 Information
                    Vocabularies           CDEs            APIs               APIs
   Models


Activity                        Tooling Used               Compatibility Criteria

Search for existing data        CDE Browser                Existing data elements are
elements to reuse                                          reused when appropriate
Reuse an existing data          Semantic Integration
element or value domain         Workbench (SIW)
Create and register data        caDSR                      Data elements are
elements in the caDSR                                      constructed according to
                                                           ISO/IEC 11179 and are
                                                           registered in the caDSR
caCORE Supports Compatibility
Examples


  Create an          Perform        Transform the        Generate            Generate a
 Information        Semantic          Model into      Programming              caGrid
    Model          Integration        Metadata          Interfaces            Interface

 Information
                  Vocabularies         CDEs               APIs                 APIs
   Models



Activity                     Tooling Used           Compatibility Criteria

Generate APIs from the UML   SDK Code Generator     API provides access to data in the
model                                               form of objects that are defined by
                                                    data elements and that correspond
                                                    to the UML model
Compatibility and the
Semantic Infrastructure
       Examples of Tool Usage During the caCORE Development Process

 Create an         Perform          Transform the         Generate        Generate a
Information       Semantic            Model into       Programming          caGrid
   Model         Integration          Metadata           Interfaces        Interface

Information
               Vocabularies            CDEs               APIs               APIs
  Models


                                                    UML Model Browser

                                                    Terminology Browser

                                                    BiomedGT Wiki

                                                    Semantic Integration Workbench (SIW)

                                                    CDE Browser

                               SDK Code Generator
NCI Semantic Infrastructure
and caBIG Compatibility

• NCI Semantic Infrastructure
   • Background
   • caCORE
   • Getting Support

• caBIG Compatibility
   • Overview
   • Relationship to the Semantic Infrastructure

• Resources
Resources

•   caBIG Training Portal
     •   https://cabig.nci.nih.gov/training
•   NCICB Applications Support
     •   ncicb@pop.nci.nih.gov
•   Vocabulary Knowledge Center (VKC)
     •   https://cabig-kc.nci.nih.gov/Vocab/KC
     •   https://cabig-kc.nci.nih.gov/Vocab/forums/
•   caCORE
     •   https://cabig.nci.nih.gov/tools/concepts/caCORE_overview
     •   https://wiki.nci.nih.gov/display/caCORE
•   LexEVS
     •   https://cabig-kc.nci.nih.gov/Vocab/KC/index.php/LexBig_and_LexEVS
•   caDSR and Related Tools
     •   https://cabig.nci.nih.gov/concepts/caDSR/
     •   https://wiki.nci.nih.gov/display/caDSR

				
DOCUMENT INFO