Slide 1 - Download NIF Ontologies - Neuroscience Information Framework

Document Sample
Slide 1 - Download NIF Ontologies - Neuroscience Information Framework Powered By Docstoc
					                                                                                                                 NIFSTD - A Comprehensive Ontology for Neuroscience
                                                    Fahim T. Imam, Sarah M. Maynard, Stephen D. Larson, Maryann E. Martone, Amarnath Gupta, Jeffery S. Grethe
                                                                       Neuroscience Information Framework, University of California, San Diego

As a core component of Neuroscience Information Framework (NIF) project (, NIF Standard (NIFSTD) was                                                                          Class Definitions. OBO Foundry practice requires all concepts receive clear and specific human readable definitions structured in
envisioned as a set of modular ontologies that provide a comprehensive collection of terminologies to describe neuroscience relevant                                                             Aristotelian form: ―A is a B which has C‖, e.g., ―the globus pallidus is a brain region which is found within the basilar region of the
data and resources. We present here on the structure, design principles and current state of NIFSTD. The NIFSTD is a critical                                                                    vertebrate telencephalon.‖ Without definitions, there is no way to guide the annotation choices made by curators which leads to terms
constituent in the NIF project to enable an effective concept-based search mechanism against a diverse collection of neuroscience                                                                being used in unanticipated ways that confound concept-based data federation. As is quite common even with well-utilized
resources. The overall ontology has been assembled in a form that promotes reuse of standard ontologies in biomedical domain,                                                                    terminologies, not all terms in NIFSTD have definitions at this time. The curation_status annotation property tracks entities that are
easy extension and modification over the course of its evolution.                                                                                                                                still lacking final definitions; this property is updated as definitions are added (uncurated) and finalized (curated).

                                                                                                                                                                                                 Lexical Variants. NIFSTD includes a variety of accepted synonymous terms to identify a distinct concept. These terms serve as an
                        STRUCTURE AND DESIGN PRINCIPLES                                                                                                                                          aid to annotators and help when using the ontology to index a large text corpus that often employ a variety of synonyms to identify a
                                                                                                                                                                                                 specific concept. Lexical variants also include alternative spellings and antiquated terms no longer in common use.
The NIFSTD is constructed according to best practices closely followed by the Open Biological Ontology (OBO) community. It was
built in a modular fashion, each covering a distinct orthogonal neuroscience relevant domain. A list of this module is listed in Table 1.                                                        Mapping Existing Equivalent Concepts. In addition to synonymous terms, external identifiers are included from one or more
NIFSTD avoids duplication of efforts by conforming to standards that promote reuse. The modules are standardized to the same                                                                     external sources where equivalent concepts exist, e.g., UMLS CUIs, NCBI Taxonomy IDs, or NeuroNames IDs. This inter-terminology
upper level ontologies, the Basic Formal Ontology (BFO), OBO Relations Ontology (OBO-RO), and the Ontology of Phenotypic                                                                         mapping helps to enable automatic data federation and querying against existing data sets already annotated with such IDs.
Qualities (PATO). Through the use of these foundational and generic ontologies, each of these modules was represented in a
standardized manner. This approach not only follows the powerful modularization ontology design pattern                                                                                          Representation of Concept Relations. NIFSTD utilizes the OBO-RO for specifying relationships between entities that are
(, but can also be more easily extended to provide highly nuanced representations to meet the need of                                                               unambiguous, distinct, and constrained. Concepts across domains are related to one another through a set of specific object
emerging neuroscientific research domains.                                                                                                                                                       properties specified in the OBO-RO such as located in, contains, inheres in, participates in, etc. These relational properties mostly
                                                                                                                                                                                                 exist as inverse pairs—e.g., part of and has part (see below for more detail on relations). Use of the OBO-RO serves both to separate
Domain          External sources                           Import or    NIFSTD Module                   Unique           Comment
                                                                                                        classes (As of
                                                                                                                                                                                                 the representation of different types of relations (e.g., ―is a‖ vs. ―part of‖) and to limit to proliferation of relation types. The former
                                                           adapt to
                                                           OWL                                          July 2009)                                                                               requirement is critical to enabling maximal algorithmic parseability of relations. For instance, it has been documented that the
Organism        NCBI Taxonomy, GBIF, ITIS, IMSR,           adapt   781              Specifically the taxonomy of model                                      computational power of the Gene Ontology is limited by the fact that it mixes the depiction of ―is a‖ and ―part of‖ relations in a single
taxonomy        Jackson Labs mouse catalog                              BiomaterialEntities/NIF-                         organisms in common use by neuroscientists
                                                                                                                                                                                                 hierarchical graph (Smith et al. 2003). At the same time, it is equally vital that the number of relations not be overly expansive, as
                                                                                                                                                                                                 each relation brings with it a computational burden – the computer code required to interpret the meaning of that relation.
Molecules       IUPHAR ion channels and receptors,         Adapt      4, 198           Tested OWL representation techniques on this limited number of
                Sequence Ontology (SO); pending: NCBI,     IUPHAR;      gy/BiomaterialEntities/NI                        molecules (∼750). See below for more detail on how molecules in         Bridge Files and Object Properties. In order to maintain the orthogonal nature of the ontology domain modules, the cross-domain
                NCBI Entrez Protein, NCBI RefSeq, NCBI     import SO    F-Molecule.owl                                   general are to be addressed in NIFSTD
                Homologene; NIDA drug lists, PDSP Ki,
                                                                                                                                                                                                 relations are specified in separate ontology bridge files rather than incorporated into the individual modules. In this way, the main
                ChEBI, and Protein Ontology                                                                                                                                                      domain files—e.g., anatomy, cell type, disease, etc.—remain independent of one another. Using these bridge files, the dependencies
Sub-cellular    Sub-cellular Anatomy Ontology (SAO)        Import   385              contains cell parts and subcellular structures from SAO-CORE —          need only be introduced by those applications that require them, such as the NIF system, which requires a description of the
anatomy                                                                 BiomaterialEntities/NIF-                         referencing the Gene Ontology Cellular Component taxonomy—and
                                                                        Subcellular.owl                                  more nerve cell specific structures needed to characterize ultra
                                                                                                                                                                                                 anatomical location of nerve cell types. These relations currently reside in the NIF Cell module, but they are being moved to a
                                                                                                                         structural studies of the nervous system                                separate files, called ―bridge files‖ (see ―Results‖ section for explanation), so that other applications which seek to use the underlying
Cell            CCDB, NeuronDB,            Adapt   277                                                                                      nerve cell domain ontology, but do not necessarily intend to import those relations, can do so. Bridge files can also choose either to
                terminologies; pending: OBO Cell                        BiomaterialEntities/NIF-
                Ontology                                                Cell.owl                                                                                                                 import the referenced domain ontologies in their entirety or to take a more minimal approach and simply declare the classes they
Gross Anatomy   NeuroNames extended by including terms     Adapt      1,483            Multi-scale representation of Nervous System Mac Macroscopic            need to reference.
                from BIRN, SumsDB,, etc                    gy/BiomaterialEntities/NI                        anatomy
Nervous         Sensory, Behavior, Cognition terms from    Adapt      149
                                                                                                                                                                                                 Importing a New Ontology. The process of importing a new vocabulary into the NIFSTD varies depending upon its state (Table 1) as
system          NIF, BIRN,, MeSH, and                      gy/Function/NIF-                                                                                                         follows:
function        UMLS                                                    Function.owl                                                                                                              If a vocabulary already uses OWL, the OBO-RO and the BFO and is orthogonal to existing modules, the import simply involves
Nervous         Nervous system disease from MeSH,          Adapt      342
                                                                                                                                                                                                      adding an owl:import statement to the main ontology file (nif.owl).
system          NINDS                                                   gy/Dysfunction/NIF-                                                                                                       If an existing orthogonal ontology is in OWL but does not use the same foundational ontologies as NIFSTD, then an ontology
dysfunction     terminology; pending: OMIM                              Dysfunction.owl                                                                                                               bridge file is constructed declaring the deep level semantic equivalencies such as foundational objects and processes. Relations
Phenotypic      PATO                                       Import      2112             Imported as part of the OBO foundry core
qualities                                                               gy/backend/ BIRNLex-                   
                                                                                                                                                                                                      are drawn from the OBO-RO as needed.
                                                                        OBO-UBO.owl                                                                                                               If the external terminology is organized but has not been represented in OWL, or does not use the same foundation as NIFSTD,
Investigation: Overlaps with molecules above, especially            n.a.                                                                                          then the terminology is adapted to OWL/RDF in the context of the NIFSTD foundational layer ontologies.
reagents       RefSeq for mRNA, ChEBI, Sequence                         gy/DigitalEntities/NIF-
               ontology; pending: Protein Ontology                      Investigation.owl
Investigation:                                             Import   641 + 58         BIRNLex-Investigation imports a BIRNLex- OBI-Proxy file being           Viewing the NIFSTD Vocabularies. The NIFSTD vocabularies are available as owl files which may be viewed using Protégé or
instruments                                                             DigitalEntities/NIF-                             assembled in parallel with the Ontology of Biomedical Investigation
                                                                                                                                                                                                 similar ontology tools. However, these tools generally require a fair amount of expertise to use. To create more human friendly viewing
                                                                                                                         (OBI) This proxy will be replaced by OBI itself, once there is a full
                                                                                                                         production release of OBI                                               environments, NIFSTD is also available through NCBO BioPortal. It supports searching for specific terms, browse the overall
Investigation: Biomaterial transformations, assays, data   Import   (Included in     same as above—i.e., ultimately derived from                             ontology concept tree, select specific concepts to display in the graph viewer, and view associated concept properties. Within the NIF,
protocols and collection, data transformation                           DigitalEntities/NIF-            641 above)       OBI                                                                     NIFSTD is served through an ontology management system called OntoQuest. OntoQuest generates an OWL-compliant relational
plans                                                                   Investigation.owl
                                                                                                                                                                                                 schema and supports operations for navigating, path finding, hierarchy exploration, and term searching in ontological graphs.
Investigation: NIF, OBI, IATR/NITRC, NCBC                  Mostly    62               Will ultimately be a inferred hierarchy based on
resource       Resourceome ontology (BRO)                  adapt,       NIF/DigitalEntities/NIF-                         NITRC, Resourceome, OBI, and NIF
                                                                                                                                                                                                 NIFSTD and NeuroLex Wiki. We strive to balance between the involvement of the neuroscience community for domain expertise
type                                                       except for
                                                           OBI                                                                                                                                   and knowledge engineering community for ontology expertise when constructing the NIFSTD. The wiki version of NIFSTD, the
                                                                                                                                                                                                 NeuroLex ( has been developed as the easy entry point for the broader community to access, annotate, edit and
    Table 1: Domains covered by NIFSTD, along with the vocabularies imported from external sources and the corresponding NIFSTD OWL module.                                                      enhance the core NIFSTD lexicon. The peer reviewed contributions in the media wiki are later implanted in NIFSTD OWL modules in
                                                                                                                                                                                                 a regular basis. We envision NeuroLex wiki to be the main entry point to NIFSTD contents for the general users and domain experts
Representation Language. The NIFSTD ontology ( is expressed in Web Ontology Language                                                                        to view, annotate and contribute to the overall lexicon.
(OWL). The current use of OWL for representing the NIFSTD semantic framework provides both the ability to employ current OWL
and RDF tools to assemble and edit the ontology, as well as a means to support a rich semantic mining capability to NIF in the future.
NIFSTD holds to the OWL Description Logic (OWL-DL) dialect to ensure computational decidability and support of automated                                                                         NIFSTD Development Workflow. The current NIFSTD development/curation workflow includes the tasks mentioned in each of the
reasoning through the use of a common DL reasoners such as Pallet and Fact++.                                                                                                                    rectangular boxes followed by a number as in figure 3:
                                                                                                                                                                                                 1. Add/Edit NeuroLex Terms/Categories: This step involves various NIF users/ group who are interested to add, update, enhance, or
Re-use of Available Distilled Knowledge Sources. Wherever possible, existing terminologies and ontologies were reused to cover                                                                      annotate the current NIF vocabularies through NeuroLex. NeuroLex wiki serves as the main entry point/ collaborative interface for
domains that were required by the Neuroscience community (Table 1). These community vocabularies were culled from a variety of                                                                      implementing changes in the NIFSTD ontology.
sources, ranging from fully structured ontologies to loosely structured controlled vocabularies. Table 2 highlights these source                                                                 2. Bulk Upload of Terms: Depending on the number and nature of terms (i.e., adding new large sub-tree of an existing NIFSTD class,
ontologies which were either imported directly or adopted into different NIFSTD modules. Also indicated in Table 1 is whether the                                                                   or new classes with known parents for a specific NIF module etc.), we can have bulk upload of terms that requires creating too
source was in OWL or needed to be adapted, the number of unique classes (concepts) under each domain/subdomain and any                                                                              many categories/pages in NeuroLex Wiki by hand otherwise. These requests can be made through a spreadsheet containing the
comments about the import                                                                                                                                                                           terms with known parents and annotations.
                                                                                                                                                                                                 3. Identify Valid Contribution: This step involves identifying the contributions in the previous steps that are valid according to the NIF
Distinct, Orthogonal Concept Domains. Each of the OWL modules in NIFSTD consists of a conceptually orthogonal or distinct                                                                           domain experts. Every contribution in the NeuroLex requires this step before they get implemented in the actual NIFSTD ontology.
domain (Table 1). Orthogonality is one of the primary OBO Foundry principles critical to ensuring maximal re-usability of the ontology.                                                             Valid contributions are identified based on certain criteria such as relevance to neuroscience research, source, consistency,
The modularity helps minimize dependencies and ensure re-use by enabling users to accept only those domains they need for                                                                           appropriateness of the hierarchy etc. For the newly added categories this step would make sure that the terms are actually new
                                                                                                                                                                                                    and not the synonyms or duplicates of the existing NIF concepts.                                                                           Conclusion
annotating. If an ontology contains one or more domains overlapping with an existing module, files must be mapped extensively to
                                                                                                                                                                                                                                                                                                                                               Currently covering about 20,000+ concepts includes both classes and synonyms, the NIFSTD continues to evolve to incorporate new
specify semantic equivalencies thus creating an added dependency and curatorial burden.
                                                                                                                                                                                                                                                                                       4. Update NIFSTD (testing): This step involves          modules and contents as well as implementing more detailed and useful cross-domain relations that follow ontology development
                                                                                                                                                                                                                                                                                          updating the actual NIFSTD OWL files or              best practices.
Single Inheritance. Each class within the NIFSTD modules follows single inheritance principle. This promotes the classes to be
                                                                                                                                                                                                                                                                                          creating new OWL files in testing environment
univocal and avoids ambiguities. However, classes with multiple parents can be derived via automated classification on defined
                                                                                                                                                                                                                                                                                          based on the update of contents from previous
classes i.e., asserted classes with logical necessary and sufficient conditions.
                                                                                                                                                                                                                                                                                       5. Testing in OntoQuest: After each significant
Unique Concept Identifiers and Supported Annotations. Each entity in NIFSTD is identified by a unique identifier and is
                                                                                                                                                                                                                                                                                          updates in the owl files, the NIFTD OWL
accompanied by a variety of supporting annotations such as a preferred label, definition, synonymous terms, links to equivalent terms
                                                                                                                                                                                                                                                                                          implementation goes for OntoQuest testing in
in other terminologies, and other lexical variants (Table 3). These properties were developed largely through the import of similar
                                                                                                                                                                                                                                                                                          staging server for feedback.
properties from the Dublin Core Metadata and the Simple Knowledge Organization System (SKOS). Our policy on NIFSTD class
                                                                                                                                                                                                                                                                                       6. Testing in BioPortal: After each significant
identifiers is as follows.
                                                                                                                                                                                                                                                                                          updates in the owl files the NIFSTD OWL
                                                                                                                                                                                                                                                                                          implementation is tested in BioPortal staging
    If a module was imported from an OBO Foundry ontology that uses BFO as its foundational ontology, the class names (i.e.,
                                                                                                                                                                                                                                                                                          environment for feedback.
     identifiers) remain unchanged. As many modules were imported directly from BIRNLex and BIRNLex follows the OBO foundry
                                                                                                                                                                                                                                                                                       7. Keep persistent links to older versions: After
     principles the prefix birnlex_XXXX is frequently used.
                                                                                                                                                                                                                                                                                          positive feedbacks from Step 5 and 6, we
 Any extensions added by NIF to an imported ontology are identified by the nifext prefix (NIF extension). If an imported ontology
                                                                                                                                                                                                                                                                                          archive the links to the old owl files and post
     was not OBO compliant, e.g., used a string as a class name, was not in OWL or had to be refactored according to BFO, NIF
                                                                                                                                                                                                                                                                                          the links to the Project wiki.
     assigns its own class name, and the mapping to the source concept is maintained through the annotation properties, e.g,
     NeuroNamesID: 342.                                                                                                                                                                                             Figure : NIFSTD Development/ Curation Workflow
 The identifiers for the new classes in NIFSTD are prefixed by nlx (NeuroLex) followed by an extension that indicates the core
     module, e.g., nlx_cell_xxxx and nlx_mol_xxxx represent two class identifiers for the Cell and Molecule modules respectively.                                                                Tasks 8-13 involves updating the NIFSTD production version, updating the NIFSTD project wiki page with release notes with version
Following the semantic web practice, NIFSTD uses complete Universal Resource Identifiers (URIs) to maintain the identity of a given                                                              specific major changes and additions of the new contents in NIFTSD, Updating OntoQuest and BioPortal production versions, and
entity. In the case of a class in NIFSTD, the complete URI is the URI for the OWL module where it resides along with the specific ID                                                             updating the textpresso repository of vocabularies with the newly added terms in NIFSTD.
(or local name in XML) for the class within that file—e.g., http:// is the URI for middle
cerebellar peduncle.

                                                                                                                                                                                                                                     Neuroscience Information Framework                                                                                                                                                             

Shared By: