Document Sample
Abstract_Data_Collection_Tool Powered By Docstoc
					                                Developing an Ontology-driven Data Collection Application
                     Daniela Bourges-Waldegga, PhD, Ted Bashora, H. Robert Frost, MSa, Lucy Hadden, PhDa, Melissa
                                       Haendel, PhDb, Carlo Torniai, PhDb , Douglas MacFadden, MSa
                                                       Harvard Medical School, Boston, MA
                                             Oregon Health & Science University, Portland, OR
Abstract                                                                  produce controlled input fields. Further refinement of input fields
The eagle-i Consortium (www.eagle-i.org/home) is building a               is captured in an application ontology. Examples of such
federated network of biomedical research resources aimed at               refinements include displaying an ontology subtype browser
facilitating resource discovery and sharing. The cornerstone of the       (populated with terms from a branch of the ontology) or an
eagle-i system is an ontology that captures biomedical domain             instance list (populated with resource instances from the
knowledge and drives the eagle-i software. We discuss the                 repository). The contents of instance lists are obtained by issuing
challenges of building a fully ontology-driven data collection tool       type-based queries to the repository; this type information is
and our technical approach leading to the successful deployment           equally obtained from the ontology.
of a tool now in use by eagle-i staff and by pilot laboratory users.      In addition to its forms, the data collection tool bases its
                                                                          navigational elements, such as menu items, resource type lists and
Introduction                                                              property filters, on information contained in the eagle-i ontology
The eagle-i Consortium is building a federated network of                 and in the application ontology.
biomedical research resources including biological specimens,
human studies, instruments, organisms and viruses, reagents,              Challenges of developing an ontology-driven application
research opportunities, services and software. The primary goal of        Modeling gap. The eagle-i ontology is a domain ontology – its
the eagle-i system is to facilitate resource discovery and sharing,       modeling point of view is that of Biology. An application, on the
and thus emphasis is placed on capturing rich data semantics and          other hand, requires a model that specifies how to manipulate
links between resources that enable powerful search mechanisms.           domain constructs and that drives the user interface. Domain
An overarching component of the system is a domain ontology that          ontology developers need to be able to freely model the knowledge
expresses, in the OWL language, these semantics and                       as it evolves, while application developers need to ensure robust
relationships.                                                            software that doesn’t break with ontology changes.
The eagle-i architecture comprises software components deployed           Complexity.The eagle-i ontology is built to be interoperable, and
at an institutional level and a central search application that           imports numerous terms from a variety of ontologies. While this is
communicates with them via a federated query network. At the              a desirable characteristic of an open ontology, it presents a
core of an institutional deployment is a repository that builds atop      challenge to web client software where large amounts of memory
an RDF store. The repository provides services for both                   and cycles are not available for ontology processing.
institutional data collection applications (such as basic creation,       Our application ontology bridges the gap between the domain,
update and deletion operations) and the central search application        concept-oriented model, and the application, UI-oriented model. It
(such as metadata harvesting for index building).                         annotates eagle-i ontology classes and properties with application-
The eagle-i ontology drives both the data collection and the search       specific information and restrictions. It is maintained separately
user interfaces, and is used for structuring and validating data in       from the eagle-i ontology so as to ensure modularity.
the repository and for indexing resources. This design choice             The application ontology is used to build an in-memory model of
allows applications to seamlessly adapt to ontology evolution.            the eagle-i ontology, restricted to the classes and properties that are
                                                                          necessary to drive applications (in essence, a valid closure of the
The eagle-i data collection tool                                          domain ontology plus its annotations). This data model is shared
The eagle-i data collection tool is a web application for entering        by the different eagle-i applications, and allows them to view the
research resource descriptions. The tool produces granular and            eagle-i ontology through the same common lens. An acceptance
well-structured resources that include textual fields, annotations        test alerts ontology developers about changes to the domain
with ontology concepts and links to other resources in the system.        ontology that are not supported by the application model.
The tool acts as a front-end to an eagle-i repository, natively
producing data in an ontology-conformant RDF format. In addition          Ongoing work
to its core resource creation functionality, the tool provides a          The eagle-i data collection tool is in use by eagle-i staff and by
workspace for updating and managing resources, with standard              pilot laboratory users and is the primary means of entering and
navigation and filtering capabilities, and enforces a data curation       curating data in the eagle-i repositories. We are refining its design
workflow, whereby a resource undergoes different data quality             so as to respond to user feedback and to insights gained in the
checks before being published in an eagle-i repository.                   development process. In particular, we are moving towards a more
                                                                          extended and diversified usage of application ontologies to
Ontology-centric design                                                   improve usability of the tool while keeping its ontology-driven
The data collection tool dynamically creates forms for a resource         nature. Indeed, it has been a key factor of the development of the
type by inspecting its corresponding ontology class. For each             eagle-i system as a whole that the eagle-i ontology evolve rapidly
ontology property defining a class, the tool generates one or more        and that applications follow without major coding changes.
input fields with validations. The kind of input field generated and
its restrictions are derived from OWL constructs. For example,            NIH/NCRR ARRA award #U24RR029825
data type properties produce text fields and object properties