Transforming Knowledge to Integrate Software by byt34827


									           Transforming Knowledge to Integrate Software
                Applications in Clinical Research

    Ravi D. Shankar1, Martin J. O’Connor1, Samson W. Tu1, David B. Parrish2, Amar
                                     K. Das1
     Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA
                          The Immune Tolerance Network, Pittsburgh, PA

        Abstract. Personnel involved with conducting complex clinical research employ
        a myriad of software applications that meet the demands of managing the
        research. In the domain of clinical trials that are performed as part of research,
        software applications vary widely in complexity, are generally autonomous, use
        very distinct trial knowledge representations, and generate enormous amount of
        data during the course of the research. Integration of the varied applications to
        share the domain semantics becomes vital to improve the efficiencies of trial
        data collection and to ensure the quality of collected data. We have built Epoch,
        a knowledge-base framework to support the management of clinical trials. We
        developed a set of ontologies that serves as a central knowledge resource of
        clinical trial knowledge. We present knowledge transformation methods that we
        have developed to extract out trial-specific configurations for each of the myriad
        trial-management and data-analytic applications that a re used in a research
        enterprise. We have adapted our methods for the Immune Tolerance Network, an
        international collaboration of scientists and clinicians studying immune-
        mediated diseases. Our initiative uses semantic technologies to provide a
        consistent basis for software applications to generate and analyze clinical
        research data.

1       The need for consistent, sharable domain semantics

The past several years have witnessed the nature of biomedical research change
rapidly. We have seen the advent of high-throughput genomic technologies,
generation of huge data sets, and the need for people and computers to engage
synergistically in undertaking large-scale experiments. For example, clinical trials—
controlled, systematic studies that are the gold standard for deciding whether new or
unproven medical interventions are efficacious and safe—involve, in particular, a
myriad of software applications for planning the study, collecting the data, and
analyzing the results. Operational plan builders, study site management tools,
participant and specimen tracking applications, and statistical software packages are
examples of applications that are needed to meet the functional requirements of the
clinical research enterprise. These applications vary widely in complexity, and are
generally autonomous. They use very distinct forms for encoding trial domain
semantics, which are typically in proprietary information representations and often
generated manually based on information derived from documents, spreadsheets, and
emails. The ad hoc interpretation of the domain semantics can lead to variances in
how the trial plan is understood, which in turn generates inconsistencies in how the
trial is executed. As a result, considerable time and effort may be needed in
validating the collected data before data analysis can begin and before the findings
from the study are known. There is a pressing demand to ensure that the applications
that research personnel use for study management are based on timely, consistent
domain semantics.

We have addressed this challenge by developing methods that support semantic
integration of existing software applications, including databases. In particular, we
use a set of ontologies to provide a central knowledge resource of clinical trial
knowledge, and we have built an ontology-driven architecture that supports consistent
semantics among trial management applications. The ontologies define the
vocabulary and semantics necessary for formal representation of the clinical trial
domain that is relevant to these applications. They provide formal consistency and
correctness in specifying a clinical trial and in acquiring and analyzing its data. In this
paper, we present a set of knowledge transformation methods facilitates extracting
trial-specific configurations from the ontologies for a myriad of trial-management

2    A knowledge-based                architecture        using     Semantic        Web

The core knowledge of a specific clinical trial is in that trial’s protocol document as
authored by clinical investigators and other research personnel. The document
contains the reason for undertaking the trial, the number of participants that will be in
the trial and the recruitment process, the sites where the trial will be conducted, the
trial drug that the participants will take, the medical tests that the participants will
undergo, the data that will be collected, and the statistical analyses that will be
performed on the data. We have developed a knowledge-based architecture that stores
the protocol information in a formal knowledge base and enables the dissemination of
the clinical trial knowledge to software applications for use. Ontologies play an
essential role in enabling domain specialists to describe formally the concepts in their
domain and the relationships among these concepts. Central to our architecture is
Epoch1, a suite of ontologies that formally represents protocol entities relevant to the
clinical trials management applications that we are supporting—schedule of trial
activities, specimen collection and processing workflow, organizational detail
including the clinical sites and clinical personnel, and temporal and non-temporal
constraints found in these entities. The Epoch ontologies thus provide a common
nomenclature and semantics required to support an integrated and consistent clinical
trials management.

We have developed our knowledge-based framework using different semantic
technologies (Figure 1). We built the Epoch ontologies in OWL
[], a W3C-standard ontology language for use in
Semantic Web where machines can provide enhanced services by reasoning with facts
and definitions expressed in OWL. The primitive entities in an OWL ontology are
classes, properties and individuals. Classes are interpreted as sets of objects that
represent specific instances or individuals in the domain of discourse. Properties are
binary relationships that link individuals. We have built hierarchies of classes
representing     the    concepts      in   the    clinical    trial  domain.      SWRL
[], the Semantic Web Rule Language, is a
W3C recommendation for a rule language that can be used to express rules in terms of
OWL concepts and to reason about OWL individuals. We have used SWRL to
specify constraints in our ontologies. We also use SWRL extensively to specify
model-model mappings in our semantic integration methods. Protégé-OWL2
[] is a software package that supports the specification and
maintenance of OWL knowledge bases. Protégé-OWL has several software plug-ins
including an OWL editor and a SWRL editor, and also provides an OWL API to
programmatically create OWL entities and query for them. We used Protégé’s OWL
editor to create descriptions of the classes and properties in the Epoch ontologies. We
have built TrialWiz, a clinical trial authoring application that trialists can use to
encode specific clinical trials.

3    Methods for semantic integration through transformation of
     clinical trial knowledge

We have developed semantic integration methods that enable uniform access of
clinical trial semantics stored in the Epoch knowledge base by trial management
applications. These applications consume shared semantics of the clinical trials, but
do not necessarily have similar representation formalisms. They are built by different
software vendors using proprietary information representations and with different
application interface requirements. We have developed four broad-based methods to
transform knowledge modeled using Semantic Web standards and technologies to the
various specification formats used by clinical trial applications: (1) OWL ontology to
XML document transformation; (2) OWL ontology to relational data transformation;
(3) OWL ontology to OWL ontology transformation; and (4) OWL ontology to Java
objects transformation. The first three methods target applications built independent
of our knowledge-base architecture and that still need to leverage the encoded
knowledge stored in the Epoch knowledge base.

1. Transformation to Configure XML Documents Clinical trial applications may
require initial configuration information to be in custom XML formats. The
configuration information is largely contained in the Epoch knowledge base. Thus
there is a need to transform the necessary entities of the Epoch knowledge base into
the target XML format in order to generate the configuration XML documents. In our
research group, we have developed XMLMapper3, a Semantic Web tool to transform
parts of an OWL ontology to an XML document. The tool uses an OWL XML
ontology that contains classes that represent standard XML constructs, such as
document, element, attributes, and namespaces. Instances of these classes are used to
represent both the structure and content of an XML document. Based on the schema
of the application’s XML configuration document, we create a set of SWRL rules that
map the Epoch ontology to the OWL XML ontology. When the rules are executed
against a specific clinical trial encoding in the Epoch knowledge base, it will create
OWL individuals of the classes of the OWL XML ontology that can then be used by
the XMLMapper to generate the application’s configuration document in XML
format (Figure 2).

2. Transformation to Configure Relational Tables Trial specifications of clinical
trial applications may be stored in relational tables in a database. Thus there is a need
to export portions of the Epoch clinical trial knowledge base to a database. We have
devised two different ways of performing knowledge transformation to relational
tables. In the first method (Figure 3), we have created an XML schema that
corresponds to the schema of the target relational tables. Similar to the method
described in the previous section, we create a set of SWRL rules that map the Epoch
ontology to the XML schema. Using these rules, we can generate XML documents
that contain clinical trial plan specifications. There are widely available tools that
parse an XML document and, based on mappings between XML schema and
relational table schema, populate the target tables with the contents of the XML
document. We have used one such tool that is bundled with Microsoft’s SQL Server
to populate relational tables with clinical trial specifications found in the XML
documents that we generated. In the second method, we employ an OWL-to-relational
mapping tool called DataMaster4 that creates a knowledge-level description of a
relational schema in the form of an OWL schema ontology. Essentially, the tool
creates an OWL class for each relational table and OWL properties for each column
in a relational table. Thus, OWL individuals of a class in the schema ontology map to
rows of the corresponding relational table. Using DataMaster, we create a schema
ontology corresponding to the application’s trial configuration relational tables. We
then build a set of SWRL rules that map Epoch clinical trial ontology to the schema
ontology. When transforming clinical trial specifics from the Epoch knowledge base
to the relational tables, we first run the SWRL rules against the knowledge base to
create OWL individuals of the classes in the schema ontology. From the content of
the generated OWL individuals of a particular OWL class in the schema ontology, we
insert rows into the relational table that corresponds to that OWL class.

3. Transformation to Configure OWL Ontology and Other Domain Models There
are other ongoing projects, similar to our efforts with Epoch, to formally model the
domain semantics of clinical trials research. These models are constructed using
different languages such as UML, XML and OWL. In the area of cancer research and
drug development, for example, large communities of software developers are
creating clinical trial applications tailored to these domain models. Clinical trial
organizations using the Epoch knowledge-base architecture may also want to use
these software applications. In order to support this intention, we need to transform
the Epoch knowledge base to a non-Epoch knowledge base that can be then consumed
by the non-Epoch-compliant applications. In order to circumvent representation
language mismatches between say OWL and UML, we create an OWL version of
portions of the non-Epoch model relevant to the software application that we intend to
use. Based on our analysis of the semantic alignment of the Epoch and non-Epoch
models, we build a set of SWRL rules that map the two models at the levels of classes
and properties. Using this mapping, a clinical trial encoding can be transformed from
the Epoch knowledge base to the non-Epoch knowledge base. We then build another
set of SWRL rules that map the non-Epoch OWL model to the software application’s
configuration format. Using these rules we can generate a configuration file for a
specific clinical trial in the non-Epoch OWL knowledge base. Thus, we can use the
two sets of mapping rules in sequence to transform the clinical trial knowledge in the
Epoch knowledge base to a non-Epoch configuration specification via an ontology-to-
ontology mapping (Figure 4).

4. Transformation to Java Objects Wit the creation of our knowledge-based
architecture, we provide a software development environment for building new
clinical trial applications. For example, TrialWiz, a clinical trial-authoring tool, has
been built to populate parts of the Epoch knowledge base, and PVT, a participant-
tracking application, has been developed to use encoded protocol knowledge to verify
constraints among visits. The newer web-based or desktop applications access the
Epoch knowledge base using a set of knowledge base services developed with the
Java programming language technology. Tight alignment of the logical semantics of
the Epoch ontologies (in OWL) and the implementation space (in Java) of the clinical
trial applications requires transformation of clinical trial knowledge in the Epoch
knowledge base to corresponding Java objects. Knowledge base services that are
layered over the Java objects will then be consistent with the clinical trial domain
semantics captured in the Epoch ontologies. We use Protégé-OWL’s facility to
generate Java code, i.e. a Java class for each OWL class in the Epoch ontology and
Java methods of the Java class for OWL properties with the domain of that OWL
class. The code generation facility also creates factory methods to manage the loading
and saving of Epoch clinical trial knowledge bases. Thus, opening a specific clinical
trial knowledge base automatically transforms the ontological content into java
objects, i.e OWL individuals of the Epoch ontology classes are transformed to
instances of the corresponding Java classes. The knowledge-base services can then
serve the content of Epoch knowledge base via the Java objects to a requesting
clinical trial application (Figure 5).

4    Use of semantic integration methods at The Immune Tolerance
     Network: a Case Study

We have tested and validated our methods within the knowledge-based architecture at
the Immune Tolerance Network (ITN) 5, an NIH-funded collaboration of scientists
and clinicians developing new therapies for immune-mediated disorders through
clinical trials and integrated mechanistic (biological) studies. The ITN is involved in
planning, developing and conducting clinical trials in autoimmune diseases, islet,
kidney and liver transplantation, allergy and asthma, and operates a dozen core
facilities that conduct bioassay services. At the ITN, the successful conduct of a
clinical trial depends upon the interaction of professionals working for various entities
including the ITN, contract research organizations, clinical study sites, and core
laboratories. Several groups within these entities collaborate in facilitating the
specification and implementation of the trials and related biological assay studies
(Figure 6). The lifecycle management of a complex clinical trial typically involves
multiple applications facilitating activities such as trial design specification, clinical
sites management, laboratory management, and participants tracking. These disparate
applications are banded together into a clinical trial management system. The
information generated by these applications along with data from loosely controlled
sources such as spreadsheets, documents and email messages are then assembled to
determine the operational state of the clinical trial. The lack of common nomenclature
among the different sources of the tracking information and the unreliable nature of
the data generation can lead to significant operational and maintenance challenges.
We adapted the Epoch knowledge base framework and the methods that we described
in the previous section to support the semantic integration of four different clinical
trial applications.

•   TrialWiz The Protocol Group generates a document that includes the trial design,
    the implementation protocol, the measured outcomes, and other relevant clinical
    knowledge. We built TrialWiz, an authoring application ITN clinical trials that
    the protocol group can use to populate the Epoch knowledge base with the
    encoding of a clinical trial.

•   ImmunoTrak ITN has contracted with Cimarron Software, Inc.
    [], to build a specimen workflow system called
    ImmunoTrak based on Cimarron’s Laboratory Workflow Systems product.
    Clinical trial personnel at the sites will use the system to log participant’s visit,
    specimen collection, shipping and receiving of bar-coded specimen containers,
    etc. ImmunoTrak can be configured using an XML specification that includes the
    participant visit flow, the specimen container specification, list of participants,
    list of clinical and laboratory sites, and specimen workflow.

•   DataViz Legacy data reporting applications of ITN, such as DataViz, present data
    collected during the conduct of a clinical trial in the context of the trial’s plan.
    The trial specifications that these applications use are stored in relational tables in
    a database.

•   Patient Study Calendar The BRIDG5 project is a collaborative project between
    the National Cancer Institute [], the Clinical Data
    Interchange Standards Consortium [] and HL7
    [] to develop a comprehensive domain analysis model that
    describes the fundamental semantics of clinical trials research. BRIDG-compliant
    applications such as Patient Study Calendar, Adverse Event Reporting and
    Participant Registry Management are being developed to support clinical trial
    management. The Immune Tolerance Network intends to use BRIDG-compliant
    applications to support ITN activities i.e. ITN clinical trials encoded as Epoch
    knowledge bases will drive BRIDG-compliant applications. In order to achieve
    this, we need to transform the Epoch knowledge base to a BRIDG knowledge
    base that can be then consumed by the BRIDG-compliant applications. We
    demonstrated such a knowledge transformation with the Patient Study Calendar
    (PSC) application. The BRIDG model is specified in UML, and the
    implementation model of PSC is based on a domain analysis model which has
    been harmonized with the BRIDG model. PSC can be configured with specific
    clinical trial knowledge using an XML specification whose schema is BRIDG-

The four clinical trial software applications—TrialWiz, ImmunoTrak, DataViz, and
Patient Study Calendar—use different internal formalisms to represent relevant
clinical trial semantics, and we were able to successfully generate their configuration
specifications from the contents of the Epoch knowledge base. We thus demonstrate
that common clinical trial semantics and our transformation methods can semantically
integrate disparate applications into a knowledge-base architecture for the clinical
research enterprise.

5    Discussion

In this paper, we present novel methods to transform encoded trial knowledge into
specifications that can tailor a host of trial applications. Even though the applications
use proprietary formalisms, we can model their shared domain semantics in a central
knowledge base built using Semantic Web standards and technologies. Our
knowledge-driven transformation methods allow various applications for clinical trial
management to have common, consistent, and up-to-date specifications of a clinical
trial protocol. The data generated by these applications are semantically integrated
because of shared specifications. For example, a research personnel monitoring the
progress of a clinical trial can determine the number of samples of blood collected
and analyzed for participants enrolled at one study site, because each of the
applications capturing such information are configured to the same trial design with
common terms and workflow knowledge.

We are expanding our knowledge-based architecture with ontology-database mapping
methods that can integrate domain knowledge with ITN’s data repository and can
allow ad hoc querying that no single clinical trial application can provide. The
repository is a relational database system that stores data related to the
implementation and execution of clinical trials. The types of data include participant
enrollment data, specimen shipping and receiving logs, participant visits and activities
records, and clinical assessment and assay results. We are using DataMaster to map
the relational tables to concepts in Epoch’s virtual trial data ontology. We extended
our existing query engine to interact with the mapping software to dynamically
retrieve relational data into the OWL environment, thus enabling knowledge-driven
querying of clinical trial data6.
Our research efforts on a knowledge-based framework for clinical trial management
demonstrate novel applications of Semantic Web standards and technologies. We
used OWL to specify the ontologies, and SWRL rules written in terms of concepts in
these ontologies to express any constraints. With the knowledge transformations
methods where we generate XML-renditions of the knowledge base or where we map
the Epoch ontologies to other clinical trial models, it is not easy to export the
semantics of the constraints. We are currently working on a declarative rules
framework7 wherein constraints are specified using high level constructs in the
constraints expression ontology. The constructs and their attributes can then be
“assembled” as SWRL rules at a later implementation stage. Then the knowledge
transformation methods can use the constructs to effectively share the semantics of
the rules among software applications.

There are several efforts8,9 from the Semantic Web community that propose similar
ontology-based architectures to integrate distributed information resources. In this
paper, we show that semantic integration methods built using the Semantic Web
approach have applicability in the modular, distributed and ad hoc architectures used
within the clinical research enterprise. Using our approach, scientists and developers
can integrate existing software applications, including databases, at a semantic level
so as to improve clarity, consistency and correctness in specifying clinical studies,
and in acquiring and analyzing their data.

Acknowledgements. This work was supported in part by the Immune Tolerance
Network, which is funded by the National Institutes of Health under Grant NO1-AI-


1. Shankar, R.D., Martins, S. B., O'Connor, M. J., Parrish, D. B., Das, A.K. An
   Ontology-based Architecture for Integration of Clinical Trials Management
   Applications. AMIA Annual Symposium, Chicago, IL. Published (2007)
2. Knublauch, H. Fergerson, R.W., Noy, N.F. and Musen, M.A. The Protégé OWL
   Plugin: An Open Development Environment for Semantic Web applications Proc
   Third ISWC (ISWC 2004), Hiroshima, Japan, 229-243 (2004)
3. XMLMapper (2008). SWRLTab XMLMapper Tool. Retrieved on April 28, 2008
4. Datamaster (2008). DataMaster. Retrieved April, 28, 2008 from
5. Rotrosen, D., Matthews, J.B., Bluestone, J.A. The Immune Tolerance Network: a
   New Paradigm for Developing Tolerance-Inducing Therapies. J Allergy Clinical
   Immunology, Jul;110(1):17-23 (2002)
6. O'Connor, M.J., Shankar, R.D., Tu, S.W., Nyulas, C, Parrish, D.B., Musen, M.A.,
   Das, A.K. Using Semantic Web Technologies for Knowledge-Driven Querying of
   Biomedical Data. 11th Conference on Artificial Intelligence in Medicine (AIME
   07), Amsterdam, Netherlands. (2007)
7. Shankar, R.D., Martins, S.B., O’Connor, M. J., Parrish, D.B., Das, A. K. An
   Ontological approach to representing and reasoning with temporal constraints in
   clinical trial protocols. Proceedings of the International Conference on Health
   Informatics. (2008)
8. Vdovjak, R., Houben, G. RDF based architecture for semantic integration of
   heterogeneous information sources. International Workshop on Information
   Integration on the Web. (2001)
9. Zhang, L., Gu, J. Ontology based semantic mapping architecture. Proceedings of
   the Fourth International Conference on Machine Learning and Cybernetics (2005).

  Figure 1 The Epoch knowledge-based architecture, which integrates relational
data and software applications using the shared semantics of the clinical trial
domain, which is modeled in an OWL knowledge base
  Figure 2 A method to transform knowledge in the Epoch knowledge base to
an XML document that can configure a clinical trial application to a specific

  Figure 3 A method to transform knowledge in the Epoch knowledge base to
an application configuration stored in relational tables
  Figure 4 A method to transform knowledge in the Epoch knowledge base to
another domain model, which is then converted to an XML configuration

   Figure 5 A method to transforming knowledge in the Epoch knowledge base
to Java objects, which can be accessed by applications via knowledge base
   Figure 6 Interactions between study personnel and the software applications
they use to author and manage clinical trials (CRO = Clinical Research

To top