Registro-dei-data-set-clinici by asafwewe



More Info
									PROREC Italia                                        Progetto MobiDis                                                     HL7 Italia

     An Italian registry for clinical data sets
              Angelo Rossi Mori, Istituto Tecnologie Biomediche, CNR
                                       Co-chair of HL7 Templates SIG
                                         President of PROREC Italia
                             with the assistance of Eulogos S.p.A., Roma

                                                          2003-03-31 v06

     We present here a phased process to create an Italian registry for clinical data sets.
     A prototype of the registry, accessible through the web, will be implemented within
     the MobiDis project, and an initial population is being arranged through the Italian
     PROREC Centre and the Italian HL7 affiliate.
     In the current experimental phase, we focus on the collection of packages from
     heterogeneous sources, on the characterization of the different kinds of data sets,
     and in their systematic representation. We want to explore the functionalities of the
     registry and the opportunities offered to developers of applications, to organizations
     developing and maintaining the data sets, to healthcare professionals using them.
     We provide a conceptual model for the simplified prototype of the registry.
     Our activity could be extended to the other PROREC Centres in Europe (in
     cooperation with the EUROREC Institute) and could facilitate the input to the
     Templates registry under development by HL7.


1.    What we mean by Clinical Data Set .....................................................................................2
2.    Usages of the definitions of clinical data sets ......................................................................3
3.    Responsible organizations and packages .............................................................................4
4.    Phases of the project for the implementation of the registry ..............................................5
5.    Co-existence of different languages within the registry ......................................................6
6.    Names of the components of data sets .................................................................................6
7.    The conceptual model of the simplified prototype of the registry .......................................6
8.    Collaborations and grants ....................................................................................................8
9.    Acronyms...............................................................................................................................8

PROREC Italia                              Progetto MobiDis                                     HL7 Italia

    1. What we mean by Clinical Data Set
A clinical data set is a set of entries that are stored, shared or presented as a unit within clinical
applications, messages and electronic health records.
An entry may be in turn a data set.

Examples of data sets are:
   - a battery of laboratory tests; 1 2
   - the set of details to be observed, to describe a finding, an health issue, a procedure
      (e.g., for a finding: localisation, onset, alleviating and exacerbating factors, frequency,
      etc) 3
   - a check list of parameters to be observed to systematically describe a signal or an
      image; 4
   - a series of observations or questions that are needed to calculate a score, a scale or a
      questionnaire (e.g. the Apgar score);
   - the sections and subsections suggested to organize systematically a particular type of
      clinical document (e.g. a discharge summary or a surgical report); 5

Clinical data sets are the most relevant types of Templates in HL7 and Archetypes in OpenEHR.
In particular, HL7 templates are defined as any constraint appied to any balloted HL7 artefact;
analogously, OpenEHR archetypes are a very general constraining construct, complementary to
the OpenEHR architectural model.
The current approach to standardization in HL7 and CEN/TC251 (and OpenEHR) separates two
kinds of agreements:
     at one side the ―structural‖ standards (RIM, CDA, revision of ENV 13606), balloted by
        the standard developing organizations and independent from the detailed clinical

  see for example the LOINC nomenclature.
  some attributes referring to all the entries may be factorised (for example, the information on the
sample, or the date of the test, or the kind of device, or the reimbursement fee). Therefore there is a need
for predefined algorithms to assign in a systematic way the factorised information to each entry in the
data set. Those algorithms may be defined for classes of homogeneous data sets (e.g. for all data sets
where the device type is factorised).
  this kind of ―terminological‖ data set usually refers to a unique attribute in the conceptual model.
Often data are aggregated into a compositional code, and there are many ways to precoordinate and
postcoordinate several atomic concepts into one or more complex concepts, with the consequent problem
to find a canonical form and to compare expressions coming from different systems.
  see for example the DICOM templates in supplement 16 (context groups) for the production of
Structures Reports
  sections are different from the previous artefacts, because they are "organizer", i.e. elements used by
healthcare operators to organize information, but not intended to create a computer-processable context
to the entries that are considered under them. In other words, the entry localized under an organizer
should carry their own explicit, complete and non ambiguous meaning also when it is extracted from its
original context (e.g. a family history item should carry an explicit context modifier so that it remains fully
defined also when it is taken out of its original section).
In the previous cases, the set of particular mandatory or optional entries that must be considered as
constituent are fully predefined; on the contrary, typically the content of organizers cannot be controlled a
priori, but actual instances are decided each time by the healthcare professional.

PROREC Italia                          Progetto MobiDis                               HL7 Italia

      at the other side the description of the detailed clinical content (including clinical data
       sets), that is managed and registered under the responsibility of external organizations,
       according to procedures defined by the standard developing organizations.

Therefore, according to this approach, there are three layers of agreements:
   1. the first layer provides widely diffused ICT standards, not specific to healthcare. They
       include the first 6 levels in ISO-OSI, the security standards, the basic e-mail standards.
       the XML family standards by W3C, etc. These standards assure a technical
       interoperability across applications, at the basis of most e-government action plans.
   2. the second layer provides standards conceived for the healthcare world. They realize the
       cooperation between the healthcare applications, according to the 7th level of ISO-OSI.
       The most relevant organizations in this field are HL7, ISO/TC215 and CEN/TC251,
       together with DICOM for the imaging field.
   3. the third layer provides agreements on contents (a kind of ―soft standards‖). They are
       developed and maintained by organizations external to standardization bodies, e.g.
       institutional agencies and medical societies. Most of these agreements — namely the
       clinical data sets — allow healthcare operators to perform specific tasks, cooperating
       among them through the ICT support.

We stress the fact that the seconf layer is devoted to the interoperability among applications (i.e.
exporting, importing, and sharing structured data), while the third layer is more oriented towards
human tasks and knowledge management and coupling, i.e. application designers may refer to
registered data sets to facilitate cooperation between healthcare operators and their decision
making activities.

A clinical data set is conceived for a particular user community within predefined operational
context, implicitly or explicitly declared by the external organizations which are responsible for
the data set.
Data sets are developed and maintained by the external organizations according their own
quality procedures, and are stored in a registry according to rules provided by the
standardization bodies.
The registry provides the definitions on data sets and their components (and possibly
documentation material about the intentions and limitations of their usage) and assures a
reasonable stability, but does not assume any responsibility on content, that remains under the
control of the external organizations.

   2. Usages of the definitions of clinical data sets
The definitions of data sets may be used for a variety of functions, e.g.
   - by end-users: to facilitate a systematic input of clinical data (check list), within predefined
       contexts; to assure the availability of the clinical data that are useful to each actor for
       each step of the care process, within shared diagnostic and therapeutic pathways;
   - by developers of clinical applications: to define appropriate algorithms to correctly
       process the information contained in a particular class of data sets, e.g. the global
       attributes that are factorised and globally refer to the data set as a whole (see also the
       previous note on batterries of laboratoty tests);
   - by the external organizations: to facilitate convergence of data sets developed by
       different organizations.

PROREC Italia                           Progetto MobiDis                                 HL7 Italia

More in general, registration of data sets will promote the diffusion of EPR systems that are
stable, uniform, tested, organized around data structures that are agreed within scientific
societies. By the way, the consequent reduction of local customisations implies a reduction of
the cost of implementations and more coherence on collected data.

   3. Responsible organizations and packages
Clinical data sets are typically defined by scientific societies, for each specialty or better for each
health issue faced within a particular clinical pathway.
Another source of data sets may be the regional and national agencies, mainly for information
flows related to public health, e.g. notification of immunizations or surveillance of infectious
The standardization bodies provide the uniform toolkit for cooperation across applications,
whereas the responsible organizations are responsible of the detailed content to be exchanged
(i.e. of the content of the paper-based or electronic forms or messages to be exchanged).

The vendors of EPR systems, by adapting each time the data elements to the needs expressed
by different customers, may act as mediators among users that belong to different healthcare
facilities, i.e. they gradually produce data sets that are increasingly generalized.
After an appropriate quality check and validation by independent clinical teams (user groups or
scientific societies), these data sets could be included in the registry.

Data sets may be organized in packages, devoted to well defined health issues within particular
user communities and for predefined work contexts (for example built around a health issue in a
network for shared care or around a particular workflow within a clinical pathway).

Each package is described by one or more files.
These files should describe the major features of the data set as a whole and of each element
of the data set, as well as the detailed structure of the data set, i.e. the relations among the data
That description may be provided as narrative or according to a structured format defined by the
responsible organization.
The structured description of the elements of the data set may be suitable for the import into
particular applications for automatic processing.
The files may contain the instruction manuals for application designers and/or for healthcare
professionals, the utilization notes about each element of the data set, the intended optimal
context of use (i.e. the clinical condition of the patient, the context of the healthcare facility, the
kind of cooperation among professionals, etc)

Ideally, in future, each registered package should correspond in addition to a web page
intended for developers and end-users (created by the registry managers or supplied by the
responsible organization according to instructions provided by the registry), where a path to
explore and use the files is defined.

In order to assure the alignment of information and the availability of the files — specially if they
are the source to produce an indexed database behind the registry — , an ―official‖ copy of each
release of these files should be stored in the registry and made accessible through the web site
of the registry.

PROREC Italia                          Progetto MobiDis                                HL7 Italia

   4. Phases of the project for the implementation of the registry
Even if there are several implementations of metadata registries worldwide (also in the
healthcare sector), the potentialities of a registry of clinical data sets, as envisaged in this
document, were not yet explored, as well as the limits to the harmonization activities across the
different responsible organizations.
We decided therefore to start with the creation of a prototype, and to continue then with gradual
improvements, according to the experience gained with each step of the project and the
resources that will be made available.

The phases that could bring the registry from the prototype to its full functionality may be the
following ones:
1. preliminary collection and registration of existing material, as files that describe packages
     produced by medical societies, institutional agencies, or EPR vendors, with some
     description of the responsible organizations, perhaps incomplete. Particular attention is
     being given to the collection of institutional datasets, produced by national and regional
     agencies, e.g. the definition of record formats for the mandatory information flows of clinical
2. analysis, characterization, and comparison of the content of the files (in particular of the
     ones expressed by a structured, processable format), working out the common properties of
     the files that belong to different packages pfor the next phases of their systematic
     presentation and to produce future documentation guidelines for the responsible
3. comparison of the content of the packages, characterization of the potential features of the
     files, support to working groups to validate content by medical societies and to systematize,
     complete and harmonize the content of the packages; creation of coding schemes to
     describe the packages (faced health issues, formats of structured files, languages, clinical
     specialties, etc)
4. production of a web interface to upload, maintain and query the files and the descriptive
     information about the packages;
5. extraction of the data elements that are either presented in a structured format, or that are
     systematically processable by semiautomatic mark-up of the data sets; construction of an
     index of the data elements contained in each package;
6. creation of a canonized structured representation for each data element, independent from
     language. This activity includes
          a. the mapping of data elements to the Reference Information Model of HL7,
          b. the mapping of data elements to the LOINC entries,
          c. the translation in English of all the names of data elements and of data sets,
          d. the ontological analysis of all the names of data elements,
          e. the construction of a ontology-based Reference Terminological Model (RTM),
          f. the construction of a thesaurus of ontological descriptors (conceptual atoms),
          g. the compositional representation of data elements according to the RTM;
7. structured representation of the data sets within the packages already registered and
     knowledge acquisition about new packages, according to a specific formalism for each kind
     of data set (see examples). Typology of data sets will be experimentally defined, using the
     existing packages as source (and the past experience in HL7);

PROREC Italia                          Progetto MobiDis                               HL7 Italia

8. cooperation with the HL7 Templates SIG to represent and register data sets according to
   the universal formalism that will be defined by HL7.

We already collected many data sets – mainly in English – within the context of the WIDENET
project, and we expect to proceed to phases 1, 2 and 3 before the summer and to reach a
prototype populated with several additional data sets provided by the members of PROREC
Italia and by their partners.
By the end of 2003 we expect to revise the conceptual model of the registry and to build and
test the web interface. We will start to produce the cumulative index of data elements, using first
the files presenting the data sets in a structured format and the files where it is easier to mark-
up and extract the data elements.
In parallel we can work on phase 6, to produce a draft RTM during spring 2004.
To complete phase 6 and to activate the subsequent phases we need further resources, not yet
available but likely to come if we meet the above goals.

   5. Co-existence of different languages within the registry
The project involves language issues, because we expect to collect several Italian data sets, but
also many data sets in English and perhaps a few in French or German. We currently are not
planning to use other languages, because we limit our project strictly within our main targets for
the Italian registry.
Nevertheless we could foresee to offer a service to the other PROREC Centres in Europe, and
in this case we need a more complete management of languages.
Anyhow we anticipate that – internally in the Italian project – names of data elements, atomic
concepts (descriptors) and the ontology (the RTM) will be expressed in English.

   6. Names of the components of data sets
In practice, for each data element registered as a component of a data set (and for the names of
data sets), we need to consider at least the following forms (some of them may coincide or be
temporarily absent):
    - name in the original language
    - name in Italian
    - name in English
    - structured canonical representation, independent from language, i.e. based on the
        ontology developed by the project (a compositional representation made of a semantic
        network using the conceptual atoms provided by our thesaurus)
    - representation by external coding schemes (e.g. LOINC)

   7. The conceptual model of the simplified prototype of the registry
Here we describe the conceptual model for the prototype of registry that we will use for the first
phases of the project.

PROREC Italia                              Progetto MobiDis                                    HL7 Italia

The conceptual model is extremely simplified and limited to essential information. Afterwards it
may be mapped to the conceptual model of the HL7 templates SIG, based on the standard
"ebXML registry" on the representation of metadata registries.
During the development of our prototype, we will explore the ebXML standard, in view of its
subsequent implementation.

An organization is responsible for one or more packages.
For each registered package, the organization identifies one or more contact persons.
A package is described by one or more files in electronic format (we do not intend to consider
paper-based documentation).
A package includes one or more "components" (i.e. a data set or a data element) 6.
A component includes zero or more components. 7
A component is mapped to an attribute of the RIM (Reference Information Model) of HL7
version 3 and/or to a field of a message in HL7 version 2.8

An organization is characterized by a name and a short description.
A contact person is characterized by name, address, tasks with respect to the package
(information to users, representative for harmonization efforts, responsible for updating the
registry, etc).
A package is characterized by name, short description, intended community of clinical users,
intended purpose, keywords, intended realm, intended health facilities, URL with a description
of a roadmap to browse and use the files.
A file is characterized by name, URL – within the registry – where it can be downloaded, kind of
file (synthetic description, introduction, user manual, detailed description of data sets, etc), ID
for the structured format (none, A, B, etc), language. 9
A component is characterized by names (see above), language, kind of element (organizer,
entry, …), free text description of the vocabulary domain allowed in the context of the

   in order to represent the recursive structure of data sets, we consider here a generic "component". The
same item may be represented as a data element in one package and as a data set in another one. We
expect that a component is represented coherently in all the occurrences within the same package, but
we are not going to force this feature as a constraint.
   a component which contains zero components is called a "data element". A component that contains
one or more components is called a "data set". A component may result as a data element in one
package, and as data set in another one. Across different releases of a package, a component can
contain a different number of components, and thus a data element can be transformed into a data set or
vice versa. This apparently arbitrary behaviour depends on legitimate decisions of the responsible
organization about representation of data.
   this mapping allows to build more easily the related ontology, and foresee the usage of the registry
within HL7.
   one or more files, within the package, will contain the components, in a predefined format or as free text.
    the management and registration of coding schemes (i.e. of the domains of allowed coded values for
each coded attribute) is the topic of an intense parallel work in HL7.
This sector was widely explored during the last few years and thus HL7 is able to offer robust solutions for
it. The registration of coding schemes in HL7 version 3, for the different attributes and for the different
realms, is already well advanced.
We feel that is not appropriate to include in our prototype for the metadata registry also the registration of
the coded domains in a structured format. If the registration of these domains will be recognized as
relevant for the goals of our metadata registry (in particular to deal with the vocabulary domains to be
used in Italy), then we will use the proper mechanism set up by HL7 for vocabulary domains.

PROREC Italia                          Progetto MobiDis                               HL7 Italia

An attribute of the RIM is characterized by name, class, description and data type.
A field in a message of HL7 version 2 is characterized by name, segment, description and data

Of course we need to manage the "administrative" information about packages and
components, e.g. about authors, date of creation and update, versions, etc.
For each possible structured format to describe the data sets that we encounter in the
systematic files, an appropriate table provides an ID and a narrative description.

   8. Collaborations and grants
The activities of CNR-ITB and Eulogos S.p.A. are partially supported by a grant for the
MobiDis project by the Italian Ministry of Research.
The activities to form and coordinate the required user groups within the medical societies will
be allocated voluntarily within the context of HL7 Italia and PROREC Italia, in cooperation with
the OSIRIS project (co-financed by the Italian Ministry of Health).

Our metadata registry will provide an experimental basis to assist in the definition of the
standard on the Templates Registry in HL7. The information of our registry, adequately
processed, could facilitate the input of the packages into the HL7 registry.

Our registry may be used by other PROREC Centres to register their respective clinical data
sets, and may bring to an harmonization activity at the European level, under the control of the
European Institute EUROREC and under the responsibility of the European medical societies.

   9. Acronyms
CDA          Clinical Document Architecture (in HL7)
CEN          Comitato Europeo di Normazione, European Committee for Standardization
HL7          Health Level 7
MIR          Modello Informativo di Riferimento (ontologia), see Reference Terminological
             Model (RTM),
OSIRIS       Osservatorio Inter-Regionale sull'ICT in Sanità
PROREC       PROmotion of health RECords
RIM          Reference Information Model (in HL7)
RTM          Reference Terminological Model, see also MIR
SIG          Special Interest Group (in HL7)


To top