The Role of Terminologies and Ontologies in the Context

Document Sample
The Role of Terminologies and Ontologies in the Context Powered By Docstoc
					R T U New York State
         Center of Excellence in
         Bioinformatics & Life Sciences



      The Role of Terminologies and
      Ontologies in the Context of the
         Electronic Health Record

                   Dagstuhl May 23th, 2006

              Werner Ceusters, MD
               Ontology Research Group
 Center of Excellence in Bioinformatics & Life Sciences
                 SUNY at Buffalo, NY
 R T U New York State
            Center of Excellence in
            Bioinformatics & Life Sciences

                Electronic Health Records
• ISO/TS 18308:2003
  – Electronic Health Record (EHR):
     • A repository of information regarding the health of a subject of care, in
       computer processable form.
  – EHR system:
     • the set of components that form the mechanism by which electronic
       health records are created, used, stored, and retrieved. It includes people,
       data, rules and procedures, processing and storage devices, and
       communication and support facilities.
• More common meaning of EHR system:
  – only the “software being executed”
R T U New York State
        Center of Excellence in
        Bioinformatics & Life Sciences

                  A replacement for




           This                 and      that
R T U New York State
          Center of Excellence in
          Bioinformatics & Life Sciences




typical
 EHR
screen




                                           www.comchart.com
R T U New York State
          Center of Excellence in
          Bioinformatics & Life Sciences
 Current US GOV eHealth goals & strategies
   • Goal 1: Inform Clinical Practice:
      – S1. Provide incentives for EHR adoption.
      – S2. Reduce risk of EHR investment.
      – S3. Promote EHR diffusion in rural and underserved areas.
   • Goal 2: Interconnect Clinicians.
      – S1. Regional collaborations.
      – S2. Develop a national health information network.
      – S3. Coordinate federal health information systems.
   • Goal 3: Personalize Care.
      – S1. Encourage use of Personal Health Records.
      – S2. Enhance informed consumer choice.
      – S3. Promote use of telehealth systems.
   • Goal 4: Improve Population Health.
      – S1. Unify public health surveillance architectures.
      – S2. Streamline quality and health status monitoring.
      – S3. Accelerate research and dissemination of evidence.
 R T U New York State
            Center of Excellence in
            Bioinformatics & Life Sciences

          Functions to be supported (HL7)
• Direct Care
  – functions that enable hands-on delivery of health care and offer
    clinical decision support.
• Care Support
  – functions that are not used for direct care of patients, but assist
    with the administrative, financial, research, public health, and
    quality monitoring aspects of an EHR-S
• Information Infrastructure
  – functions that provide the framework for proper operation of all
    Direct Care and Supportive functions.
                                HL7 EHR System Functional Model. Draft May 2006
 R T U New York State
          Center of Excellence in
          Bioinformatics & Life Sciences

               Direct Care Functions
• DC.1 Care Management
  – ordering medications
  – creating clinical documentation
• DC.2 Clinical Decision Support
  – alerting the provider that immunizations are due or
    drug interactions are indicated.
• DC.3 Operations Management and Communication
  – ???
 R T U New York State
          Center of Excellence in
          Bioinformatics & Life Sciences

               Care support functions
• S.1 Clinical Support
• S.2 Measurement, Analysis, Research and
      Reports
• S.3 Administrative and Financial
  – verifying insurance eligibility
  – reporting encounter data to public health systems
 R T U New York State
            Center of Excellence in
            Bioinformatics & Life Sciences

        Information Infrastructure Functions
• Information Infrastructure
  –   I.1    Security
  –   I.2    Health Record Information and Management
  –   I.3    Identity, Registry, & Directory Services
  –   I.4    Terminology Standards & Services
  –   I.5    Standards-based Interoperability
  –   I.6    Business Rules Management
  –   I.7    Workflow Management
 R T U New York State
          Center of Excellence in
          Bioinformatics & Life Sciences

                     ‘Terminology’
1) The discipline of terminology management
  –   homonymous with terminology
  –   synonymous with terminology work (used in ISO
      704)
2) The set of designations used in the special
   language of a subject field, such as the
   terminology of chemistry
  –   Used in in both the singular and plural
  –   Used with an article in the singular: a terminology
 R T U New York State
            Center of Excellence in
            Bioinformatics & Life Sciences

  Fundamental Activities of Terminology Work
• Identifying „concepts’ and „concept relations’;
   – Analyzing and modeling concept systems on the basis of
     identified concepts and concept relations;
   – Establishing representations of concept systems through concept
     diagrams;
   – Crafting concept-oriented definitions;
   – Attributing designations (predominantly terms) to each concept
     in one or more languages; and,
   – Recording and presenting terminological data, principally in
                   This is not the right
     terminological entries stored in print and electronic media
                  approach to ontology !
     (terminography).
 R T U New York State
            Center of Excellence in
            Bioinformatics & Life Sciences

Reason for our rejection: The terminological View
• Objects
     • perceived or conceived, concrete or abstract
     • abstracted or conceptualized into concepts
• Concepts
                                                                 ?
     • depict or correspond to a set of objects based on a defined set of
       characteristics
     • represented or expressed in language by designations or by definitions
     • organized into concept systems
• Terminology is a tool for dealing with language,
  Designations
     • represented as terms, names (appellations) or symbols
             not one for concept
     • designate or represent a
                                 representing reality.
     • attributed to a concept by consensus within a special language
       community
R T U New York State
                 Center of Excellence in
                 Bioinformatics & Life Sciences

                 Peirce, Ogden & Richards, …
                                 ~ Universal ???
                      Unit of Thinking (Concept)
                                                   (Unit of Thought,
                                                   Unit of Knowledge)




                                                    Universal
Designation                                         Referent
                                                    (Concrete Object,
(Symbol, Sign,
                                                    Real Thing,
Term, Formula
                                                    Conceived Object)
etc.)
                                                    Particular
R T U New York State
        Center of Excellence in
        Bioinformatics & Life Sciences

Success of concept-based view in healthcare IT

                  Concept „dog‟



   Chien
    Dog
   Hond
   Hund
     …
 R T U New York State
          Center of Excellence in
          Bioinformatics & Life Sciences

                Why terminologies ?
• As such ?
  – Fixing/stabilizing the language within a domain and a
    linguistic community;
  – Unambiguous communication.
• In relation to EHRs ?
  – Semantic Indexing;
  – Information exchange and linking between
    heterogeneous systems;
  – Terminologies as basis for coding and classification
    systems
 R T U New York State
           Center of Excellence in
           Bioinformatics & Life Sciences

          Some systems and their purpose
• Remuneration
  – ICD9/10-CM in US for insurance and medicare for diseases
  – Clinical Procedures Terminology (CPT) for surgical procedures
• Public Health Reporting
  – ICD9/10
• Clinical Recording
  – Read 1-3, SNOMED-CT, ICPC
• Indexing publications
  – MeSH (MedLine/PubMed), EMTree (EMBASE)
• Support for applications and decision support
  – GALEN, FMA
R T U New York State
           Center of Excellence in
           Bioinformatics & Life Sciences

           „Traditional‟ semantic indexing
• Statement:
  – „ Joe Smith has a fracture of the left tibia ‟
• Becomes indexed as :
  –      #12               M-2xg41            A-2t68
  – M-2xg41 code in SnowMeat with terms:
      – fracture, fractures, fracture NOS, broken, ...
  – A-2t68 ibidem associated with:
    – left tibia, left tibia NEC, ...
  – Additional terms through
      – hierarchy: bone, bones, os, ...
      – associations: lower leg, limb, body part, ...
    R T U New York State
                  Center of Excellence in
                  Bioinformatics & Life Sciences

                            Classification: ICD
•   ...
•   Chapter II:     Neoplasms (C00-D48)
•   Chapter III:    Diseases of the Blood and Blood-forming organs and certain disorders
    involving the immune mechanism (D50-D89)
•         Excludes :          auto-immune disease (systemic) NOS (M35.9)
•                             ....
•         Nutritional Anemias (D50-D53)
•            D50    Iron deficiency anaemia
•             Includes: ...
•              D50.0 Iron deficiency anaemia secondary to blood loss (chronic)
•                   Excludes : ...
•              D50.1          ...
•            D51    Vit B12 deficiency anaemia
•         Haemolytic Anemias (D55-D59)
•         ...
•   Chapter IV:     ...
 R T U New York State
            Center of Excellence in
            Bioinformatics & Life Sciences

             Coding versus classification
• Coding:
  – Annotate terms in the EHR with codes from a coding
    system
     •  synonyms, translations, hierarchies
• Classification:
  – Assign patients exhibiting certain features to a
    predefined class
     •  purpose oriented, culture dependent
• Frequently mixed up !
 R T U New York State
           Center of Excellence in
           Bioinformatics & Life Sciences



Fractured
  nose

  = ???

Fracture
   of
  nose
 R T U New York State
          Center of Excellence in
          Bioinformatics & Life Sciences

         Coding / classification confusion
• “patient with fractured nose”
              =
  “patient with fracture of nose”

• But therefor not
  “fractured nose”
               =
  “fracture of nose” !
 R T U New York State
             Center of Excellence in
             Bioinformatics & Life Sciences

            Classification: culture dependent
Dyirbal classification of objects in the universe,
• Bayi: men, kangaroos, possums, bats, most snakes, most fishes,
  some birds, most insects, the moon, storms, rainbows, boomerangs,
  some spears, etc. derived through analysis of the
       Categories
• Balan: women, anything connected with water or fire, bandicoots,
   structure of the language used by these people.
  dogs, platypus, echidna, some snakes, some fishes, most birds,
  fireflies, scorpions, crickets, the stars, shields, some spears, some
  trees, etc.
     Language is NOT a thrustworthy basis for
• Balam: all edible fruit and the plants that bear them, tubers, ferns,
                (realist) cake.
  honey, cigarettes, wine, ontology development.
• Bala: parts of the body, meat, bees, wind, yamsticks, some spears,
  most trees, grass, mud, stones, noises, language, etc.
                                       Lakoff 1987. Women, fire and dangerous things
 R T U New York State
              Center of Excellence in
              Bioinformatics & Life Sciences

         The “exploding bicycle” (J. Rogers)
• 10 things to hit…
   – Pedestrian / cycle / motorbike / car / HGV / train / unpowered
      vehicle / a tree / other
• 5 roles for the injured…
   – Driving / passenger / cyclist / getting in / other
• 5 activities when injured…
   – resting / at work / sporting / at leisure / other
• 2 contexts…
   – In traffic / not in traffic
 V12.24 Pedal cyclist injured in collision with two- or three-
  wheeled motor vehicle, unspecified pedal cyclist, nontraffic
  accident, while resting, sleeping, eating or engaging in other vital
  activities
 R T U New York State
           Center of Excellence in
           Bioinformatics & Life Sciences

         Border‟s classification of Medicine
• Medicine
  – Mental health
  – Internal medicine
      • Endocrinology
           – Oversized endocrinology
      • Gastro-enterology
      • ...
  – Pediatrics
  – ...
  – Oversized medicine
 R T U New York State
             Center of Excellence in
             Bioinformatics & Life Sciences

       Ambituous claims have been made …
• The Unified Medical Language System (UMLS) is
  designed to “facilitate the development of
  computer systems that behave as if they
  ‘understand’ the meaning of the language of
  biomedicine and health”.

     UMLS fact sheet, updated 7 May 2004
    (http://www.nlm.nih.gov/pubs/factsheets/umls.html).
R T U New York State
        Center of Excellence in
        Bioinformatics & Life Sciences

       Mesh: Medical Subject Headings
R T U New York State
        Center of Excellence in
        Bioinformatics & Life Sciences

       Mesh: Medical Subject Headings
R T U New York State
        Center of Excellence in
        Bioinformatics & Life Sciences

     MeSH: typing myocardial infarction
R T U New York State
        Center of Excellence in
        Bioinformatics & Life Sciences


 H
 i
 e
 r
 a
 r
 c
 h
 i
 c
 a
 l
R T U New York State
        Center of Excellence in
        Bioinformatics & Life Sciences

MeSH: Different context, different meaning ?




                                ???
                                ???
                                ???
 R T U New York State
          Center of Excellence in
          Bioinformatics & Life Sciences

           MeSH Tree Structures - 2004
• Body Regions [A01]
  – Extremities [A01.378]
     • Lower Extremity [A01.378.610]
        – Buttocks [A01.378.610.100]
        – Foot [A01.378.610.250]
            » Ankle [A01.378.610.250.149]
            » Forefoot, Human [A01.378.610.250.300] +
            » Heel [A01.378.610.250.510]
        – Hip [A01.378.610.400]
        – Knee [A01.378.610.450]
                                              The most abundant
        – Leg [A01.378.610.500]
        – Thigh [A01.378.610.750]
                                            sort of mistakes if used
                                               as an ontology!
 R T U New York State
           Center of Excellence in
           Bioinformatics & Life Sciences

             Intermediate conclusion (1)
• Concept-based terminology (and standardisation
  thereof) is there as a mechanism to improve
  understanding of messages by humans.
• It is NOT the right device
  – to explain why reality is what it is, how it is organised,
    etc., (although it is needed to allow communication),
  – to reason about reality,
  – to make machines understand what is real,
  – to integrate across different views, languages,
    conceptualisations, ...
 R T U New York State
             Center of Excellence in
             Bioinformatics & Life Sciences

                            Why not ?
• Does not take care of universals and particulars
  appropriately
• Concepts not necessarily correspond to something that
  (will) exist(ed)
   – Sorcerer, unicorn, leprechaun, ...
• Definitions set the conditions under which terms may be
  used, and may not be abused as conditions an entity must
  satisfy to be what it is
• Language can make strings of words look as if it were
  terms
   – “Middle lobe of left lung”
 R T U New York State
              Center of Excellence in
              Bioinformatics & Life Sciences

 Ok, then Description Logics and OWL will save us ... ?
Description logics:
• A decidable fragment of FOL
• A propositional modal logic
• A classes and properties (concepts and roles) oriented KR
  language
• Subsumption and satisfiability (consistency) are the key
  inferences
• Most DLs are supersets of ALC
   – Boolean operators on concepts
   – Existential and Universal quantifiers
• OWL-DL is a large superset (SHOIN):
   – Property hierarchies & Transitive roles (SH)
   – Inverse (I)
   – Nominals (O) (hasValue and one of)
R T U New York State
        Center of Excellence in
        Bioinformatics & Life Sciences

                SNOMED and DL
   SNOMED-RT (2000)




                                    SNOMED-CT (2003)




         DL don‟t guarantee you to get parthood right !
 R T U New York State
         Center of Excellence in
         Bioinformatics & Life Sciences

                   NCI Thesaurus
• a biomedical thesaurus created
  specifically to meet the needs of the
  National Cancer Institute.
• semantically modeled cancer-related
  terminology built using description logics
R T U New York State
        Center of Excellence in
        Bioinformatics & Life Sciences

        NCI Thesaurus Root concepts


             Anatomic Structure, Anatomicwhy is gene
                       the NCI not If yes, which category
             Or ? Does Substance ? know toSystem, or
             Anatomic classified there belongs ? why are
             product not subsumed
             Any item Substance ?by it ? If no,
             drugs and chemicals not subsumed by it ?
R T U New York State
        Center of Excellence in
        Bioinformatics & Life Sciences

          Definition of “cancer gene”
R T U New York State
        Center of Excellence in
        Bioinformatics & Life Sciences




         Terminologies and ontologies
                for EHR use:
           the quest for principles
 R T U New York State
         Center of Excellence in
         Bioinformatics & Life Sciences

   Requirements for clinical vocabularies (1)
• Domain completeness: coverage of all possible
  terms that lie within a vocabulary‟s domain
• Non-vagueness: the term should represent the
  concept behind it as close as possible
• Non-ambiguity: the same term cannot refer to
  more than one concept
• Non-redundancy: each concept must be
  represented by one unique identifier
                                          (Cimino, 1989)
 R T U New York State
          Center of Excellence in
          Bioinformatics & Life Sciences

    Requirements for clinical vocabularies (2)
• Synonomy: multiple ways for expressing a word
  (or concept) must be allowed
• Multiple classification: concepts must be allowed
  to be classified in multiple hierarchies
• Consistency of view: concepts must have the
  same relationships in all views
• Explicit relationships: all relationships (e.g. class,
  synonymy,…) must be explicitly labelled.
 R T U New York State
           Center of Excellence in
           Bioinformatics & Life Sciences

               The Desiderata Revisited
• Concept orientation - what is the alternative?
• Concept permanence and graceful evolution - version
  control
• Formal definitions - add to knowledge vs. recognize
  change
• Reject NEC - store what the patient has and classify later
• Multiple granularities - patient level vs. reuse
• Representing context - the implicit meaning in the EMR
  design
                           Cimino 2003, Rome Ontology Workshop (pushed by Smith)
 R T U New York State
         Center of Excellence in
         Bioinformatics & Life Sciences

  New desiderate for biomedical terminologies
• Provide identifiers for meanings we want to apply
  to the patient
• Make sure the semantics are universally
  understood, separate from linguistics
• Make sure that, as our understanding changes,
  original meaning is not forgotten
• Provide a bridge between what we record and how
  we reason
                    Cimino 2003, Rome Ontology Workshop (pushed by Smith)
 R T U New York State
          Center of Excellence in
          Bioinformatics & Life Sciences

     Desiderata for Controlled Medical Data
  I - Capture what is known about the patient
 II - No information loss
III - No false implications
IV - Support retrieval
 V - Support reuse
VI - Support aggregation
VII - Support inference
                     Cimino 2003, Rome Ontology Workshop (pushed by Smith)
 R T U New York State
           Center of Excellence in
           Bioinformatics & Life Sciences

 Take off of ontology in biomedical informatics
• Concept/terminology-based systems make implicit
  knowledge explicit
• Ontologies aim to push explicitness further:
  – reasoning by machines
     • Classification
     • Prediction
     • Triggering of alerts
 R T U New York State
             Center of Excellence in
             Bioinformatics & Life Sciences
                                                  However !
                     A practical example
• At <timestamp> lab reports <procedure> with id <ID>
  and value <value> for <patient>
• At <timestamp> <clinician> interprets <ID> as
  indicating <condition> for <patient>
              Is this a procedure or
• At <timestamp> <clinician> orders pharmacy item
             the documentation of a
  <formulary item> with order id <ID> for <patient>
                       Is this condition really a
                    procedure ?
• At <timestamp> pharmacy delivers <inventory item>
                      patient condition or just an
  with inventory id <ID> for order id <ID> for <patient>
                                  idea ?
• At <timestamp> decision support system suggests
   How are these
  <condition> for <patient>
       related ?                    Cimino 2003, Rome Ontology Workshop
 R T U New York State
           Center of Excellence in
           Bioinformatics & Life Sciences

               The dispute between …
• “Practical engineers”:
  – If it works for our purposes, it is ok
• Good philosophers:
  – If it works always, it is ok,     and
  – It can only always work if it represents the relevant
    portion of reality faithfully.
R T U New York State
           Center of Excellence in
           Bioinformatics & Life Sciences

  Ontology desiderata (C. Goble) for engineers
                         Precision
                   formal, unambiguous
                       high fidelity


   Explicitness                                  Flexibility
clarity, commitment,                        expressivity, evolution
          reuse


                        Systematic
                   control, quality, clarity
R T U New York State
          Center of Excellence in
          Bioinformatics & Life Sciences

      Ontology description space (C. Goble)
                        Coverage
          upper, domain general, domain specific


   Knowledge
 representation                        Inference mechanisms
 languages and                             classification, coherency
     models
words, OO, frames,
      logics
                       Expressivity
             taxonomy, relationships, axioms
  R T U New York State
             Center of Excellence in
             Bioinformatics & Life Sciences

         But not to forget: change management
     The reasons for changes in ontologies AND health
      records should be explicitly motivated, possibilities
      being
1. changes in the underlying reality (does the appearance or
   disappearance of an entry relate to the appearance or disappearance
   of entities or of relationships among entities in reality?);
2. changes in our scientific understanding;
3. reassessments of what is considered to be relevant for inclusion ;
4. corrections of encoding mistakes introduced during ontology
   curation or data entry
 R T U New York State
            Center of Excellence in
            Bioinformatics & Life Sciences

                          Conclusions
• Main role of:
   – Terminologies: standardise language use
   – Ontologies: represent what is generic in reality
   – EHR: document what is specifically related to particulars
     (patients directly, (sub)populations indirectly)
• Role of terminologies in the context of the EHR:
   – Make the documentation intelligable to humans other than those
     who entered the data
• Role of ontologies in the context of the EHR:
   – Ensure that the regimentation imposed by the EHR system does
     not interfere with the re-usability of the data for a variety of
     purposes, other than patient documentation.