The Neighborhood Auditing Tool

Document Sample
The Neighborhood Auditing Tool Powered By Docstoc
					               The Neighborhood
                 Auditing Tool –
             Past, Present and Future

    Structural Analysis of
                             James Geller
     Ontologies Center       Michael Halper
                             Yehoshua Perl
    New Jersey Institute
      of Technology          C. Paul Morrey
        Newark, NJ           Chris Ochs
       The Past:
       Goals of an Auditor’s Tool for the UMLS
       Principles of Auditing with Neighborhoods
       The Idea of a Hybrid Display
       The Present:
       Neighborhood Auditing Tool (NAT) Features for the UMLS
       The new NAT Website
       The Near Future:
       Adaptation of NAT to SNOMED
       Tools for SNOMED abstraction display
       The Farther Future:
       Relationship-Centric UMLS auditing
       Guiding the Auditor what to Audit
       Managing Auditors and Workflows
2                                                                2
                   Research Paper
    C.P. Morrey, J. Geller, M. Halper, Y. Perl. The
      Neighborhood Auditing Tool: A hybrid interface
      for auditing the UMLS. J Biomedical Informatics,
      42(3):468-89, June 2009. (Part of a Special
      Issue edited by our group on Auditing).

3                                                        3
                     Auditing the UMLS

         About 156 source vocabularies
         It is natural that inconsistencies will appear
         Over 2.2 million concepts and 9.9 million
         Two level structure consisting of the
          Semantic Network and the Metathesaurus
         133 Semantic Types in the Semantic
          Network organized as two trees

4   *UMLS Metathesaurus version 2010AA                     4
             Some of our Work on Auditing
       H. Gu, Y. Perl, J. Geller, M. Halper, L. Liu, and J.J. Cimino. Representing
        the UMLS as an Object-oriented Database: Modeling Issues and
        Advantages. J Am Med Inform Assoc, 7(1):66-80, 2000.
       J. Geller, H. Gu, Y. Perl, and M. Halper. Semantic refinement and error
        correction in large terminological knowledge bases. Data & Knowledge
        Engineering, 45(1):1-32, 2003.
       J.J. Cimino, H. Min, and Y. Perl. Consistency across the hierarchies of the
        UMLS Semantic Network and Metathesaurus. J Biomed Inform, 36(6):450-
        461, 2003.
       H. Gu, Y. Perl, G. Elhanan, H. Min, L. Zhang, Y. Peng. Auditing concept
        categorizations in the UMLS. Artif Intell Med, 31(1):29-44, 2004.
       Y. Chen, Y. Perl, J. Geller, and J.J. Cimino. Analysis of a study of the users,
        uses, and future agenda of the UMLS. J Am Med Inform Assoc, 14(2):221-
        231, 2007.
       J. Geller, C. P. Morrey, J. Xu, M. Halper, G. Elhanan, Y. Perl, G. Hripcsak,
        (2009). Comparing Inconsistent Relationship Configurations Indicating
        UMLS Errors, In L. Ohno-Machado, V. L. Patel, D. Aronsky (Ed.),
        Proceedings of the American Medical Informatics Association, (pp. 193-
        197). San Francisco, CA. Omnipress.
5                                                                                     5
        Previous Work on Auditing (cont’d)
       H. Gu, G. Hripcsak, Y. Chen, C.P. Morrey, G. Elhanan, J.J. Cimino, J.
        Geller, and Y. Perl. Evaluation of a UMLS auditing process of semantic type
        assignments. In J.M. Teich, J. Suermondt, and G. Hripcsak, editors, Proc
        AMIA Symp, pages 294-298, Chicago IL, Nov. 2007.
       Y. Chen, H. Gu, Y. Perl, J. Geller, M. Halper. Structural group auditing of a
        UMLS semantic type's extent. J Biomed Inform. 2009 Feb;42(1):41-52.
       L. Chen, C.P. Morrey, H. Gu, M. Halper, Y. Perl. Modeling multi-typed
        structurally viewed chemicals with the UMLS Refined Semantic Network. J
        Am Med Inform Assoc, 16(1):116-31, 2009.
       Y. Chen, H. Gu, Y. Perl, J. Geller. Structural group-based auditing of
        missing hierarchical relationships in UMLS. J Biomed Inform. 2009
       Y. Chen, H. Gu, Y. Perl, M. Halper, and J. Xu, Expanding the extent of a
        UMLS Semantic Type via Group Neighborhood Auditing. J Am Med Inform
        Assoc, Accepted for publication.
       K. C. Huang, J. Geller, G. Elhanan, Y. Perl and M. Halper, Auditing
        SNOMED Integration into the UMLS for Duplicate Concepts. Accepted to
        AMIA 2010.

6                                                                                   6
         Ancient Past – Before the NAT:
          Provide Info as Paper Form
   CPT: C1081844 Antonospora locustae
   STY: T004T009 Fungus + Invertebrate
   SYN: Antonospora locustae | Nosema locustae
   PAR: Antonospora{STY: Invertebrate}

7 shown for this concept is from the UMLS Metathesaurus version 2006AC
 Data                                                                    7
           Auditing Results also Paper Form
    (C1081844) Antonospora locustae
    STY: Fungus + Invertebrate

       No errors
       Semantic Type Error: Fungus
       Semantic Type Error: Invertebrate
       Add Semantic Type______________________
       Ambiguity
       Other error_____________________________
       Comments _____________________________

8                                                  8
         Goals of an Auditor’s Tool for the

       Display relevant information to the auditor.
       Do not overwhelm the auditor with too much
       Help the auditor focus on areas most likely to
        contain errors.
        –   Algorithms suggest likely erroneous concepts
        –   Concepts are reviewed in a neighborhood display

9                                                             9
                Or as I Like to Say it

        “Give them [UMLS Auditors] what they want.”
        “Give them all what they want.”
        “Give them only what they want.”
        (At least this is what we want.)
        But how?
        As a diagram?
        As indented text?

      What Makes a Diagram Wonderful?

    You can follow parent/child paths with your eyes.
    You can get a feeling for everything a concept is
     connected to with one look.
    You can see multiple parents and multiple paths
     with one look.
    You can see global features (short and bushy
     versus tall and sparse, or (gasp!) tall and bushy).
    But is every diagram wonderful?
    Let us look at more from the ancient past.

11                                                    11
     What makes Indented Text Wonderful?

        Think of something as simple as a Microsoft
         file list in an Explore Window.
        Indentation expresses parenthood compactly
         and elegantly.
        There are no lines crossing, no lines at all.
        You don’t need a layout algorithm.
        There is a linear order in which to study text.
        But … see under “what makes a diagram
         wonderful.” All that is missing.
15                                                         15
                  We got a Problem

        Diagrams are wonderful – as long as they fit
         on one screen.
        Indented text is wonderful – as long as there
         are no or very few multiple parents.
        But the UMLS does not fit onto one screen
         and there are many cases of multiple

16                                                       16
           The Idea of a Hybrid Display

        Keep the best features of text and the best
         features of diagrams.
        Auditing is organized around a “concept of
         interest,” the focus concept.
        Maintain relative positions between the focus
         concept and its children, parents, etc.
        Eliminate clutter of arrows.

17                                                       17
             Auditing with Neighborhoods of a
                      Focus Concept
        Several years of experience: Auditing is to a
         large degree a “local” activity.
        It happens (mostly) in the Neighborhood of
         the focus concept an auditor is interested in.
        Concepts have two kinds of knowledge
         –   Textual Knowledge Elements: Preferred term,
             CUI, synonyms, LUI, definition, sources, semantic
         –   Contextual Knowledge Elements: Neighbors

18                                                               18
              Types of Neighborhoods

        Focus concept: The concept presently being audited
        Immediate Neighborhood: The set of concepts
         reachable from the focus concept by following one
         relationship (up, down, lateral, etc.)
        Extended neighborhood: Includes parents of parents
         (grandparents), children of children (grandchildren)
         and siblings. No lateral chains.
        Up-Extended and Down-Extended Neighborhoods
         (add only grandparents or only grandchildren)

19                                                              19
          References about Neighborhoods

    M.S. Tuttle, D.D. Sherertz, N.E. Olson, M.S. Erlbaum, W.D. Sperzel,
     and L.F. Fuller, et al. Using META-1, the first version of the UMLS
     Metathesaurus. In Proc 14th Annu Symp Comput Appl Med Care,
     pages 131-135, Washington, D.C., 1990.

    S.J. Nelson, M.S. Tuttle, W.G. Cole, D.D. Sherertz, W. D. Sperzel,
     M.S. Erlbaum, L.L. Fuller, N.E. Olson, From meaning to term:
     semantic locality in the UMLS Metathesaurus. In Proc Annu Symp
     Comput Appl Med Care, pages 209-213, Washington, D.C., 1991.

20                                                                        20
     Immediate Neighborhood as Diagram

                             Microsporidia <protozoa>
                                                                                                Cellular aspects of


                            Microsporidia, Unclassified
                                                                                                Pathogenicity Aspects


                                     Fibrillanosema                                   Oligosporidium
         Dictyocoela   Edhazardia

21                                                                                                                      21
                      Extended Neighborhood as Diagram
                                       GRANDPARENTS                                        PHYLUM MICROSPORA

                                                                        fungus                                                         Protozoa

                                            Erroneous concept

                                                                                         Microsporidia <protozoa>
                                                                                                                                                                                Cellular aspects of

                                                                                 FOCUS CONCEPT                                                                                     Microbiological
                         Microsporea                      SIB                           Microsporidia, Unclassified
                                                                                                                                                                                Pathogenicity Aspects


                                                                                       Fibrillanosema                                 Oligosporidium
                                                Dictyocoela         Edhazardia

 GRANDCHILDREN                                                   Edhazardia aedis
                                                                                                                                                                                  Oligosporidium occidentalis
                                                                       Fibrillanosema crangonycis

     Dictyocoela berillonum
                                                                                         Kabatana takedai
                                                                                                                                                                            Microsporidium seriolae

                                                                                                Microsporidium 57864                                            Microsporidium prosopium
                                                                          Dictyocoela sp.L11
       Dictyocoela cavimanum                                                                                Microsporidium africanum           Microsporidium cypselurus

                                                              Dictyocoela muelleri
                                                                                                                                Microsporidium ceylonensis
                   Dictyocoela dehayesum
                                           Dictyocoela grammarellum

                               Dictyocoela duebenum

22                                                                                                                                                                                                              22
         A Hybrid Diagram/Form Display of a

     Synonyms                      Relationships
                   Focus Concept


23                                                 23
              Desirable Information Beyond

        Concept definition for Focus Concept
        Sources for concepts and relationships
        Assigned Semantic Types of concepts
        Definitions of relevant Semantic Types
        Global view of the Semantic Network
         –   Indented (better for wide branches)
         –   Graphical (better for almost everything else)

24                                                           24
                     The Present
               NAT: Serving the Auditor
        The Neighborhood Auditing Tool has been
         implemented to fully support display of
         neighborhoods. It is Web-based.
        Navigation: Neighboring concepts are an
         easy (double)click away.
        Additional features listed above have been
        YouTube training videos for beginners
        Redesigned Home Page:

25                                                    25
     Demonstration of NAT Features

        Neighborhood                  Semantic Type definition
        Grandparents and              Semantic Network
         grandchildren                  (indented)
        Synonyms                      Semantic Network (diagram)
        Relationships: Concept,       Navigation
         Sibling, Term                 Search (full, partial)
        Focus concept definition      Viewing History
        Sources: Concepts,            Choice of release
         Relationships                 Choice of sources
        Display CUIs
        Semantic Type display

                                          offline version

26                                                               26

     The Present and Future (in Acronyms)
        Present:
        Release of the NAT with Level 0 (and SNOMED)
        BLUESNO =Biomedical Layout Utility Engine for
        BLUESNO-3D
        Future:
        SNET = SNomed Enhancement Tool
        CRAM-NAT =C-NAT (Concept or Current NAT) +
                     R-NAT (Rel’ship-Centric NAT) +
                     Audit Set Builder +
47                   Management of Audits
             Present: Release of the NAT
             with Level 0 (and SNOMED)

        On the Web site there are three releases:
        Public NAT with UMLS Level 0 (unrestricted)
         terminologies; for everybody
        Public NAT + SNOMED for users with a
         SNOMED license.
        NAT with complete UMLS (requires a

         BLUESNO Biomedical Layout Utility
         Engine for SNOmed [Abstractions]
        Based on years of research on building
         abstraction networks of Areas and Partial
         Areas for the SNOMED by hand.
        Definition: An area contains all concepts with
         the same set of relationships (attrib./roles).
        Definition: A root of an area is a concept that
         has no parents within that area.
        Definition: A partial area contains a root and
         all its descendants within one area. (This is a
         simplified case. Assumes no overlap.)
                                                                 Number of
                                             Two Partial         Partial Areas
         Area                                Areas               in Area

 Relationships   {Specimen Substance, Specimen Procedure} (9)
 that are
 common to all
                 Drainage Fluid Sample (3)        Pus Swab (2)
 concepts in
 this Area and
 its included
 Partial Areas
 (= Area name)

                                                    Number of concepts in
       Root concepts of                             Partial Area
       Partial Areas (=
       Partial Area name)                                   One Element
                            Other concepts in Partial      of a BLUESNO
                            Area (usually not shown)
50                                                         Area Diagram
         What does BLUESNO do, and why?
        BLUESNO is a tool that automates the layout
         of Area/Partial area diagrams.
        Area/Partial area diagrams are excellent
         abstractions of large SNOMED hierarchies.
        Areas/Partial areas have been found to
         support auditing of large terminologies, esp.
        Layout of these Areas and Partial areas in a
         diagram by hand requires days even for
         intermediate size hierarchies.
        Couldn’t auditing the SNOMED be as much
         fun as playing a video game?
        One step in this direction:
        BLUESNO-3D allows navigation of the
         Area/P-area structure as it if is “suspended in
        Navigation consists of “flying” into and away
         from this structure.
        Height of boxes indicates number of
                  Future Work:
         SNET: SNomed Enhancement Tool

        A tool similar to the NAT.
        However, optimized for the SNOMED.
        Native SNOMED files, not RRF files of UMLS
        SNOMED Semantic Tags instead of UMLS
         Semantic Types
        SNOMED root structure including overall root
        Defined Attributes versus Qualifying
        Stated and Inferred View
56      SNOMED IDs, not UMLS CUIs, etc.
          A lot of Competition in SNOMED
                   Browsing Tools

        Rogers J, Bodenreider O. SNOMED
         CT: browsing the browsers. Paper accepted
         for presentation at: KR-MED
         2008. Representing and sharing knowledge
         using SNOMED; 2008 May 31-Jun 2;

          CRAM-NAT (Concept+Relationship
         NAT+Audit Set Builder+Management)
        Treating relationships as first class citizens
         and as focus objects of auditing.
        Neighborhoods will be designed around
         relationships not concepts.
        3 Tab Design with navigation between tabs.
         –   Concept Tab
         –   Hierarchical Relationship Tab
         –   Lateral Relationship Tab
        4th Tab for Audit Set Builder
                Audit Set Builder (ASB)
        The current NAT is “passive.”
        The auditor decides what concepts to audit.
         –   By Intuition
         –   By using one of the algorithms we have
             developed in the past (and plan to develop in the
        Audit Set Builder: Suggests to the user which
         concepts to audit.
        ASB helps identify “suspicious concepts.”

     What are Suspicious Concepts?

        Certain structural features of a terminology
         indicate that a concept is suspicious.
        Many of these features can be recognized
        Example: Extents of small intersection types

     Extents of Small Intersection Types

    Definition: Extent = Set of concepts assigned
     one specific semantic type
    Definition: Intersection Type = A combination of
     UMLS semantic types for which the UMLS
     contains at least one concept that has been
     assigned this combination.
    Definition: Small Intersection Type = An
     intersection type with a small extent (≤ 6

     Past Research Results

    Concepts in Small Intersection Types are more
     likely to be erroneous (or inconsistent with other
     concepts) than concepts from a random sample.
    Concepts in Small Intersection Types can be
     found algorithmically and proposed to an Auditor
     to work on.

     Back to the Audit Set Builder
        The Audit Set Builder will contain the “Small
         Intersection Type” algorithm and will display
         concepts in Small Intersection Type Extents
         to auditors to work on.
        Other algorithms will be “plugged in.”
        Goal: An architectures so that newly
         developed algorithms should be easy to plug

              Farther into the Future:
           Managing Audits in CRAM-NAT
        Recording auditor recommendations
        Facilitate team auditing where several
         auditors review the same sample.
        Management of auditors by a Master Auditor
        Report generation functionalities
        Work flow of audit process using a team of

64                                                    64
     Audit Example: Inconsistent

     J. Geller, C. P. Morrey, J. Xu, M. Halper, G.
     Elhanan, Y. Perl, G. Hripcsak, (2009).
     Comparing Inconsistent Relationship
     Configurations Indicating UMLS Errors, In L.
     Ohno-Machado, V. L. Patel, D. Aronsky (Ed.),
     Proceedings of the American Medical
     Informatics Association, (pp. 193-197). San
     Francisco, CA. Omnipress.

65                                                    65
     Example 1 Consistent Configuration:
      Parent has Same Semantic Type as Child

     (Parent Concept)

     (Child Concept)

      Example 2 Consistent Configuration:
         Parent is Assigned a More General
           Semantic Type than the Child
            X               Semantic-Type of X

     (Parent Concept)               …..

     (Child Concept)        Semantic-Type of Y

           Example of Lack of Ancestry:
        No Hierarchical Relationship Between
           Corresponding Semantic Types

     (Parent Concept)          …..       …..

            Y           Semantic-Type of X

     (Child Concept)           Semantic-Type of Y

     Example 1 of Semantic Type Inversion:
     Parent is Assigned a More Specific Semantic
                  Type than the Child
                               Semantic-Type of X
      (Parent Concept)

             Y                Semantic-Type of Y
      (Child Concept)

     Example 2 of Semantic Type Inversion:
       Parent is Assigned a Much More Specific
            Semantic Type than the Child
            X               Semantic-Type of Y

     (Parent Concept)               …..

     (Child Concept)         Semantic-Type of X

            84 Wrong Instances in
     Sample of 100 with Semantic Inversion
               Description             #        %

     Too General Semantic Type of          58   69%
     Too Specific Semantic Type of         16   19%
     Wrong Parent-Child Relationship        7       8%

     Ambiguous Child                        3       4%

     TOTAL                                 84   100%
     Preliminary Evaluation Study with NAT

        Compare paper-based auditing and NAT-
         based auditing.
        Counterbalanced groups.
        Recall improves with NAT use. Auditors
         seem willing to investigate more concepts.
        Precision stays the same. Auditors’ mental
         process does not improve.

72                                                    72
                Conclusions on Past

        Preliminary study showed that people are
         more successful finding errors with NAT than
         with paper sources. 
        Recall improved with the NAT, precision did
        NAT seems to nicely complement use of the

73                                                      73
               Conclusions on Present

        On SNOMED
         –   BLUESNO, BLUESNO-3D
        On Audit Support for the UMLS
         –   NAT Releases

                 Conclusions on Future

        There is a lot of exciting work to be done.
        On SNOMED
         –   SNET
        On Audit Support for the UMLS
         –   Relationship auditing, Audit Set Builder, audit by
             teams, audit workflows, reporting, management of
             audit teams

     Contact Information:

76                                                 76