This is an implementation of the Semantic Medline search by pji19056

VIEWS: 0 PAGES: 16

									SemMed in Pathema User’s Guide - Version 2.0

Table of Contents

1.0…Introduction
2.0…Navigation
3.0…SemMed Search Window
4.0…SemMed Graph Window
4.1……Viewer Pane
4.1.1……Nodes and Edges
4.2……Tool Pane
4.2.1……Information Tab
4.2.1.1………Concept Information
4.2.1.1.a………..Internal Data Fields
4.2.1.1.b………..External Data Fields
4.2.1.2………Relationship Information
4.2.1.2.a………..Citations
4.2.2……Search Tab
5.0…SemMed Word Cloud Window


Appendix 1…Semantic Types


1.0 Introduction

“SemMed in Pathema” is a modified implementation of the Semantic Medline search and
visualization tool (SemMed), a research and development project of the Cognitive
Science Branch, Lister Hill National Center for Biomedical Communications, U.S.
National Library of Medicine (NLM). SemMed summarizes MEDLINE citations
returned by a PubMed search through advanced natural language processing, and
provides a content-rich visual display interface for utilizing the search results. The
SemMed project at NLM is headed by Dr. Thomas Rindflesch, and is fully described in
Kilicoglu, et al. 2008.

At this time, Pathema is providing only pre-processed views of pre-computed SemMed
searches, rather then access to the full tool. This should help most users to take
immediate advantage of the benefits that the tool offers without extensive training or
assistance with interpretation of the results. It is envisioned that a future release will
contain the full version of SemMed to enable pathogen researchers to perform their own
searches. In the interim, the pre-computed and pre-arranged network diagrams should
provide significant value added to the user community. Those wishing to have access to
the full version of SemMed at NLM should feel free to contact Dr. Rindflesch directly
(trindflesch@mail.nih.gov).
Bug reports and help requests should be sent to clostridium@jcvi.org.

2.0 Navigation

The SemMed in Pathema tool can be found under the “Searches” tab on each clade’s
main page. Selecting the tool will launch a new SemMed search window.

Figure 1. Top view




3.0 SemMed Search Window

The SemMed search window is the main gateway to the pre-generated data graphs for the
Pathema organisms. Selection of the data involves three sequential steps, which can be
seen in Figure 2. The first phase is to select a “scientific category”, which is a
perspective on the data that permits the full corpus of information to be placed in specific
scientific context, from the first dropdown. There are scientific categories corresponding
to areas of general scientific interest: treatment of disease, diagnosis, substance
interactions, and pharmacogenomics. Once the summary type has been selected, the pre-
computed semantic data files that have been produced will appear as options in the
category subtype dropdown. Files are named in the form of “subject” + “semantic
relationship” (eg, “clostridium toxin causes”). Please note that not all subjects will be
available in all four categories (eg, “clostridium toxin causes” does not have a graph in
the diagnosis perspective). The third step is to select a format for viewing the data:
either graph (Section 4) or word cloud (Section 5). Pressing SUBMIT will facilitate the
launching of a SemMed application window containing that file, in either word cloud or
graph format, for further use. All fields can be reverted to original status using the
“reset” button. Note that all semantic graphs will relate to the clade of record in the
header (eg, Clostridium).

Figure 2. Selection of data to view in the search window
Please also note that instructions, publications, limitations, and contact information is
available in “Documentation and Contact Information” (Figure 2).

4.0 SemMed Graph Window

The SemMed graph window contains two separate panes; a viewer pane and a tool pane.
The viewer pane is the area in which the semantic connections diagram (“graph”) is
displayed. The tool pane contains several tabs which help the user navigate through the
information in the viewer pane. Both can be seen in Figure 3.

Figure 3. SemMed application window




4.1 Viewer Pane

The viewer pane contains the graph of the semantic relationships between terms within
an analysis. The graph consists of “nodes” and “edges”, which are covered in Section
4.1.1.

4.1.1 Nodes and Edges

The nodes and edges that comprise a semantic graph contain all the necessary
information to derive conclusions about the relationships between these objects.
Selecting a node or edge with the cursor will populate the appropriate region of the
information tab (see Section 4.2.1).
Nodes are represented as balls, each with a data label. One node is a “central node”, and
forms the basis for the analysis. In Figure 4, the central node is “Toxin” (ie, clostridium
toxin), and the graph represents the corpus of knowledge relating to that subject in that
perspective (see Section 3.0). Nodes have been arranged in these pre-computed graphs
to help with ease of viewing by the community. Please keep in mind that relationships
between non-central nodes are still valid in the overall context of the investigation.

Edges are the lines between the nodes. When more than one edge type appears on one
graph, they will be seen in different colors, and will be labeled in the relation labels tab
(see Section 4.2.3). The first data release has both single relation type graphs, which are
part of the graph title (eg, “clostridium toxin causes”), and multi-relationship type graphs.
For the latter, the nodes are color coded – please see “relation” in section 4.2.1.2.

Figure 4. Introduction to nodes and edges




Note also that the SemMed graph nodes, as seen in Figure 4 are colored-coded into 15
high-level groups representing biomedical categories, as can be seen in Figure 5.

Figure 5. Color-coding of nodes
4.2 Tool Pane

The tool pane contains several functionalities present in the form of separate tabs, each of
which helps provide specific information and data relevant to the interpretation of the
graph.

4.2.1 Information Tab

This tab in the tool pane provides information pertaining to the nodes and edges. When a
node is selected, the “Concept Information” area will be populated, while the
“Relationship Information” area is populated when an edge is selected. Note that the
exact image in Figure 6, with both concepts and relationships populated, will not be
seen. This is provided to show both types of information at once – in all cases, only one
of the two will be visible at a given time. Concept Information and Relationship
Information fields are described in Sections 4.2.1.1 and 4.2.1.2, respectively.

Figure 6. Detailed view of the information tab




4.2.1.1 Concept Information Fields

The concept information tab is populated with two broad types of information: external
data fields, in the form of “buttons” (UMLS, GHR, OMIM, and ENTREZ), and internal
data fields (Concept, CUI, Semantic type, and Number of predictions). The buttons, if
active, will pop up another window with the external data source results. Please note that
not all nodes will have entries in each of the four buttons - those with information will be
black, while those without will be gray.
Descriptions of the internal data fields can be found in Section 4.2.1.1.a. Descriptions of
the external data fields, along with examples, are provided in Section 4.2.1.1.b.

4.2.1.1.a Internal Data Fields

Concept – This is the label on the node selected.

CUI – The Methasaurus concept unique identifier (CUI) is an 8-character identifier
beginning with the letter "C" and followed by 7 digits in the Unified Medical Language
System (UMLS) of the National Library of Medicine (NLM). Each concept is assigned
such a CUI. The CUI has no intrinsic meaning but remains constant through time and
across versions. An example of a CUI and it’s meaning: C0006055 = botulinum toxin.
More information can be found in Campbell et al. 1998 (PMID: 9760390).

Semantic types – Semantic types are high level category descriptors of language that
provide a consistent categorization of all concepts represented in the National Library of
Medicine's Unified Medical Language System’s “Metathesaurus”. Metathesaurus is a
very large, multi-purpose, and multi-lingual vocabulary database that contains
information about biomedical and health related concepts, their various names, and the
relationships among them. Some typical examples that SemMed in Pathema users will
see include “aapp” (amino acid, peptide, or protein), “gngm” (gene or genome), and
“dsyn” (disease or syndrome). A full listing can be seen in Appendix 1.

Number of predictions – This is the number of MEDLINE records supporting a given
node.

4.2.1.1.b External Data Fields

UMLS

The purpose of the National Library of Medicine's Unified Medical Language System®
(“UMLS”) is to facilitate the development of computer systems that behave as if they
understood the language of biomedicine and health. To that end, they produce and
distribute the UMLS Knowledge Sources (databases) and associated software tools
(programs) for use by system developers in building or enhancing electronic information
systems that create, process, retrieve, integrate, and/or aggregate biomedical and health
data and information, as well as in informatics research.

More information at - http://www.nlm.nih.gov/research/umls/about_umls.html
PMID: 3544826
Figure 7. UMLS results for “buffers”




OMIM

Online Mendelian Inheritance in Man® (“OMIM”) is a comprehensive, authoritative, and
timely compendium of human genes and genetic phenotypes. The full-text, referenced
overviews in OMIM contain information on all known Mendelian disorders and over
12,000 genes. OMIM focuses on the relationship between phenotype and genotype.

More information at - http://www.ncbi.nlm.nih.gov/sites/entrez?db=omim
PMID: 8423603

Figure 8. OMIM results for “Diabetes”
ENTREZ

Entrez Gene is NCBI's database for gene-specific information. It does not include all
known or predicted genes, but instead focuses on the genomes that have been completely
sequenced, that have an active research community to contribute gene-specific
information, or that are scheduled for intense sequence analysis. The content of Entrez
Gene represents the result of curation and automated integration of data from NCBI's
Reference Sequence project (“RefSeq”), from collaborating model organism databases,
and from many other databases available from NCBI. Records are assigned unique, stable
and tracked integers as identifiers. The content includes nomenclature, map location,
gene products and their attributes, markers, phenotypes, and links to citations, sequences,
variation details, maps, expression, homologs, protein domains, and external databases.

More information at – http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene
PMID: 17148475

Figure 9. Entrez results for “Rho”




GHR

The Genetics Home Reference (“GHR”) is an information resource developed as part of
the National Library of Medicine's consumer health initiatives. The GHR's guiding
principle is to make the health implications of the Human Genome Project accessible to
the public, and as such it provides consumer-friendly information about the effects of
genetic variations on human health.

More information at - http://ghr.nlm.nih.gov/
PMID: 14728441
Figure 10. GHR results for “hypertensive disease”




4.2.1.2 Relationship Information Fields

This portion of the information tab allows users to extract the MEDLINE references of
interest from a semantic graph. This part of the information tab is populated when an
edge is selected from the semantic graph. An example of this can be seen in Figure 11.

Figure 11. Relationship information from a selected edge
Note that there are several fields in the relationship information section. The nouns in the
“subject” and “object” fields are related to each other in the form defined by the
“relation” field, forming a “semantic triad”. In the example in Figure 11, clostridium
toxin (SUBJECT, seen as “toxin”) stimulates PLA2G4A protein (OBJECT). There are 2
MEDLINE citations, containing 3 semantic predictions, supporting this semantic triad.

To retrieve the Medline abstracts that support a given semantic triad, the user will press
the “Citations” button within the relationship information. This action calls a window
containing the abstract and PMID hyperlink. The window should contain highlighting of
the phrase or phrases that support the semantic triad of interest. An example can be seen
in Figure 12. The scroll bar on the left of the inserted window allows the user to
navigate between abstracts, and the new window can be closed using the given button
that the bottom. The presence of an especially long abstract, one that cannot be displayed
in their entirety within a window of this size, cause the appearance of a second scrollbar
that pans within an abstract (not shown). Note that Figure 12 represents the retrieval of
results from the edge selected in Figure 11.

Figure 12. Retrieving the reference information for a given edge (semantic triad)




4.2.2 Search Tab
This functionality provides the user the option of searching for subjects of particular
interest within a graph. The search function uses a text string, which the user enters, to
interrogate the graph. Use of this functionality can be seen in Figure 13, where a search
is conducted for “infantile”. The node that returned by the search appears in the box
under the search button, and can be seen marked by the red arrow. Navigating to that
node is accomplished by selecting the title (text) of the returned node.
Figure 13. Using the search function




5.0 SemMed Word cloud Window

The newest version of the SemMed in Pathema application has a second type of interface,
one not available in the NLM version of SemMed, that of a word cloud. An example of
this new interface can be found in Figure 14. We anticipate that this will increase the
data richness of the interface, enabling more effective analyses. Note that in the word
cloud views provided, each phrase is part of a semantic condensate with the central node
of the graph (see Figure 4). This means that second-order edges (ie, those between
nodes where neither is the central node) are not currently preserved (eg, the edge between
“virulence factors” and “food poisoning” at the top of Figure 4). This may be addressed
in future releases. Note also in Figure 14 that page and data descriptions are available.

Figure 14. Word cloud interface
Word, or tag, clouds are a common navigational approach on the Web to navigate
complex data associations. It is a visual representation of text items wherein each word
or phrase has weights, displayed in the form of font size and color, which correspond to
frequency or other quantifiable properties. In the case of the SemMed in Pathema word
cloud, the size and darkness of the word or phrase is proportional to the number of
semantic condensates available. A study of users of this technology found that the
alphabetization feature commonly employed aids users in finding the information they
require quickly and easily, and that users access the cloud by scanning, rather than
reading each word (http://www2007.org/htmlposters/poster988/). As such, the SemMed
in Pathema word cloud interface will also have the words and phrases in alphabetical
order. Finally, note that the clade (eg, Clostridium), category (eg, pharmacogenomics),
and subtpe of semantic condensate data (eg, clostridium toxin causes) selected are all
displayed in the interface (Figure 14).

Normally, the words or phrases that comprise a word cloud are hyperlinks, in the form of
inline HTML elements, which permit the user to further explore the underlying data. The
main benefit of using word clouds are to draw attention to the strongest associations
between concepts using a flat rather than hierarchical structure to the data. The SemMed
in Pathema word cloud interface has this property as well; selecting a word or phrase of
interest will produce a pop-up window containing the Medline results (Figure 15). Note
that this window is distinct from that in Figure 12, which was produced from a SemMed
graph window, but retains the features of highlighting the semantic condensate in the
abstract text and possessing a PMID hyperlink. Note also that this window contains data
from both node and edge portions of the Information Tab (see section 4.2.1 and Figure
6). Finally, the “relationship source” and “relationship target” are the two node names in
the displayed semantic condensate (e.g., toxin causes extravasation), which derive from
the data file (eg, clostridium toxin causes) selected in the search window and the word or
phrase selected in the word cloud (e.g., extravastion).

Figure 15. Selection of Medline data underlying “extravasation” term in the word cloud
Appendix 1: SEMANTIC TYPES

 Abbreviation Unique Identifier (TUI)                    Full Name
     aapp              T116             Amino Acid, Peptide, or Protein
     acab              T020             Acquired Abnormality
     acty              T052             Activity
     aggp              T100             Age Group
     alga              T003             Alga
     amas              T087             Amino Acid Sequence
    amph               T011             Amphibian
     anab              T190             Anatomical Abnormality
     anim              T008             Animal
     anst              T017             Anatomical Structure
     antb              T195             Antibiotic
     arch              T194             Archaeon
     bacs              T123             Biologically Active Substance
     bact              T007             Bacterium
     bdsu              T031             Body Substance
     bdsy              T022             Body System
     bhvr              T053             Behavior
     biof              T038             Biologic Function
     bird              T012             Bird
     blor              T029             Body Location or Region
    bmod               T091             Biomedical Occupation or Discipline
    bodm               T122             Biomedical or Dental Material
     bpoc              T023             Body Part, Organ, or Organ Component
     bsoj              T030             Body Space or Junction
     carb              T118             Carbohydrate
     celc              T026             Cell Component
     celf              T043             Cell Function
     cell              T025             Cell
     cgab              T019             Congenital Abnormality
    chem               T103             Chemical
     chvf              T120             Chemical Viewed Functionally
     chvs              T104             Chemical Viewed Structurally
clas   T185   Classification
clna   T201   Clinical Attribute
clnd   T200   Clinical Drug
cnce   T077   Conceptual Entity
comd   T049   Cell or Molecular Dysfunction
crbs   T088   Carbohydrate Sequence
diap   T060   Diagnostic Procedure
dora   T056   Daily or Recreational Activity
dsyn   T047   Disease or Syndrome
edac   T065   Educational Activity
eehu   T069   Environmental Effect of Humans
eico   T111   Eicosanoid
elii   T196   Element, Ion, or Isotope
emod   T050   Experimental Model of Disease
emst   T018   Embryonic Structure
enty   T071   Entity
enzy   T126   Enzyme
evnt   T051   Event
famg   T099   Family Group
ffas   T021   Fully Formed Anatomical Structure
fish   T013   Fish
fndg   T033   Finding
fngs   T004   Fungus
food   T168   Food
ftcn   T169   Functional Concept
genf   T045   Genetic Function
geoa   T083   Geographic Area
gngm   T028   Gene or Genome
gora   T064   Governmental or Regulatory Activity
grpa   T102   Group Attribute
grup   T096   Group
hcpp   T068   Human-caused Phenomenon or Process
hcro   T093   Health Care Related Organization
hlca   T058   Health Care Activity
hops   T131   Hazardous or Poisonous Substance
horm    T125   Hormone
humn    T016   Human
idcn    T078   Idea or Concept
imft    T129   Immunologic Factor
inbe    T055   Individual Behavior
inch    T197   Inorganic Chemical
inpo    T037   Injury or Poisoning
inpr    T170   Intellectual Product
invt    T009   Invertebrate
irda    T130   Indicator, Reagent, or Diagnostic Aid
lang    T171   Language
lbpr    T059   Laboratory Procedure
 lbtr   T034   Laboratory or Test Result
lipd    T119   Lipid
mamm    T015   Mammal
mbrt    T063   Molecular Biology Research Technique
mcha    T066   Machine Activity
medd    T074   Medical Device
menp    T041   Mental Process
mnob    T073   Manufactured Object
mobd    T048   Mental or Behavioral Dysfunction
moft    T044   Molecular Function
mosq    T085   Molecular Sequence
neop    T191   Neoplastic Process
nnon    T114   Nucleic Acid, Nucleoside, or Nucleotide
npop    T070   Natural Phenomenon or Process
nsba    T124   Neuroreactive Substance or Biogenic Amine
nusq    T086   Nucleotide Sequence
ocac    T057   Occupational Activity
ocdi    T090   Occupation or Discipline
opco    T115   Organophosphorus Compound
orch    T109   Organic Chemical
orga    T032   Organism Attribute
orgf    T040   Organism Function
orgm    T001   Organism
orgt   T092   Organization
ortf   T042   Organ or Tissue Function
patf   T046   Pathologic Function
phob   T072   Physical Object
phpr   T067   Phenomenon or Process
phsf   T039   Physiologic Function
phsu   T121   Pharmacologic Substance
plnt   T002   Plant
podg   T101   Patient or Disabled Group
popg   T098   Population Group
prog   T097   Professional or Occupational Group
pros   T094   Professional Society
qlco   T080   Qualitative Concept
qnco   T081   Quantitative Concept
rcpt   T192   Receptor
rept   T014   Reptile
resa   T062   Research Activity
resd   T075   Research Device
rich   T006   Rickettsia or Chlamydia
rnlw   T089   Regulation or Law
sbst   T167   Substance
shro   T095   Self-help or Relief Organization
socb   T054   Social Behavior
sosy   T184   Sign or Symptom
spco   T082   Spatial Concept
strd   T110   Steroid
tisu   T024   Tissue
tmco   T079   Temporal Concept
topp   T061   Therapeutic or Preventive Procedure
virs   T005   Virus
vita   T127   Vitamin
vtbt   T010   Vertebrate

								
To top