Recognizing the Electronic Medical Record Data from Unstructured Medical Data Using Visual Text Mining Techniques
Shared by: ijcsiseditor
Categories
Tags
IJCSIS, call for paper, journal computer science, research, google scholar, IEEE, Scirus, download, ArXiV, library, information security, internet, peer review, scribd, docstoc, cornell university, archive, Journal of Computing, DOAJ, Open Access, June 2011, Volume 9, No. 6, Impact Factor, engineering, international, proQuest, computing, computer, technology
-
Stats
- views:
- 155
- posted:
- 7/5/2011
- language:
- English
- pages:
- 11
Document Sample


(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, June 2011
Recognizing The Electronic Medical Record Data
From Unstructured Medical Data Using Visual
Text Mining Techniques
Prof. Hussain Bushinak Dr. Sayed AbdelGaber Mr. Fahad Kamal AlSharif
Faculty of Medicine Faculty of Computers and Information Collage of Computer Science
Ain Shams University Helwan University Modern Academy
Cairo, Egypt Cairo, Egypt Cairo, Egypt
Abstract: Computer systems and communication technologies 2. Help to derive data directly from the electronic record,
made a strong and influential presence in the different fields making research-data collection by product of routine
of medicine. The cornerstone of a functional medical clinical record keeping. .
information system is the Electronic Health Records (EHR)
management system. EHR implementation and adoption face 3. Help to Move from paper-based health care system to
different barriers that slow down its deployment in different secure electronic medical records which will save lives
organizations. This research focuses on resolving the most and reduce health care costs.
public barriers, which are data entry, unstructured clinical
data modifying the physician work flow. This research
4. Help in Early detection of infectious disease by
proposed a solution, which use Text mining and Natural advanced data collection, fusion and processing
language processing techniques.This solution tested and techniques which would be at the forefront in spotting
verified in four real-world clinical organizations. The the emergence of new diseases, and crucial to tracking
suggested solution proved correcteness and perciseness with the spread of known diseases[2].
91.88%..
II.ELECTRONIC HEALTH RECORD ,DEFINITION AND MODELS
Keywords: Electronic Health Reacord, Textmining, EHR defined as longitudinal electronic record of
Unstructured Medical Data , medical Data entry, Health patients' health information generated by one or more
Information Technology. encounters in any care delivery setting. This information
includes, but not limited to, patient demographics, progress
I.INTRODUCTION notes, examinations details like symptoms and findings,
medications, vital signs, past medical history,
The paper-based medical record is woefully inadequate immunizations, laboratory data, and radiology reports. The
for meeting the needs of modern medicine. It arose in the EHR automates and streamlines the clinician's workflow.
19th century as a highly personalized "lab notebook" that The EHR has the ability to generate a complete record of a
clinicians could use to record their observations and plans clinical patient encounter as well as supporting other care
so that they could be reminded of pertinent details when directly or indirectly related activities via interface
they next saw that same patient. There were no bureaucratic including evidence-based decision support, quality
requirements, no assumptions that the record would be used management, and outcomes reporting. The EHR means a
to support communication among varied providers of care, repository of patient data in a digital form stored and
and remarkably few data or test results to fill up the exchanged securely and accessible by multiple authorized
record’s pages. The record that met the needs of clinicians a users. [2][3][4]
century ago has struggled mightily to adjust over the
decades so as to accommodate to new requirements as
health care and medicine have changed which leads to the There are many EHR architectural models that can be
existence of Health Information Technology (HIT) [1]. used all over the world. The most two popular EHR models
are:
HIT allows comprehensive management of medical
knowledge and its secure exchange among health care
1. Central Repository Model
consumers and providers. Broad uses of HIT will:
The center of EHR model will be the repository, which
1. Help to eliminate the manual tasks of extracting data
will be fed by the existing applications in different care
from charts or filling out specialized datasheets.
locations such as hospitals, clinics, and family physician
practices. The feed from these applications will be
messaging based on the pre-agreed standards. The
messaging needs to be based well-defined standards, for
25 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, June 2011
example the HL7. Reference Information Model (RIM) for repository using a shared database or by providing a
which XML could be used as the recommended common user interface to all hosted applications and
Implementation Technology Specification (ITS). [5] extracting data from these systems using a portal whose
authentication and authorization mechanism can also be
controlled at the data center level as shown in figure 3. [5]
Figure 1. EHR Central Repository Model
Figure 3. Shared Services Model
The event-driven messages that need to be sent and
stored in the repository will essentially be event-based III.BARRIERS OF THE ELECTRONIC HEALTH RECORD
summaries as shown in figure (2). The event-based IMPLEMENTATION
summaries stored in the repository can be queried and Implementation of EHR faces different barriers, but
retrieved by different clinicians who are treating the these barriers vary from one environment to another.
patients in different scenarios and by different clinical Hereafter, the main focus will be on the general barriers
settings. The retrieval and access of data from the that exist in most of EHR implementation attempts, these
repository is subject to establishing that the clinicians barriers are:
legitimately access the data for treating only those patients
who are in their care. The retrieval is done through 1. Financial Barriers
messaging which can be done either through synchronous Financial barriers are divided into the following points:
or asynchronous messages depending on the urgency,
complexity, and importance of the data that is being High Costs: These costs are divided into two
retrieved. [5] main parts, initial cost and ongoing cost. [6]
Under-developed business case: This barrier
raised because of the following: Uncertainty
of EHR returns on investment, Financial
benefits are only achieved on the long run and
The main objective and benefits of EHR is to
provide a high quality medical service for the
citizens. [6]
2. Technological Barriers
Technological barriers are divided into four points: [7]
Inadequate technical support
Figure 2. EHR Message Events Inadequate data exchange
Security and privacy
Lack of standards
2. Managed Services Model 3. Physicians Attitudinal and Behavioral Barriers in data
entry:
The managed services model is based on hosting
applications for different care providers and care settings in Many health information system projects fail due to
a data center by a consortium, which may consist of group attitudes, behaviors, barriers in data entry and lack of
of infrastructure providers, system integrators, and systematic consideration of human-centered computing
application providers. The hosted applications can be used issues such as usability, workflow, organizational change,
to provide an effective EHR by building a common and process reengineering. There are two major factors that
26 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, June 2011
lead to sluggish performance of this EHR system, these Textual Objects: Based on a written or printed
factors are: complexity of the Graphical User Interface language, such as clinical reports, nursery
(GUI) and system response time. This forces clinician to notes and examination sheets. [11]
see fewer patients and have longer workdays, largely
because of the extra time needed to use the system. [8]
In 2004,Lisa Pizziferri and others concluded that the Using unstructured data for storing clinical data has the
benefits of using EHR system can be achieved and accepted following limitations:
by physicians if only the physicians do not need to sacrifice The data is not consumable from a semantic
their time with patients or other activities during clinic level without a compatible interface or
sessions. Physicians recognize the quality improvements application.
achieved by EHRs, but their time should be saved by
decreasing the time required for data entry in EHR systems. Any technology cannot be necessarily gained
[9] insight into the context of the information
unless it can actually be read.
4. Organizational Change Barriers
6. Barriers of using unstructured data in Electronic Health
This category contains many points, these points are: Record:
Aggregation of information across all the records in
Design of and alignment with workflow and a large repository could bring benefits for clinical
office integration: research. When physicians work with structured data,
54.2 percent out of the 5000 respondents they could receive alerts of the drugs that have bad
reported that they are worried about slower interaction together which enables them to enhance
workflow and low productivity according to the treatment process and avoid the medication errors;
the American Academy of Family Physicians but this cannot be done with unstructured data [12].
survey results (American Academy of Family
Physicians 2004). [10] IV.SURVEYING THE SOLUTIONS OF EHR DATA ENTRY
BARRIERS:
Migration from paper-based systems:
In October 2010, Ergin Soysal, Ilyas Cicekli, and
Staff training: Nazife Baykal designed and developed an ontology
based information extraction system for radiological
5. The format of Clinical Data store in EHR systems reports. [15]
Generally speaking, there are two main types of The main goal of this technique is to extract and
data store shapes: structured data and convert the available information in free text Turkish
unstructured data. radiology reports into a structured information model
using manually created extraction rules and domain
Structured data: Structured data is a data that ontology. This technique extracts data from the
has a relational data model and enforce radiological reports, which is a free text written by
composition to the atomic data types. physicians and insert it as a structured data into the
Structured data is managed by technology that EHR. [13]
allows for querying and reporting against
predetermined data types and understood
relationships, like patient demographics, However, this technique has the following
laboratory tests, etc. [11] drawbacks:
Unstructured data: Unstructured data consists It concentrates mainly on abdominal
of any data stored in an unstructured format at radiology reports.
an atomic level. That is, in the unstructured It does not use a huge and trusted medical
content, there is no conceptual definition and expressions repository, which may reduce
no data type definition - in textual documents, the quality of information extraction
a word is simply a word. [11] process. Consequently, wrong clinical
information will be recorded.
Unstructured data consists of two basic categories: In September 2010, Adam Wright, Elizabeth S.
Bitmap Objects: Inherently non-language Chen, and Francine L. Maloney developed a technique
based, such as X-rays, radiology, video or for identifying associations between medications,
audio files. laboratory results and problems. They developed a
27 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, June 2011
knowledge base of medication and laboratory result It does not use spelling correction.
problems associations in an automated fashion. It was There is no clear structure data model to
based on two data mining techniques; frequent item store the extracted data from the clinical
set mining and association rule mining. This technique report.
was successfully able to identify a large number of It does not use a huge and trusted data
clinically accurate associations. A high proportion of source for medical expressions like Unified
high-scoring associations were adjudged clinically Medical Language Systems (UMLS).
accurate when evaluated against the gold standard
(89.2% for medications with the best-performing In July 2010, another technique for automatically
statistic, chi square, and 55.6% for laboratory results extracting information needed from complex clinical
using interest) [14]. However, this technique has the questions was developed by Yong-gang Cao, James J.
following drawbacks: Cimino, John Ely and Hong Yu. They built a fully
automated system Ask EHRMES Help clinicians
The researchers assumed that patients’ data extract and articulate multimedia information from
was structured. literature to answer their ad hoc clinical questions.
Building the knowledge base concentrated This system automatically retrieves, extracts, and
only on patient’s problems, medications integrates information from the literature and other
and laboratory results, which mean the information resources and attempts to formulate this
other data, such as the patient’s history, information as answers in response to ad hoc medical
diagnosis, and procedures are not in questions posted by clinicians, all of which can be
account. achieved within a time-frame that meets their demands
Data entry is done through traditional GUI. [17]. This technique succeeds in clinical question
So, this solution did not enhance the answering and in identifying the category of the
physician workflow. question but in the EHR system adoption process
faced the following limitations:
This technique extracted the clinical
In September 2010, a system for misspellings in information to identify the question
drug information system queries was developed by category but not to store this information in
Christian Senger, Jens Kaltschmidt, Simon P.W. the EHR repository.
Schmitt, Markus G. Pruszydlo and Walter E. Haefeli. It works only on question answering but
This system attempted to solve the problem of drug’s not in the data entry process.
data entry in Drug Information System (DIS). The It does not enhance the physician workflow
researchers evaluated correctly spelled and misspelled during the examination process.
drug names from all queries of the University Hospital Although the previous techniques attempted to solve
of Heidelberg. The results identified that search the EHR data entry barrier but it has the following
engines of DIS should be equipped with error-tolerant limitations:
search capabilities. Auto-completion lists might These techniques concentrate on specific
expedite searches but might fail regularly due to the parts of data, such as diseases and leaves.
high frequency of typographic errors already in initials. The used medical expression repository
It improved the DIS data entry by using spelling does not contain all the expressions or the
corrected tools to make the drug information semantic relations between them.
understandable and available, but it concentrated only Some of these techniques store the EHR
on DIS without examination, history, and procedure data as free text (unstructured data form).
data [16]. The physician workflow has some
modifications which, in turn, leads to more
In august 2010, a technique was developed by physical and mental efforts and reduces the
Yong-gang Cao, James J. Cimino, John Ely and Hong physician’s productivity.
Yu. It was an automated identification of diseases and
diagnosis in clinical records. This technique presents
V. BRIDGING THE UNSTRUCTURED DATA TO STRUCTURED
an approach for a prototyping of a diagnosis classifier
EHR
based on a popular computational linguistics platform
[18]. This technique has the following limitations: The suggested idea is to convert the unstructured
It focuses only on the diseases key words free text clinical data to structured EHR data without
to be extracted and ignores other important modifying the workflow of physicians or adding any
parts like operations, symptoms, additional physical or mental effort to them. Figure (4)
finding…etc. shows the algorithm of the suggested technique.
28 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, June 2011
Figure 6 Spell Check input and output
Step 3: Text mining with Natural Language Processing
Techniques
In this step, the resulted data will be cleaned and
partitioned into statements. to be classified and coded;
Using text mining and NLP all medical data will be
classified and coded in the form of multiple statements
and remove the unwanted words. This step consists of:
[19]
Text preprocessing,
Figure 4 Objective Technique Steps Part of speech tagging,
Statements Segmentation,
Noun phrase extraction.
Step1: Optical Character Recognition OCR The declaration of each pervious component is
The physician writes his/her diagnoses as regular on showing in the following.
pen-pad, paper or tablet PC. If the clinical report wrote 1. Text preprocessing: Is called tokenization or text
on paper, it will need to scan it. The clinical report normalization and it does include the following
data will be stored as image of a free hand text which steps: [19]
can be process. This free hand text image scans with Throw away unwanted stuff (e.g.,
OCR tool to convert to machine encoded text. The unwanted brackets and tags).
Details of this step represented in figure (5). Word boundaries: white space and
punctuations.
Stemming (Lemmatization): This is
optional. English words like ‘look’ can be
inflected with morphological suffixes to
produce ‘looks, looking, looked’. They
share the same stem ‘look’. Often (but not
always) it is beneficial to map all inflected
forms into the stem. This is a complex
process since there can be many
exceptional cases (e.g., department vs.
Figure 5 OCR and Handwriting input and output depart, be vs. were). The most commonly
used stemmer is the Porter Stemmer.
Step 2: Spelling Corrector However, there are many others.
Machine encoded text may include spelling errors Stop word removal: the most frequent
which may yield wrong information during the words often do not carry much
extraction process. So, all the incorrect spelling words
meaning.
will be correct to move to the next step. This step
requires a medical dictionary that contains most of the Capitalization, case folding: often it is
medical expressions in different forms such as verbs, convenient to lower case every
adjectives, nouns… etc. Figure (6) represent the character.
details of this step.
2. Part of speech tagging: A Part-Of-Speech Tagger
(POS Tagger) is a piece of software that reads text
in some language and assigns parts of speech to
each word (and other token), such as nouns, verbs,
adjectives, etc. [19]
3. Statements segmentation: The output of this part
divides the clinical text into several statements.
[19]
29 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, June 2011
4. Noun phrase extraction: In this part, all noun
phrases are extracted and the complex noun
phrase is decomposed into smaller noun phrases.
Figure 8 UMLS expressions coding
The pseudo code of UMLS coding algorithm can be:
For each Statement S in Statements //in physician
sheet
Begin
For each noun-phrase N in S
Begin
If N exists in UMLS then,
Extract N and C // where c is the
Figure 7 Text mining and NLP tasks UMLS code
Put N with C as pair <N, C>
Step 4: Unified Medical Language System (UMLS) End if
Coding End
To identify the clinical information, there is a need for End
a huge repository for all clinical expressions to extract
the matched clinical expressions. UMLS used to
achieve this purpose. The UMLS is a compendium of Step 5: Classify EHR Components
many controlled vocabularies in the biomedical The suggested technique applied on physician’s
sciences and created in 1986. It provides a mapping examination sheet. The examination sheet contains the
structure among these vocabularies and allows following classes:
translating among the various terminology systems. It History
may be viewed as a comprehensive thesaurus and Examination
ontology of biomedical concepts. [20] Diagnosis
Procedure
Each part treated as a class and all coded clinical data
UMLS consists of the following components: [20] that were produced from the previous steps classified
Metathesaurus, the core database of the into one of the previous classes.
UMLS, a collection of concepts and terms
from the various controlled vocabularies The first step in the classification process is building a
and their relationships. collective set of features that is typically called a
Semantic Network, a set of categories and dictionary. The UMLS clinical expressions in the
relationships that are being used to classify dictionary form represent the base to create a
and relate the entries in the Metathesaurus. spreadsheet of numeric data corresponding to the
Specialist Lexicon, a database of previous defined classes.
lexicographic information to be used in
natural language processing.
A number of supporting software tools.
Morphologically analyzed words are compared to the
UMLS entries to find the best matched expression TABLE (1): CLASSES DICTIONARY
according to its Morphological position. Each noun
phrase which matches a clinical expression entry in
the UMLS, put as a pair that contains the noun phrase
with its UMLS’s clinical codes.
v
30 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, June 2011
cosine value is close to 1 this means that the clinical
phrase is more similar to the compared class.
Each row defines a class and each column represents a
UMLS code. The cell in the spreadsheet represents a Step 6: Storing data in EHR Repository
measurement of the feature corresponding to the The classified clinical phrase stored in its class inside
column and the class corresponding to the row. The the EHR database with its matched UMLS code. For
dictionary of words covers all the possibilities and the example, a physician wrote the following:
number corresponds to the columns. All cells values
ranged between zero and one depending on whether There is enlarged prostate with tender base of the bladder.
the words were encountered in the Class or not. The
form of classes’ dictionary is shown in table (1). This statement contains two findings, and then this
statement compared with each class. The cosine vector
The second step is measuring the similarity between scores for this statement against each defined class
extracted expressions and the defined classes then according to the previous equations are calculated.
classify each expression to the most similar class. The The winning class will be the high score one. The data
Cosine algorithm selected to calculate the Similarity will store in the winning class with its UMLS codes as
between the extracted clinical phrases and predefined pairs inside EHR repository:
classes. Steps of Cosine Similarity algorithm are: < enlarged prostate, Finding>
Compute the similarity of new clinical < tender base of the bladder, Finding>
phrase to all Classes in Dictionary. The EHR put in a structured form for analysis and data
Select the Class that is most similar to the mining operation, or as a perfect resource for decision
new clinical phrase. support system.
The class which occurs most frequently is
the similar one.
VI. THE EXPERIMENTAL STUDY
The aim of the experiment is to prove the success of
the suggested technique in a real world cases. For any
experiment, there are some hypotheses; the hypotheses
of this experiment are:
Physician has little experience of computer
using.
Physician’s handwriting is readable.
The used medical abbreviations should be
standard.
The experiment applied during the
examination session.
Figure 9: Computing similarity scores for New Clinical Phrase
The required equipments to implement the
experiment are:
For cosine similarity, only positive words shared by An electronic pen pad.
the compared phrases are considered. Frequency of A Laptop or personal computer.
word occurrence is also valued. The clinical phrase is Windows vista or later
compared with each class by the following equation: SQL server 2008
[21] Microsoft office 2007 or later (For
applying OCR in Pin pad)
Norm (P) = W (j): is the weight of the word phrase in .Net framework 4
class UMLS database system
Cosine (P1, P2) = wp1 (j) * wp2 (j))/ (Norm (P1) * Medical dictionary (for spelling correction)
Norm (P2)) The implementation of the experimental study is
Wpi: is the weight of the word phrase in class i going through the following steps:
The cosine similarity of two Classes will range from 0 Step 1: At the nurse office the patient
to 1. The angle between two term frequency vectors demographics data recorded using the following
cannot be greater than 90°, consequently, when the screen.
31 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, June 2011
Figure 12: Applying OCR on the diagnosis sheet
Step 4: After the OCR done, the system starts to
checks and corrects the spelling errors of the
Figure 10: EHR demographics form examination data according to the installed
medical dictionary through an interaction session
with the physician.
Step 2: The physician uses the pen pad to write
the diagnosis.
Figure 11: Pen pad to Computer Form
The physician has the freedom to erase, add or
modify any partition of his/her diagnosis. This
step helps him/her to work as regular without any Figure 13: Applying spell check on the examination text
additional effort. The data is directly recorded on
the computer which will help the physician to
retrieve it easy with its form or as structured data.
Step 3: After the physician finished his/her hand
writing, he/she press OCR button to convert the
diagnosis from image form to machine coded text
as shown in the following figure: Step 5: After the spelling correction done, the
physician presses “insert into EHR” button to
convert the diagnosis data from unstructured to
the structured form. Conversion is done through
the following steps:
Text preprocessing: All brackets, unwanted
stuff, and word boundaries are removed.
32 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, June 2011
Parts of speech tagging: Assigning parts of o One tablet twice daily for three
speech to each word. months
Statements segmentation: Examination text o One tablet
is split into multiple statements. o Twice daily
Phrase tagging: Each phrase is tagged with o Three months
the suitable code to identify all phrases o R3 Depavit B12 ampule
contained in the diagnosis sheet.
The output of this step is the examination of Step 7: All noun phrases are coded with UMLS
words with their parts of speech; this output exists codes. The output of this step represented in table
in the following format: (2).
(TOP (S (NP (DT A) (ADJP (NP (CD 15) (NNS years)) (JJ TABLE (2): NOUN PHRASES WITH THEIR UMLS CODES.
old)) (JJ female) (NN patient)) (VP (VBZ complains) (PP (IN
from) (NP (JJ nocturnal) (NN enuresis))) (PP (IN since) (NP
(NN birth)))) (. . .)))
(TOP (S (NP (NP (JJ Plain) (NN X-ray)) (PP (IN of) (NP (DT
the) (NN abdomen)))) (VP (VBD was) (ADJP (JJ free))) (. .)))
(TOP (S (NP (JJ Abdominal) (NN ultra) (NN sonography))
(VP (VBD was) (ADJP (JJ free))) (. .)))
(TOP (S (NP (PRP he)) (VP (VBZ has) (NP (NP (NNP
Enuresis)) (SBAR (S (NP (DT The) (NN patient)) (VP (MD
should) (VP (VB receive))))) (: :) (NP (NP (NNP R1) (NNP
Uipam) (NN tablet)) (NP (NP (CD one) (NN tablet)) (NP (RB
twice) (RB daily)) (PP (IN for) (NP (CD three) (NNS
months))))))) (. .)))
(TOP (S (PP (IN R2) (NP (NNP Dipripam) (CD 20) (NN mg)
(NN capsule))) (NP (NP (CD one) (NN tablet)) (NP (RB
twice) (RB daily)) (PP (IN for) (NP (CD three) (NNS
months)))) (. .))) (TOP (S (NP (DT R3) (NNP Depavit) (NNP
B12) (NN ampule)) (. .)))
Figure 14: Output of Text mining technique
Each statement got score according to UMLS
Noun Phrase Extraction: codes and the class’s dictionary which declared in
All noun phrases are extracted and table (1). Table (3) shows the statements and their
compounded. Noun phrases are divided scores.
into a smaller noun phrases, such as the
following:
o A 15 years old female patient TABLE (3): STATEMENTS’ SCORE.
o 15 years
o Nocturnal enuresis since birth
o Birth
o Plain X-ray of the abdomen
o Plain X-ray
o The abdomen
o Abdominal ultra sonography
o Enuresis
o The patient
o R1 Uipam tablet
o One tablet twice daily for three
months Step 8: According to the scores showed in table
o One tablet (3), the statements classified into their classes.
o Twice daily The predefined classes are:
o Three months History
o Dipripam 20 mg capsule Examination
Diagnosis
Procedure
33 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, June 2011
The classifier uses the COS similarity algorithm Table (6) shows the overall precession
to classify each statement according to the class percentage in each of tested department.
dictionary. Table (4) shows the score of each
statement relative to nearst class.
TABLE (6): RESULTS OF THE EXPERIMENTAL STUDY.
TABLE (4): COS SIMILARITY SCORES FOR EACH CLASS.
Department Overall Precise
Surgical Oncology 92.96%
Surgery Urology 91.55%
Cardiology 92.33 %
General Surgery 88.61%
Overall precession 91.36
Some factors affect the results, such as quality of
physician hand writing. The effect of this factor clears
in the result of experiment four, since it is the lowest
precision percentage (91.36 %). High precision OCR
tool can minimize the effect of this factor; but it may
Step 9: After determining the winning class for be expensive. The results indicated that the suggested
each statement, each noun phrase with its UMLS technique success with high percentage in a real world
code saved inside the EHR in the winning class as experiment, which means that this technique can be
a paired tag. Table (5) shows this format. applied in the real live in future.
TABLE (5): DATA THAT INSERTED INSIDE THE EHR
VIII. CONCLUSION
The suggested technique succeeded in working as a
bridge between unstructured and structured medical
data. The medical data stored inside the EHR system
in its right position without any additional physical or
mental effort by physician, which in turn satisfy the
main objective of this research.
REFERENCES
[1] Institute of Medicine. “Review of the Adoption and
Implementation of Health IT Standards by the DHHS
Office of the National Coordinator for Health
Step 10: This extracted information compared Information
with the physician manual results to identify the Technology”http://www.iom.edu/Activities/Workforc
suggested technique precision. e/HealthITStandards.aspx
VII. RESULTS DISCUSSION [2] Richard Dick, Elaine B. Steen, and Don Detmer, “The
Computer Based Patient Record: An Essential
The experimental study conducted on four Technology for Health Care”, National Academy
Medical departments. In each department 10 Press, 1997.
diagnosis sheets tested. The tested departments
are: [3] See HIMSS web page for the consensus definition of
Surgical Oncology an electronic health record.
Surgery Urology http://www.himss.org/ASP/topics_ehr.asp.
Cardiology
General Surgery [4] J.H. van Bemmel and M.A. Musen, “Handbook of
Medical Informatics”, Springer, 1997.
34 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, June 2011
[5] K. Ananda Mohan,” National Electronic Health [18] Dina Demner-Fushman, James G. Mork, Sonya E.
Record Models”, Tata Consultancy Services Shooshan, Alan R. Aronson ,“UMLS content views
(TCS),2004. appropriate for NLP processing of the biomedical
literature vs. clinical text”, Elsevierhealth, 2009.
[6] Miller, R. H. and Sim, Ida. “Physicians’ Use Of
Electronic Medical Records: Barriers And Solutions”.
[19] Malgorzata Marciniak,Agnieszka Mykowiecka,”
Health Affairs, 2004.
Aspects of Natural Language
Processing”,Springer,2009.
[7] Waegemann, “EHR vs. CPR vs. EMR. Healthcare
Informatics”, 2003.
[20] Catherine R. Selden,Betsy L. Humphreys,” Unified
[8] Himali Saitwala, Xuan Fengb, Muhammad Walji, Medical Language System: Current Bibliographies in
Vimla Patel, Jiajie Zhanga, ”Assessing performance of Medicine”, National institute of health,1990.
an Electronic Health Record (EHR) using Cognitive
[21] Jiawei Han,Micheline Kamber,” Data mining:
Task Analysis” , Elsevierhealth, 2010.
concepts and techniques”,Diana Cerra,2006.
[9] Lisa Pizziferri, Anne F. Kittler, Lynn A. Volk, Melissa
M. Honourb, Sameer Gupta, Samuel Wang, Tiffany
Wang, Margaret Lippincott, Qi Li and David W.
Bates,” Primary care physician time utilization before
and after implementation of an electronic health
record: A time-motion study”, Elsevierhealth,2004.
[10] American Academy of Family Physicians. “Family
Practice Management Monitor”, AAFP pushes for
affordable EMR system, 2004.
[11] Oleh Hrycko,” Electronic Discovery in Canada: Best
Practices and Guidelines”,CCH,2007.
[12] Angus Roberts , Robert Gaizauskas, Mark Hepple,
George Demetriou, Yikun Guo, Ian Roberts, Andrea
Setzer,” Building a semantically annotated corpus of
clinical texts”, Elsevierhealth,2009.
[13] Hanna M. Seidlingab, Marilyn D. Paternoac, Walter E.
Haefelib, David W. Bates,” Coded entry versus free-
text and alert overrides: What you get depends on how
you ask”, Elsevierhealth,2010.
[14] Adam Wright, Elizabeth S. Chenc, d and Francine L.
Maloney,” An automated technique for identifying
associations between medications, Laboratory results
and problems”, Elsevierhealth, 2010.
[15] Ergin Soysal, IlyasCicekli, NazifeBaykal,” An
ontology based information extraction system for
radiological reports”, Elsevierhealth, 2010.
[16] Christian Senger, Jens Kaltschmidt, Simon P.W.
Schmitt,Markus G. Pruszydlo, Walter E.
Haefeli ,“Misspellings in drug information system
queries: Characteristics of drug name spelling errors
and strategies for their prevention”, Elsevierhealth,
2010.
[17] Yong-gang Cao, James J. Cimino, John Ely, Hong Yu,
“Automatically extracting information needs from
complex clinical questions”, Elsevierhealth, 2010.
35 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
Related docs
Other docs by ijcsiseditor
Digital Images Encryption in Spatial Domain Based on Singular Value Decomposition and Cellular Automata
Views: 0 | Downloads: 0
Agent Behavior in Multiagent Systems: Issues and Challenges in Design, Development and Implementation
Views: 1 | Downloads: 0
Optimizing Cost, Delay, Packet Loss and Network Load in AODV Routing Protocols
Views: 2 | Downloads: 0
Get documents about "