J. Software Engineering & Applications, 2009, 2: 150-159
doi:10.4236/jsea.2009.23022 Published Online October 2009 (http://www.SciRP.org/journal/jsea)
Data Mining in Biomedicine: Current Applications and
Further Directions for Research
S. L. TING1, C. C. SHUM2, S. K. KWOK1, A. H. C. TSANG1, W. B. LEE1
Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hong Kong, China; 2Department of
Computing, The Hong Kong Polytechnic University, Hong Kong, China.
Received January 16th, 2009; revised June 18th, 2009; accepted June 24th, 2009.
Data mining is the process of finding the patterns, associations or relationships among data using different analytical
techniques involving the creation of a model and the concluded result will become useful information or knowledge.
The advancement of the new medical deceives and the database management systems create a huge number of data-
bases in the biomedicine world. Establishing a methodology for knowledge discovery and management of the large
amounts of heterogeneous data has become a major priority of research. This paper introduces some basic data mining
techniques, unsupervised learning and supervising learning, and reviews the application of data mining in biomedicine.
Applications of the multimedia mining, including text, image, video and web mining are discussed. The key issues faced
by the computing professional, medical doctors and clinicians are highlighted. We also state some foreseeable future
developments in the field. Although extracting useful information from raw biomedical data is a challenging task, data
mining is still a good area of scientific study and remains a promising and rich field for research.
Keywords: Data Mining, Biomedicine
1. Introduction Various techniques are used in different areas of bio-
medicine, including genomics, proteomics, medical di-
With the tremendous improvement in the speed of com- agnosis, effective drug design and pharmaceutical indus-
puter and the decreasing cost of data storage, huge vol- try.
umes of data are created. However, data itself has no In this paper, we would first give a brief outline on
value. Only if data can be changed to information, it be- what is data mining, its position or role in the knowledge
comes useful. In order to generate meaningful informa- discovery process and the basic principles of some com-
tion, or knowledge from database, the field of data min-
monly used data mining techniques. Next, we present our
ing was born. The data mining field is about two decade
investigation results of the applications of the data min-
old. Early pioneers such as U. Fayyad, H. Mannila, G.
ing in the biomedicine aspect, which includes the area of
Piatetsky-Shapiro, G. Djorgovski, W. Frawley, P. Smith,
and others found that the traditional statistical techniques biology, medicine, pharmacy and health care. Lastly, we
were not adequate to handle the mass amount of data. discuss some difficulties of data mining in biomedicine
They recognized the need of better, faster and cheaper and the possible direction for the future development.
ways to deal with the dramatic increase in the amount of 2. What is Data Mining?
Nowadays, besides the numerous number of databases Data mining (DM) is the process of finding the patterns,
created and accumulated in a dramatic speed, data is no associations or relationships among data using different
longer restricted to numeric or character only especially analytical techniques involving the creation of a model
in the biomedicine aspect. The advanced medical de- and the concluded result will become useful information
ceives and database management systems enable the in- or knowledge. DM can also be expressed as
tegration of the different types of high dimensional mul- Nontrivial extraction of implicit, previously un-
timedia data (e.g. text, image, audio, and video) under known, and potentially useful information from data ;
the same umbrella. Establishing a methodology for kno- and
wledge discovery and management of large amounts of Making sense of large amounts of mostly unsuper-
heterogeneous data has therefore bec