Effective Classification Algorithms to Predict the Accuracy of Tuberculosis - A Machine Learning Approach
Tuberculosis is a disease caused by mycobacterium which can affect virtually all organs, not sparing even the relatively inaccessible sites. India has the world’s highest burden of tuberculosis (TB) with million estimated incident cases per year. Studies suggest that active tuberculosis accelerates the progression of Human Immunodeficiency Virus (HIV) infection. Tuberculosis is much more likely to be a fatal disease among HIV-infected persons than persons without HIV infection. Diagnosis of pulmonary tuberculosis has always been a problem. Classification of medical data is an important task in the prediction of any disease. It even helps doctors in their diagnosis decisions. In this paper we propose a machine learning approach to compare the performance of both basic learning classifiers and ensemble of classifiers on Tuberculosis data. The classification models were trained using the real data collected from a city hospital. The trained models were then used for predicting the Tuberculosis as two categories Pulmonary Tuberculosis (PTB) and Retroviral PTB(RPTB) i.e. TB along with Acquired Immune Deficiency Syndrome (AIDS). The prediction accuracy of the classifiers was evaluated using 10-fold Cross Validation and the results have been compared to obtain the best prediction accuracy. The results indicate that Support Vector Machine (SVM) performs well among basic learning classifiers and Random forest from ensemble with the accuracy of 99.14% from both classifiers respectively. Various other measures like Specificity, Sensitivity, F-measure and ROC area have been used in comparison.

(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 7, July 2011
Effective Classification Algorithms to Predict the
Accuracy of Tuberculosis-A Machine Learning
Approach
Asha.T S. Natarajan K.N.B. Murthy
Dept. of Info.Science & Engg., Dept. of Info. Science & Engg. Dept.of Info. Science & Engg.
Bangalore Institute of Technology P.E.S. Institute of Technology P.E.S.Institute of Technology
Bangalore, INDIA Bangalore,INDIA Bangalore,INDIA
Abstract— Tuberculosis is a disease caused by mycobacterium medical knowledge as has been proved in a number of medical
which can affect virtually all organs, not sparing even the data mining applications.
relatively inaccessible sites. India has the world’s highest burden
of tuberculosis (TB) with million estimated incident cases per Data classification process using knowledge obtained from
year. Studies suggest that active tuberculosis accelerates the known historical data has been one of the most intensively
progression of Human Immunodeficiency Virus (HIV) infection. studied subjects in statistics, decision science and computer
Tuberculosis is much more likely to be a fatal disease among science. Data mining techniques have been applied to medical
HIV-infected persons than persons without HIV infection. services in several areas, including prediction of effectiveness
Diagnosis of pulmonary tuberculosis has always been a problem. of surgical procedures, medical tests, medication, and the
Classification of medical data is an important task in the discovery of relationships among clinical and diagnosis data.
prediction of any disease. It even helps doctors in their diagnosis In order to help the clinicians in diagnosing the type of disease
decisions. In this paper we propose a machine learning approach computerized data mining and decision support tools are used
to compare the performance of both basic learning classifiers and which are able to help clinicians to process a huge amount of
ensemble of classifiers on Tuberculosis data. The classification data available from solving previous cases and suggest the
models were trained using the real data collected from a city probable diagnosis based on the values of several important
hospital. The trained models were then used for predicting the attributes. There have been numerous comparisons of the
Tuberculosis as two categories Pulmonary Tuberculosis(PTB)
different classification and prediction methods, and the matter
and Retroviral PTB(RPTB) i.e. TB along with Acquired Immune
remains a research topic. No single method has been found to
Deficiency Syndrome(AIDS). The prediction accuracy of the
classifiers was evaluated using 10-fold Cross Validation and the
be superior over all others for all data sets.
results have been compared to obtain the best prediction India has the world’s highest burden of tuberculosis (TB) with
accuracy. The results indicate that Support Vector Machine million estimated incident cases per year. It also ranks[20]
(SVM) performs well among basic learning classifiers and among the world’s highest HIV burden with an estimated 2.3
Random forest from ensemble with the accuracy of 99.14% from million persons living with HIV/AIDS. Tuberculosis is much
both classifiers respectively. Various other measures like more likely to be a fatal disease among HIV-infected persons
Specificity, Sensitivity, F-measure and ROC area have been used
than persons without HIV infection. It is a disease caused by
in comparison.
mycobacterium which can affect virtually all organs, not
sparing even the relatively inaccessible sites. The
Keywords-component; Machine learning; Tuberculosis; microorganisms usually enter the body by inhalation through
Classification, PTB, Retroviral PTB the lungs. They spread from the initial location in the lungs to
other parts of the body via the blood stream. They present a
diagnostic dilemma even for physicians with a great deal of
I. INTRODUCTION experience in this disease.
There is an explosive growth of bio-medical data, ranging
from those collected in pharmaceutical studies and cancer II. RELATED WORK
therapy investigations to those identified in genomics and Orhan Er. And Temuritus[1] present a study on tuberculosis
proteomics research. The rapid progress in data mining diagnosis, carried out with the help of Multilayer Neural
research has led to the development of efficient and scalable Networks (MLNNs). For this purpose, an MLNN with two
methods to discover knowledge from these data. Medical data hidden layers and a genetic algorithm for training algorithm
mining is an active research area under data mining since has been used. Data mining approach was adopted to classify
medical databases have accumulated large quantities of
genotype of mycobacterium tuberculosis using c4.5
information about patients and their clinical conditions.
algorithm[2].Rethabile Khutlang et.al. present methods for the
Relationships and patterns hidden in this data can provide new
89 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 7, July 2011
automated identification of Mycobacterium tuberculosis in III. DATA SOURCE
images of Ziehl–Neelsen (ZN) stained sputum smears obtained The medical dataset we are classifying includes 700 real
using a bright-field microscope.They segment candidate records of patients suffering from TB obtained from a city
bacillus objects using a combination of two-class pixel hospital. The entire dataset is put in one file having many
classifiers[3]. records. Each record corresponds to most relevant information
Sejong Yoon, Saejoon Kim [4] proposes a mutual of one patient. Initial queries by doctor as symptoms and some
information-based Support Vector Machine Recursive Feature required test details of patients have been considered as main
Elimination (SVM-RFE) as the classification method with attributes. Totally there are 11 attributes(symptoms) and one
feature selection in this paper.Diagnosis of breast cancer class attribute. The symptoms of each patient such as age,
using different classification techniques was carried chroniccough(weeks), loss of weight, intermittent fever(days),
out[5,6,7,8]. A new constrained-syntax genetic programming night sweats, Sputum, Bloodcough, chestpain, HIV,
algorithm[9] was developed to discover classification rules radiographic findings, wheezing and class are considered as
for diagnosing certain pathologies.Kwokleung Chan et.al. [10] attributes.
used several machine learning and traditional calssifiers in the Table I shows names of 12 attributes considered along with
classification of glaucoma disease and compared the their Data Types (DT). Type N-indicates numerical and C is
performance using ROC. Various classification algorithms categorical.
based on statistical and neural network methods were
presented and tested for quantitative tissue characterization of Table I. List of Attributes and their Datatypes
diffuse liver disease from ultrasound images[11] and No Name DT
comparison of classifiers in sleep apnea[18]. Ranjit Abraham
1 Age N
et.al.[19] propose a new feature selection algorithm CHI-WSS
to improve the classification accuracy of Naïve Bayes with 2 Chroniccough(weeks) N
respect to medical datasets. 3 WeightLoss C
Minou Rabiei et.al.[12] use tree based ensemble classifiers for 4 Intermittentfever N
the diagnosis of excess water production. Their results
5 Nightsweats C
demonstrate the applicability of this technique in successful
diagnosis of water production problems. Hongqi Li, Haifeng 6 Bloodcough C
Guo et.al. present[13] a comprehensive comparative study on 7 Chestpain C
petroleum exploration and production using five feature
selection methods including expert judgment, CFS, LVF, 8 HIV C
Relief-F, and SVM-RFE, and fourteen algorithms from five 9 Radiographicfindings C
distinct kinds of classification methods including decision tree,
10 Sputum C
artificial neural network, support vector machines(SVM),
Bayesian network and ensemble learning. 11 Wheezing C
Paper on “Mining Several Data Bases with an Ensemble of 12 Class C
Classifiers”[14] analyze the two types of conflicts, one created
by data inconsistency within the area of the intersection of the
data bases and the second is created when the meta method
IV. CLASSIFICATION ALGORITHMS
selects different data mining methods with inconsistent
competence maps for the objects of the intersected part and SVM (SMO)
their combinations and suggest ways to handle them. The original SVM algorithm was invented by Vladimir
Referenced paper[15] studies medical data classification Vapnik. The standard SVM takes a set of input data, and
methods, comparing decision tree and system reconstruction predicts, for each given input, which of two possible classes
analysis as applied to heart disease medical data mining. the input is a member of, which makes the SVM a non-
Under most circumstances, single classifiers, such as neural probabilistic binary linear classifier.
networks, support vector machines and decision trees, exhibit A support vector machine constructs a hyperplane or set of
worst performance. In order to further enhance performance hyperplanes in a high or infinite dimensional space, which can
combination of these methods in a multi-level combination be used for classification, regression or other tasks. Intuitively,
scheme was proposed that improves efficiency[16]. paper[17] a good separation is achieved by the hyperplane that has the
demonstrates the use of adductive network classifier largest distance to the nearest training data points of any class
committees trained on different features for improving (so-called functional margin), since in general the larger the
classification accuracy in medical diagnosis. margin the lower the generalization error of the classifier.
K-Nearest Neighbors(IBK)
The k-nearest neighbors algorithm (k-NN) is a method for[22]
classifying objects based on closest training examples in the
90 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 7, July 2011
feature space. k-NN is a type of instance-based learning., or individual trees. It is a popular algorithm which builds a
lazy learning where the function is only approximated locally randomized decision tree in each iteration of the bagging
and all computation is deferred until classification. Here an algorithm and often produces excellent predictors.
object is classified by a majority vote of its neighbors, with the
object being assigned to the class most common amongst its k
nearest neighbors (k is a positive, typically small). V. EXPERIMENTAL SETUP
The open source tool Weka was used in different phases of the
Naive Bayesian Classifier (Naive Bayes)
experiment. Weka is a collection of state-of-the-art machine
It is Bayes classifier which is a simple probabilistic classifier
learning algorithms[26] for a wide range of data mining tasks
based on applying Baye’s theorem(from Bayesian statistics)
such as data preprocessing, attribute selection, clustering, and
with strong (naive) independence[23] assumptions. In
classification. Weka has been used in prior research both in the
probability theory Bayes theorem shows how one conditional
field of clinical data mining and in bioinformatics.
probability (such as the probability of a hypothesis given
observed evidence) depends on its inverse (in this case, the Weka has four main graphical user interfaces(GUI).The main
probability of that evidence given the hypothesis). In more graphical user interface are Explorer and Experimenter. Our
technical terms, the theorem expresses the posterior Experiment has been tried under both Explorer and
probability (i.e. after evidence E is observed) of a hypothesis Experimenter GUI of weka. In the Explorer we can flip back
H in terms of the prior probabilities of H and E, and the and forth between the results we have obtained,evaluate the
probability of E given H. It implies that evidence has a models that have been built on different datasets, and visualize
stronger confirming effect if it was more unlikely before being graphically both the models and the datasets themselves-
observed. including any classification errors the models make.
Experimenter on the other side allows us to automate the
C4.5 Decision Tree(J48 in weka)
process by making it easy to run classifiers and filters with
Perhaps C4.5 algorithm which was developed by Quinlan is
different parameter settings on a corpus of datasets, collect
the most popular tree classifier[21]. It is a decision support
performance statistics, and perform significance tests.
tool that uses a tree-like graph or model of decisions and their
Advanced users can employ the Experimenter to distribute the
possible consequences, including chance event outcomes,
computing load across multiple machines using java remote
resource costs, and utility. Weka classifier package has its own
method invocation.
version of C4.5 known as J48. J48 is an optimized
implementation of C4.5 rev. 8.
A. Cross-Validation
Bagging(bagging) Cross validation with 10 folds has been used for evaluating the
Bagging (Bootstrap aggregating) was proposed by Leo classifier models. Cross-Validation (CV) is the standard Data
Breiman in 1994 to improve the classification by combining Mining method for evaluating performance of classification
classifications of randomly generated training sets. The algorithms mainly, to evaluate the Error Rate of a learning
concept of bagging (voting for classification, averaging for technique. In CV a dataset is partitioned in n folds, where each
regression-type problems with continuous dependent variables is used for testing and the remainder used for training. The
of interest) applies to the area of predictive data mining to procedure of testing and training is repeated n times so that
combine the predicted classifications (prediction) from each partition or fold is used once for testing. The standard
multiple models, or from the same type of model for different way of predicting the error rate of a learning technique given a
learning data. It is a technique generating multiple training single, fixed sample of data is to use a stratified 10-fold cross-
sets by sampling with replacement from the available training validation. Stratification implies making sure that when
data and assigns vote for each classification. sampling is done each class is properly represented in both
Adaboost(Adaboost M1) training and test datasets. This is achieved by randomly
AdaBoost is an algorithm for constructing a “strong” classifier sampling the dataset when doing the n fold partitions.
as linear combination of “simple” “weak” classifier. Instead of In a stratified 10-fold Cross-Validation the data is divided
resampling, Each training sample uses a weight to determine randomly into 10 parts in which the class is represented in
the probability of being selected for a training set. Final approximately the same proportions as in the full dataset. Each
classification is based on weighted vote of weak classifiers. part is held out in turn and the learning scheme trained on the
AdaBoost is sensitive to noisy data and outliers. However in remaining nine-tenths; then its error rate is calculated on the
some problems it can be less susceptible to the overfitting holdout set. The learning procedure is executed a total of 10
problem than most learning algorithms. times on different training sets, and finally the 10 error rates
are averaged to yield an overall error estimate. When seeking
Random forest (or random forests) an accurate error estimate, it is standard procedure to repeat
The algorithm for inducing a random forest was developed by the CV process 10 times. This means invoking the learning
leo-braiman[25]. The term came from random decision forests algorithm 100 times. Given two models M1 and M2 with
that was first proposed by Tin Kam Ho of Bell Labs in 1995. It different accuracies tested on different instances of a data set,
is an ensemble classifier that consists of many decision trees
and outputs the class that is the mode of the class's output by
91 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 7, July 2011
to say which model is best, we need to measure the confidence
level of each and perform significance tests.
VI. PERFORMANCE MEASURES
Supervised Machine Learning (ML) has several ways of
evaluating the performance of learning algorithms and the
classifiers they produce. Measures of the quality of
classification are built from a confusion matrix which records
correctly and incorrectly recognized examples for each class.
Table II presents a confusion matrix for binary classification,
where TP are true positive, FP false positive, FN false
negative, and TN true negative counts.
Table II. Confusion matrix
Predicted Label
Figure.1 Comparison of average F-measure and ROC area
Positive Negative
False
True Positive
Positive Negative
Known (TP)
(FN)
Label
False Positive True Negative
Negative
(FP) (TN)
The different measures used with the confusion matrix are:
True positive rate(TPR)/ Recall/ Sensitivity is the percentage
of positive labeled instances that were predicted as positive
given as TP / (TP + FN). False positive rate(FPR) is the
percentage of negative labeled instances that were predicted as
positive given as FP / (TN + FP).Precision is the percentage of
positive predictions that are correct given as TP / (TP +
Figure.2 Comparing the prediction accuracy of all classifiers
FP).Specificity is the percentage of negative labeled instances
that were predicted as negative given as TN / (TN + FP)
.Accuracy is the percentage of predictions that are correct Conclusions
given as (TP + TN) / (TP + TN + FP + FN).F-measure is the
Tuberculosis is an important health concern as it is also
Harmonic mean between precision and recall given as
associated with AIDS. Retrospective studies of tuberculosis
2xRecallxPrecision/ Recall+Precision.
suggest that active tuberculosis accelerates the progression of
HIV infection. Recently, intelligent methods such as Artificial
Neural Networks(ANN) have been intensively used for
VII. RESULTS AND DISCUSSIONS classification tasks. In this article we have proposed data
Results show that certain algorithms demonstrate superior mining approaches to classify tuberculosis using both basic
detection performance compared to others. Table III lists the and ensemble classifiers. Finally, two models for algorithm
evaluation measures used for various classification algorithms selection are proposed with great promise for performance
to predict the best accuracy. These measures will be the most improvement. Among the algorithms evaluated, SVM and
important criteria for the classifier to consider as the best Random Forest proved to be the best methods.
algorithm for the given category in bioinformatics. The
prediction accuracy of SVM and C4.5 decision trees among
Acknowledgment
single classifiers, Random Forest among ensemble are
considered to be the best. Our thanks to KIMS Hospital, Bangalore for providing the
valuable real Tuberculosis data and principal Dr. Sudharshan
Other measures such as F-measure and ROC area of above
for giving permission to collect data from the Hospital.
classifiers are graphically compared in figure 1. It displays the
average F-measure and ROC area of both the classes.
Prediction accuracy of these classifiers are shown in figure 2.
92 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 7, July 2011
Table III. Performance comparison of various classifiers
Classifier category Classifier model Various measures Disease categories(class)
PTB RPTB
Basic Learning classifiers SVM(SMO) TPR/ Sensitivity 98.9% 99.6%
FPR 0.004 0.011
Specificity 99.6% 98.9%
Prediction 99.14%
Accuracy
K-NN(IBK) TPR/ Sensitivity 99.1% 96.9%
FPR 0.03 0.008
Specificity 96.9% 99.1%
Prediction 98.4%
Accuracy
Naive Bayes TPR/ Sensitivity 96.4% 96.5%
FPR 0.035 0.037
Specificity 96.5% 96.4%
Prediction 96.4%
Accuracy
C4.5 Decision Trees(J48) TPR/ Sensitivity 98.5% 100%
FPR 0 0.015
Specificity 100% 98.5%
Prediction 99%
Accuracy
Ensemble classifiers Bagging TPR/ Sensitivity 98.5% 99.6%
FPR 0.004 0.015
Specificity 99.6% 98.5%
Prediction 98.85%
Accuracy
Adaboost(AdaboostM1) TPR/ Sensitivity 98.5% 100%
FPR 0 0.015
Specificity 100% 98.5%
Prediction 99%
Accuracy
Random Forest TPR/ Sensitivity 98.9% 99.6%
FPR 0.004 0.011
Specificity 99.6% 98.9%
Prediction 99.14%
Accuracy
[3] Rethabile Khutlang, Sriram Krishnan, Ronald Dendere, Andrew
REFERENCES Whitelaw, Konstantinos Veropoulos, Genevieve Learmonth, and Tania
S. Douglas, “Classification of Mycobacterium tuberculosis in Images of
[1] Orhan Er, Feyzullah Temurtas and A.C. Tantrikulu, “ Tuberculosis ZN-Stained Sputum Smears”, IEEE Transactions On Information
disease diagnosis using Artificial Neural Networks ”,Journal of Medical Technology In Biomedicine, VOL. 14, NO. 4, JULY 2010.
Systems, Springer, DOI 10.1007/s10916-008-9241-x online,2008.
[4] Sejong Yoon and Saejoon Kim, “ Mutual information-based SVM-RFE
[2] M. Sebban, I Mokrousov, N Rastogi and C Sola “ A data-mining for diagnostic Classification of digitized mammograms”, Pattern
approach to spacer oligo nucleotide typing of Mycobacterium Recognition Letters, Elsevier, volume 30, issue 16, pp 1489–1495,
tuberculosis” Bioinformatics, oxford university press, Vol 18, issue 2, December 2009.
pp 235-243. J. Clerk Maxwell, A Treatise on Electricity and Magnetism,
3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68–73,2002.
93 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 7, July 2011
[5] Nicandro Cruz-Ramırez , Hector-Gabriel Acosta-Mesa , Humberto http://www.nacoonline.org/Quick_Links/HIV_Data/ Accessed 06
Carrillo-Calvet and Rocıo-Erandi Barrientos-Martınez, “Discovering February, 2008.
interobserver variability in the cytodiagnosis of breast cancer using [21] J.R. Quinlan, “Induction of Decision Trees” Machine Learning 1, Kluwer
decision trees and Bayesian networks” Applied Soft Computing, Elsevier, Academic Publishers, Boston, pp 81-106, 1986.
volume 9,issue 4,pp 1331–1342, September 2009.
[22] Thomas M. Cover and Peter E. Hart, "Nearest neighbor pattern
[6] Liyang Wei, Yongyi Yanga and Robert M Nishikawa, classification," IEEE Transactions on Information Theory, volume. 13,
“Microcalcification classification assisted by content-based image issue 1, pp. 21-27,1967.
retrieval for breast cancer diagnosis” Pattern Recognition , Elsevier,
volume 42,issue 6, pp 1126 – 1132, june 2009. [23] Rish and Irina, “An empirical study of the naïve Bayes classifier”, IJCAI
2001, workshop on empirical methods in artificial intelligence,
[7] Abdelghani Bellaachia and Erhan Guven, “ Predicting breast cancer 2001(available online).
survivability using Data Mining Techniques” Artificial Intelligence in
Medicine, Elsevier, Volume 34, Issue 2, pp 113-127, june 2005. [24] R. J. Quinlan, "Bagging, boosting, and c4.5," in AAAI/IAAI: Proceedings
of the 13th National Conference on Artificial Intelligence and 8th
[8] Maria-Luiza Antonie, Osmar R Zaıane and Alexandru Coman, Innovative Applications of Artificial Intelligence Conference. Portland,
“Application of data mining techniques for medical image classification” Oregon, AAAI Press / The MIT Press, Vol. 1, pp.725-730,1996.
In Proceedings of Second International Workshop on Multimedia Data
Mining (MDM/KDD’2001) in conjunction with Seventh ACM SIGKDD, [25] Breiman, Leo, "Random Forests". Machine Learning 45 (1): 5–
pp 94-101,2000. 32.,Doi:10.1023/A:1010933404324,2001.
[9] Celia C Bojarczuk, Heitor S Lopes and Alex A Freitas, “ Data Mining [26] Weka – Data Mining Machine Learning Software,
http://www.cs.waikato.ac.nz/ml/.
with Constrained-Syntax Genetic Programming: Applications in Medical
Data Set” Artificial Intelligence in Medicine, Elsevier, volume 30, issue [27] J. Han and M. Kamber, Data mining: concepts and techniques: Morgan
1, pp. 27-48,2004. Kaufmann Publishers, 2006.
[10] Kwokleung Chan, Te-Won Lee, Associate Member, IEEE, Pamela A. [28] I. H. Witten and E. Frank, Data Mining: Practical Machine Learning
Sample, Michael H. Goldbaum, Robert N. Weinreb, and Terrence J. Tools and Techniques, Second Edition: Morgan Kaufmann Publishers,
Sejnowski, Fellow, IEEE ,“Comparison of Machine Learning and 2005.
Traditional Classifiers in Glaucoma Diagnosis”, IEEE Transactions On
Biomedical Engineering, volume 49, NO. 9, September 2002. AUTHORS PROFILE
[11] Yasser M. Kadah, Aly A. Farag, Member, IEEE, Jacek M. Zurada, Mrs.Asha.T obtained her Bachelors and Masters in Engg.,
Fellow, IEEE,Ahmed M. Badawi, and Abou-Bakr M. Youssef, from Bangalore University, Karnataka, India. She is
“Classification algorithms for Quantitative Tissue Characterization of pursuing her research leading to Ph.D in Visveswaraya
diffuse liver disease from ultrasound images”, IEEE Transactions On
Technological University under the guidance of Dr. S.
Medical Imaging, volume 15, NO. 4, August 1996.
Natarajan and Dr. K.N.B. Murthy. She has over 16 years
[12] Minou Rabiei and Ritu Gupta, “Excess Water Production Diagnosis in of teaching experience and currently working as Assistant
Oil Fields using Ensemble Classifiers”, in proc. of International professor in the Dept. of Information Science & Engg.,
Conference on Computational Intelligence and Software Engineering , B.I.T. Karnataka, India. Her Research interests are in Data
IEEE,pages:1-4,2009. Mining, Medical Applications, Pattern Recognition, and
[13] Hongqi Li, Haifeng Guo, Haimin Guo and Zhaoxu Meng, “ Data Mining Artificial Intelligence.
Techniques for Complex Formation Evaluation in Petroleum Exploration
and Production: A Comparison of Feature Selection and Classification Dr S.Natarajan holds Ph. D (Remote Sensing) from
Methods” in proc. 2008 IEEE Pacific-Asia Workshop on Computational JNTU Hyderabad India. His experience spans 33 years in
Intelligence and Industrial Application ,volume 01 Pages: 37-43,2008. R&D and 10 years in Teaching. He worked in Defence
[14] Seppo Puuronen, Vagan Terziyan and Alexander Logvinovsky, “Mining Research and Development Laboratory (DRDL),
several data bases with an Ensemble of classifiers” in proc. 10th Hyderabad, India for Five years and later worked for
International Conference on Database and Expert Systems Applications, Twenty Eight years in National Remote Sensing Agency,
Vol.1677 , pp: 882 – 891, 1999. Hyderabad, India. He has over 50 publications in peer
[15] Tzung-I Tang,Gang Zheng ,Yalou Huang and Guangfu Shu, “A reviewed Conferences and Journals His areas of interest
comparative study of medical data classification methods based on are Soft Computing, Data Mining and Geographical
Decision Tree and System Reconstruction Analysis” IEMS ,Vol. 4, issue Information System.
1, pp. 102-108, June 2005.
[16] Tsirogiannis, G.L. Frossyniotis, D. Stoitsis, J. Golemati Dr. K. N. B. Murthy holds Bachelors in Engineering
S. Stafylopatis and A. Nikita, K.S, “Classification of medical data with from University of Mysore, Masters from IISc,
a robust multi-level combination scheme” in proc. 2004 IEEE Bangalore and Ph.D. from IIT, Chennai India. He has
International Joint Conference on Neural Networks, volume 3, pp 2483- over 30 years of experience in Teaching, Training,
2487, 25-29 July 2004. Industry, Administration, and Research. He has authored
[17] R.E. Abdel-Aal, “Improved classification of medical data using abductive over 60 papers in national, international journals and
network committees trained on different feature subsets” Computer conferences, peer reviewer to journal and conference
Methods and Programs in Biomedicine, volume 80, Issue 2, pp. 141-153, papers of national & international repute and has
2005. authored book. He is the member of several academic
committees Executive Council, Academic Senate, University Publication
[18] Kemal polat,Sebnem Yosunkaya and Salih Guines, “Comparison of Committee, BOE & BOS, Local Inquiry Committee of VTU, Governing Body
different classifier algorithms on the Automated Detection of Obstructive Member of BITES, Founding Member of Creativity and Innovation Platform of
Sleep Apnea Syndrome”, Journal of Medical Systems,volume 32 ,Issue 3,
Karnataka. Currently he is the Principal & Director of P.E.S. Institute of
pp. 9129-9, June 2008.
Technology, Bangalore India. His research interest includes Parallel
[19] Ranjit Abraham, Jay B.Simha and Iyengar S.S “Medical datamining with Computing, Computer Networks and Artificial Intelligence.
a new algorithm for Feature Selection and Naïve Bayesian classifier”
proceedings of 10th International Conference on Information
Technology, IEEE, pp.44-49,2007.
[20] HIV Sentinel Surveillance and HIV Estimation, 2006. New Delhi, India:
National AIDS Control Organization, Ministry of Health and Family
Welfare, Government of India.
94 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
Get documents about "