Effective Classification Algorithms to Predict the Accuracy of Tuberculosis - A Machine Learning Approach
Tuberculosis is a disease caused by mycobacterium which can affect virtually all organs, not sparing even the relatively inaccessible sites. India has the world’s highest burden of tuberculosis (TB) with million estimated incident cases per year. Studies suggest that active tuberculosis accelerates the progression of Human Immunodeficiency Virus (HIV) infection. Tuberculosis is much more likely to be a fatal disease among HIV-infected persons than persons without HIV infection. Diagnosis of pulmonary tuberculosis has always been a problem. Classification of medical data is an important task in the prediction of any disease. It even helps doctors in their diagnosis decisions. In this paper we propose a machine learning approach to compare the performance of both basic learning classifiers and ensemble of classifiers on Tuberculosis data. The classification models were trained using the real data collected from a city hospital. The trained models were then used for predicting the Tuberculosis as two categories Pulmonary Tuberculosis (PTB) and Retroviral PTB(RPTB) i.e. TB along with Acquired Immune Deficiency Syndrome (AIDS). The prediction accuracy of the classifiers was evaluated using 10-fold Cross Validation and the results have been compared to obtain the best prediction accuracy. The results indicate that Support Vector Machine (SVM) performs well among basic learning classifiers and Random forest from ensemble with the accuracy of 99.14% from both classifiers respectively. Various other measures like Specificity, Sensitivity, F-measure and ROC area have been used in comparison.
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 7, July 2011 Effective Classification Algorithms to Predict the Accuracy of Tuberculosis-A Machine Learning Approach Asha.T S. Natarajan K.N.B. Murthy Dept. of Info.Science & Engg., Dept. of Info. Science & Engg. Dept.of Info. Science & Engg. Bangalore Institute of Technology P.E.S. Institute of Technology P.E.S.Institute of Technology Bangalore, INDIA Bangalore,INDIA Bangalore,INDIA Abstract— Tuberculosis is a disease caused by mycobacterium medical knowledge as has been proved in a number of medical which can affect virtually all organs, not sparing even the data mining applications. relatively inaccessible sites. India has the world’s highest burden of tuberculosis (TB) with million estimated incident cases per Data classification process using knowledge obtained from year. Studies suggest that active tuberculosis accelerates the known historical data has been one of the most intensively progression of Human Immunodeficiency Virus (HIV) infection. studied subjects in statistics, decision science and computer Tuberculosis is much more likely to be a fatal disease among science. Data mining techniques have been applied to medical HIV-infected persons than persons without HIV infection. services in several areas, including prediction of effectiveness Diagnosis of pulmonary tuberculosis has always been a problem. of surgical procedures, medical tests, medication, and the Classification of medical data is an important task in the discovery of relationships among clinical and diagnosis data. prediction of any disease. It even helps doctors in their diagnosis In order to help the clinicians in diagnosing the type of disease decisions. In this paper we propose a machine learning approach computerized data mining and decision support tools are used to compare the performance of both basic learning classifiers and which are able to help clinicians to process a huge amount of ensemble of classifiers on Tuberculosis data. The classification data available from solving previous cases and suggest the models were trained using the real data collected from a city probable diagnosis based on the values of several important hospital. The trained models were then used for predicting the attributes. There have been numerous comparisons of the Tuberculosis as two categories Pulmonary Tuberculosis(PTB) different classification and prediction methods, and the matter and Retroviral PTB(RPTB) i.e. TB along with Acquired Immune remains a research topic. No single method has been found to Deficiency Syndrome(AIDS). The prediction accuracy of the classifiers was evaluated using 10-fold Cross Validation and the be superior over all others for all data sets. results have been compared to obtain the best prediction India has the world’s highest burden of tuberculosis (TB) with accuracy. The results indicate that Support Vector Machine million estimated incident cases per year. It also ranks (SVM) performs well among basic learning classifiers and among the world’s highest HIV burden with an estimated 2.3 Random forest from ensemble with the accuracy of 99.14% from million persons living with HIV/AIDS. Tuberculosis is much both classifiers respectively. Various other measures like more likely to be a fatal disease among HIV-infected persons Specificity, Sensitivity, F-measure and ROC area have been used than persons without HIV infection. It is a disease caused by in comparison. mycobacterium which can affect virtually all organs, not sparing even the relatively inaccessible sites. The Keywords-component; Machine learning; Tuberculosis; microorganisms usually enter the body by inhalation through Classification, PTB, Retroviral PTB the lungs. They spread from the initial location in the lungs to other parts of the body via the blood stream. They present a diagnostic dilemma even for physicians with a great deal of I. INTRODUCTION experience in this disease. There is an explosive growth of bio-medical data, ranging from those collected in pharmaceutical studies and cancer II. RELATED WORK therapy investigations to those identified in genomics and Orhan Er. And Temuritus present a study on tuberculosis proteomics research. The rapid progress in data mining diagnosis, carried out with the help of Multilayer Neural research has led to the development of efficient and scalable Networks (MLNNs). For this purpose, an MLNN with two methods to discover knowledge from these data. Medical data hidden layers and a genetic algorithm for training algorithm mining is an active research area under data mining since has been used. Data mining approach was adopted to classify medical databases have accumulated large quantities of genotype of mycobacterium tuberculosis using c4.5 information about patients and their clinical conditions. algorithm.Rethabile Khutlang et.al. present methods for the Relationships and patterns hidden in this data can provide new 89 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 7, July 2011 automated identification of Mycobacterium tuberculosis in III. DATA SOURCE images of Ziehl–Neelsen (ZN) stained sputum smears obtained The medical dataset we are classifying includes 700 real using a bright-field microscope.They segment candidate records of patients suffering from TB obtained from a city bacillus objects using a combination of two-class pixel hospital. The entire dataset is put in one file having many classifiers. records. Each record corresponds to most relevant information Sejong Yoon, Saejoon Kim  proposes a mutual of one patient. Initial queries by doctor as symptoms and some information-based Support Vector Machine Recursive Feature required test details of patients have been considered as main Elimination (SVM-RFE) as the classification method with attributes. Totally there are 11 attributes(symptoms) and one feature selection in this paper.Diagnosis of breast cancer class attribute. The symptoms of each patient such as age, using different classification techniques was carried chroniccough(weeks), loss of weight, intermittent fever(days), out[5,6,7,8]. A new constrained-syntax genetic programming night sweats, Sputum, Bloodcough, chestpain, HIV, algorithm was developed to discover classification rules radiographic findings, wheezing and class are considered as for diagnosing certain pathologies.Kwokleung Chan et.al.  attributes. used several machine learning and traditional calssifiers in the Table I shows names of 12 attributes considered along with classification of glaucoma disease and compared the their Data Types (DT). Type N-indicates numerical and C is performance using ROC. Various classification algorithms categorical. based on statistical and neural network methods were presented and tested for quantitative tissue characterization of Table I. List of Attributes and their Datatypes diffuse liver disease from ultrasound images and No Name DT comparison of classifiers in sleep apnea. Ranjit Abraham 1 Age N et.al. propose a new feature selection algorithm CHI-WSS to improve the classification accuracy of Naïve Bayes with 2 Chroniccough(weeks) N respect to medical datasets. 3 WeightLoss C Minou Rabiei et.al. use tree based ensemble classifiers for 4 Intermittentfever N the diagnosis of excess water production. Their results 5 Nightsweats C demonstrate the applicability of this technique in successful diagnosis of water production problems. Hongqi Li, Haifeng 6 Bloodcough C Guo et.al. present a comprehensive comparative study on 7 Chestpain C petroleum exploration and production using five feature selection methods including expert judgment, CFS, LVF, 8 HIV C Relief-F, and SVM-RFE, and fourteen algorithms from five 9 Radiographicfindings C distinct kinds of classification methods including decision tree, 10 Sputum C artificial neural network, support vector machines(SVM), Bayesian network and ensemble learning. 11 Wheezing C Paper on “Mining Several Data Bases with an Ensemble of 12 Class C Classifiers” analyze the two types of conflicts, one created by data inconsistency within the area of the intersection of the data bases and the second is created when the meta method IV. CLASSIFICATION ALGORITHMS selects different data mining methods with inconsistent competence maps for the objects of the intersected part and SVM (SMO) their combinations and suggest ways to handle them. The original SVM algorithm was invented by Vladimir Referenced paper studies medical data classification Vapnik. The standard SVM takes a set of input data, and methods, comparing decision tree and system reconstruction predicts, for each given input, which of two possible classes analysis as applied to heart disease medical data mining. the input is a member of, which makes the SVM a non- Under most circumstances, single classifiers, such as neural probabilistic binary linear classifier. networks, support vector machines and decision trees, exhibit A support vector machine constructs a hyperplane or set of worst performance. In order to further enhance performance hyperplanes in a high or infinite dimensional space, which can combination of these methods in a multi-level combination be used for classification, regression or other tasks. Intuitively, scheme was proposed that improves efficiency. paper a good separation is achieved by the hyperplane that has the demonstrates the use of adductive network classifier largest distance to the nearest training data points of any class committees trained on different features for improving (so-called functional margin), since in general the larger the classification accuracy in medical diagnosis. margin the lower the generalization error of the classifier. K-Nearest Neighbors(IBK) The k-nearest neighbors algorithm (k-NN) is a method for classifying objects based on closest training examples in the 90 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 7, July 2011 feature space. k-NN is a type of instance-based learning., or individual trees. It is a popular algorithm which builds a lazy learning where the function is only approximated locally randomized decision tree in each iteration of the bagging and all computation is deferred until classification. Here an algorithm and often produces excellent predictors. object is classified by a majority vote of its neighbors, with the object being assigned to the class most common amongst its k nearest neighbors (k is a positive, typically small). V. EXPERIMENTAL SETUP The open source tool Weka was used in different phases of the Naive Bayesian Classifier (Naive Bayes) experiment. Weka is a collection of state-of-the-art machine It is Bayes classifier which is a simple probabilistic classifier learning algorithms for a wide range of data mining tasks based on applying Baye’s theorem(from Bayesian statistics) such as data preprocessing, attribute selection, clustering, and with strong (naive) independence assumptions. In classification. Weka has been used in prior research both in the probability theory Bayes theorem shows how one conditional field of clinical data mining and in bioinformatics. probability (such as the probability of a hypothesis given observed evidence) depends on its inverse (in this case, the Weka has four main graphical user interfaces(GUI).The main probability of that evidence given the hypothesis). In more graphical user interface are Explorer and Experimenter. Our technical terms, the theorem expresses the posterior Experiment has been tried under both Explorer and probability (i.e. after evidence E is observed) of a hypothesis Experimenter GUI of weka. In the Explorer we can flip back H in terms of the prior probabilities of H and E, and the and forth between the results we have obtained,evaluate the probability of E given H. It implies that evidence has a models that have been built on different datasets, and visualize stronger confirming effect if it was more unlikely before being graphically both the models and the datasets themselves- observed. including any classification errors the models make. Experimenter on the other side allows us to automate the C4.5 Decision Tree(J48 in weka) process by making it easy to run classifiers and filters with Perhaps C4.5 algorithm which was developed by Quinlan is different parameter settings on a corpus of datasets, collect the most popular tree classifier. It is a decision support performance statistics, and perform significance tests. tool that uses a tree-like graph or model of decisions and their Advanced users can employ the Experimenter to distribute the possible consequences, including chance event outcomes, computing load across multiple machines using java remote resource costs, and utility. Weka classifier package has its own method invocation. version of C4.5 known as J48. J48 is an optimized implementation of C4.5 rev. 8. A. Cross-Validation Bagging(bagging) Cross validation with 10 folds has been used for evaluating the Bagging (Bootstrap aggregating) was proposed by Leo classifier models. Cross-Validation (CV) is the standard Data Breiman in 1994 to improve the classification by combining Mining method for evaluating performance of classification classifications of randomly generated training sets. The algorithms mainly, to evaluate the Error Rate of a learning concept of bagging (voting for classification, averaging for technique. In CV a dataset is partitioned in n folds, where each regression-type problems with continuous dependent variables is used for testing and the remainder used for training. The of interest) applies to the area of predictive data mining to procedure of testing and training is repeated n times so that combine the predicted classifications (prediction) from each partition or fold is used once for testing. The standard multiple models, or from the same type of model for different way of predicting the error rate of a learning technique given a learning data. It is a technique generating multiple training single, fixed sample of data is to use a stratified 10-fold cross- sets by sampling with replacement from the available training validation. Stratification implies making sure that when data and assigns vote for each classification. sampling is done each class is properly represented in both Adaboost(Adaboost M1) training and test datasets. This is achieved by randomly AdaBoost is an algorithm for constructing a “strong” classifier sampling the dataset when doing the n fold partitions. as linear combination of “simple” “weak” classifier. Instead of In a stratified 10-fold Cross-Validation the data is divided resampling, Each training sample uses a weight to determine randomly into 10 parts in which the class is represented in the probability of being selected for a training set. Final approximately the same proportions as in the full dataset. Each classification is based on weighted vote of weak classifiers. part is held out in turn and the learning scheme trained on the AdaBoost is sensitive to noisy data and outliers. However in remaining nine-tenths; then its error rate is calculated on the some problems it can be less susceptible to the overfitting holdout set. The learning procedure is executed a total of 10 problem than most learning algorithms. times on different training sets, and finally the 10 error rates are averaged to yield an overall error estimate. When seeking Random forest (or random forests) an accurate error estimate, it is standard procedure to repeat The algorithm for inducing a random forest was developed by the CV process 10 times. This means invoking the learning leo-braiman. The term came from random decision forests algorithm 100 times. Given two models M1 and M2 with that was first proposed by Tin Kam Ho of Bell Labs in 1995. It different accuracies tested on different instances of a data set, is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by 91 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 7, July 2011 to say which model is best, we need to measure the confidence level of each and perform significance tests. VI. PERFORMANCE MEASURES Supervised Machine Learning (ML) has several ways of evaluating the performance of learning algorithms and the classifiers they produce. Measures of the quality of classification are built from a confusion matrix which records correctly and incorrectly recognized examples for each class. Table II presents a confusion matrix for binary classification, where TP are true positive, FP false positive, FN false negative, and TN true negative counts. Table II. Confusion matrix Predicted Label Figure.1 Comparison of average F-measure and ROC area Positive Negative False True Positive Positive Negative Known (TP) (FN) Label False Positive True Negative Negative (FP) (TN) The different measures used with the confusion matrix are: True positive rate(TPR)/ Recall/ Sensitivity is the percentage of positive labeled instances that were predicted as positive given as TP / (TP + FN). False positive rate(FPR) is the percentage of negative labeled instances that were predicted as positive given as FP / (TN + FP).Precision is the percentage of positive predictions that are correct given as TP / (TP + Figure.2 Comparing the prediction accuracy of all classifiers FP).Specificity is the percentage of negative labeled instances that were predicted as negative given as TN / (TN + FP) .Accuracy is the percentage of predictions that are correct Conclusions given as (TP + TN) / (TP + TN + FP + FN).F-measure is the Tuberculosis is an important health concern as it is also Harmonic mean between precision and recall given as associated with AIDS. Retrospective studies of tuberculosis 2xRecallxPrecision/ Recall+Precision. suggest that active tuberculosis accelerates the progression of HIV infection. Recently, intelligent methods such as Artificial Neural Networks(ANN) have been intensively used for VII. RESULTS AND DISCUSSIONS classification tasks. In this article we have proposed data Results show that certain algorithms demonstrate superior mining approaches to classify tuberculosis using both basic detection performance compared to others. Table III lists the and ensemble classifiers. Finally, two models for algorithm evaluation measures used for various classification algorithms selection are proposed with great promise for performance to predict the best accuracy. These measures will be the most improvement. Among the algorithms evaluated, SVM and important criteria for the classifier to consider as the best Random Forest proved to be the best methods. algorithm for the given category in bioinformatics. The prediction accuracy of SVM and C4.5 decision trees among Acknowledgment single classifiers, Random Forest among ensemble are considered to be the best. Our thanks to KIMS Hospital, Bangalore for providing the valuable real Tuberculosis data and principal Dr. Sudharshan Other measures such as F-measure and ROC area of above for giving permission to collect data from the Hospital. classifiers are graphically compared in figure 1. It displays the average F-measure and ROC area of both the classes. Prediction accuracy of these classifiers are shown in figure 2. 92 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 7, July 2011 Table III. Performance comparison of various classifiers Classifier category Classifier model Various measures Disease categories(class) PTB RPTB Basic Learning classifiers SVM(SMO) TPR/ Sensitivity 98.9% 99.6% FPR 0.004 0.011 Specificity 99.6% 98.9% Prediction 99.14% Accuracy K-NN(IBK) TPR/ Sensitivity 99.1% 96.9% FPR 0.03 0.008 Specificity 96.9% 99.1% Prediction 98.4% Accuracy Naive Bayes TPR/ Sensitivity 96.4% 96.5% FPR 0.035 0.037 Specificity 96.5% 96.4% Prediction 96.4% Accuracy C4.5 Decision Trees(J48) TPR/ Sensitivity 98.5% 100% FPR 0 0.015 Specificity 100% 98.5% Prediction 99% Accuracy Ensemble classifiers Bagging TPR/ Sensitivity 98.5% 99.6% FPR 0.004 0.015 Specificity 99.6% 98.5% Prediction 98.85% Accuracy Adaboost(AdaboostM1) TPR/ Sensitivity 98.5% 100% FPR 0 0.015 Specificity 100% 98.5% Prediction 99% Accuracy Random Forest TPR/ Sensitivity 98.9% 99.6% FPR 0.004 0.011 Specificity 99.6% 98.9% Prediction 99.14% Accuracy  Rethabile Khutlang, Sriram Krishnan, Ronald Dendere, Andrew REFERENCES Whitelaw, Konstantinos Veropoulos, Genevieve Learmonth, and Tania S. Douglas, “Classification of Mycobacterium tuberculosis in Images of  Orhan Er, Feyzullah Temurtas and A.C. Tantrikulu, “ Tuberculosis ZN-Stained Sputum Smears”, IEEE Transactions On Information disease diagnosis using Artificial Neural Networks ”,Journal of Medical Technology In Biomedicine, VOL. 14, NO. 4, JULY 2010. Systems, Springer, DOI 10.1007/s10916-008-9241-x online,2008.  Sejong Yoon and Saejoon Kim, “ Mutual information-based SVM-RFE  M. Sebban, I Mokrousov, N Rastogi and C Sola “ A data-mining for diagnostic Classification of digitized mammograms”, Pattern approach to spacer oligo nucleotide typing of Mycobacterium Recognition Letters, Elsevier, volume 30, issue 16, pp 1489–1495, tuberculosis” Bioinformatics, oxford university press, Vol 18, issue 2, December 2009. pp 235-243. J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68–73,2002. 93 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 7, July 2011  Nicandro Cruz-Ramırez , Hector-Gabriel Acosta-Mesa , Humberto http://www.nacoonline.org/Quick_Links/HIV_Data/ Accessed 06 Carrillo-Calvet and Rocıo-Erandi Barrientos-Martınez, “Discovering February, 2008. interobserver variability in the cytodiagnosis of breast cancer using  J.R. Quinlan, “Induction of Decision Trees” Machine Learning 1, Kluwer decision trees and Bayesian networks” Applied Soft Computing, Elsevier, Academic Publishers, Boston, pp 81-106, 1986. volume 9,issue 4,pp 1331–1342, September 2009.  Thomas M. Cover and Peter E. Hart, "Nearest neighbor pattern  Liyang Wei, Yongyi Yanga and Robert M Nishikawa, classification," IEEE Transactions on Information Theory, volume. 13, “Microcalcification classification assisted by content-based image issue 1, pp. 21-27,1967. retrieval for breast cancer diagnosis” Pattern Recognition , Elsevier, volume 42,issue 6, pp 1126 – 1132, june 2009.  Rish and Irina, “An empirical study of the naïve Bayes classifier”, IJCAI 2001, workshop on empirical methods in artificial intelligence,  Abdelghani Bellaachia and Erhan Guven, “ Predicting breast cancer 2001(available online). survivability using Data Mining Techniques” Artificial Intelligence in Medicine, Elsevier, Volume 34, Issue 2, pp 113-127, june 2005.  R. J. Quinlan, "Bagging, boosting, and c4.5," in AAAI/IAAI: Proceedings of the 13th National Conference on Artificial Intelligence and 8th  Maria-Luiza Antonie, Osmar R Zaıane and Alexandru Coman, Innovative Applications of Artificial Intelligence Conference. Portland, “Application of data mining techniques for medical image classification” Oregon, AAAI Press / The MIT Press, Vol. 1, pp.725-730,1996. In Proceedings of Second International Workshop on Multimedia Data Mining (MDM/KDD’2001) in conjunction with Seventh ACM SIGKDD,  Breiman, Leo, "Random Forests". Machine Learning 45 (1): 5– pp 94-101,2000. 32.,Doi:10.1023/A:1010933404324,2001.  Celia C Bojarczuk, Heitor S Lopes and Alex A Freitas, “ Data Mining  Weka – Data Mining Machine Learning Software, http://www.cs.waikato.ac.nz/ml/. with Constrained-Syntax Genetic Programming: Applications in Medical Data Set” Artificial Intelligence in Medicine, Elsevier, volume 30, issue  J. Han and M. Kamber, Data mining: concepts and techniques: Morgan 1, pp. 27-48,2004. Kaufmann Publishers, 2006.  Kwokleung Chan, Te-Won Lee, Associate Member, IEEE, Pamela A.  I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Sample, Michael H. Goldbaum, Robert N. Weinreb, and Terrence J. Tools and Techniques, Second Edition: Morgan Kaufmann Publishers, Sejnowski, Fellow, IEEE ,“Comparison of Machine Learning and 2005. Traditional Classifiers in Glaucoma Diagnosis”, IEEE Transactions On Biomedical Engineering, volume 49, NO. 9, September 2002. AUTHORS PROFILE  Yasser M. Kadah, Aly A. Farag, Member, IEEE, Jacek M. Zurada, Mrs.Asha.T obtained her Bachelors and Masters in Engg., Fellow, IEEE,Ahmed M. Badawi, and Abou-Bakr M. Youssef, from Bangalore University, Karnataka, India. She is “Classification algorithms for Quantitative Tissue Characterization of pursuing her research leading to Ph.D in Visveswaraya diffuse liver disease from ultrasound images”, IEEE Transactions On Technological University under the guidance of Dr. S. Medical Imaging, volume 15, NO. 4, August 1996. Natarajan and Dr. K.N.B. Murthy. She has over 16 years  Minou Rabiei and Ritu Gupta, “Excess Water Production Diagnosis in of teaching experience and currently working as Assistant Oil Fields using Ensemble Classifiers”, in proc. of International professor in the Dept. of Information Science & Engg., Conference on Computational Intelligence and Software Engineering , B.I.T. Karnataka, India. Her Research interests are in Data IEEE,pages:1-4,2009. Mining, Medical Applications, Pattern Recognition, and  Hongqi Li, Haifeng Guo, Haimin Guo and Zhaoxu Meng, “ Data Mining Artificial Intelligence. Techniques for Complex Formation Evaluation in Petroleum Exploration and Production: A Comparison of Feature Selection and Classification Dr S.Natarajan holds Ph. D (Remote Sensing) from Methods” in proc. 2008 IEEE Pacific-Asia Workshop on Computational JNTU Hyderabad India. His experience spans 33 years in Intelligence and Industrial Application ,volume 01 Pages: 37-43,2008. R&D and 10 years in Teaching. He worked in Defence  Seppo Puuronen, Vagan Terziyan and Alexander Logvinovsky, “Mining Research and Development Laboratory (DRDL), several data bases with an Ensemble of classifiers” in proc. 10th Hyderabad, India for Five years and later worked for International Conference on Database and Expert Systems Applications, Twenty Eight years in National Remote Sensing Agency, Vol.1677 , pp: 882 – 891, 1999. Hyderabad, India. He has over 50 publications in peer  Tzung-I Tang,Gang Zheng ,Yalou Huang and Guangfu Shu, “A reviewed Conferences and Journals His areas of interest comparative study of medical data classification methods based on are Soft Computing, Data Mining and Geographical Decision Tree and System Reconstruction Analysis” IEMS ,Vol. 4, issue Information System. 1, pp. 102-108, June 2005.  Tsirogiannis, G.L. Frossyniotis, D. Stoitsis, J. Golemati Dr. K. N. B. Murthy holds Bachelors in Engineering S. Stafylopatis and A. Nikita, K.S, “Classification of medical data with from University of Mysore, Masters from IISc, a robust multi-level combination scheme” in proc. 2004 IEEE Bangalore and Ph.D. from IIT, Chennai India. He has International Joint Conference on Neural Networks, volume 3, pp 2483- over 30 years of experience in Teaching, Training, 2487, 25-29 July 2004. Industry, Administration, and Research. He has authored  R.E. Abdel-Aal, “Improved classification of medical data using abductive over 60 papers in national, international journals and network committees trained on different feature subsets” Computer conferences, peer reviewer to journal and conference Methods and Programs in Biomedicine, volume 80, Issue 2, pp. 141-153, papers of national & international repute and has 2005. authored book. He is the member of several academic committees Executive Council, Academic Senate, University Publication  Kemal polat,Sebnem Yosunkaya and Salih Guines, “Comparison of Committee, BOE & BOS, Local Inquiry Committee of VTU, Governing Body different classifier algorithms on the Automated Detection of Obstructive Member of BITES, Founding Member of Creativity and Innovation Platform of Sleep Apnea Syndrome”, Journal of Medical Systems,volume 32 ,Issue 3, Karnataka. Currently he is the Principal & Director of P.E.S. Institute of pp. 9129-9, June 2008. Technology, Bangalore India. His research interest includes Parallel  Ranjit Abraham, Jay B.Simha and Iyengar S.S “Medical datamining with Computing, Computer Networks and Artificial Intelligence. a new algorithm for Feature Selection and Naïve Bayesian classifier” proceedings of 10th International Conference on Information Technology, IEEE, pp.44-49,2007.  HIV Sentinel Surveillance and HIV Estimation, 2006. New Delhi, India: National AIDS Control Organization, Ministry of Health and Family Welfare, Government of India. 94 http://sites.google.com/site/ijcsis/ ISSN 1947-5500