Document Sample

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 7, October 2010 Supervised Learning Approach for Predicting the Presence of Seizure in Human Brain Sivagami P,Sujitha V Vijaya MS M.Phil Research Scholar Associate Professor and Head PSGR Krishnammal College for Women GRG School of Applied Computer Technology Coimbatore, India PSGR Krishnammal College for Women sivagamithiru@gmail.com,vsujitha1987@gmail.com Coimbatore, India. msvijaya@grgsact.com Abstract— Seizure is a synchronous neuronal activity in Machine learning is a technique which can discover the brain. It is a physical change in behavior that occurs after an previously unknown regularities and trends in diverse datasets episode of abnormal electrical activity in the brain. Normally two [2]. Today machine learning provides several indispensable diagnostic tests namely Electroencephalogram (EEG) and tools for intelligent data analysis. Machine learning technology Magnetic Resonance Imaging (MRI) are used to diagnose the is currently well suited for analyzing medical data and presence of seizure. The sensitivity of the human eye in empirical results reveal that the machine learning systems are interpreting large numbers of images decreases with increasing highly efficient and could significantly reduce the number of cases. Hence, it is essential to automate the accurate computational complexities. prediction of seizure in patients. In this paper supervised learning approaches has been employed to model the prediction Yong Fan developed a method for diagnosis of brain task and the experiments show about 94% high prediction abnormality using both structural and functional MRI images accuracy. [3]. Christian E. Elger, Klaus Lehnertz developed a seizure prediction by non-linear time series analysis of brain electrical activity [4]. J.W.Wheless, L.J.Willmore, Keywords-Seizure; Support vector machine; K-NN; Naïve J.I.Breier, M.Kataki, J.R.Smith , D.W.King provides the Bayes; J48 comparison of Magnetoencephalography, MRI, and V- EEG in Patients Evaluated for Epilepsy Surgery [5]. William I. INTRODUCTION D.S. Killgorea, Guila Glossera, Daniel J. Casasantoa, Seizure is defined as a transient symptom of "abnormal Jacqueline A. Frencha, David C. Alsopb, John A. Detreab excessive in the brain”. Seizures can cause involuntary changes provide a complementary information for predicting post- in body movement or function, sensation, awareness, or operative seizure control [6]. behavior. It is an abnormal, unregulated electrical discharge The motivation behind the research reported in this paper is that occurs within the brain's cortical grey matter and to predict the presence of seizure in human brain. Machine transiently interrupts normal brain function [1]. Based on the learning techniques are employed here to model the seizure physiological characteristics of seizure and the abnormality in prediction problem as classification task to facilitate physician the brain, the kind of seizure is determined. Seizure is broadly for accurate prediction of seizure presence. In this paper classified into absence seizure, simple partial, complex partial supervised learning algorithms are made use of for the and general seizure. Absence seizure is a brief episode of automated prediction of type of seizure. staring. It usually begins in childhood between ages 4 and 14. Simple partial seizure affects only a small region of the brain, II. PROPOSED METHODOLOGY often the hippocampus. Complex partial seizure usually starts in a small area of the temporal lobe or frontal lobe of the brain. The proposed methodology models the seizure prediction as General seizure affects the entire brain. a classification task and provides a convenient solution by using supervised classification algorithms. Descriptive features Various diagnostic techniques normally employed for of MRI image such as energy, entropy, mean, standard patients are Computed Tomography (CT), Magnetic Resonance deviation, contrast, homogeneity of grey scale image have been Imaging (MRI) and PET (Positron Emission Tomography). extracted and used for training. The model is trained using Magnetic Resonance Imaging (MRI) is used as a valuable tool training datasets and the trained model is built. Finally the and widely used in the clinical and surgical environment for trained model is used to predict the type of seizure. seizure identification because of its characteristics like superior soft tissue differentiation, high spatial resolution and contrast. Magnetic Resonance Images are examined by radiologists based on visual interpretation of the films to identify the presence of seizure. The proposed model is shown in Figure.1. 165 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 7, October 2010 TABLE I. FEATURES OF MRI Feature Extraction Statistical Grey Level Co- Grey Level Run Length occurrence Matrix Matrix Mean Contrast Short run emphasis Training Variance Homogeneity Long run emphasis Skewness Correlation Grey level distribution Kurtosis Energy Run length distribution Entropy Run percentage Low grey level run emphasis High grey level run emphasis Trained Model Prediction Figure 1. The Proposed model 1) Grey Level Co-occurence Matrix(GLCM) The GLCM is defined as a tabulation of different A. Image Acquisition combinations of pixel brightness values (grey levels) occur in A magnetic resonance imaging (MRI) scan of the an image. The texture filter functions provide a statistical view patient’s brain is a noninvasive method to create detailed of texture based on the image histogram. This function pictures of the brain and surrounding nerve tissues. MRI uses provides useful information about the texture of an image but powerful magnets and radio waves. The MRI scanner contains does not provide information about shape, i.e., the spatial the magnet. The magnetic field produced by an MRI is about relationships of pixels in an image. 10 thousand times greater than the earth's. The magnetic field The features corresponding to GLCM statistics and their forces hydrogen atoms in the body to line up in a certain way. description are: When radio waves are sent toward the lined-up hydrogen atoms, it bounces back and a computer records the signal. • Contrast - Measures the local variations in the Different types of tissues send back different signals. grey-level co-occurrence matrix. The MRI dataset consisting of MRI scans images of 350 • Homogeneity - Measures the closeness of the patients of five types namely Normal, Absence Seizure, Simple distribution of elements in the GLCM to the Partial Seizure, Complex Partial Seizure and General Seizure GLCM diagonal. are taken into consideration. • Correlation - Measures the joint probability occurrence of the specified pixel pairs. B. Feature Extraction The purpose of feature extraction is to reduce the original • Energy - Provides the sum of squared elements in data set by measuring certain properties or features that the GLCM. Also known as uniformity or the distinguish one input pattern from another. A brain MRI slices angular second moment. is given as an input. The various features based on statistical, • Entropy - statistical measure of randomness. grey level co-occurrence matrix and grey level run-length matrix from the MRI is extracted. The extracted features provide the characteristics of the input type to the classifier by 2) Grey Level Run Lrngth Matrix(GLRLM) considering the description of the relevant properties of the The GLRLM is based on computing the number of grey- image into a feature space. level runs of various lengths. A grey-level run is a set of The statistical features based on image intensity are mean consecutive and collinear pixel points having the same grey variance, skewness and kurtosis. The grey level co-occurrence level value. The length of the run is the number of pixel points matrices (GLCM) features such as Contrast, Homogeneity, in the run [7]. Seven features are extracted from this matrix. Correlation, Energy, Entropy and the features of grey level run length matrices (GLRLM) such as Short run emphasis, Long C. Supervised Classification Algorithms run emphasis, Grey level distribution, Run-length distribution, Supervised learning is a machine learning technique for Run percentage, Low grey level run emphasis, High grey level deducing a function from training data. The training run emphasis are used to investigate the adequacy for the data consist of pairs of input objects and desired outputs. The discrimination of the presence of seizure. Table I shows the output of the function can predict a class label of the input features of MRI of a human brain. object called classification. The task of the supervised learner is to predict the value of the function for any valid input object after having seen a number of training examples i.e. pairs of input and target output. The supervised classification techniques namely, support vector machine, decision tree 166 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 7, October 2010 induction, Naive Bayes and k-nn are employed in seizure prediction modeling. 1) Support Vector Machine The machine is presented with a set of training examples, dTu =0 0≤u≤Ce (3) (xi, yi) where the xi is the real world data instances and the yi are the labels indicating which class the instance belongs to. For the two class pattern recognition problem, yi = +1 or yi = - where K - the Kernel Matrix. Q = DKD. 1. A training example (xi, yi) is called positive if yi = +1 and negative otherwise [6]. SVMs construct a hyper plane that separates two classes and tries to achieve maximum separation The Kernel function K (AAT) (polynomial or Gaussian) is between the classes. Separating the classes with a large margin used to construct hyperplane in the feature space, which minimizes a bound on the expected generalization error. separates two classes linearly, by performing computations in the input space. The simplest model of SVM called Maximal Margin classifier, constructs a linear separator (an optimal hyper plane) given by w T x - y= 0 between two classes of examples. The f(x)=sgn(K(x,xiT)*u-γ) (4) free parameters are a vector of weights w which is orthogonal to the hyper plane and a threshold value. These parameters are obtained by solving the following optimization problem using Lagrangian duality. where u - the Lagrangian multipliers. In general larger the margins will lower the generalization error of the classifier. 1 2 2) Naïve Bayes Minimize = w Naïve Bayes is one of the simplest probabilistic classifiers. 2 The model constructed by this algorithm is a set of probabilities. Each member of this set corresponds to the probability that a specific feature fi appear in the instances of subject to D ii (w x τ i ) − γ ≥ 1, i = 1,......, l. (1) class c, i.e., P (fi ¦ c). These probabilities are estimated by counting the frequency of each feature value in the instances of a class in the training set. Given a new instance, the classifier where Dii corresponds to class labels +1 and -1. The estimates the probability that the instance belongs to a specific instances with non null weights are called support vectors. In class, based on the product of the individual conditional the presence of outliers and wrongly classified training probabilities for the feature values in the instance. The exact examples it may be useful to allow some training errors in calculation uses bayes theorem and this is the reason why the order to avoid over fitting. A vector of slack variables ξi that algorithm is called a bayes classifier. measure the amount of violation of the constraints is introduced and the optimization problem referred to as soft margin is given 3) K-NN below. In this formulation the contribution to the objective K-nearest neighbor algorithms are only slightly more function of margin maximization and training errors can be complex. The k nearest neighbor of the new instance is balanced through the use of regularization parameter C. The retrieved and whichever class is predominant amongst them is following decision rule is used to correctly predict the class of given as the new instance's classification. K-nearest neighbor new instance with a minimum error. is a supervised learning algorithm where the result of new instance query is classified based on majority of K-nearest f(x)= sgn[wtx-γ] (2) neighbor category [9]. The purpose of this algorithm is to classify a new object based on attributes and training samples. The classifiers do not use any model to fit and only based on The advantage of the dual formulation is that it memory. permits an efficient learning of non–linear SVM separators, by 4) J48 Decision Tree Induction introducing kernel functions. Technically, a kernel function J48 algorithm is an implementation of the C4.5 decision calculates a dot product between two vectors that have been tree learner. This implementation produces decision tree (non- linearly) mapped into a high dimensional feature space models. The algorithm uses the greedy technique to induce [8]. Since there is no need to perform this mapping explicitly, decision trees for classification [10]. A decision-tree model is the training is still feasible although the dimension of the real built by analyzing training data and the model is used to feature space can be very high or even infinite. The parameters classify unseen data. J48 generates decision trees, the nodes of are obtained by solving the following non linear SVM which evaluate the existence or significance of individual formulation (in matrix form), features. Minimize LD (u) =1/2uT Qu - eT u 167 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 7, October 2010 III. EXPERIMENTAL SETUP depicted the same in Figure.2. The seizure data analysis and Prediction has been carried TABLE III. AVERAGE PERFORMANCE OF THREE MODELS out using WEKA and SVMlight for machine learning. WEKA is a collection of machine learning algorithms for Kernel Type Prediction Accuracy(%) data mining tasks [11]. SVMlight provides the extensive support for the whole process of experiment including preparing the input data, evaluating learning schemes statistically and Linear 75 visualizing the input data and the result of learning. Polynomial 80 The dataset is trained using SVM with most commonly used kernels linear, polynomial and RBF, with different RBF 94 parameter settings for d, gamma and C –regularization parameter. The parameters d and gamma are associated with polynomial kernel and RBF kernel respectively. Image processing toolbox of Matlab has been used for MRI feature extraction. The datasets are grouped into five broad classes namely Normal, Absence Seizure, Simple Partial Seizure, 100 94 Complex Partial Seizure and General Seizure to facilitate their 80 use in experimentally determining the presence of seizure in 80 75 Accuracy(%) MRI. The seizure dataset has 17 attributes, there are 350 instances, and as indicated above, 5 classes. Supervised 60 classification algorithms such as support vector machine, decision tree induction, naïve bayes and K-NN are applied for training. Support vector machine learning is implemented using 40 SVM light. Decision tree induction, Naïve Bayes and K-NN are implemented using WEKA. The performance of the trained 20 models has been evaluated using 10 fold cross validation and their results are compared. 0 Linear Polynomial RBF IV. RESULTS The results of the experiments are summarized in Table II. Figure 2. Comparing Prediction Accuracy of SVM Kernels Prediction accuracy and learning time are the parameters considered for performance evaluation. Prediction accuracy is The predictive accuracy shown by SVM with RBF kernel the ratio of number of correctly classified instances and the with parameter C=3 and g=2 is higher than the linear and total number of instances. Learning time is the time taken to polynomial kernel. build the model on the dataset. B. Classification using WEKA A. Classification using SVM The performance of the three kinds of SVMs with The results of the experiments are summarized in Table IV linear, polynomial and RBF kernels are evaluated based on the and V. prediction accuracy and the results are shown in Table II. TABLE IV. PREDICTIVE PERFORMANCE Evaluation Criteria TABLE II. SVM - LINEAR, POLYNOMIAL, RBF KERNELS Learning Correctly Incorrectly Prediction Classifiers Time (secs) classified classified accuracy SVM instances instances (%) Kernels C=1 C=2 C=3 C=4 Naïve 0.03 272 68 80 Linear 74 76 72 79 Bayes K-NN 0.02 276 64 81.17 Polynom 1 2 1 2 1 2 1 2 J48 0.09 293 47 86.17 ial (d) 79 81.2 82 80 86 84 74 75 0.5 1 0.5 1 0.5 1 0.5 1 RBF (g) 92 94 93 92 95 97 94 95 Table III shows the average performance of the SVM based classification model in terms of predictive accuracy and 168 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 7, October 2010 V. CONCLUSION This paper describes the modeling of the seizure prediction TABLE V. COMPARISON OF ESTIMATES task as classification and the implementation of trained model using supervised learning techniques namely, Support vector Evaluation Classifiers machine, Decision tree induction, Naive Bayes and K-NN. The criteria performance of the trained models are evaluated using 10 fold Kappa Statistic Naïve Bayes K-NN J48 cross validation based on prediction accuracy and learning time Mean Absolute 0.7468 0.7614 0.8235 and the results are compared. It is observed that about 94% Error high predictive accuracy is shown by the seizure prediction Root Mean 0.2716 model. As far as the seizure prediction is concerned, the 0.266 0.2284 Squared Error predictive accuracy plays major role in determining the Relative Absolute 26.1099 Error 24.2978 21.2592 performance of the model than the learning time. The Root Relative comparative results indicate that support vector machine yield a Squared Error 68.428 67.0142 57.549 better performance when compared to other supervised classification algorithms. Due to wide variability in the dataset, machine learning techniques are effective than the statistical approach in improving the predictive accuracy. The performances of the three models are illustrated in Figure 3 and 4. ACKNOWLEDGMENT The authors would like to thank the Management and 87 86.17 Acura Scan Centre, Coimbatore for providing the MRI data. 86 85 84 REFERENCES Accuracy(%) 83 82 81.17 [1] Robin cook, “Seizure” Berkley Pub Group, 2004. 81 80 [2] Karpagavalli S, Jamuna KS, and Vijaya MS, “Machine Learning 80 Approach for pre operative anaesthetic risk Prediction”, International 79 Journal of Recent Trends in Engineering,Vol. 1. No.2, May 2009. 78 [3] Yong fan ,”Multivariate examination of brain abnormality using both 77 structural and functional MRI”, Neuroimaging, elsevier, vol 36 issue 4 76 pp 1189-1199, 2007 Naïve Bayes K-NN J48 [4] Christian E. Elger, Klaus Lehnertz, “Seizure prediction by non-linear time series analysis of brain electrical activity” European Journal of Neuroscience Vol 10, Issue 2, pages 786–789, February 1998. [5] J. W. Wheless,L. J. Willmore ,J. I. Breier, M. Kataki, J. R. Smith ,.D. W. Figure 3. Comparing Prediction Accuracy King ,” A Comparison of Magnetoencephalography, MRI, and V-EEG in Patients Evaluated for Epilepsy Surgery”, Epilepsia ,Vol 40, Issue 7, 0.1 pages 931–941, July 1999. 0.0 9 0.09 [6] William D.S. Killgorea, Guila Glossera, Daniel J. Casasantoa, 0.08 Jacqueline A. Frencha, David C. Alsopb, John A. Detreab, Functional Learning Time(secs) MRI and the Wada test provide complementary information for 0.07 predicting post-operative seizure control , Seizure, pp 450-455,Dec 1999 0.06 [7] Galloway M. “Texture analysis using grey level runs lengths”, Comp 0.05 Graph Image Process,pp.72–179.,1975. 0.04 0.03 [8] Nello Cristianini and John Shawe-Taylor. “An Introduction to Support 0.03 Vector Machines and other kernel-based learning methods” Cambridge 0.02 0.02 University Press, 2000. 0.01 [9] Teknomo, Kardi. K-Nearest Neighbors Tutorial 0 [10] M. Chen, A. X. Zheng, J. Lloyd, M. I. Jordan, and E. Brewer, “Failure Naïve B ayes K -NN J4 8 diagnosis using decision trees”, In Proc. IEEE ICAC, 2004. [11] Ian H. Witten, Eibe Frank, Len Trigg, Mark Hall, Geoffrey Holmes, Sally Jo Cunningham, “Weka: Practical Machine Learning Tools and Figure 4. Comparing Learning Time Techniques with Java Implementations” ,1999. The time taken to build the model and the prediction accuracy is high in J48 when compared to other two algorithms in WEKA environment. 169 http://sites.google.com/site/ijcsis/ ISSN 1947-5500

DOCUMENT INFO

Shared By:

Categories:

Tags:
IJCSIS, call for paper, journal computer science, research, google scholar, IEEE, Scirus, download, ArXiV, library, information security, internet, peer review, scribd, docstoc, cornell university, archive, Journal of Computing, DOAJ, Open Access, October 2010, Volume 8, No. 7, Impact Factor, engineering, international, proQuest, computing, computer, technology

Stats:

views: | 75 |

posted: | 11/2/2010 |

language: | English |

pages: | 5 |

Description:
Vol. 8 No. 7 October 2010 International Journal of Computer Science and Information Security

OTHER DOCS BY ijcsis

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.