Document Sample

International Journal of Emerging Trends & Technology in Computer Science(IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume I, Issue 3, September-October 2012 ISSN 2278-6856 Using Intelligent Techniques for Breast Cancer Classification Hesham Arafat 1, Sherif Barakat 2 and Amal F. Goweda3 1 Mansoura University, Faculty of Engineering, Department of Computer Engineering and Systems, 2 Mansoura University, Faculty of Computers and Information, Department of Information Systems, 3 Mansoura University, Faculty of Computers and Information, Department of Information Systems, physicians in optimizing the decision task effectively. Abstract: Attribute reduction is an important issue in Rough set theory offers a novel approach to manage rough set theory. It is necessary to investigate fast and uncertainty that has been used for the discovery of data effective approximate algorithms to generate a set of dependencies, importance of features, patterns in sample discriminatory features. The main objective of this paper is data, feature space dimensionality reduction and the investigating a strategy based on Rough Set Theory (RST) classification of objects. While rough set on their own with Particle Swarm Optimization (PSO) to be used. Rough provide a powerful technique, it is often combined with Set Theory has been recognized to be one of the powerful other computational intelligence techniques such as tools in the medical feature selection .The supplementary neural networks, fuzzy sets, genetic algorithms, Bayesian part which will be used is Particle Swarm Optimization (PSO) that is defined as a subfield of swarm intelligence that approaches, swarm optimization and support vector studies the emergent collective intelligence of groups of machines. Particle Swarm Optimization as a new simple agents and based on social behavior that can be evolutionary computation technique, in which each observed in nature, such as flocks of birds and fish schools potential solution is seen as a particle with a certain where a number of individuals with limited capabilities are velocity flying through the problem space. The Particle able to achieve intelligent solutions for complex problems. Swarms find optimal regions of the complex search space Particle Swarm Optimization is widely used and rapidly through the interaction of individuals in the population. developed for its easy implementation and few particles PSO is attractive for feature selection in that particle required to be tuned. This hybrid approach embodies an swarms will discover best feature combinations as they adaptive feature selection procedure which dynamically fly within the subset space. Compared with other accounts for the relevance and dependence of the features evolutionary techniques, PSO requires only primitive and .The relevance selected feature subsets are used to generate decision rules for the breast cancer classification task to simple mathematical operators. In this research, rough differentiate the benign cases from the malignant cases by set is applied to improve feature selection and data assigning classes to objects. The proposed hybrid approach reduction. Particle Swarm Optimization (PSO) is used to can help in improving classification accuracy and also in optimize the rough set feature reduction to effectively finding more robust features to improve classifier classify breast cancer tumors, either malignant or benign. performance. This paper is organized as follows: In section 2 we Keywords: Breast cancer, Rough Sets, Feature Selection, reviewed briefly some of the recent related work Particle Swarm Optimization (PSO), Classification. published in the area of cancer classification using intelligent techniques, the vision of the proposed hybrid 1. INTRODUCTION techniques given in section 3, section 4 discusses basic Breast cancer occurs when cells become abnormal and concepts of rough set theory, the basic idea of Particle divide without control or order that can be considered as Swarm Optimization and also shown experiments results cancerous growth that begins in the tissues of the breast. and Section 5 concludes the research Breast cancer has become the most common cancer disease among women [1]. The most effective way to 2. RELATED WORK reduce breast cancer deaths is detect it earlier. Early Besides the introduction given above, a literary review is detection is the best form of cure and accurate diagnosis presented on Particle Swarm Optimization and rough sets of the tumor which is extremely vital. Early detection through their journey from inception to be implemented allows physicians to differentiate between benign breast in various problems. tumors from malignant ones without going for surgical The use of machine learning and data mining techniques biopsy. It also offers accurate, timely analysis of patient's [2] has revolutionized the whole process of breast cancer particular type of cancer and the available treatment Diagnosis and Prognosis. Data mining methods overview options. Extensive research has been carried out on including Decision Trees, Support Vector Machine automating the critical diagnosis procedure as various (SVM), Genetic Algorithms (GAs) / Evolutionary machine learning algorithms have been developed to aid Programming (EP), Fuzzy Sets, Neural Networks and Volume I, Issue 3, September-October 2012 Page 26 International Journal of Emerging Trends & Technology in Computer Science(IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume I, Issue 3, September-October 2012 ISSN 2278-6856 Rough Sets had been carried out to enhance the breast threshold and relative importance of upper and lower cancer diagnosis and prognosis. approximations of the rough sets. Data Mining Classification Techniques for Breast Cancer Another approach that uses rough set with PSO has been Diagnosis as discussed in [3] .A. Soltani Sarvestani, A. proposed by Wang et al. in [7]. The authors applied A. Safavi, N.M. Parandeh and M.Salehi provided a rough set to predict the degree of malignancy in brain comparison among the capabilities of various neural glioma. The selected feature subsets based on rough set networks such as Multilayer Perceptron (MLP), Self- with PSO are used to generate decision rules for the Organizing Map (SOM), Radial Basis Function (RBF) classification task. A rough set attribute reduction and Probabilistic Neural Network(PNN) which are used algorithm that employs a search method based on PSO is to classify WBC and NHBCD data. The performance of proposed and compared with other rough set reduction these neural network structures was investigated for algorithms. Experimental results show that reducts found breast cancer diagnosis problem. RBF and PNN were by the proposed algorithm are more efficient and can proved as the best classifiers in the training set. But the generate decision rules with better classification PNN gave the best classification accuracy when the test performance. Moreover, the decision rules induced by set is considered. This work showed that statistical neural rough set rule induction algorithm can reveal regular and networks can be effectively used for breast cancer interpretable patterns of the relations between glioma diagnosis as by applying several neural network MRI features and the degree of malignancy, which are structures a diagnostic system was constructed that helpful for medical experts. performed quite well. In this introduction, different data mining techniques In [4] Wei-pin Chang, Der-Ming and Liou explored that including rough sets and evolutionary algorithms used the genetic algorithm model yielded better results than for working out the feature selection problem, other data mining models for the analysis of the data of classification, data analysis and clustering had been breast cancer patients in terms of the overall accuracy of showed. No attempt has been made to cover all the patient classification, the expression and complexity approaches currently existing in literature but our of the classification rule. The artificial neural network, intention has been to highlight the role played by data decision tree, logistic regression and genetic algorithm mining techniques and especially rough set theory across were used for the comparative studies and the accuracy evolutionary algorithms. while positive predictive value of each algorithm was Table 1 depicts a fair comparison between supervised used as the evaluation indicators. WBC database was learning approaches which are used for the classification incorporated for the data analysis followed by the 10-fold task that concentrates on predicting the value of the cross-validation. The results showed that the genetic decision class for an object among a predefined set of algorithm described in the study was able to produce values classes given the values of some given attributes accurate results in the classification of breast cancer data for the object comparison between learning algorithms and the classification rule identified was more acceptable and more details can be found in [8] and comprehensible. Table 1: a fair comparison between supervised learning In [5] K. Rajiv Gandhi, Marcus Karnan and S. Kannan approaches in their paper constructed classification rules using the Particle Swarm Optimization algorithm for breast cancer Classificatio Strength Points Weakness datasets. In this study to cope with heavy computational n Technique Points efforts, the problem of feature subset selection as a pre- Decision Tree Decision trees Decision processing step was used which learns fuzzy rules bases Induction generate rules trees are using GA implementing the Pittsburgh approach. It was which are less used to produce a smaller fuzzy rule bases system with effective and appropri higher accuracy. The resulted datasets after feature simple ate to selection were used for classification using particle predict swarm optimization algorithm. The rules developed were Decision trees the with rate of accuracy defining the underlying attributes provide fields continuo effectively. which are the us Das et al. in [6] hybridized rough set theory with Particle important attribute Swarm Optimization (PSO) algorithm. The hybrid ones. values. rough-PSO technique has been used for grouping the pixels of an image in its intensity space. Authors treated It can be image segmentation as a clustering problem. Each cluster expensiv is modeled with a rough set. PSO is employed to tune the e to train. Volume I, Issue 3, September-October 2012 Page 27 International Journal of Emerging Trends & Technology in Computer Science(IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume I, Issue 3, September-October 2012 ISSN 2278-6856 Bayesian consider Easy to Loss of Classification ed so implement. accuracy that accordin Good results higher g to obtained in fitness is assumpti most of the attainabl ons cases. e. Dependen cies To evaluate which model is the best for the classification exist task some dimensions for Comparison are taken into among accounts which are as follow: variables . Error in numeric predictions Neural Cross Validation High tolerance Long Speed of model application Network to noisy data training Speed of model application Classifier time. Classification Accuracy Well suited for Total cost/benefit continuous Require a Noise Tolerance valued inputs number and outputs of paramet Algorithms are ers. 3. ROUGH SET AND PSO BASED parallel. FEATURE SELECTION Poor 3.1Problem Definition interpret Feature selection aims to determine a minimal feature ability subset from a problem domain while retaining a suitably Support high accuracy in representing the original features. The Training is The need significance of feature selection can be viewed in two Vector relatively easy. for a facets. The frontier facet is to filter out noise and remove Machines good redundant and irrelevant features. According to Jensen Non-traditional kernel and Shen in [9] feature selection is compulsory due to the data can be function abundance of noisy, irrelevant or misleading features in a used as input dataset. Second facet, feature selection can be to SVM implemented as an optimization procedure of search for instead of an optimal subset of features that better satisfy a desired feature vectors measure in [10].Thus, a proposed rough set feature Genetic The selection algorithm based on a search method called The parallelism Algorithms language Particle Swarm Optimization (PSO) is used to select that allows used to feature subsets that are more efficient to describe the evaluating specify decisions as well as the original whole feature set and many schema candidat discarded the redundant features leading to better at once e prediction accuracy. After selecting those features that Genetic solutions influence the decision concepts, they are employed within algorithms must be a decision rule generation process and creating perform well robust. descriptive rules for the classification task. in problems for 3.2 Objective The which the Problem target is focusing on finding optimal minimal problem fitness feature subset from a problem domain in order to remove of how landscape is those features considered as irrelevant that increase the to write complex ones. complexity of learning process and decrease the accuracy the of induced knowledge. It results not only in improving fitness the speed of data manipulation but even in improving the function classification rate by reducing the influence of noise and must be achieving classification accuracy. The aim is to build a carefully concise model of the distribution of class labels in terms Volume I, Issue 3, September-October 2012 Page 28 International Journal of Emerging Trends & Technology in Computer Science(IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume I, Issue 3, September-October 2012 ISSN 2278-6856 of predictor features. Then the resulting classifier is used (3) i : Pi Xi to assign class labels to the testing instances where the (4) while (stopping criterion not met) values of the predictor features are known, but the value (5) for i = 1,…, S // for each particle of the class label is unknown. (6) if (fitness(i) > fit) // local best Figure 1 shows the feature selection procedure that was (7) fit fitness(i) adopted in this study. This structure composes of two (8) pbest Xi important phases. Feature Selection tier constitutes the (9) if (fitness(i) > globalbest) // global best deployment of Particle Swarm Optimization for further (10) globalbest fitness(i) refinement and recommend only the significant features. (11) gbest Xi; Classification tier constitutes the process of exploiting R getReduct(Xi) // convert to reduct optimized features to extract classification rules to be (12) updateVelocity(); updatePosition() used to differentiate between benign cases from malignant cases. Figure 2 An algorithm to compute reducts In processing medical data, the advantage of choosing Initially, a population of particles is constructed with the optimal subset of features is as in [11]: random positions and velocities on S dimensions in the • Reducing the dimensionality of the attributes reduces problem space. For each particle, the fitness function is the complexity of the problem and allows researchers evaluated. If the current particle’s fitness evaluation is to focus more clearly on the relevant attributes better than pbest, then this particle becomes the current • Simplifying data description may facilitate best, and its position and fitness are stored. Next, the physicians to make a prompt diagnosis. current particle’s fitness is compared with the • Having fewer features means that less data need to population’s overall previous best fitness. If the current be collected, as collecting data is never an easy job in value is better than gbest, then this is set to the current medical applications because it is time-consuming particle’s position, with the global best fitness updated. and costly. This position represents the best feature subset encountered so far, and is thus converted and stored in R .The velocity and position of the particle is then updated according to Equation 12 and Equation 13. This process loops until a stopping criterion is met, usually a sufficiently good fitness or a maximum number of iterations (generations).The chosen subsets are then employed within a decision rule generation process, creating descriptive rules for the classification task. The motivation behind this study is trying to provide a practical tool for optimizing feature selection problem, the number of reducts found and the classification accuracy when applied to the classification of complex, real-world datasets. PSO has a strong search capability in the problem space and can efficiently find minimal reducts. Therefore, the combination of both with domain intelligence leads to better knowledge. 4. TECHNIQUES USED IN THE STUDY Structure described in Fig.1, involves two techniques Rough Set and PSO. 4.1.Rough Set Theory Rough set theory [13] is a mathematical approach for Figure 1 Rough-PSO Feature Selection Process handling vagueness and uncertainty in data analysis. Objects may be indiscernible due to the limited available The idea of PSO reducts for the optimal feature selection information. A rough set is characterized by a pair of problem can be shown [12] in Figure 2: precise concepts, called lower and upper approximations (1) i : Xi randomPosition(); which are generated using object indiscernibility. The Vi randomVelocity() most important issues are the reduction of attributes and (2) fit bestFit(X); the generation of decision rules. The rough set approach globalbest fit; seems to be of fundamental importance to AI and pbest bestPos(X); cognitive sciences, especially in the areas of machine Volume I, Issue 3, September-October 2012 Page 29 International Journal of Emerging Trends & Technology in Computer Science(IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume I, Issue 3, September-October 2012 ISSN 2278-6856 learning, knowledge acquisition, and decision analysis, 4.1.1.Basic Rough Set Concepts knowledge discovery from databases, expert systems, Let I (U , A {d }) be an information system [13] inductive reasoning and pattern recognition. where U is the universe with a non-empty set of finite Irrelevant features, uncertainties and missing values objects. A is a non-empty finite set of condition often exist in medical data such as breast cancer data. So, attributes, and d is the decision attribute (such a table is the analysis of medical data often requires dealing with also called decision table). a A There is a incompleteness and inconsistent of data that make it differ from other intelligent techniques such as neural corresponding function f a : U Va , where Va is the networks, decision trees and fuzzy theory that are mainly set of values of a. If P A , there is an associated based hypothesis (e.g. knowledge about dependencies, equivalence relation: probability distributions and large number of experiments). IND(P) {(x, y) U U | a P, f a ( x) f a ( y)}(1) Rough set theory [13] can deal with uncertainty and incompleteness in data analysis. The attribute reduction The partition of U, generated by IND (P) is denoted U/P. algorithm removes redundant information or features and If ( x, y ) IND ( P ) then x and y are indiscernible to P. selects a feature subset that has the same discernibility as the original set of features. The medical goal is to The equivalence classes of the P-indiscernibility relation identify subsets of the most important attributes are denoted [ x]P . Let X U , the P-lower influencing the treatment of patients. Rough set rule approximation P X and P-upper approximation P X of induction algorithms generate decision rules that can be more useful for medical expert to analyze and gain set X can be defined as: understanding dimensions of the problem P X {x U | [ x] X } (2) P The main advantage of rough set theory in data analysis P X {x U | [ x ] P X } is that it does not need any preliminary or additional (3) information about data like probability in statistics or Let P, Q A be equivalence relations over U, then the grade of membership or the value of possibility in fuzzy positive, negative and boundary regions can be defined set theory. Rough set has many advantages to be used by as: many researchers: POS P (Q) P X Providing efficient algorithms for finding hidden X U / Q (4) patterns in data and most of them are suited for parallel processing. NEG P (Q) U P X X U / Q (5) Evaluating significance of data BND P (Q ) P X P X X U / Q X U / Q Generating sets of decision rules from data which are (6) concise and valuable The positive region of the partition U/Q with respect to P abbreviated as POS P (Q ) is the set of all objects of U Finding minimal sets of data (data reduction) that can be certainly classified to blocks of the partition Rough set methods do not need membership U/Q by means of P. Functional dependence: For given functions and prior parameter settings due to its A= (U, A), P, Q A, by P→Q is denoted the functional simplicity dependence of (Q) on (P) in A that holds if and only if Rough sets have been a useful tool for medical IND (P) IND (Q). Also dependencies to a degree are applications. Hassanien [14] reported application of considered in [13]: rough sets to breast cancer data that generated rules with Q depends on P in a degree k ( 0 k 1 ) denoted 98% accurate results. Tsumoto [15] proposed a rough set P k Q algorithm to generate diagnostic rules based on the hierarchical structure of differential medical diagnosis POS P (Q) and it was evaluated experimentally results show that k P (Q ) U rules represent experts’ decision processes. Komorowski (7) and Ohrn [16] use a rough set approach for identifying a If k=1, Q depends totally on P, if 0<k<1, Q depends patient group in need of a scintigraphic scan for partially on P, and if k=0 then Q does not depend on P. subsequent modeling. In [15], a rough set classification When P is a set of condition attributes and Q is the algorithm exhibits higher classification accuracy than decision, P (Q) is the quality of classification [13], decision tree algorithms, such as ID3 and C4.5. The [17]. generated rules are more understandable than those produced by decision tree methods. Volume I, Issue 3, September-October 2012 Page 30 International Journal of Emerging Trends & Technology in Computer Science(IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume I, Issue 3, September-October 2012 ISSN 2278-6856 The goal of attribute reduction is to remove redundant card ( d v ) A A attributes so that the reduced set provides the same (D ) card ( ) quality of classification as the original. The set of all A (11) reducts is defined as [18]: A dataset may have many attribute reducts. The set of all Red{R C | R (D) C (D),B R, B (D) C (D)}(8) optimal reducts is: card ( d v ) Redmin {R Red| R Red, R R} (D ) A A (9) card ( d v ) A (12) The intersection of all reducts is called the core, the elements of which are those attributes that cannot 4.1.3.Rough set Feature Selection be eliminated. The core is defined as [13], [18]: Rough sets for feature selection [20] is valuable, as the Core(C) = Red (10) selected feature subset can generate more general 4.1.2.Decision Rules decision rules and better classification quality of new An expression c: (a=v) where a A and v Va is an samples. So some heuristic or approximation algorithms have to be considered. K.Y. Hu [21] computes the elementary condition (atomic formula) of the decision significance of an attribute using heuristic ideas from rule which can be checked for any x X . An elementary discernibility matrices and proposes a heuristic reduction condition c can be interpreted as a algorithm (DISMAR). X. Hu [22] gives a rough set mapping c : U {true, false} . A conjunction C of q reduction algorithm using a positive region-based elementary conditions is denoted attribute significance measure as a heuristic (POSAR). by C c1 c2 ... cq . The cover of a conjunction C G.Y. Wang [23] develops a conditional information entropy reduction algorithm (CEAR). denoted by [C] or C A is the subset of examples that 4.2.Particle Swarm Optimization (PSO) satisfy the conditions represented by C as showed in Particle swarm optimization (PSO) is an evolutionary [17].The cover of conjunction computation technique developed by Kennedy and [C ] {x U : C ( x ) true} called the support Eberhart [24], [25]. The original idea was to graphically descriptor .If K is concept, the positive cover simulate the choreography of a bird flock. Shi.Y. [C ] [C ] K denotes the set of positive examples introduced the concept of inertia weight into the particle K swarm optimizer to produce the standard PSO algorithm covered by C. [26].The concept of particle swarms has become very A decision rule r for A is any expression of the form popular these days as an efficient search and (d v ) where c1 c 2 ... c q is a optimization technique. The Particle Swarm Optimization (PSO) [27], [30] does not require any conjunction, satisfying [ ] K and v Vd , Vd is the gradient information of the function to be optimized, uses set of values of d. The set of attribute-value pairs only primitive mathematical operators, and is occurring in the left hand side of the rule r is the conceptually very simple. Since its advent in 1995, PSO condition part, Pred(r), and the right hand is the decision has attracted the attention of many researchers all over part, Succ(r). An object u U is matched by a decision the world resulting in a huge number of variants of the basic algorithm and many parameter automation rule (d v ) if and only if u supports both the strategies. condition part and the decision part of the rule. If u is An Analysis of the Advantages of the Basic Particle matched by (d v ) then we say that the rule Swarm Optimization Algorithm discussed in [28]: classifies u to decision class v. The number of objects PSO is based on the intelligence. It can be applied matched by a decision rule, (d v ) , denoted by into both scientific research and engineering use. PSO have no overlapping and mutation calculation. Match(r), is equal to card ( A ) . The support of the The search can be carried out by the speed of the rule card ( d v A ) is the number of objects particle. During the development of several A generations, only the most optimist particle can supporting the decision rule. transmit information onto the other particles, and the As in [19], the accuracy and coverage of a decision speed of the researching is very fast. rule (d v ) are defined as: The calculation in PSO is very simple. In compared with the other developing calculations, it occupies the biggest optimization ability and it can be completed easily. Volume I, Issue 3, September-October 2012 Page 31 International Journal of Emerging Trends & Technology in Computer Science(IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume I, Issue 3, September-October 2012 ISSN 2278-6856 PSO adopts the real number code, and it is decided (8) end if directly by the solution. The number of the (9) g = i.// Arbitrary dimension is equal to the constant of the solution. (10) for j = indexes of neighbors do PSO is initialized with a population of particles. Each (11) if G (pj) > G (pg) then (12) g = j. particle is treated as a point in an S-dimensional space. //g is the index of the best performer in the The ith particle is represented as X i ( xi1 , xi 2 ,..., xiS ) . neighborhood The best previous position (pbest, the position giving the (13) end if best fitness value) of any particle (14) end for (15) for d = 1 to number of dimensions do is Pi ( pi1 , pi 2 ,..., piS ) . The index of the global best (16) vid (t) = f(xid(t − 1), vid(t − 1), pid, pgd) particle is represented by ‘gbest’. The velocity for particle //Update velocity is Vi (vi1 , vi 2 ,..., viS ) . The particles are manipulated (17) vid 2 in (−Vmax,+Vmax) (18) xid(t) = f(vid(t), xid(t − 1)) . according to the following equation: //Update position vid w*vid c1 *rand *(pid xid) c2 *Rand *(pgd xid) () () (19) end for (13) (20) end for xid xid vid (21) until stopping criteria (14) (22) end procedure Where w is the inertia weight, suitable selection of the inertia weight provides a balance between global and Figure 3: Standard Particle Swarm Optimization local exploration and thus require less iterations on (PSO) average to find the optimum. If a time varying inertia Definitions and Variables used in Figure 3: weight is employed, better performance can be expected t means the current time step, t − 1 means the [29]. The acceleration constants c1 and c2 in equation previous time step. (13) represent the weighting of the stochastic xid(t) is the current state (position) at site d of acceleration terms that pull each particle toward pbest individual i. and gbest positions. Low values allow particles to roam vid (t) is the current velocity at site d of individual i. far from target regions before being tugged back, while ±Vmax is the upper/lower bound placed on vid. high values result in abrupt movement toward, or past, pid is the individual’s i best state (position) found so target regions. rand () and Rand() are two random far at site d. functions in the range [0,1]. Particle’s velocities on each pgd is the neighborhood best state found so far at site dimension are limited to a maximum velocity Vmax. If d Vmax is too small, particles may not explore sufficiently 4.3.Problems' Description and Basic beyond locally good regions. If Vmax is too high Experimentation Setup particles might fly past good solutions. Breast cancer UCI dataset [31] was obtained from The first part of equation (13) enables the “flying University of Wisconsin Hospitals, Madison from Dr. particles” with memory capability and the ability to William H. Wolberg. We perform experimentation on the explore new search space areas. The second part is the dataset summarized in Table 2. “cognition” part, which represents the private thinking of Table2: Data used in the experiments the particle itself. The third part is the “social” part, which represents the collaboration among the particles. Name Instances Class Validati Equation (13) is used to update the particle’s velocity. 699 Distributio on Then, the particle flies toward a new position according n to equation (14). The performance of each particle is Wisconsin Attribute Benign Trainin measured according to a pre-defined fitness function. Breast s cases: 458 g 80% The process for implementing the PSO algorithm is as Cancer 11 (65.5%) Testing follows [7]: Diagnosti Malignant :140 c cases: 241 case (1) procedure PSO (34.5%). (2) repeat (3) for i = 1 to number of individuals do The data will be nine conditional features and one (4) if G (xi) > G (pi) then decision feature as the first attribute that describes //G () evaluates goodness sample code number will be removed as shown later. (5) for d = 1 to dimensions do Data was implemented in WEKA software more (6) pid = xid . // pid is the best state found so far information about it can be found in [32] (7) end for Steps to be implemented: Volume I, Issue 3, September-October 2012 Page 32 International Journal of Emerging Trends & Technology in Computer Science(IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume I, Issue 3, September-October 2012 ISSN 2278-6856 Step 1: Remove sample code number from data (no effect population size and 20 iterations obtains the best result on data) with removal filter. among the other methods. Step 2: Dataset was discretized from numeric to nominal Table 4: Comparison of classification results by using data using NumericToNominal filter which is defined as various classification techniques an instance filter that discretizes a range of numeric attributes in the dataset into nominal attributes. Step 3: Replace missing values for nominal and numeric Correct Incorr TP F Pre R Popul Feat ly ectly Rat P cisi ec ation ure attributes with modes and means from the training data Classifi Classif e R on all Size Selec that will be done by using ReplaceMissingValues filter. ed ied (A at (A tion Classification Step 4: To find the reducts we applied the supervised Instanc Instan V e VG Technique es ces G) (A ) attribute selection filter RSARSubsetEval(Rough Set V Attribute Reduction) that is the implementation of the G) QuickReduct algorithm of rough set attribute reduction and we use the search method as PSOsearch that explores 135 5 0.9 0. 0.9 0. 20 10 96.428 3.571 64 03 65 96 Attri the attribute space using the Particle Swarm 6% 4% 8 4 butes Optimization (PSO) algorithm described in [33] and parameters showed in figure4 and table 3 Confusion Matrix Table 3: PSO Parameters a b <------classified as 87 3 | a = 2 2 48 | b = 4 136 42. 0.9 0. 0.9 0. 10 9 PSO Individua Inertia Social Iteration 97.142 8571 71 02 72 97 Attri Parameter l Weigh Weigh s 9% % 5 1 butes Naïve Bayes s Weight t t Confusion Matrix 0.34 0.33 0.33 20 a b <-- classified as 87 3 | a = 2 1 49 | b = 4 136 42. 0.9 0. 0.9 0.97 5 8 97.142 8571 71 02 72 1 Attri 9% % 5 butes Confusion Matrix a b <-- classified as 87 3 | a = 2 1 49 | b = 4 129 117 0.9 0. 0.9 0. 20 10 92.142 .8571 21 10 21 92 Attri Figure 4 Preprocess Implementation 9% % 6 1 butes This stage was important because Rough Set filtering was 129 117 0.9 0. 0.9 0. 10 9 used to eliminate the unimportant and redundant features 92.142 .8571 21 10 21 92 Attri (First phase in Figure1) and to reduce the number of 9% % 6 1 butes 10 iterations that PSO has to perform in finding an optimum Rule feature subset. s 0.08 Decision Table Step 5: We used some of classification techniques as Seco showed in table 4 to classify the data .The number of nds decision rules and the classification accuracy are also 129 117 0.9 0. 0.9 0. 5 8Attr shown. 92.142 .8571 21 10 21 92 ibute 9% % 6 1 s From the results, we could conclude that an increase of 10 particle/individual above 20 does not bring any relevant Rule improvement in the algorithm’s performance. The s 0.07 increment or the decrement in number of iterations has Seco also no influence on algorithm's performance as its ideal nds result by experiments is on 20 iterations. The best result Confusion Matrix for the above three cases in all of the classification algorithms obtains with a b <-- classified as 86 4 a = 2 minimum feature subset .This achieves our view to obtain 7 43 b = 4 best results with minimum features subset. Finally, the evaluation results show that using Naïve Bayes with 5 Volume I, Issue 3, September-October 2012 Page 33 International Journal of Emerging Trends & Technology in Computer Science(IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume I, Issue 3, September-October 2012 ISSN 2278-6856 122 96. 0.93 0. 0.9 0. 20 10 Confusion Matrix 87.142 4286 1 10 32 93 Attri a b <-- classified as 9% % 4 1 butes 86 4 | a = 2 Uncl 3 47 | b = 4 assifi ed 96 130 107 0.9 0. 0.9 0. 5 8 .428 92.857 .1429 29 09 28 92 Attri 6% 1% % 3 9 butes Leav Confusion Matrix es:37 a b <-- classified as Tree 82 2 | a = 2 Size 7 40 | b = 4 :41 Confusion Matrix a b <-- classified as 119 117 0.927 0.922 10 9 9 86 4 | a = 2 85 % .1429 0.9 0. Attri 6 44 | b = 4 % 22 12 butes 130 107 0.9 0. 0.9 0. 20 10At Prism 6 Uncl 92.857 .1429 29 08 29 92 tribu assifi 1% % 4 9 tes ed 14 11 Rule (7.8 s 571 0.11 %) seco Confusion Matrix nds a b <-- classified as 81 1 | a = 2 9 38 | b = 4 120 9 0.9 0. 0.9 0. 5 8 Confusion Matrix 85.714 6.428 3 12 37 93 Attri a b <-- classified as 3% 6% 6 butes 85 5 | a = 2 Uncl 5 45 | b = 4 assifi ed 135 53. 0.9 0. 0.9 0. 10 9Attr JRip with k=2 optimizations, 11 7.85 96.428 5714 64 05 65 96 ibute 71% 6% % 5 4 s 13 Rule Confusion Matrix s Folds 3 a b <-- classified as 0.08 83 0 | a = 2 seco 9 37 | b = 4 nds 1339 7 5 0.9 0.5 0.9 0.9 20 10 5% % 5 4 5 5 Attrib utes Num ber of Leav es :29 Tree Confusion Matrix Size a b <-- classified as :32 89 1 | a = 2 4 46 | b = 4 1339 7 5 0.9 0.5 0.9 0.9 10 9 133 75 0.9 0.9 0. 5 8 J48 5% % 5 4 5 5 Attrib 95% % 5 0. 5 95 Attri utes 07 butes Num 2 15 ber of Rule Leav s es :29 Tree Size :32 Confusion Matrix a b <-- classified as 88 2 | a = 2 5 45 | b = 4 Volume I, Issue 3, September-October 2012 Page 34 International Journal of Emerging Trends & Technology in Computer Science(IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume I, Issue 3, September-October 2012 ISSN 2278-6856 Naïve Bayes Classifier shows that classification process set based algorithms can be derived from data tables with with minimum features only 8 attributes achieve higher no need for priori estimates or preliminary assumptions. result with 136 correctly classified instances. The same The combination of rough sets with other intelligence result the naïve base classifier achieved when it used 9 techniques is able to provide a more effective approach. attributes and 10 population size .This shows that the We have illustrated that rough sets have been best result is on minimum features selection .Decision successfully combined with particle swarm optimization Table Classifier gives 129 correctly classified instances algorithms that is described as new heuristic optimization with minimum features subset .Only 8 attributes give the method based on swarm intelligence. It is very simple, same result as 10 or 9 attributes used. easily implemented and it needs fewer parameters, which Prism classifier gives the worst results .It achieves 122 made it fully developed and applied for feature extraction correctly classified instances with ten attributes used. task. Decrement number of correctly classified instances to References 120 when 8 attributes and 5 population size are used .So [1] R. Roselin, K. Thangavel, and C. we can say that decrement number of attributes with Velayutham,“Fuzzy-Rough Feature Selection for Prism classifier gives counterproductive and decreases Mammogram Classification”, Journal of Electronic the classification accuracy.J48 classifier achieves best Science and Technology, Vol. 9, No. 2, JUNE 2011. result of 133 correctly classified with 9 attributes and [2] J.R. Quinlan, “Induction of Decision Trees”, population size of 10 .JRip achieves best result of 135 Machine Learning, pp.81-106, Vol.1, 1986. correctly classified instances with 9 attributes and 10 [3] Sarvestan Soltani A., Safavi A. A., Parandeh M. population size .Best results were extracted by Naïve N. and Salehi M., “Predicting Breast Cancer Bayes then JRip classifier in terms of classification Survivability using data mining techniques”, accuracy and feature reduction Software Technology and Engineering (ICSTE), 2nd International Conference ,pp.227-231, Vol.2, 2010. 5. Future Plans [4] Chang Pin Wei and Liou Ming Der, “Comparison The blending with the other intelligent optimization of three Data Mining techniques with Genetic algorithm [28] Algorithm in analysisof Breast Cancer data”, The Blending Process is to combine the advantages of the Available:http://www.ym.edu.tw/~dmliou/Paper/com PSO with the advantages of the other intelligent par_threedata.pdf. optimization algorithms to create the compound [5] Gandhi Rajiv K., Karnan Marcus and Kannan S., algorithm that has practical value. For example, the “Classification Rule Construction Using Particle particle swarm optimization algorithm can be improved Swarm Optimization Algorithm for Breast Cancer by the simulated annealing (SA) approach .It can be Datasets”, Signal Acquisition and Processing, connected with the hereditary agents, the algorithm of a ICSAP, International Conference, pp. 233 – 237, colony of ants, vague method and etc. 2010. The application area of the Algorithm [6] S.Das, A. Abraham, S.K. Sarkar,“A Hybrid At present, the most research on PSO in the coordinate Rough Set–Particle Swarm Algorithm for Image system. There is less research on the PSO algorithm Pixel Classification”,Proc.of the SixthInt.Conf. on application in non-coordinate system, scattered system Hybrid Intelligent Systems, pp. 26-32, 2006. and compound optimization system. [7] Matthew Settles, “An Introduction to Particle Swarm Optimization”, November 2007. 6. CONCLUSION [8] S. B. Kotsiantis, “Supervised Machine Learning: A Review of Classification Techniques”, Medical diagnosis is considered as an intricate task that Informatica, Vol.31, 249-268, 2007. needs to be carried out precisely and efficiently. The [9] Jensen, R. and Shen, Q., “Fuzzy-rough Data automation of the same would be highly beneficial. Reduction with Ant Colony Optimization”, Journal Clinical decisions are often made based on doctor's of Fussy Sets and Systems, pp.5-20, Vol. 149, 2005. intuition and experience. Data mining techniques have [10] Monteiro, S., Uto, TK., Kosugi, Y., the potential to generate a knowledge-rich environment Kobayashi,N.,Watanabe, E. and Kameyama, which can help to significantly improve the quality of K,“Feature Extraction of Hyperspectral Data for clinical decisions. Rough set theory supplies essential Under Spilled Blood Visualization Using Particle tools for knowledge analysis. It provides algorithms for Swarm Optimization”,International Journal of knowledge reduction, concept approximation, decision Bioelectromagnetism, pp.232-235, Vol. 7,No.1 , rule induction and object classification. The methods of 2005. rough set theory rest on indiscernibility and related [11] Yan WANG, Lizhuang MA, “Feature Selection notions, in particular on notions related to rough for Medical Dataset Using Rough Set Theory”, inclusions. All constructs needed in implementing rough Volume I, Issue 3, September-October 2012 Page 35 International Journal of Emerging Trends & Technology in Computer Science(IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume I, Issue 3, September-October 2012 ISSN 2278-6856 Proceedings of the 3rd WSEAS International [25] J.Kennedy, R.C.Eberhart,“A new optimizer Conference on COMPUTER ENGINEERING and using particle swarm theory”, In: Sixth International APPLICATIONS. Symposium on Micro Machine and Human Science, [12] Jensen, R., Shen, Q., & Tuson, A.,“Finding Nagoya, pp. 39-43, 1995. Rough Set Reducts with SAT,” In Proceedings of the [26] Y. Shi, R. Eberhart,“A Modified Particle Swarm 10th International conference on Rough Sets, Fuzzy Optimizer", In: Proc. IEEE Int. Conf. On Sets, Data Mining and Granular Computing, LNAI Evolutionary Computation, Anchorage, AK, USA, 3641, pp. 194-203, 2005. pp. 69-73, 1998. [13] Z. Pawlak, “Rough Sets: Theoretical aspects of [27] Kennedy.J,“Small Worlds and Mega-Minds: reasoning about data,” Kluwer Academic Publishers, Effects of Neighborhood Topology on Particle Dordrecht, 1991. Swarm Performance", Proceedings of the 1999 [14] A.E. Hassanien,“Rough Set Approach for Congress of Evolutionary Computation, IEEE Press, Attribute Reduction and Rule Generation: A Case Vol. 3, pp. 1931-1938, 1999. of Patients with Suspected Breast Cancer”, Journal [28] Qinghai Bai, “Analysis of Particle Swarm of the American society for Information science and Optimization Algorithm",Computer and Information Technology ,pp. 954-962, Vol.55,No.11, 2004. Science,Vol.3,No.1, 2010. [15] S. Tsumoto,“Mining Diagnostic Rules from [29] Y. Shi, R. C. Eberhart,“Parameter Selection in Clinical Databases Using Rough Sets and Medical Particle Swarm Optimization in Evolutionary Diagnostic Model”, Information Sciences, pp.65-80, Programming”, VII: Proc. EP98, New York: Vol.162, 2004. Springer-Verlag, pp. 591-600, 1998. [16] J. Komorowski, A. Ohrn,“ Modeling Prognostic [30] R.C. Eberhart, Y. Shi,“Particle Swarm Power of Cardiac Tests Using Rough Sets, Artificial Optimization: Developments, Applications and Intelligence in Medicine ”, pp. 167-191, Vol.15, Resources”, In: Proc. IEEE Int. Conf. On 1999. Evolutionary Computation, Seoul, pp. 81-86, 2001. [17] ZPawlak, “Rough Set Approach to Knowledge- [31] http://archive.ics.uci.edu/ml/machine-learning- Based Decision Support", European Journal of databases/breast-cancer-wisconsin/breast-cancer- Operational Research, pp. 48-57, Vol.99, 1997. wisconsin.data [18] Xiangyang Wang , Jie Yang , Xiaolong Teng , [32] http://www.cs.waikato.ac.nz/ml/weka/ Weijun Xia , Richard Jensen , “Feature Selection [33] Moraglio, A., Di Chio, C., and Poli, R., based on Rough Sets and Particle Swarm “Geometric Particle Swarm Optimization,”EuroGP, Optimization” . LNCS 445, pp. 125-135, 2007. [19] A.Skowron, C.Rauszer, “The Discernibility Matrices and Functions in Information Systems”, In: R.W. Swiniarski (Eds.): Intelligent Decision Support—Handbook of Applications and Advances of the Rough Sets Theory, Kluwer Academic Publishers, Dordrecht, pp. 311-362, 1992. [20] R.W. Swiniarski, A. Skowron, “Rough set methods in feature selection and recognition”, Pattern Recognition Letters, pp. 833-849, Vol. 24, 2003.[21] K.Y. Hu, Y.C. Lu, C.Y. Shi, “Feature ranking in rough sets,” AI Communications, pp. 41- 50, Vol.16,No.1, 2003. [22] X. Hu, “Knowledge Discovery in Databases: An Attribute-Oriented Rough Set Approach”,Ph.D thesis, Regina University,1995. [23] G.Y. Wang, J. Zhao, J.J. An, Y. Wu, “Theoretical Study on Attribute Reduction of Rough Set Theory: Comparison of Algebra and Information Views", In: Proceedings of the Third IEEE International Conference on Cognitive Informatics, (ICCI’04), 2004. [24] J .Kennedy, R.Eberhart, “Particle Swarm Optimization",In :Proc IEEE Int. Conf. On Neural Networks, Perth, pp. 1942-1948, 1995. Volume I, Issue 3, September-October 2012 Page 36

DOCUMENT INFO

Shared By:

Categories:

Tags:
IJETTCS, Volume 1, Issue 3, September – October 2012 ISSN 2278-6856
http://www.ijettcs.org/V1I3.html

Stats:

views: | 17 |

posted: | 11/19/2012 |

language: | |

pages: | 11 |

Description:
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 3, September – October 2012 ISSN 2278-6856

OTHER DOCS BY editorijettcs

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.