VIEWS: 120 PAGES: 5 CATEGORY: Emerging Technologies POSTED ON: 3/8/2011
The International Journal of Computer Science and Information Security (IJCSIS Vol. 9 No. 2) is a reputable venue for publishing novel ideas, state-of-the-art research results and fundamental advances in all aspects of computer science and information & communication security. IJCSIS is a peer reviewed international journal with a key objective to provide the academic and industrial community a medium for presenting original research and applications related to Computer Science and Information Security. . The core vision of IJCSIS is to disseminate new knowledge and technology for the benefit of everyone ranging from the academic and professional research communities to industry practitioners in a range of topics in computer science & engineering in general and information & communication security, mobile & wireless networking, and wireless communication systems. It also provides a venue for high-calibre researchers, PhD students and professionals to submit on-going research and developments in these areas. . IJCSIS invites authors to submit their original and unpublished work that communicates current research on information assurance and security regarding both the theoretical and methodological aspects, as well as various applications in solving real world information security problems.
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 2, February 2011 AN IMPROVED MULTIPERCEPTRON NEURAL NETWORK MODEL TO CLASSIFY SOFTWARE DEFECTS M.V.P. Chandra Sekhara Rao, Dr.B.Raveendra Babu Director (Operations), Delta Technologies (P) Ltd., Aparna Chaparala, Hyderabad, India Department of CSE, Dr. A.Damodaram R.V.R. &J.C. College of Engineering, JNTU, CSE Department, JNTU College of Engineering, Kukatpally, Guntur, India Hyderabad, INDIA Abstract: Predicting software defects in modules not only quality of software but does not ensure zero defects helps in maintaining legacy systems but also helps the and is a very expensive proposition if not planned software development process and ensures higher properly. reliability. Advantage includes planning of resources for the projects and minimization of budget. Research has been carried out using statistical methodology and machine Software quality modeling becomes an important learning techniques which are generic in nature. The criterion to ensure that the software not only meets dependability on legacy Software systems to meet current the desired quality but also within time and budget demanding requirements is a major challenge for any IT lines. Defect prediction based on quantifiable metrics administrator and estimation of costs to maintain the same though in controversy, has been used successfully to is a huge challenge. In this paper, it is proposed to modify the existing multi layer perceptron Neural Network which predict defects in modules. Defect prediction models is a popular supervised classification algorithm to predict have independent variables captured in the form of defects in a given module based on the available software product and process metrics and one dependent metrics. variable which indicates whether there could be a fault or no fault in the module. Typically researchers Keywords— Legacy software, Software metrics, Software have used product metrics extensively to predict fault reliability, Classification, Multilayer Perceptron Neural in the modules. The independent variables used for network, Fault-proneness. prediction of defects can be parameters captured in previous projects which is available in the configuration management system or can be I. INTRODUCTION computed from the current project. Software reliability and Software quality assurance Predicting module defects also finds application in are two major areas in software engineering which legacy systems where it may not be possible to ensures high quality software. Both these concepts replace legacy systems through the practice of are drawn in throughout the development and application retirement. Defect prediction provides a maintenance process. The notable major activities cost effective process to enhance them. used are performance analysis, functional tests, quantifying time and budget along with measurement The previous work carried out by the author  of metrics. In addition; code reviews, key investigates the KC1 for defect classification using personnel assignment and automatic test-case Decision Tree induction and Bayesian networks. generation are the other strategies that are applied to Various pre-processing techniques were also reach the high reliability . investigated . The results obtained are tabulated in table 1 and 2. Software quality can be viewed from different perspectives including time, budget and mean time to failure. Alpha and Beta testing help to improve the 124 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 2, February 2011 TABLE-I. CLASSIFICATION ACCURACY ON KC1 decades. There are several techniques proposed to DATASET classify the modules for identifying fault-prone modules Correctly Incorrectly Mean classified classified Absolute III. DATA MINING TECHNIQUES KC1 Dataset % % error Data Mining (DM) aims to establish something new Ramdom tree 81.86 18.14 0.1924 from the facts recorder in the databases. Originally, CART 84.91 15.09 0.2095 data mining is a statistician’s term for overusing data Bayesian to draw in legitimate inferences. DM is the use of logistic powerful tools to sift out important or significant regression 86.03 13.97 0.1397 traits that are previously unknown from databases or data warehouses. TABLEII. CLASSIFICATION ACCURACY AFTER Software is prone to have errors and bugs. The PREPROCESSING IN KC1 DATA SET process of software testing is to assess the quality of computer software and verify whether the software % correctly % Incorrectly complies with software specification and customer classified Classified needs. There are two ways to find errors in software Random testing: manual and automated. Manually debugging Tree 94.5531 5.4469 is laboured intensive and costly while automated Logistic debugging can classify and locate the software defect regression 95.6704 4.3296 automatically. Data mining based software CART 96.7877 3.2123 debugging is becoming more and more accepted and it can significantly reduce the amount of labour cost in software debugging. In this paper, the efficacy of neural network for defect prediction using available model and our Data Mining extracts useful information and proposed model is verified. knowledge from huge amount of data. DM methods can be applied to the data generated in every stage of This paper is organized into the following sections. software life cycle such as design, development, Section II describes software metrics, Section III testing, deployment and maintenance, and extract describes data mining techniques for classification, potential errors in the software. Section IV gives an introduction to Neural Network used, Section V describes the dataset used in the IV. NEURAL NETWORKS work, Section VI includes the improved neural network technique and output obtained. The last Neural networks consist of multiple layers of section analyses and concludes the paper. computational units, usually interconnected in a feed- forward way. Each neuron in one layer has directed II. SOFTWARE METRICS connections to the neurons of the subsequent layer. In many applications the units of these networks apply a Software metrics are collected at various phases of sigmoid function as an activation function. the software development process. These metrics contain information of software and can be used to The feed forward neural network was the first and predict software quality in the early stages of arguably simplest type of artificial neural network software life cycle. devised. As the majority of faults are found of its modules, there is a need to investigate the modules Software reliability engineering is one of the most that are affected severely as compared to other important aspects of software quality. Recent studies modules and proper maintenance to be done on time show that software metrics can be used in software especially for the critical applications Ebru Ardil et. module fault-proneness prediction. A software al (2009). module has a series of metrics, some of which are related to fault-proneness. Multiple research works Algorithms based on neural networks have a lot of on the software quality prediction using the applications in knowledge engineering. In data relationship between software metrics and software mining, the following neural network architectures module’s fault-proneness have been done in the last are used: 125 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 2, February 2011 Kohenen’s self-organizing maps provide means for • Multilayered feed forward neural visualization of multivariate data, because two networks clusters of similar members activate output neurons • Kohenen’s self-organizing maps. with small distance in the output layer. In other words, neurons that share a topological resemblance A) Multilayered feed forward neural will be sensitive to inputs that are similar. This networks property has no other algorithm of cluster analysis. Multilayered feed forward neural networks (ANNs) SOM is a dynamic system, which learns abstract are non-parametric regression methods, which structure in high-dimensional input space using low- approximate the underlying functionality in data by dimensional space for representation. minimizing the loss function. The common loss function used for training and ANN is quadratic error V. DATA SET function. ANN is used for adaptation supervised learning. Database form a training set. During Data from the NASA’s Metric Data Program (MDP) training, specified items of data records are put as the data repository is made use of. The KC1 dataset used input of neural network and its weights are changed contains LOC measure, cyclomatic complexity, Base in such a way that its output would approximate the Halstead Measures, Derived Halstead measures from values in the data set. After finishing learning various software modules. process, the learned knowledge is represented by the values of neural network weights. For training, the The attributes used in this work is described briefly algorithm of back propagation of error is often used. below Input Hidden Output LOC_BLANK - The number of blank lines in a Layer Layers Layer module. x LOC_CODE_AND_COMMENT - The number of x w1j lines which contain both code & comment in a x1 module. LOC_COMMENTS - The number of lines of x1 x w2j comments in a module. CYCLOMATIC_COMPLEXITY - The cyclomatic Oj Wjk complexity of a module. xi wij DESIGN_COMPLEXITY - The design complexity x Ok of a module. wnj ESSENTIAL_COMPLEXITY - The essential complexity of a module. xn x LOC_EXECUTABLE - The number of lines of Fig. 1. Multilayaer Neural Network executable code for a module (not blank or comment) HALSTEAD_CONTENT - The Halstead length B) Kohenen’s self-organizing maps content of a module. Kohenen’s self-organizing maps (SOMs) have HALSTEAD_DIFFICULTY - The Halstead become a promising technique in cluster analysis. difficulty metric of a module. They are adapted by unsupervised learning. In data HALSTEAD_EFFORT - The Halstead effort metric mining, Kohenen’s self-organizing maps based of a module. cluster techniques have the following advantages HALSTEAD_ERROR_EST - The Halstead error over standard statistical methods. estimate metric of a module. HALSTEAD_LENGTH - The Halstead length metric DM typically deals with high-dimensional data. A of a module. record in a database typically consists of a large HALSTEAD_LEVEL - The Halstead level metric of number of items. The data do not have regular a module. multivariate distribution and thus the traditional HALSTEAD_PROG_TIME - The Halstead statistical methods have their limitations and they are programming time metric of a module. not effective. SOMs work with high-dimensional data HALSTEAD_VOLUME - The Halstead volume efficiently. metric of a module. NUM_OPERANDS - The number of operands contained in a module. 126 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 2, February 2011 NUM_OPERATORS - The number of operators contained in a module. Where y is the input and w is the weight. L2 NUM_UNIQUE_OPERANDS - The number of Criterion is used to compute the cost function unique operands contained in a module. desirable. The error computed to the supervised NUM_UNIQUE_OPERATORS - The number of learning procedure is the squared Euclidean distance unique operators contained in a module. between the network's output and the desired LOC_TOTAL - The total number of lines for a given response. module. 65 percent of the data was used as the training set and VI. PROPOSED METHODOLOGY & the remaining used as the test set. The classification EXPERIMENTAL INVESTIGATION accuracy obtained on KC1 dataset is 98.2%. The Multilayer Perceptron is an example of a The proposed fuzzy based neural model was able to supervised learning artificial neural network that is classify better than Random Tree by 14.66%, CART used extensively for the solution of a number of by 11.41% and Bayesian logistic regression by different problems, including classification, pattern 10.50%. However the proposed method needs to be recognition and interpolation. The algorithm for evaluated with other datasets to better test the Perceptron Learning is based on the back- performance in terms of consistency. propagation rule. The hidden layer typically consists of either sigmoid or tanh function. The algorithm for The results obtained by our proposed methodology is multi layer perceptron neural network is given below. improved over the regular multilayer perceptron model with sigmoidal hidden function by 3.92%. i. Present input and desired output Figure 2 displays the accuracy obtained by various Present input Yp = y0 ,y1 ,y2 ,...,yn-1 and target classification methods carried out. output Cp = c0 ,c1 ,...,cm-1 where n is the number of input nodes and m is the number of output nodes. 100 ii. Calculate the actual output 95 Each layer calculates the following: 90 fxpj = f [w0y0 + w1y1 + .... + wyn] 85 jThis is then passed to the next layer as an input. The 80 final layer outputs values opj. 75 70 iii. Adapts weights, starting from the output we CART MLP NN Random Tree Random Tree Proposed NN Regression Regression Preprocessing now work backwards. CART with Logistic Logistic Model wij(t+1) = wij(t) + ñþpjopj , where ñ is a gain term with and þpj is an error term for pattern p on node j. For output units þpj = kopj(1 - opj)(t - opj) FIG.2. CLASSIFICATION ACCURACY ON KC1 For hidden units DATA SET þpj = kopj(1 - opj)[(þp0wj0 + þp1wj1 + ....+þpkwjk)] where the sum is over the k nodes in the layer above CONCLUSION node j. In this paper, it has been observed that the proposed In this paper, a fuzzy bell hidden layer is proposed, Bell fuzzy based neural network model performs that uses a bell shaped curve as its fuzzy member in better than existing neural network model and other the hidden layer and is given by classification algorithms. Thus it can be very decisively said that Bell fuzzy function used in multi- perceptron neural network improves the classification accuracy of software defect prediction.. 127 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 2, February 2011 REFERENCES 2 Aparna Chaparala, is the  C. Yilmaz, C. Catal, O. Kalipsiz, & A. Porter, Associate Professor of the “Distributed Quality Assurance”. Proc.2nd Turkish department of computer science and National Symposium on Software Engineering, Ankara, Turkey, 2005, 189-198. engineering in R.V.R. & J.C. College of Engineering, Chowdavaram,  T. M. Khoshgoftaar & N. Seliya, “An Empirical Study Guntur. She has 9 years experience of Predicting Software Faults with Case Based in teaching. She completed her Reasoning”, Software Quality Journal, 14, 2006, 85- 111. M.Tech in Computer Sicence & Engineering. She is doing her research in the area of Data Mining.  M.V.P. Chandra Sekhara Rao ,Dr. B. Raveendra Babu, Presently pursuing Ph.D from J.N.T.U, Hyderabad. Dr. A Damodaram and B. Madhusudhanan, “ Business She has published 3 papers in international journals. Intelligence Model Using Data Mining Techniques for Code optimization in legacy systems”  M.V.P. Chandra Sekhara Rao ,Dr. B. Raveendra Babu, 3 Dr B. Raveendra Babu, obtained Dr. A Damodaram and Ch. Aparna “ Severity Based his Masters in Computer Science Code optimization : A Data Mining Approach” International Journal of Computer Science and and Engineering from Anna Engineering(IJCSE), Vol. 02, No. 05, 2010, 1754-1757. University, Chennai. He received his Ph.D. in Applied Mathematics at  N. Nagappan, T. Ball, B. Murphy, Using Historical S.V University, Tirupati. He is Data and Product Metrics for Early Estimation of Software Failures, In Proc. ISSRE 2006, Raleigh, NC, currently leading a Team as Director 2006. (Operations), M/s.Delta Technologies (P) Ltd., Madhapur, Hyderabad. He has 26 years of teaching  Sttefan Lessmann, (2008). Benchmarking Classification experience. He has more than 25 international & Models for Software Defect Prediction: A Proposed Framework and Novel Findings, IEEE national publications to his credit. His research areas TRANSACTIONS ON SOFTWARE Engineering, of interest include VLDB, Image Processing, Pattern 34(4), pp. 485-496. analysis and Wavelets.  M.H. Halstead, (1977). Elements of software Science. Elsevier. 4 Dr.A.Damodaram received B.Tech  NASA Metrics data Repository available: (CSE), M.Tech (CSE) from JNTU, www.mdp.ivv.nasa.gov Hyderabad & he did his Ph.D in  J.Han, M. Kamber, “Data Mining: Concepts and Image Processing area from JNTU, Techniques”, Harchort India Private Limited, 2001. Hyderabad. He has been serving JNTU since 1989. He is Professor in  H Lu, R Setiono, H Liu. Effective Data Mining Using Department of C.S.E and worked as Neural Network. IEEE Transactions on Knowledge and Data Engineering, 1996, 8(6): 957-961. Director, Vice-Principal, JNTU-UGC-Academic Staff College. He is presently working as Director  wilamowski, B.M. Neural Network Architectures and for Distance Education Learning, JNTU, Hyderabad. Learning Algorithms, IEEE Industrial Electronics He has published more than 30 research publications Magazine, Vol.3., Issue.4, pg. 56-63, 2009. in various National, International conferences, ABOUT THE AUTHORS proceedings and Journals. 1 M.V.P.Chandra Sekhara Rao, is the Associate Professor of the department of computer science and engineering in R.V.R. & J.C. College of Engineering, Chowdavaram, Guntur. He has 15 years experience in teaching. He completed his B.E and M.Tech in Computer Science & Engineering. He is doing his research in the area of Data Mining. Presently pursuing Ph.D from J.N.T.U, Hyderabad. He has published 3 papers in international journals and presented one paper in international conference. 128 http://sites.google.com/site/ijcsis/ ISSN 1947-5500
Pages to are hidden for
"An Improved Multiperceptron Neural Network Model To Classify Software Defects"Please download to view full document