Document Sample

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010 Extracting Membership Functions Using ACS Method via Multiple Minimum Supports Ehsan Vejdani Mahmoudi Masood Niazi Torshiz Mehrdad Jalali Islamic Azad University, Mashhad Department of computer Department of computer Branch, Young Researchers Club, Engineering, Islamic Azad Engineering, Islamic Azad Mashhad, Iran University - Mashhad Branch, University - Mashhad Branch, e.vejdani@mshdiau.ac.ir Mashhad, Iran Mashhad, Iran niazi@mshdiau.ac.ir jalali@mshdiau.ac.ir Abstract— Ant Colony Systems (ACS) have been successfully approach inspired from the behavior of social insects. Ants applied to different optimization issues in recent years. However, deposit their chemical trails called “Pheromone” on the ground only few works have been done by employing ACS method to for communicating with others. According to the pheromone, data mining. This paper addresses the lack of investigations on ants can find the shortest path between the source and the this study by proposing an ACS-based algorithm to extract destination. Recently, Ant Colony Systems (ACS) has been membership functions in fuzzy data mining. In this paper, the membership functions were encoded into binary bits, and then successfully applied to several difficult NP-hard problems, they have given to the ACS method to discover the optimum set such as the quadratic assignment [5], communication of membership functions. By considering this approach, a strategies [6], production sequencing problem [7]. Job comprehensive exploration can be executed to implement the Schedule Problem (JSP) [8], the traveling salesman problems system automation. Therefore, it is a new frontier, since the [9], [10], Vehicle Routing Problems (VRP) [11], etc. proposed model does not require any user-specified threshold of minimum support. Hence, we evaluated our approach Basically, fuzzy mining algorithms first used membership experimentally and could reveal this approach by significant functions to transform each quantitative value into a fuzzy set improving of membership functions. in linguistic terms and then used a fuzzy mining process to find fuzzy association rules. Items have their own characteristics, Keywords- fuzzy data mining; multiple minimum supports; different minimum supports specified for different items. Han, association rule; membership functions; ant colony system. Wang, Lu, and Tzvetkov [12] have pointed out that setting the minimum support is quite subtle, which can hinder the I. INTRODUCTION widespread applications of these algorithms. Our own Recently, the fuzzy set theory has been used more and more experiences of mining transaction databases also tell us that the frequently in intelligent systems because of its simplicity and setting is by no means an easy task. Therefore, our approach similarity to human reasoning [1]. As to fuzzy data mining, proposed method for computing minimum supports for each Hong and Kuo proposed a mining approach that integrated item in database with own features. This approach leads to fuzzy-set concepts with the Apriori mining algorithm [2]. effectiveness, efficiency for global search and system automation, because our model does not require the user ACO is a branch of a larger field referred to as Swarm specified threshold of minimum support. Intelligence (SI). SI is the property of a system whereby the collective behaviors of simple agents interacting locally with Numerical experiments on the proposed algorithm are also their environment cause coherent functional global patterns to performed to show its effectiveness. The remaining parts of the emerge [3]. It is the behavioral simulation of social insects paper are organized as follows. Section II presents An ACS- such as bees, ants, wasps and termites. This behavioral based mining framework. The proposed algorithm based on the simulation came about for many reasons—optimization of above framework is described in Section III. Numerical simulations are shown in Section IV. Conclusions are given in systems and learning about self-organization are two of many Section V. reasons why scientists are interested in simulating these insects. More specifically, ACO simulates the collective II. THE ACS- BASED FUZZY MINING FRAMEWORK foraging habits of ants—ants venturing out for food, and In this section, the ACS based fuzzy mining framework bringing their discovered food back to the nest. Ants have [13] is shown in Fig. 1 where each item has its own poor vision and poor communication skills, and a single ant membership function set .These membership function sets are faces a poor probability of longevity. However, a large group, then fed into the ant colony system to search for the final or swarm, of ants can collectively perform complex tasks with proper sets .When the termination condition is reached, the best proven effectiveness, such as gathering food, sorting corpses membership function set (with the highest fitness value) can or performing division of labor [4]. They are a heuristic then be used to mine fuzzy association rules from a database. 330 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010 The proposed framework modified the ACS-based process. In this work, we use the fitness function proposed by Framework for Fuzzy Data Mining in [13]. The framework is Chen et al. [16] to obtain a good set of membership functions. divided into two phases. The first phase searches for an appropriate set of membership functions for the items by the B. ACS-based fuzzy data mining algorithm ACS mining algorithm. Having searched for the solutions in Although, the proposed algorithm as considered in [15] and the first phase, we use the best membership functions for fuzzy [16] concerns one constant minimum support for all items, we data mining in the second phase. applied the determined minimum support for each item. As a matter of fact, in real world applications such as work on The ACS algorithm plays an important role in extracting transactional data of chain stores, the items have different the membership functions. In the past, Parpinelli et al. proposed quantities. Hence, using different minimum supports for each the AntMiner to discover association rules [14]. They worked item in order to extracting membership functions is an efficient on categorical attributes and discrete values. They proved that idea. However, the previous ones that user specified minimum the ACS algorithm performed well on handling discrete values supports, the new approach proposes the minimum supports are in a solution space. In this work, we assume the parameters of achieved by a preprocessing on all items. On the other hand, membership functions as discrete values and thus try to use the minimum support for each item is automatically set as a value ACS algorithm to find them. We transform the extraction of correspond with the quantity of the item. We considered a membership functions into a route-search problem. A route method for computing minimum support for each item with its then represents a possible set of membership functions. The characteristics in databases. There are significant criteria for artificial ants, which refer to virtual ants that are used to solve computing minimum support like, the number that each item this problem, can then be used to find a nearly optimal solution. happened in database and sum of values for each item in database. For example, suppose the number that item A happened in database is 10 and sum values is 20 and also the number that item B happened in database is 2 and sum values is 20. Clearly in mining process item A valuable than item B. We computing minimum support for item B until this item can`t satisfying minimum support. As mentioned above, we suggested in (1) as below: ∑ min _Sup(I ) = ∗ ∗ (1) Let I = {i1, i2, ..., im} be a set of items and D = {t1, t2, ... , tn} be a set of transactions. N is total number of transaction data. T is the number that each item happened in database. Si is sum values of an item in database D. P is constant digit with respect to the interval [0, 1]. In addition, as we investigated the parameters defined in [13], the following parameters performed: The number of artificial ants, the minimum pheromone ratio of an ant, the evaporation ratio of pheromone, the local updating ratio, and the global updating ratio. The proposed ACS-based algorithm for mining membership functions and fuzzy association rules Figure 1. The ACS-based framework for fuzzy data mining are given as follow. INPUT : III. THE ACS_BASED FUZZY DATA MINING ALGORITHM a) quantitative transaction data, A. Initializations b) a set of m items, which is with l predefined linguistic As revealing membership functions of all items result in a terms, long code, we will encode the membership function of each c) a maximum number of iterations G, item into a binary code. We use the coding algorithm which was represented in [13]. Furthermore, we utilize some rules d) P is constant digit with respect to the interval [0, 1]. called State transition rule, Pheromone updating rule, Local OUTPUT: An appropriate set of membership functions for updating rule, Global updating rule which were defined in [15]. all items in fuzzy data mining. In this work, each item will have a set of isosceles- step 1) Let = 1 , where is used to keep the identity triangular membership functions. The membership function number of the items to be processed. stands for the linguistic terms such as low, middle, high. step 2) Let the multi- stage graph for the fuzzy mining Transforming these quantitative values into linguistic terms problem be ( , ), where is the set of nodes and is the set requires a feasible population of database. Therefore, we need of edges. Also denote the j- node in the i- th stage as , and to initialize and update a population during the evolution 331 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010 the edge from to ( ) as . Initially set the IV. NUMERICAL SIMULATION pheromone on every edge as 0.5. We experimentally evaluated our approach to expose the step 3) Let the initial generation = 1. performance of the proposed algorithm. The experiments were step 4) Sets up the complete route for each artificial ant implemented in C/C++ on a computer with Intel Core(TM) 2 by the following sub steps. Duo Processor 2.66GHz and 4 GB main memory, running the a) Selects the edges from start to end according to the Microsoft Windows 7 operating system. We used two datasets state transaction rule. to present results: Dataset [13] with a total of 64 items and 10,000 transactions. In addition, a real dataset called b) Update the pheromone of the edges passed through FOODMART from an anonymous chain store was used in the by according to the local updating rule. experiments [17]. The FOODMART dataset contained step 5) Evaluate the fitness value of the solution quantitative transactions about the products sold in the chain (membership functions) obtained by each artificial ant store. There were totally 21,556 transactions with 1600 items in according to the following sub steps. the dataset Used in the experiments. The initial count of ants was set at 10. The parameters in the ACS algorithm were set as a) For each transaction datum , = 1 to n, transfer follows: the initial ratio of pheromone was 0.05, the minimum its quantitative value for item into a fuzzy set pheromone of ants was 0.2, the evaporation ration was 0.9, the according to the membership functions obtained from the ant local updating ratio was 0.1 and the global updating ratio was in (2). That is, is represented as : 0.9, minimum support for FOODMART dataset was set to 0.0015 and for dataset [13] was set to 0.04. We considered the + +⋯+ +⋯+ (2) value of constant P, as mentioned in (1) for FOODMART dataset equal to 0.05 and for dataset [13] equal to 0.02. The average fitness values of the artificial ants along with Where Region is the k-th fuzzy term of itemI , f different numbers of generations for two datasets are shown in is v ’s fuzzy membership value in the region, and l is the Fig. 2 and Fig. 3. number of fuzzy membership functions. b) The scalar cardinality of each region in the ACS with constant minimum support transactions is calculated in (3): ACS with multiple minimum supports () 2.5 =∑ (3) Avrage fitness values 2 () Where f is the fuzzy membership value of region R from the i-th datum. 1.5 c) Check for each whether its / is larger 1 than or equal to the minimum support threshold . If satisfies the above condition, put it in the set of large 1 - 0.5 itemsets ( ). d) Calculate the fitness value of the solution from the 0 ant by dividing the number of large itemsets in over the suitability. That is Equation(4), 0 1000 2000 3000 4000 Generations | | = (4) Figure 2. The average fitness values along with different numbers of generations with dataset [13] step 6) Once all the artificial ants find their entire routes, It can be vividly seen from Fig. 2 and Fig. 3 that in our the one holding the highest fitness value will be used to update approach, the average fitness values increased by an offset the pheromone according to the global updating rule. compared with the previous one. Thus, became stable within step 7) If the generation g is equal to G, output the current less number of generations. In addition, we used smaller best set of membership functions of item I for fuzzy data numbers for generation with the aim of comparing the p mining; otherwise, g =g +1 and go to s 4. difference between our model and the existing one that has step 8) If p ≠ m, set p =p +1 and go to Step 2 for another static constant minimum support in Fig. 4. It is obviously item; otherwise, stop the algorithm. represents that our model achieved the best fitness at 300 numbers of generations, whereas the existing one reached its The final set of membership functions output in step 7 and best fitness at 500 numbers of generations. the 1-itemsets obtained are then used to mine fuzzy association rules from the given database. 332 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010 ACS with constant minimum support ACS with constant minimum support ACS with multiple minimum supports ACS with multiple minimum supports 2.5 2.5 Avrage fitness values 2 Avrage fitness values 2 1.5 1.5 1 1 0.5 0.5 0 0 0 1000 2000 3000 4000 Generations 0 200 400 600 800 1000 Figure 3. The average fitness values along with different numbers of Generations generations with FOODMART dataset Figure 5. The average fitness values along with different numbers of generations with FOODMART dataset (in smaller scale) ACS with constant minimum support ACS with multiple minimum supports ACS with constant minimum support 2.5 ACS with multiple minimum supports Avrage fitness values 2 140 Number of Large 1-itemsets 120 1.5 100 1 80 0.5 60 40 0 20 0 200 400 600 800 1000 Generations 0 0 1000 2000 3000 4000 Figure 4. The average fitness values along with different numbers of generations with dataset [13] (in smaller scale) Generations As shown in Fig.5, the result of executing ACS algorithm Figure 6. The numbers of large 1-itemsets along with different numbers of with multiple minimum supports on FOODMART dataset is generations with dataset [13] much better than ACS algorithm with constant minimum support since it has higher average of fitness values. The Fig.7 illustrates the numbers of large 1-itemsets along with number of items in FOODMART dataset is too many. different generations for FOODMART dataset. The proposed Therefore artificial ants have been through difficulty for ACS algorithm could increase large 1-itemsets in interval 50 to optimizing membership functions. Meanwhile, ACS algorithm 500 generations, and stabilize after about 500 generations while with multiple minimum supports could easily pass this test, and the existing method with increasing generation had no changes, extracting membership functions with high average of fitness since the existing algorithm cannot work with FOODMART values. dataset which have a lot of items. The numbers of large 1-itemsets along with different generations are shown in Fig. 6. The curve of the existing method stabilized after about three thousand generations while the curve of our approach remained constant after one thousand generations. Besides, the number of large 1-itemsets of our approach is clearly much higher. 333 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010 ACS with constant minimum support ACS with constant minimum support ACS with multiple minimum supports ACS with multiple minimum supports 3500 35000 Number of large 1-itemsets 3000 30000 25000 Time (second) 2500 2000 20000 1500 15000 1000 10000 500 5000 0 0 0 1000 2000 3000 4000 0 1000 2000 3000 4000 Generations Generatons Figure 7. The numbers of large 1-itemsets along with different numbers of Figure 9. The execution time of the ACS mining algorithm with FOODMART generations with FOODMART dataset dataset Fig. 8 and Fig. 9 reveal the execution time of the ACS algorithms for different numbers of generations. Although, ACS with constant minimum support execution time increased along with the generations within both line graphs. Therefore, our approach represents the same ACS with multiple minimum supports execution time for smaller number of generations, but increases 2.5 for high number of generations, slightly. Avrage fitnees values 2 ACS with constant minimum support 1.5 ACS with multiple minimum support 1 8000 7000 0.5 6000 Time (second) 0 5000 0% 20% 40% 60% 80% 100% 4000 Size of dataset 3000 Figure 10. The average fitness values along with different size of dataset [13] 2000 1000 ACS with constant minimum support 0 0 1000 2000 3000 4000 ACS with multiple minimum supports Generations 2.5 Avrage fitness values Figure 8. The execution time of the ACS mining algorithm with dataset [13] 2 In the following study, we expressed the ACS algorithms efficiency with scalability test on two datasets. The generation 1.5 parameter among execution of algorithms is considered with constant value of 500. The average of fitness values of the 1 artificial ants along with different size of dataset [13] is shown in Fig. 10. By increasing the size of dataset, the accurate 0.5 membership functions are extracted, and the artificial ant can learn more and find proper solutions. While in existing 0 algorithm with increasing the size of dataset has no changes. 0% 20% 40% 60% 80% 100% Fig. 11 which is executed on FOODMART dataset, is as the Size of dataset same as Fig.10 mentioned before. Figure 11. The average fitness values along with different size of FOODMART dataset 334 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010 The large 1-itemsets of the artificial ants along with different size of datasets is shown in Fig. 12 and Fig. 13. By ACS with constant minimum support increasing the size of dataset, the number of large 1-itemsets ACS with multiple minimum supports increased as well. However, at the first existing algorithm had high values. Nevertheless, the ACS with constant minimum 1000 support had remained steady. 900 800 Time (second) 700 ACS with constant minimum support 600 500 ACS with multiple minimum supports 400 Number of large 1-itemsets 300 140 200 120 100 0 100 0% 20% 40% 60% 80% 100% 80 Size of dataset 60 40 Figure 14. The execution time of the ACS mining algorithm along with different size of dataset [13] 20 0 ACS with constant minimum support 0% 20% 40% 60% 80% 100% Size of dataset ACS with multiple minimum supports 4000 Figure 12. The numbers of large 1-itemsets along with different size of dataset [13] 3600 Time (second) ACS with constant minimum support 3200 ACS with multiple minimum supports 3500 2800 Number of large 1-itemsets 3000 2400 2500 2000 2000 0% 20% 40% 60% 80% 100% 1500 Size of dataset 1000 Figure 15. The execution time of the ACS mining algorithm along with 500 different size of FOODMART dataset 0 V. CONCLUSIONS 0% 20% 40% 60% 80% 100% Size of dataset In this paper, we could seek for the issues of applying the ACS algorithm to extract membership functions for fuzzy data Figure 13. The numbers of large 1-itemsets along with different size of mining and have proposed an algorithm to address this aim. As FOODMART dataset a matter of fact, in this approach we could deliver two benefits including the usage of multiple minimum supports, and system Fig. 14 and Fig. 15 reveal the execution time of the ACS automation. On the other hand, computation results illustrated algorithms for different size of datasets. As can be observed in our work can be given as an alternative for effective association Fig. 14 and 15, the execution time of both algorithms is nearly rule mining. equal. It therefore proves that proposed algorithm does not Meanwhile, the most significant difference between our increase the execution time as well as improving efficiency algorithm and older ACS algorithms to extract membership encourages us to employ the proposed algorithm for functions concerns the independency of minimum support extracting membership functions. threshold. The experimental results of this new approach encouraged us to improve the system and utilize this strategy in real world applications, magnificently. 335 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010 REFERENCES [1] Kandel, "Fuzzy expert systems," CRC Press, pp. 8- 19, 1992. [2] T. P. Hong, et al., "Trade-off between time complexity and number of rules for fuzzy mining from quantitative data," International Journal of Uncertainty Fuzziness and Knowledge-Based Systems, vol. 9, pp. 587- 604, 2001. [3] E. Bonabeau, et al. (1999). Swarm intelligence from natural to artificial isystems.Available:http://link.library.utoronto.ca/eir/EIRdetail.cfm?Reso urces__ID=432271&T=F [4] W. B. Tao, et al., "Object segmentation using ant colony optimization algorithm and fuzzy entropy," Pattern Recognition Letters, vol. 28, pp. 788-796, May 1 2007. [5] V. Maniezzo and A. Colorni, "The ant system applied to the quadratic assignment problem," Ieee Transactions on Knowledge and Data Engineering, vol. 11, pp. 769-778, Sep-Oct 1999. [6] M. Dorigo, et al., "Ant system: Optimization by a colony of cooperating agents," Ieee Transactions on Systems Man and Cybernetics Part B- Cybernetics, vol. 26, pp. 29-41, Feb 1996. [7] P. R. McMullen, "An ant colony optimization approach to addressing a JIT sequencing problem with multiple objectives," Artificial Intelligence in Engineering, vol. 15, pp. 309-317, Jul 2001. [8] Colorni, et al., "Ant system for job-shop scheduling," Belgian Journal of Operations Research, Statistics and Computer Science, vol. 34, pp. 39- 53, 1994. [9] S. C. Chu, et al., "Ant colony system with communication strategies," Information Sciences, vol. 167, pp. 63-76, Dec 2 2004. [10] M. Dorigo and L. M. Gambardella, "Ant colony system: a cooperative learning approach to the traveling salesman problem," IEEE Transactions on Evolutionary Computation, vol. 1, pp. 53- 66, 1997. [11] A.Wade and S. Shalhi, "An ant system algorithm for the mixed vehicle routing problem with backhauls," in Metaheuristics: Computer Decision-Making, Norwell, MA: Kluwer, pp. 699- 719, 2004. [12] J. Han, et al., "Mining top-k frequent closed patterns without minimum support," In Proceedings of the 2002 IEEE international conference on data mining, pp. 211- 218, 2002. [13] T. P. Hong, et al., "An ACS-based framework for fuzzy data mining," Expert Systems with Applications, vol. 36, pp. 11844-11852, Nov 2009. [14] R. S. Parpinelli, et al., "An Ant Colony Based System for Data Mining: Application to Medical Data," The Genetic and Evolutionary Computation Conference, pp. 791- 798, 2001. [15] T. P. Hong, et al., "Extracting membership functions in fuzzy data mining by Ant Colony Systems," Proceedings of 2008 International Conference on Machine Learning and Cybernetics, Vols 1-7, pp. 3979- 3984, 2008. [16] C. H. Chen, et al., "Cluster-based evaluation in fuzzy-genetic data mining," IEEE Transactions on Fuzzy Systems, vol. 16, pp. 249-262, Feb 2008. [17] "Microsoft Corporation. Example Database FoodMart of Microsoft Analysis Services." 336 http://sites.google.com/site/ijcsis/ ISSN 1947-5500

DOCUMENT INFO

Shared By:

Categories:

Tags:
IJCSIS, call for paper, journal computer science, research, google scholar, IEEE, Scirus, download, ArXiV, library, information security, internet, peer review, scribd, docstoc, cornell university, archive, Journal of Computing, DOAJ, Open Access, December 2010, Volume 8, No. 9, Impact Factor, engineering, international, proQuest, computing, computer, technology, fuzzy data mining, multiple minimum supports, association rule, membership functions, ant colony system

Stats:

views: | 82 |

posted: | 1/20/2011 |

language: | English |

pages: | 7 |

Description:
The International Journal of Computer Science and Information Security (IJCSIS) is a well-established publication venue on novel research in computer science and information security. The year 2010 has been very eventful and encouraging for all IJCSIS authors/researchers and IJCSIS technical committee, as we see more and more interest in IJCSIS research publications. IJCSIS is now empowered by over thousands of academics, researchers, authors/reviewers/students and research organizations. Reaching this milestone would not have been possible without the support, feedback, and continuous engagement of our authors and reviewers.
Field coverage includes: security infrastructures, network security: Internet security, content protection, cryptography, steganography and formal methods in information security; multimedia systems, software, information systems, intelligent systems, web services, data mining, wireless communication, networking and technologies, innovation technology and management. ( See monthly Call for Papers)
We are grateful to our reviewers for providing valuable comments. IJCSIS December 2010 issue (Vol. 8, No. 9) has paper acceptance rate of nearly 35%.
We wish everyone a successful scientific research year on 2011.
Available at http://sites.google.com/site/ijcsis/
IJCSIS Vol. 8, No. 9, December 2010 Edition
ISSN 1947-5500 � IJCSIS, USA.

OTHER DOCS BY ijcsiseditor

Feel free to Contact Us with any questions you might have.