Document Sample

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No.8, November 2010 Feed Forward Neural Network Algorithm for Frequent Patterns Mining Amit Bhagat1 Department of Computer Applications Dr. Sanjay Sharma2 Associate Prof. Deptt. of Computer Applications Dr. K.R.Pardasani3 Professor Deptt. of Mathematics Maulana Azad National Institute of Technology, Bhopal (M.P.)462051, India 1 am.bhagat@gmail.com, 2ssharma66@rediffmail.com,3 kamalrajp@rediffmail.com Abstract: Association rule mining is used to find relationships database many times, so the computational cost is high. In among items in large data sets. Frequent patterns mining is an order to overcome the disadvantages of Apriori algorithm and important aspect in association rule mining. In this paper, an efficiently mine association rules without generating candidate efficient algorithm named Apriori-Feed Forward(AFF) based on itemsets, many authors developed some improved algorithms Apriori algorithm and the Feed Forward Neural Network is and obtained some promising results [9,10, 11, 12, 13]. presented to mine frequent patterns. Apriori algorithm scans database many times to generate frequent itemsets whereas Recently, there are some growing interests in developing Apriori-Feed Forward(AFF) algorithm scans database Only techniques for mining association patterns without a support Once. Computational results show the Apriori-Feed constraint or with variable supports [14, 15, 16].Association Forward(AFF) algorithm performs much faster than Apriori rule mining among rare items is also discussed in [17,18]. So algorithm. far, there are very few papers that discuss how to combine Apriori algorithm and Neural Network to mine association Keywords: Association rule mining, dataset scan, frequent rules. In this paper, an efficient algorithm named Apriori- itemsets, Neural Network.. Feed Forward(AFF) based on Apriori algorithm and Feed Forward Neural Network is proposed, this algorithm can I. INTRODUCTION efficiently combine the advantages of Apriori algorithm and Data mining has recently attracted considerable attention Structure of Neural Network. Computational results verify the from database practitioners and researchers because it has good performance of the Apriori-Feed Forward(AFF) been applied to many fields such as market strategy, financial algorithm. The organization of this paper is as follows. In forecasts and decision support [1]. Many algorithms have been Section II, we will briefly review the Apriori method and Feed proposed to obtain useful and invaluable information from Forward Neural Network method. Section III proposes an huge databases [2]. One of the most important algorithms is efficient Apriori-Feed Forward(AFF) algorithm that based on mining association rules, which was first introduced in [3, Apriori and the Feed Forward(AFF) structure. Experimental 4].Association rule mining has many important applications in results will be presented in Section IV. Section V gives out the our life. An association rule is of the form X => Y. And each conclusions. rule has two measurements: support and confidence. The association rule mining problem is to find rules that satisfy user-specified minimum support and minimum confidence. It II. CLASSICAL MINING ALGORITHM AND NEURAL NETWORK mainly includes two steps: first, find all frequent patterns; second, generate association rules through frequent patterns. A. Apriori Algorithm Many algorithms for mining association rules from In [4], Agrawal proposed an algorithm called Apriori to the transactions database have been proposed [5, 6, 7]since problem of mining association rules first. Apriori algorithm is Apriori algorithm was first presented. However, most a bottm-up, breadth-first approach. The frequent itemsets are algorithms were based on Apriori algorithm which generated extended one item at a time.Its main idea is to generate k-th and tested candidate itemsets iteratively. This may scan candidate itemsets from the (k-1)-th frequent itemsets and to 201 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No.8, November 2010 find the k-th frequent itemsets from the k-th candidate structure. Through training data mining, the neural network itemsets.The algorithm terminates when frequent itemsets can method gradually calculates (including repeated iteration or not be extended any more. But it has to generate a large cumulative calculation) the weights the neural network amount of candidate itemsets and scans the data set as many connected. The neural network model can be broadly divided times as the length of the longest frequent itemsets. Apriori into the following three types: algorithm can be written by pseudocode as follows. (1) Feed-forward networks: it regards the perception back- Procedure Apriori, Input: data set D, minimum support propagation model and the function network as minsup, Output: frequent itemsets L representatives, and mainly used in the areas such as (1) 1 L = find_frequent_1_itemsets(D); prediction and pattern recognition; (2) for (k = 2; Lk 1 −≠ ф; k++) (2) Feedback network: it regards Hopfield discrete model and (3) { continuous model as representatives, and mainly used for (4) Ck = Apriori_gen( Lk −1 , minsup); associative memory and optimization calculation; (5) for each transactions t _ D (3) Self-organization networks: it regards adaptive resonance (6) { theory (ART) model and Kohonen model as representatives, (7) Ct = subset( Ck , t); and mainly used for cluster analysis. (8) for each candidate c _Ct (9) c.count++; D. Feedforward Neural Network : (10) } Feedforward neural network (FF network) are the most (11) Lk = {c _Ck | c.count > minsup}; popular and most widely used models in many practical (12) } applications. They are known by many different names, such (13) return L = { L1 _ L2 _ ... _ Ln }; as "multi-layer perceptrons." In the above pseudocode, Ck means k-th candidate itemsets Figure 2(a) illustrates a one-hidden-layer FF network with and Lk means k-th frequent itemsets. inputs ,..., and output . Each arrow in the figure B. Neural Network symbolizes a parameter in the network. The network is divided Neural network[19,20] is a parallel processing network into layers. The input layer consists of just the inputs to the which generated with simulating the image intuitive thinking network. Then follows a hidden layer, which consists of any of human, on the basis of the research of biological neural number of neurons, or hidden units placed in parallel. Each network, according to the features of biological neurons and neuron performs a weighted summation of the inputs, which neural network and by simplifying, summarizing and refining. then passes a nonlinear activation function , also called the It uses the idea of non-linear mapping, the method of parallel neuron function. processing and the structure of the neural network itself to express the associated knowledge of input and output. Initially, the application of the neural network in data mining was not optimistic, and the main reasons are that the neural network has the defects of complex structure, poor interpretability and long training time. But its advantages such as high affordability to the noise data and low error rate, the continuously advancing and optimization of various network training algorithms, especially the continuously advancing and Figure 2(a) A feedforward network with one hidden layer and improvement of various network pruning algorithms and rules one output. extracting algorithm, make the application of the neural network in the data mining increasingly favored by the overwhelming majority of users. Mathematically the functionality of a hidden neuron is described by C. Neural Network Method in Data Mining There are seven common methods and techniques of data mining[21,22,23] which are the methods of statistical analysis, where the weights { , } are symbolized with the arrows rough set, covering positive and rejecting inverse cases, formula found, fuzzy method, as well as visualization feeding into the neuron.The network output is formed by technology. Here, we focus on neural network method. Neural another weighted summation of the outputs of the neurons in network method is used for classification, clustering, feature the hidden layer. This summation on the output is called the mining, prediction and pattern recognition. It imitates the output layer. In Figure 2(a) there is only one output in the neurons structure of animals, bases on the M-P model and Hebb learning rule, so in essence it is a distributed matrix output layer since it is a single-output problem. Generally, the 202 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No.8, November 2010 number of output neurons equals the number of outputs of the Figure 2(b) approximation problem. The neurons in the hidden layer of the III. APRIORI- FEEDFORWARD ALGORITHM(AFF) network in Figure 2(a) are similar in structure to those of the In this Section, a new algorithm based on Apriori and the perceptron, with the exception that their activation functions Feedforward Neural Network structure is presented, which is called Apriori- Feedforward Algorithm(AFF).Fig.3a shows the can be any differential function. The output of this network is database structure in which there are different Item Ids and given by sets of Items purchased against Item ID and Fig 3(a) Item_ID Items 001 I1, I2, I3 002 I1, I3, I4 where n is the number of inputs and nh is the number of 003 I2, I4 neurons in the hidden layer. The variables { , , , } are 004 I1, I2 the parameters of the network model that are represented 005 I1, I2, I3, I5 collectively by the parameter vector . In general, the neural ---- ------ network model will be represented by the compact notation Fig 3(a): Item Id and Item List of Database g( ,x) whenever the exact structure of the neural network is not necessary in the context of a discussion. Note that the size of the input and output layers are defined by the number of inputs I1 f1 I1I2 f12 and outputs of the network and, therefore, only the number of hidden neurons has to be specified when the network is I2 f2 I1I3 f13 defined. The network in Figure 2(a) is sometimes referred to as a three-layer network, counting input, hidden, and output I3 f3 layers. However, since no processing takes place in the input layer, it is also sometimes called a two-layer network. To avoid confusion this network is called a one-hidden-layer FF network throughout this documentation. FIG 3(B): Data Structure of Nodes in FeedForward Neural Network In training the network, its parameters are adjusted incrementally until the training data satisfy the desired The Apriori- Feedforward algorithm mainly includes two mapping as well as possible; that is, until ( ) matches the steps. First, a neural network model is prepared according to the desired output y as closely as possible up to a maximum maximum number of items present in the dataset. Then first number of iterations. transaction of the data set is scanned to find out the frequent 1 itemsets, and then neural network is updated for frequent 2 The nonlinear activation function in the neuron is usually itemsets frequent 3 itemsets and so on. The data set is scanned chosen to be a smooth step function. The default is the only once to build all frequent combinations of datasets. standard sigmoid While updating frequent 2/frequent 3 itemsets…, its pruning is done at the same time to avoid redundancy of item sets. At last, the built Neural Network is mined by Apriori- FeedForward Algorithm. The detailed Apriori-FeedForward that looks like this. Algorithm is as follows. Procedure : Create_Model Input: data set D, minimum support minsup Output: (1) procedure Create_Model(n) (2) for(i=1;i ≠ ; i++) (3) for each itemset l1 ∈ lk-1 (4) for each itemset l2 ∈ lk-1 (5) if( l1[1] = l2[1]) ( l1[2] = l2[2]) ( l1[3] = l2[3]) …... ( l1[n] = l2[n]) 203 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No.8, November 2010 (6) then (7) C = l1 x l2 (8) if already_generated(l1 x l2) then (9) delete C (10) else add C to Ck (11) FNN(Ck) Procedure FNN(Ck) Input: itemset Model Ck Output: frequent itemsets L (1) procedure FNN(Ck) (2) n = recordcount(Dataset) Fig.4(b). (3) for(i=1; i<n ; i++) From Fig.4(a), 4(b). we can make the following two (4) { statements. First, Apriori- FeedForward algorithm works much (5) L1 = get_first_transaction(Dataset) faster than Apriori. It uses a different method FNN to calculate (6) upf = update_frequecy(l1 x l2) the support of candidate itemsets and it consumes less memory (7) if (upf >= min_sup) than Apriori because it doesn’t need to traverse database again (8) print(l1 x l2) and again. It needs only single scan to the database. In this algorithm a complete Feed forward neural network is prepared according to the maximum number of items present V. CONCLUSION AND FUTURE WORK in the datasets. First layer of the network is a frequent 1 In this paper, we have proposed the Apriori- FeedForward itemsets second layer is frequent 2 item sets and so on until a algorithm. This method builds Feed Forward Neural Network final layer is prepared which is a single node comprising of all Model and scans the data base only once to generate frequent items present in the datasets. Every n+1 layer is combination patterns. The future work is to further improve the Apriori- of item n with respect to all other items present at that layer, Feed Forward algorithm and test more and larger datasets. these layers are generated by calculating factorial of n+1 items. . IV. EXPERIMENTAL RESULTS The content of our test data set are frequently purchased items REFERENCES of a super market. There are 7 to 12 different items and 10000 [1] M.S. Chen, J. Han, P.S. Yu, “Data mining: an overview from a database to 50000 records in that data set. In order to verify the perspective”, IEEE Transactions on Knowledge and Data Engineering, 1996, 8, pp. 866-883. performance of the Apriori - FeedForward algorithm, we [2] J. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan compare Apriori-Feed Forwrd with Apriori. The algorithms Kaufmann Publisher, San Francisco,CA, USA, 2001. are performed on a computer with i7 processor 1.60GHz and 4 [3] R.Agrawal, T.Imielinski and A.Swami, “Mining association rules GB memory. The program is developed by NetBeans 6.8. The between sets of items in large databases,in: Proceedings of the Association for Computing Machinery, ACM-SIGMOD, 1993, 5, pp.207- computational results of two algorithms are reported in Table 216. 1.The clearer comparison of two algorithms is given in [4] R. Agrawal, R. Srikant, “Fast algorithms for mining association rules”, Fig.4(a).Table 1. The running time of two Proceedings of the 20th Very Large DataBases Conference (VLDB’94), Santiago de Chile, Chile, 1994, pp. 487-499. [5] Agrawal, R., Srikant, R., & Vu, Q, “Mining association rules with item algorithms Apriori - FeedForward algorithm constraints”, In The third international conference on knowledge discovery in databases and data mining, Newport Beach, California, Min.Supp Apriori AFF 1997, pp. 67-73. [6] J.Han, Y. Fu, “Discovery of multiple-level association rules from large 30% 15000ms 1762ms database”, In The twenty-first international conference on very large 25% 15000ms 1545ms data bases, Zurich, Switzerland, 1995, pp. 420-431. 20% 15000ms 1529ms [7] Fukuda, T., Morimoto, Y., Morishita, S., & Tokuyama, T.,“Mining 15% 19000ms 1682 ms optimized association rules for numeric attributes”,In The ACM SIGACT-SIGMOD-SIGART symposium on principles of database 10% 20000ms 1634 ms systems, 1996, pp. 182-191. 5% 30000ms 1625ms [8] Park, J. S., Chen, M. S., & Yu, P. S., “Using a hash-based method with transaction trimming for mining association rules”, IEEE Transactions on Knowledge and Data Engineering, 1997, 9(5), pp. 812-825. Fig.4(a).Table 1. [9] J.Han, J.Pei and Y.Yin., “Mining frequent patterns without candidate Generation”, in: Proceeding of ACM SIGMOD International Conference Management of Data, 2000, pp. 1-12. 204 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No.8, November 2010 [10] J.Han, J.Wang, Y.Lu and P.Tzvetkov, “Mining top-k frequent closed [19] Anderson, J. A., 1995, Introduction to Neural Networks (Cambridge, patterns without minimum support”, in:Preceeding of International MA:MIT Press). Conference Data Mining,2002,12, pp. 211-218. [20] Van Hulle, M. M., 2000, Faithful Representations and Topographic [11] G.Liu, H.Lu, J.X.Yu, W.Wei and X.Xiao, “AFOPT: An efficient Maps:From Distortion-to-Information-Based Self Organization (New implementation of pattern growth approach”, in:IEEE ICDM Workshop York:Wiley). Frequent Itemset Mining Implementations, CEUR Workshop Proc., [21] Cristofor, L., Simovici, D., Generating an informative cover for 2003, 80. association rules. In Proc. of the IEEE International Conference on Data [12] J.Wang, J.Han, and J.Pei, “CLOSET+: searching for the best strategies Mining, 2002. for mining frequent closed Itemsets”, in:Preceeding of International [22] Yuan, Y., Huang, T., A Matrix Algorithm for Mining Association Rules, Conference, Knowledge Discovery and Data Mining, 2003, 8, pp. 236- Lecture Notes in Computer Science, Volume 3644, Sep 2005, Pages 370 245. – 379 [13] Tzung-Pei Hong, Chun-Wei Lin, Yu-Lung Wu,“Incrementally fast [23] Sotiris Kotsiantis, Dimitris Kanellopoulos,Association Rules Mining: A updated frequent pattern trees”, Expert Systems with Applications, 2008, Recent Overview, GESTS International Transactions on Computer 34, pp. 2424-2435. Science and Engineering, Vol.32 (1), 2006, pp. 71-82 [14] K. Wang, Y. He, D. Cheung, Y. Chin, “Mining confident rules without support requirement”, in: Proceedings of ACM International Conference on Information and Knowledge Management, CIKM, 2001, pp.89-96. [15] H. Xiong, P. Tan, V. Kumar, “Mining strong affinity association patterns in data sets with skewed support distribution”, in: Proceedings of the Third IEEE International Conference on Data Mining, ICDM, 2003, pp. 387-394. [16] Ya-Han Hu, Yen-Liang Chen, “Mining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism”, Decision Support Systems, 2006, 42, pp. 1-24. [17] J. Ding, “Efficient association rule mining among infrequent items”, Ph.D. Thesis, University of Illinois at Chicago, 2005. [18] Ling Zhou, Stephen Yau, “Efficient association rule mining among both frequent and infrequent items”,Computers and Mathematics with Applications, 2007, 54, pp. 737-749. 205 http://sites.google.com/site/ijcsis/ ISSN 1947-5500

DOCUMENT INFO

Shared By:

Categories:

Tags:
IJCSIS, call for paper, journal computer science, research, google scholar, IEEE, Scirus, download, ArXiV, library, information security, internet, peer review, scribd, docstoc, cornell university, archive, Journal of Computing, DOAJ, Open Access, November 2010, Volume 8, No.8, Impact Factor, engineering, international, proQuest, computing, computer, technology

Stats:

views: | 231 |

posted: | 12/4/2010 |

language: | English |

pages: | 5 |

OTHER DOCS BY ijcsis

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.