VIEWS: 77 PAGES: 5 CATEGORY: Emerging Technologies POSTED ON: 4/9/2011
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 3, March 2011 Mining Fuzzy Cyclic Patterns F. A. Mazarbhuiya M. A. Khaleel P. R. Khan Department of Computer Science Department of Computer Science Department of Computer Science College of Computer Science College of Computer Science College of Computer Science King Khalid University King Khalid University King Khalid University Abha, Kingdom of Saudi Arabia Abha, Kingdom of Saudi Arabia Abha, Kingdom of Saudi Arabia e-mail: fokrul_2005@yahoo.com e-mail: khaleel_dm@yahoo.com e-mail: drpervaizrkhan@yahoo.com Abstract— The problem of mining temporal association rules to define the similarity between fuzzy time intervals associated from temporal dataset is to find association between items that with a frequent itemsets. Similarly, fuzzy distance between any hold within certain time intervals but not throughout the dataset. two consecutive fuzzy time intervals associated with a frequent This involves finding frequent sets that are frequent at certain itemset can be used to find the fuzzy time gaps between the time intervals and then association rules among the items present fuzzy time intervals and its variance can be used to define in the frequent sets. In fuzzy temporal datasets as the time of similarity among fuzzy time gaps associated with the frequent transaction is imprecise, we may find set of items that are itemsets. frequent in certain fuzzy time intervals. We call these as fuzzy locally frequent sets and the corresponding associated association In section II we give a brief discussion on the works related rules as fuzzy local association rules. The algorithm discussed [5] to our work. In section III we describe the definitions, terms finds all fuzzy locally frequent itemsets where each frequent item and notations used in this paper. In section IV, we give the set is associated with a list of fuzzy time intervals where it is algorithm proposed in this paper for mining fuzzy locally frequent. The list of fuzzy time intervals may exhibit some frequent sets. We conclude with conclusion and lines for future interesting properties e.g. the itemsets may be cyclic in nature. In work in section V. In the last section we give some references. this paper we propose a method of finding such cyclic frequent itemsets. II. RELATED WORKS Keywords- Fuzzy time-stamp, Fuzzy time interval, Fuzzy The problem of discovery of association rules was first temporal datasets, Fuzzy locally frequent sets, Fuzzy distance, formulated by Agrawal et al in 1993. Given a set I, of items Variance of a fuzzy interval and a large collection D of transactions involving the items, the problem is to find relationships among the items i.e. the I. INTRODUCTION presence of various items in the transactions. A transaction The problem of mining association rules has been defined t is said to support an item if that item is present in t. A initially [10] by R. Agarwal et al for application in large super transaction t is said to support an itemset if t supports each markets. Large supermarkets have large collection of records of of the items present in the itemset. An association rule is daily sales. Analyzing the buying patterns of the buyers will an expression of the form X ⇒ Y where X and Y are subsets help in taking typical business decisions such as what to put on of the itemset I. The rule holds with confidence τ if τ% of sale, how to put the materials on the shelves, how to plan for the transaction in D that supports X also supports Y. The future purchase etc. rule has support σ if σ% of the transactions supports X ∪ Y. Mining for association rules between items in temporal A method for the discovery of association rules was given databases has been described as an important data-mining in [9], which is known as the A priori algorithm. This was problem. Transaction data are normally temporal. The market then followed by subsequent refinements, generalizations, basket transaction is an example of this type. extensions and improvements. Mining fuzzy temporal dataset is also an interesting data Temporal Data Mining is now an important extension mining problem. In [5], author proposed a method of finding of conventional data mining and has recently been able to fuzzy locally frequent sets from such datasets. The algorithm attract more people to work in this area. By taking into discussed in [5] extracts all frequent itemsets along with a set account the time aspect, more interesting patterns that are of list of fuzzy time intervals where each frequent itemset is time dependent can be extracted. There are mainly two associated a list of fuzzy time intervals. The list of fuzzy time broad directions of temporal data mining [6]. One concerns intervals associated with a frequent itemsets can be used to find the discovery of causal relationships among temporally some interesting results. In this paper we propose to study the oriented events. Ordered events form sequences and the problem of cyclic nature of a list of time intervals associated cause of an event always occur before it. The other with a frequent itemset and devise a method to extract all concerns the discovery of similar patterns within the same frequent itemsets which are cyclic. We call such frequent time sequence or among different time sequences. The itemset as fuzzy cyclic frequent itemsets as the time intervals underlying problem is to find frequent sequential patterns are fuzzy in nature. In such case, as the variance of the fuzzy in the temporal databases. The name sequence mining is intervals is invariant with respect to translation, it can be used normally used for the underlying problem. In [7] the 95 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 3, March 2011 problem of recognizing frequent episodes in an event In general, a generalized fuzzy number A is described as sequence is discussed where an episode is defined as a any fuzzy subset of the real line R, whose membership function collection of events that occur during time intervals of a A(x) satisfying the following conditions specific size. (1) A(x) is continuous mapping from R to the closed The association rule discovery process is also extended interval [0, 1] to incorporate temporal aspects. In temporal association rules each rule has associated with it a time interval in (2) A(x)=0, -∝< x ≤ c which the rule holds. The problems associated are to find (3) A(x)= L(x) is strictly increasing on [c, a] valid time periods during which association rules hold, the discovery of possible periodicities that association rules (4) A(x)=w, a ≤ x ≤ b have and the discovery of association rules with temporal (5) A(x)= R(x) is strictly decreasing on [b, d] features. In [8], [14], [15] and [16], the problem of temporal data mining is addressed and techniques and (6) A(x)=0, d ≤ x<∝, algorithms have been developed for this. In [8] an where 0 <w ≤1, a, b, c and d are real numbers. This type of algorithm for the discovery of temporal association rules is generalized fuzzy number is denoted by A=(c, a, b, d; w)LR. described. In [2], two algorithms are proposed for the When w=1, the above generalized fuzzy number will be a discovery of temporal rules that display regular cyclic fuzzy interval and is denoted by A=(c, a, b, d)LR. When L(x) variations where the time interval is specified by user to and R(x) are straight line, then A is a trapezoidal fuzzy number divide the data into disjoint segments like months, weeks, and is denoted by (c, a, b, d). If a=b, then the above trapezoidal days etc. Similar works were done in [4] and [17] number will be a triangular fuzzy number denoted by (c, a, d). incorporating multiple granularities of time intervals (e.g. The h-level of the fuzzy number [t1-a, t1, t1+a] is a closed first working day of every month) from which both cyclic interval [t1+(α-1).a, t1+(1-α).a]. Similarly the h-level of the and user defined calendar patterns can be achieved. In [1], fuzzy interval [t1-a, t1, t2, t2+a] is a closed interval [t1+(α-1).a, the method of finding locally and periodically frequent sets t2+(1-α).a]. and periodic association rules are discussed which is an improvement of other methods in the sense that it Chen and Hsieh [11, 12, 13] proposed graded mean integration dynamically extracts all the rules along with the intervals representation for generalized fuzzy number as follows: where the rules hold. In ([18], [19]) fuzzy calendric data Suppose L-1 and R-1are inverse functions of the functions L and mining and fuzzy temporal data mining is discussed where R respectively and the graded mean h-level value of user specified ill-defined fuzzy temporal and calendric generalized fuzzy number A=(c, a, b, d; w)LR is h[L-1(h)+ R- 1 patterns are extracted from temporal data. (h)]/2. Then the graded mean integration representation of In [5], the authors propose a method of extracting generalized fuzzy number based on the integral value of graded fuzzy locally frequent sets from fuzzy temporal datasets. mean h-levels is The algorithm discussed in [5], extracts all frequent w w itemsets with each itemset is associated with a list fuzzy L−1 ( h ) + R −1 ( h ) time intervals where the itemset is frequent. The list fuzzy time intervals associated with a frequent itemset exhibits P(A)= ∫ 0 h( 2 )dh / ∫ hdh 0 some interesting properties e.g. the size of the fuzzy time intervals may be almost same and also the time gap where h is between 0 and w, 0<w≤1. between two consecutive fuzzy time intervals is also almost B. Fuzzy distance same. We call such frequent itemset as a fuzzy cyclic frequent itemset. So the study is basically an intra-itemset Chen and Wang [13] proposed fuzzy distance between any two study. trapezoidal fuzzy numbers as follows: Let A=(a1, a2, a3, a4), B=(b1, b2, b3, b4) be two trapezoidal fuzzy numbers, and their III. PROBLEM DEFINITION graded mean integration representation are P(A), P(B) respectively. Assume A. Some Definition related to Fuzzy sets si=(ai-P(A)+bi-P(B))/2, i=1, 2, 3, 4; Let E be the universe of discourse. A fuzzy set A in E is characterized by a membership function A(x) lying in [0,1]. ci=P(A)-P(B)+ si , i=1, 2, 3, 4; A(x) for x ∈ E represents the grade of membership of x in A. Thus a fuzzy set A is defined as then the fuzzy distance of A, B is C=(c1, c2, c3, c4). Obviously the fuzzy distance between two trapezoidal numbers is also a A={(x, A(x)), x ∈ E } trapezoidal number. A fuzzy set A is said to be normal if A(x) =1 for at least one C. Possibilistic variance of a fuzzy number x∈E Let F be a family of fuzzy number and A be a fuzzy number belonging to F. Let Aγ =[a1(γ), a2(γ)], γ∈[0, 1] be a γ-level of 96 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 3, March 2011 A. Carlsson and Fuller [3] defined the possibilistic variance of gaps) and see whether it is almost equal to the variance of the fuzzy number A∈ F as fuzzy distance (time gap) between the third and the fourth fuzzy time intervals. If the average of the variance of the first 1 two fuzzy time intervals is almost equal with the variance of Var(A) = 1 2 ∫ γ (a 2 (γ ) − a1 (γ )) 2 dγ the third interval we proceed further or otherwise stop. In 0 general if the average of the variance of the first (n-1) fuzzy Before proceeding further we review the theorem given by time intervals is almost equal to the variance of the n-th fuzzy Carlsson and Fuller [3]. time interval and the average of variance of first (n-2) fuzzy distances (time gaps) are almost equal to the n-1 th time gap, D. Theorem: The variance of a fuzzy number is invariant to then the average of variances of n fuzzy time intervals is shifting. compared with variance of (n+1)th fuzzy time interval and that of the first n-1 time gaps is compared with the n th time gap. Proof: Let A∈ F be a fuzzy number and let θ be a real number. This way we can extract fuzzy cyclic patterns if such patterns If A is shifted by value θ, then we get a fuzzy number, denoted exist. We describe below the algorithm for extracting periodic by B, satisfying the property B(x)=A(x-θ) for all x∈ R. Then item sets. from the relationship Algorithm for extracting fuzzy cyclic frequent item sets Bγ=[a1(γ)+θ, a2(γ)+θ] For each fuzzy locally frequent item set iset do we find { t1 first fuzzy time interval for iset 1 v1 var(t1) Var(B)= 1 2 ∫ γ ((a 2 (γ ) + θ ) − (a1 (γ ) + θ )) 2 dγ 0 t2 second fuzzy time interval for iset 1 v2 var( t2) =1 2 ∫ γ (a 2 (γ ) − a1 (γ )) dγ 2 if not almostequal(v1, v2,sgma) then 0 { report that iset is not fuzzy cyclic in nature =Var(A) continue /* go for the next frequent item set */ E. Almost equal fuzzy intervals } Given two fuzzy intervals A and B, we say A is almost equal B n=1 or B is almost equal to A if the variances of both A and B are equal up to a small variation say λ%. i.e. ftg1 fuzzydist(t1,t2) var(A)+ λ% of var(B)=var(B) v(ftg1) var(fuzzydist(t1,t2)) var(B)+ λ% of var(A)=var(A) avgvar (v1 +v2)/2 where λ is specified by the user. flag = 0 IV. PROPOSED ALGORITHM while not end of fuzzy interval list for iset do A. Extraction of fuzzy cyclic patterns { tint current fuzzy time interval One way to extract these sets is to find the fuzzy distance ftg fuzzydist(tint, t2) between any two consecutive fuzzy time intervals of the same v(ftg) var(fuzzydist(tint, t2)) frequent set. If the fuzzy distance (time gap) between any two consecutive frequent time intervals are found to be almost if almostequal(v(ftg),v(ftg1), sgma) then equal and also the fuzzy time intervals are found to be almost avv(gftg) (n*v(ftg1) +v(ftg))/(n+1) equal (the definition of almost equal fuzzy time interval is given in Definition E of section III) then we call these frequent else sets as fuzzy cyclic frequent sets. Now to find out such type of { flag = 1; break;} cyclic nature for each frequent item set we proceed as follows. If the first fuzzy time interval is almost equal to the second var var(tint) fuzzy time interval then we see whether the fuzzy distance if almostequal(var, avgvar, sgma) then (time gap) between the first and the second fuzzy time interval is almost equal to the fuzzy distance (time gap) between the avgvar ((n+1)*avgvar +var) /(n+2) second and third fuzzy time intervals. If it is, then we take the else average of the variance of the first two fuzzy distances (time 97 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 3, March 2011 {flag = 1; break;} [3] C. Carlsson and R. Fuller; On Possibilistic Mean Value and Variance of Fuzzy Numbers, Fuzzy Sets and Systems 122 (2001), pp. 315-326. t2 tint; [4] G. Zimbrao, J. Moreira de Souza, V. Teixeira de Almeida and W. Araujo da Silva; An Algorithm to Discover Calendar-based Temporal n n+1; Association Rules with Item’s Lifespan Restriction, Proc. of the 8th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining } (2002) Canada, 2nd Workshop on Temporal Data Mining, v. 8, 701-706, 2002. if (flag == 1) [5] F. A. Mazarbhuiya, and Ekramul Hamid; Finding Fuzzy Locally report that iset is not fuzzy cyclic in nature Frequent itemsets, International Journal of Computer Science and Information Security, ISSN: 1947-5500, LJS Publisher and IJCSIS Press, else USA, 2011 [6] J. F. Roddick, M. Spillopoulou; A Biblography of Temporal, Spatial and report that iset is fuzzy cyclic in nature Spatio-Temporal Data Mining Research; ACM SIGKDD, June 1999. [7] H. Manilla, H. Toivonen and I. Verkamo; Discovering frequent episodes } in sequences; KDD’95; AAAI, 210-215, August 1995. Here the function var() returns the fuzzy variance value of the [8] J. M. Ale and G.H. Rossi; An approach to discovering temporal association rules; Proceedings of the 2000 ACM symposium on Applied fuzzy time intervals associated with a fuzzy locally frequent Computing, March 2000. set, fuzzydist() returns fuzzy distance between two consecutive [9] R. Agrawal and R. Srikant; Fast algorithms for mining association rules, fuzzy time intervals associated with a fuzzy locally frequent Proceedings of the 20th International Conference on Very Large set i.e. fuzzy time gap, v() returns fuzzy variance value of the Databases (VLDB ’94), Santiago, Chile, June 1994. fuzzy time gap between two consecutive fuzzy time intervals [10] R. Agrawal, T. Imielinski and A. Swami; Mining association rules associated with a fuzzy locally frequent itemset, avv() returns between sets of items in large databases; Proceedings of the ACM SIGMOD ’93, Washington, USA, May 1993. the average of the variances of fuzzy time gaps avgvar [11] S. H. Chen and C. H. Hsieh; Graded mean integration representation of returns average of the fuzzy variances of fuzzy time intervals generalised fuzzy number, Proc. of Conference of Taiwan Fuzzy System associated with a fuzzy locally frequent set, sgma is the Association , Taiwan, 1998. threshold value upto which the variation of fuzzy variance [12] S. H. Chen and C. H. Hsieh; Graded mean integration representation of values can be permitted. generalised fuzzy number, Journal of the Chinese Fuzzy Sytem Association, Vol. 5, No. 2, pp. 1-7, Taiwan, 1999. V. CONCLUSIONS AND LINES FOR FUTURE WORKS [13] S. H. Chen and C. H. Hsieh; Representation, Ranking, Distance, Similarity of L-R type fuzzy number and Application, Australia Journal An algorithm for finding fuzzy cyclic frequent sets from fuzzy of Intelligent Information Processing Systems, Vol. 6, No. 4, pp. 217- temporal data is given in the paper. The algorithm [5] gives all 229, Australia. fuzzy locally frequent itemsets where each frequent itemsets is [14] X. Chen and I. Petrounias; A framework for Temporal Data Mining; associated with a list of fuzzy time intervals where it is Proceedings of the 9th International Conference on Databases and Expert frequent. Our algorithm takes the input from the result of the Systems Applications, DEXA ’98, Vienna, Austria. Springer-Verlag, Berlin; Lecture Notes in Computer Science 1460, 796-805, 1998. algorithm [5]. As the the variance of the fuzzy interval is [15] X. Chen and I. Petrounias; Language support for Temporal Data Mining; invariant with respect to shifting, we can consider two fuzzy Proceedings of 2nd European Symposium on Principles of Data Mining intervals having same variance value belonging in two different and Knowledge Discovery, PKDD ’98, Springer Verlag, Berlin, 282- parts of time domain as equal. Using this notion we define the 290, 1998. equality of the fuzzy time intervals associated with a fuzzy [16] X. Chen, I. Petrounias and H. Healthfield; Discovering temporal locally frequent itemsets. Our algorithm supplies all frequent Association rules in temporal databases; Proceedings of IADT’98 sets which are fuzzy cyclic. (International Workshop on Issues and Applications of Database Technology, 312-319, 1998. We may have some patterns where the fuzzy time gaps are [17] Y. Li, P. Ning, X. S. Wang and S. Jajodia; Discovering Calendar-based almost equal but the fuzzy time intervals of frequency are not Temporal Association Rules, In Proc. of the 8th Int’l Symposium on equal even in the approximate sense. We may also have some Temporal Representation and Reasonong, 2001. patterns where the fuzzy time-gaps are not equal but durations [18] W. J. Lee and S. J. Lee; Discovery of Fuzzy Temporal Association Rules, IEEE Transactions on Systems, Man and Cybenetics-part B; of the time intervals are almost equal. The above algorithm can Cybernetics, Vol 34, No. 6, 2330-2341, Dec 2004. be modified accordingly to find such patterns. In future we [19] W. J. Lee and S. J. Lee; Fuzzy Calendar Algebra and It’s Applications to would like to devise methods to find various kinds of patterns Data Mining, Proceedings of the 11th International Symposium on which may exist in fuzzy temporal datasets. Temporal Representation and Reasoning (TIME’04), IEEE, 2004. References AUTHOR’S PROFILE [1] A. K. Mahanta, F. A. Mazarbhuiya and H. K. Baruah; Finding Locally Fokrul Alom Mazarbhuiya received B.Sc. degree in Mathematics and Periodically Frequent Sets and Periodic Association Rules, from Assam University, India and M.Sc. degree in Mathematics from Proceeding of 1st Int’l Conf on Pattern Recognition and Machine Intelligence (PreMI’05),LNCS 3776, 576-582, 2005. Aligarh Muslim University, India. After this he obtained his Ph.D. degree in Computer Science from Gauhati University, India. Since [2] B. Ozden, S. Ramaswamy and A. Silberschatz; Cyclic Association Rules, Proc. of the 14th Int’l Conference on Data Engineering, USA, 2008 he has been serving as an Assistant Professor in College of 412-421, 1998. Computer Science, King Khalid University, Abha, kingdom of Saudi Arabia. His research interest includes Data Mining, Information security, Fuzzy Mathematics and Fuzzy logic. 98 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 3, March 2011 Mohammed Abdul Khaleel received B.Sc. degree in Mathematics from Osmania University, India and M.C.A degree from Osmania University, India. After that worked in Global Suhaimi Company Dammam Saudi Arabia as Senior Software Developer.Since 2008 serving as Lecturer at College of Computer Science, King Khalid University, Abha, Kingdom of Saudi Arabia. His research interest includes Data Mining, Software Engineering. Pervaiz Rabbani Khan received his B.Sc. B.Sc.(Hons) and M.Sc. in Physics from Punjab University, Lahore, Pakistant. After this he has done M.Sc. in Computer Science from New Castle upon Tyne, U.K. After this he obtained his Ph.D. from the same University. Since 2001 he has been working as an Assistant Professor in College of Computer Science, King Khalid University, Abha, Kingdom of Saudi Arabia. His research interest includes Simulation and Modeling, Fuzzy Logic. 99 http://sites.google.com/site/ijcsis/ ISSN 1947-5500