Docstoc

Mining Fuzzy Cyclic Patterns

Document Sample
Mining Fuzzy Cyclic Patterns Powered By Docstoc
					                                                                (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                  Vol. 9, No. 3, March 2011


                               Mining Fuzzy Cyclic Patterns
              F. A. Mazarbhuiya                           M. A. Khaleel                                P. R. Khan
         Department of Computer Science           Department of Computer Science              Department of Computer Science
         College of Computer Science              College of Computer Science                 College of Computer Science
         King Khalid University                   King Khalid University                      King Khalid University
         Abha, Kingdom of Saudi Arabia            Abha, Kingdom of Saudi Arabia               Abha, Kingdom of Saudi Arabia
         e-mail: fokrul_2005@yahoo.com           e-mail: khaleel_dm@yahoo.com                 e-mail: drpervaizrkhan@yahoo.com

Abstract— The problem of mining temporal association rules                  to define the similarity between fuzzy time intervals associated
from temporal dataset is to find association between items that             with a frequent itemsets. Similarly, fuzzy distance between any
hold within certain time intervals but not throughout the dataset.          two consecutive fuzzy time intervals associated with a frequent
This involves finding frequent sets that are frequent at certain            itemset can be used to find the fuzzy time gaps between the
time intervals and then association rules among the items present           fuzzy time intervals and its variance can be used to define
in the frequent sets. In fuzzy temporal datasets as the time of             similarity among fuzzy time gaps associated with the frequent
transaction is imprecise, we may find set of items that are                 itemsets.
frequent in certain fuzzy time intervals. We call these as fuzzy
locally frequent sets and the corresponding associated association              In section II we give a brief discussion on the works related
rules as fuzzy local association rules. The algorithm discussed [5]         to our work. In section III we describe the definitions, terms
finds all fuzzy locally frequent itemsets where each frequent item          and notations used in this paper. In section IV, we give the
set is associated with a list of fuzzy time intervals where it is           algorithm proposed in this paper for mining fuzzy locally
frequent. The list of fuzzy time intervals may exhibit some                 frequent sets. We conclude with conclusion and lines for future
interesting properties e.g. the itemsets may be cyclic in nature. In        work in section V. In the last section we give some references.
this paper we propose a method of finding such cyclic frequent
itemsets.                                                                                        II.     RELATED WORKS
   Keywords- Fuzzy time-stamp, Fuzzy time interval, Fuzzy                   The problem of discovery of association rules was first
temporal datasets, Fuzzy locally frequent sets, Fuzzy distance,             formulated by Agrawal et al in 1993. Given a set I, of items
Variance of a fuzzy interval                                                and a large collection D of transactions involving the items,
                                                                            the problem is to find relationships among the items i.e. the
                       I.    INTRODUCTION                                   presence of various items in the transactions. A transaction
The problem of mining association rules has been defined                    t is said to support an item if that item is present in t. A
initially [10] by R. Agarwal et al for application in large super           transaction t is said to support an itemset if t supports each
markets. Large supermarkets have large collection of records of             of the items present in the itemset. An association rule is
daily sales. Analyzing the buying patterns of the buyers will               an expression of the form X ⇒ Y where X and Y are subsets
help in taking typical business decisions such as what to put on            of the itemset I. The rule holds with confidence τ if τ% of
sale, how to put the materials on the shelves, how to plan for              the transaction in D that supports X also supports Y. The
future purchase etc.
                                                                            rule has support σ if σ% of the transactions supports X ∪ Y.
    Mining for association rules between items in temporal                  A method for the discovery of association rules was given
databases has been described as an important data-mining                    in [9], which is known as the A priori algorithm. This was
problem. Transaction data are normally temporal. The market                 then followed by subsequent refinements, generalizations,
basket transaction is an example of this type.                              extensions and improvements.
     Mining fuzzy temporal dataset is also an interesting data                   Temporal Data Mining is now an important extension
mining problem. In [5], author proposed a method of finding                 of conventional data mining and has recently been able to
fuzzy locally frequent sets from such datasets. The algorithm               attract more people to work in this area. By taking into
discussed in [5] extracts all frequent itemsets along with a set            account the time aspect, more interesting patterns that are
of list of fuzzy time intervals where each frequent itemset is              time dependent can be extracted. There are mainly two
associated a list of fuzzy time intervals. The list of fuzzy time           broad directions of temporal data mining [6]. One concerns
intervals associated with a frequent itemsets can be used to find           the discovery of causal relationships among temporally
some interesting results. In this paper we propose to study the             oriented events. Ordered events form sequences and the
problem of cyclic nature of a list of time intervals associated             cause of an event always occur before it. The other
with a frequent itemset and devise a method to extract all                  concerns the discovery of similar patterns within the same
frequent itemsets which are cyclic. We call such frequent                   time sequence or among different time sequences. The
itemset as fuzzy cyclic frequent itemsets as the time intervals             underlying problem is to find frequent sequential patterns
are fuzzy in nature. In such case, as the variance of the fuzzy             in the temporal databases. The name sequence mining is
intervals is invariant with respect to translation, it can be used          normally used for the underlying problem. In [7] the




                                                                       95                                http://sites.google.com/site/ijcsis/
                                                                                                         ISSN 1947-5500
                                                            (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                              Vol. 9, No. 3, March 2011

problem of recognizing frequent episodes in an event                       In general, a generalized fuzzy number A is described as
sequence is discussed where an episode is defined as a                  any fuzzy subset of the real line R, whose membership function
collection of events that occur during time intervals of a              A(x) satisfying the following conditions
specific size.                                                              (1) A(x) is continuous mapping from R to the closed
     The association rule discovery process is also extended                    interval [0, 1]
to incorporate temporal aspects. In temporal association
rules each rule has associated with it a time interval in                   (2) A(x)=0, -∝< x ≤ c
which the rule holds. The problems associated are to find                   (3) A(x)= L(x) is strictly increasing on [c, a]
valid time periods during which association rules hold, the
discovery of possible periodicities that association rules                  (4) A(x)=w, a ≤ x ≤ b
have and the discovery of association rules with temporal                   (5) A(x)= R(x) is strictly decreasing on [b, d]
features. In [8], [14], [15] and [16], the problem of
temporal data mining is addressed and techniques and                        (6) A(x)=0, d ≤ x<∝,
algorithms have been developed for this. In [8] an                      where 0 <w ≤1, a, b, c and d are real numbers. This type of
algorithm for the discovery of temporal association rules is            generalized fuzzy number is denoted by A=(c, a, b, d; w)LR.
described. In [2], two algorithms are proposed for the                  When w=1, the above generalized fuzzy number will be a
discovery of temporal rules that display regular cyclic                 fuzzy interval and is denoted by A=(c, a, b, d)LR. When L(x)
variations where the time interval is specified by user to              and R(x) are straight line, then A is a trapezoidal fuzzy number
divide the data into disjoint segments like months, weeks,              and is denoted by (c, a, b, d). If a=b, then the above trapezoidal
days etc. Similar works were done in [4] and [17]                       number will be a triangular fuzzy number denoted by (c, a, d).
incorporating multiple granularities of time intervals (e.g.
                                                                            The h-level of the fuzzy number [t1-a, t1, t1+a] is a closed
first working day of every month) from which both cyclic
                                                                        interval [t1+(α-1).a, t1+(1-α).a]. Similarly the h-level of the
and user defined calendar patterns can be achieved. In [1],
                                                                        fuzzy interval [t1-a, t1, t2, t2+a] is a closed interval [t1+(α-1).a,
the method of finding locally and periodically frequent sets
                                                                        t2+(1-α).a].
and periodic association rules are discussed which is an
improvement of other methods in the sense that it                       Chen and Hsieh [11, 12, 13] proposed graded mean integration
dynamically extracts all the rules along with the intervals             representation for generalized fuzzy number as follows:
where the rules hold. In ([18], [19]) fuzzy calendric data              Suppose L-1 and R-1are inverse functions of the functions L and
mining and fuzzy temporal data mining is discussed where                R respectively and the graded mean h-level value of
user specified ill-defined fuzzy temporal and calendric                 generalized fuzzy number A=(c, a, b, d; w)LR is h[L-1(h)+ R-
                                                                        1
patterns are extracted from temporal data.                                (h)]/2. Then the graded mean integration representation of
     In [5], the authors propose a method of extracting                 generalized fuzzy number based on the integral value of graded
fuzzy locally frequent sets from fuzzy temporal datasets.               mean h-levels is
The algorithm discussed in [5], extracts all frequent                           w                                        w
itemsets with each itemset is associated with a list fuzzy                                 L−1 ( h ) + R −1 ( h )
time intervals where the itemset is frequent. The list fuzzy
time intervals associated with a frequent itemset exhibits
                                                                        P(A)=   ∫
                                                                                0
                                                                                      h(             2              )dh / ∫ hdh
                                                                                                                          0
some interesting properties e.g. the size of the fuzzy time
intervals may be almost same and also the time gap                      where h is between 0 and w, 0<w≤1.
between two consecutive fuzzy time intervals is also almost             B. Fuzzy distance
same. We call such frequent itemset as a fuzzy cyclic
frequent itemset. So the study is basically an intra-itemset            Chen and Wang [13] proposed fuzzy distance between any two
study.                                                                  trapezoidal fuzzy numbers as follows: Let A=(a1, a2, a3, a4),
                                                                        B=(b1, b2, b3, b4) be two trapezoidal fuzzy numbers, and their
                III.   PROBLEM DEFINITION                               graded mean integration representation are P(A), P(B)
                                                                        respectively. Assume
A. Some Definition related to Fuzzy sets
                                                                                    si=(ai-P(A)+bi-P(B))/2, i=1, 2, 3, 4;
   Let E be the universe of discourse. A fuzzy set A in E is
characterized by a membership function A(x) lying in [0,1].                         ci=P(A)-P(B)+ si , i=1, 2, 3, 4;
A(x) for x ∈ E represents the grade of membership of x in A.
Thus a fuzzy set A is defined as                                        then the fuzzy distance of A, B is C=(c1, c2, c3, c4). Obviously
                                                                        the fuzzy distance between two trapezoidal numbers is also a
         A={(x, A(x)), x ∈ E }                                          trapezoidal number.
  A fuzzy set A is said to be normal if A(x) =1 for at least one        C. Possibilistic variance of a fuzzy number
x∈E                                                                     Let F be a family of fuzzy number and A be a fuzzy number
                                                                        belonging to F. Let Aγ =[a1(γ), a2(γ)], γ∈[0, 1] be a γ-level of




                                                                   96                                         http://sites.google.com/site/ijcsis/
                                                                                                              ISSN 1947-5500
                                                                    (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                      Vol. 9, No. 3, March 2011

A. Carlsson and Fuller [3] defined the possibilistic variance of             gaps) and see whether it is almost equal to the variance of the
fuzzy number A∈ F as                                                         fuzzy distance (time gap) between the third and the fourth
                                                                             fuzzy time intervals. If the average of the variance of the first
                1                                                            two fuzzy time intervals is almost equal with the variance of
Var(A) =    1
            2   ∫       γ (a 2 (γ ) − a1 (γ )) 2 dγ                          the third interval we proceed further or otherwise stop. In
                0
                                                                             general if the average of the variance of the first (n-1) fuzzy
Before proceeding further we review the theorem given by                     time intervals is almost equal to the variance of the n-th fuzzy
Carlsson and Fuller [3].                                                     time interval and the average of variance of first (n-2) fuzzy
                                                                             distances (time gaps) are almost equal to the n-1 th time gap,
D. Theorem: The variance of a fuzzy number is invariant to                   then the average of variances of n fuzzy time intervals is
shifting.                                                                    compared with variance of (n+1)th fuzzy time interval and that
                                                                             of the first n-1 time gaps is compared with the n th time gap.
Proof: Let A∈ F be a fuzzy number and let θ be a real number.                This way we can extract fuzzy cyclic patterns if such patterns
If A is shifted by value θ, then we get a fuzzy number, denoted              exist. We describe below the algorithm for extracting periodic
by B, satisfying the property B(x)=A(x-θ) for all x∈ R. Then                 item sets.
from the relationship
                                                                             Algorithm for extracting fuzzy cyclic frequent item sets
           Bγ=[a1(γ)+θ, a2(γ)+θ]                                             For each fuzzy locally frequent item set iset do
we find                                                                       { t1          first fuzzy time interval for iset
                1
                                                                                v1 var(t1)
Var(B)=    1
           2    ∫       γ ((a 2 (γ ) + θ ) − (a1 (γ ) + θ )) 2 dγ
                0
                                                                                t2          second fuzzy time interval for iset

                    1
                                                                                 v2         var( t2)
           =1
            2       ∫    γ (a 2 (γ ) − a1 (γ )) dγ
                                               2
                                                                                 if not almostequal(v1, v2,sgma) then
                    0
                                                                                      { report that iset is not fuzzy cyclic in nature
           =Var(A)                                                                      continue /* go for the next frequent item set */
E. Almost equal fuzzy intervals                                                         }
Given two fuzzy intervals A and B, we say A is almost equal B                    n=1
or B is almost equal to A if the variances of both A and B are
equal up to a small variation say λ%. i.e.                                       ftg1         fuzzydist(t1,t2)
           var(A)+ λ% of var(B)=var(B)                                            v(ftg1)          var(fuzzydist(t1,t2))
           var(B)+ λ% of var(A)=var(A)                                            avgvar             (v1 +v2)/2
where λ is specified by the user.                                                    flag = 0

                            IV.   PROPOSED ALGORITHM                         while not end of fuzzy interval list for iset do
A. Extraction of fuzzy cyclic patterns                                           { tint        current fuzzy time interval
One way to extract these sets is to find the fuzzy distance                          ftg      fuzzydist(tint, t2)
between any two consecutive fuzzy time intervals of the same                      v(ftg)        var(fuzzydist(tint, t2))
frequent set. If the fuzzy distance (time gap) between any two
consecutive frequent time intervals are found to be almost                           if almostequal(v(ftg),v(ftg1), sgma) then
equal and also the fuzzy time intervals are found to be almost                              avv(gftg)      (n*v(ftg1) +v(ftg))/(n+1)
equal (the definition of almost equal fuzzy time interval is
given in Definition E of section III) then we call these frequent                    else
sets as fuzzy cyclic frequent sets. Now to find out such type of
                                                                                        { flag = 1; break;}
cyclic nature for each frequent item set we proceed as follows.
If the first fuzzy time interval is almost equal to the second                               var     var(tint)
fuzzy time interval then we see whether the fuzzy distance
                                                                                if almostequal(var, avgvar, sgma) then
(time gap) between the first and the second fuzzy time interval
is almost equal to the fuzzy distance (time gap) between the                          avgvar           ((n+1)*avgvar +var) /(n+2)
second and third fuzzy time intervals. If it is, then we take the
                                                                                else
average of the variance of the first two fuzzy distances (time



                                                                        97                                        http://sites.google.com/site/ijcsis/
                                                                                                                  ISSN 1947-5500
                                                                         (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                           Vol. 9, No. 3, March 2011

            {flag = 1; break;}                                                       [3]    C. Carlsson and R. Fuller; On Possibilistic Mean Value and Variance of
                                                                                            Fuzzy Numbers, Fuzzy Sets and Systems 122 (2001), pp. 315-326.
             t2       tint;                                                          [4]    G. Zimbrao, J. Moreira de Souza, V. Teixeira de Almeida and W. Araujo
                                                                                            da Silva; An Algorithm to Discover Calendar-based Temporal
          n n+1;                                                                            Association Rules with Item’s Lifespan Restriction, Proc. of the 8th
                                                                                            ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining
                  }                                                                         (2002) Canada, 2nd Workshop on Temporal Data Mining, v. 8, 701-706,
                                                                                            2002.
          if (flag == 1)
                                                                                     [5]    F. A. Mazarbhuiya, and Ekramul Hamid; Finding Fuzzy Locally
            report that iset is not fuzzy cyclic in nature                                  Frequent itemsets, International Journal of Computer Science and
                                                                                            Information Security, ISSN: 1947-5500, LJS Publisher and IJCSIS Press,
          else                                                                              USA, 2011
                                                                                     [6]    J. F. Roddick, M. Spillopoulou; A Biblography of Temporal, Spatial and
            report that iset is fuzzy cyclic in nature                                      Spatio-Temporal Data Mining Research; ACM SIGKDD, June 1999.
                                                                                     [7]    H. Manilla, H. Toivonen and I. Verkamo; Discovering frequent episodes
      }                                                                                     in sequences; KDD’95; AAAI, 210-215, August 1995.
Here the function var() returns the fuzzy variance value of the                      [8]    J. M. Ale and G.H. Rossi; An approach to discovering temporal
                                                                                            association rules; Proceedings of the 2000 ACM symposium on Applied
fuzzy time intervals associated with a fuzzy locally frequent                               Computing, March 2000.
set, fuzzydist() returns fuzzy distance between two consecutive                      [9]    R. Agrawal and R. Srikant; Fast algorithms for mining association rules,
fuzzy time intervals associated with a fuzzy locally frequent                               Proceedings of the 20th International Conference on Very Large
set i.e. fuzzy time gap, v() returns fuzzy variance value of the                            Databases (VLDB ’94), Santiago, Chile, June 1994.
fuzzy time gap between two consecutive fuzzy time intervals                          [10]   R. Agrawal, T. Imielinski and A. Swami; Mining association rules
associated with a fuzzy locally frequent itemset, avv() returns                             between sets of items in large databases; Proceedings of the ACM
                                                                                            SIGMOD ’93, Washington, USA, May 1993.
the average of the variances of fuzzy time gaps avgvar
                                                                                     [11]   S. H. Chen and C. H. Hsieh; Graded mean integration representation of
returns average of the fuzzy variances of fuzzy time intervals                              generalised fuzzy number, Proc. of Conference of Taiwan Fuzzy System
associated with a fuzzy locally frequent set, sgma is the                                   Association , Taiwan, 1998.
threshold value upto which the variation of fuzzy variance                           [12]   S. H. Chen and C. H. Hsieh; Graded mean integration representation of
values can be permitted.                                                                    generalised fuzzy number, Journal of the Chinese Fuzzy Sytem
                                                                                            Association, Vol. 5, No. 2, pp. 1-7, Taiwan, 1999.
             V. CONCLUSIONS AND LINES FOR FUTURE WORKS                               [13]   S. H. Chen and C. H. Hsieh; Representation, Ranking, Distance,
                                                                                            Similarity of L-R type fuzzy number and Application, Australia Journal
An algorithm for finding fuzzy cyclic frequent sets from fuzzy                              of Intelligent Information Processing Systems, Vol. 6, No. 4, pp. 217-
temporal data is given in the paper. The algorithm [5] gives all                            229, Australia.
fuzzy locally frequent itemsets where each frequent itemsets is                      [14]   X. Chen and I. Petrounias; A framework for Temporal Data Mining;
associated with a list of fuzzy time intervals where it is                                  Proceedings of the 9th International Conference on Databases and Expert
frequent. Our algorithm takes the input from the result of the                              Systems Applications, DEXA ’98, Vienna, Austria. Springer-Verlag,
                                                                                            Berlin; Lecture Notes in Computer Science 1460, 796-805, 1998.
algorithm [5]. As the the variance of the fuzzy interval is
                                                                                     [15]   X. Chen and I. Petrounias; Language support for Temporal Data Mining;
invariant with respect to shifting, we can consider two fuzzy                               Proceedings of 2nd European Symposium on Principles of Data Mining
intervals having same variance value belonging in two different                             and Knowledge Discovery, PKDD ’98, Springer Verlag, Berlin, 282-
parts of time domain as equal. Using this notion we define the                              290, 1998.
equality of the fuzzy time intervals associated with a fuzzy                         [16]   X. Chen, I. Petrounias and H. Healthfield; Discovering temporal
locally frequent itemsets. Our algorithm supplies all frequent                              Association rules in temporal databases; Proceedings of IADT’98
sets which are fuzzy cyclic.                                                                (International Workshop on Issues and Applications of Database
                                                                                            Technology, 312-319, 1998.
We may have some patterns where the fuzzy time gaps are                              [17]   Y. Li, P. Ning, X. S. Wang and S. Jajodia; Discovering Calendar-based
almost equal but the fuzzy time intervals of frequency are not                              Temporal Association Rules, In Proc. of the 8th Int’l Symposium on
equal even in the approximate sense. We may also have some                                  Temporal Representation and Reasonong, 2001.
patterns where the fuzzy time-gaps are not equal but durations                       [18]   W. J. Lee and S. J. Lee; Discovery of Fuzzy Temporal Association
                                                                                            Rules, IEEE Transactions on Systems, Man and Cybenetics-part B;
of the time intervals are almost equal. The above algorithm can                             Cybernetics, Vol 34, No. 6, 2330-2341, Dec 2004.
be modified accordingly to find such patterns. In future we                          [19]   W. J. Lee and S. J. Lee; Fuzzy Calendar Algebra and It’s Applications to
would like to devise methods to find various kinds of patterns                              Data Mining, Proceedings of the 11th International Symposium on
which may exist in fuzzy temporal datasets.                                                 Temporal Representation and Reasoning (TIME’04), IEEE, 2004.
                                  References                                                                  AUTHOR’S PROFILE
[1]       A. K. Mahanta, F. A. Mazarbhuiya and H. K. Baruah; Finding Locally         Fokrul Alom Mazarbhuiya received B.Sc. degree in Mathematics
          and Periodically Frequent Sets and Periodic Association Rules,             from Assam University, India and M.Sc. degree in Mathematics from
          Proceeding of 1st Int’l Conf on Pattern Recognition and Machine
          Intelligence (PreMI’05),LNCS 3776, 576-582, 2005.
                                                                                     Aligarh Muslim University, India. After this he obtained his Ph.D.
                                                                                     degree in Computer Science from Gauhati University, India. Since
[2]       B. Ozden, S. Ramaswamy and A. Silberschatz; Cyclic Association
          Rules, Proc. of the 14th Int’l Conference on Data Engineering, USA,
                                                                                     2008 he has been serving as an Assistant Professor in College of
          412-421, 1998.                                                             Computer Science, King Khalid University, Abha, kingdom of Saudi
                                                                                     Arabia. His research interest includes Data Mining, Information
                                                                                     security, Fuzzy Mathematics and Fuzzy logic.




                                                                                98                                     http://sites.google.com/site/ijcsis/
                                                                                                                       ISSN 1947-5500
                                                                (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                  Vol. 9, No. 3, March 2011

Mohammed Abdul Khaleel received B.Sc. degree in Mathematics
from Osmania University, India and M.C.A degree from Osmania
University, India. After that worked in Global Suhaimi Company
Dammam Saudi Arabia as Senior Software Developer.Since 2008
serving as Lecturer at College of Computer Science, King Khalid
University, Abha, Kingdom of Saudi Arabia. His research interest
includes Data Mining, Software Engineering.
Pervaiz Rabbani Khan received his B.Sc. B.Sc.(Hons) and M.Sc. in
Physics from Punjab University, Lahore, Pakistant. After this he has
done M.Sc. in Computer Science from New Castle upon Tyne, U.K.
After this he obtained his Ph.D. from the same University. Since
2001 he has been working as an Assistant Professor in College of
Computer Science, King Khalid University, Abha, Kingdom of Saudi
Arabia. His research interest includes Simulation and Modeling,
Fuzzy Logic.




                                                                       99                            http://sites.google.com/site/ijcsis/
                                                                                                     ISSN 1947-5500