Docstoc

Post-Mining of Association Rules: Techniques for Effective Knowledge Extraction

Document Sample
Post-Mining of Association Rules: Techniques for Effective Knowledge Extraction Powered By Docstoc
					1

Association Rules:
An Overview
Paul D. McNicholas University of Guelph, Guelph, Ontario, Canada. Yanchang Zhao University of Technology, Sydney, Australia.

Chapter I

aBStract
Association rules present one of the most versatile techniques for the analysis of binary data, with applications in areas as diverse as retail, bioinformatics, and sociology. In this chapter, the origin of association rules is discussed along with the functions by which association rules are traditionally characterised. Following the formal definition of an association rule, these functions – support, confidence and lift – are defined and various methods of rule generation are presented, spanning 15 years of development. There is some discussion about negations and negative association rules and an analogy between association rules and 2×2 tables is outlined. Pruning methods are discussed, followed by an overview of measures of interestingness. Finally, the post-mining stage of the association rule paradigm is put in the context of the preceding stages of the mining process.

introduction
In general, association rules present an efficient method of analysing very large binary, or discretized, data sets. One common application is to discover relationships between binary variables in transaction databases, and this type of analysis is

called a ‘market basket analysis’. While association rules have been used to analyse non-binary data, such analyses typically involve the data being coded as binary before proceeding. Association rules present one of the most versatile methods of analysing large binary datasets; recent applications have ranged from detection of bio-terrorist

Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.

Association Rules

attacks (Fienberg and Shmeeli, 2005) and the analysis of gene expression data (Carmona-Saez et al., 2006), to the analysis of Irish third level education applications (McNicholas, 2007). There are two or three steps involved in a typical association rule analysis (as shown in Box 1). This book focuses on the third step, postmining, and the purpose of this chapter is to set the scene for this focus. This chapter begins with a look back towards the foundations of thought on the association of attributes: the idea of an association rule is then introduced, followed by discussion about rule generation. Finally, there is a broad review of pruning and interestingness.

such issues were seriously considered and it is still the case that the absence of items from transactions is often ignored in analyses. We will revisit this issue later in this chapter, when discussing negative association rules and negations.

aSSociation ruLeS Definitions
Given a non-empty set I, an association rule is a statement of the form A ⇒ B, where A, B ⊂ I such that A ≠ ∅, B ≠ ∅, and A ∩ B = ∅. The set A is called the antecedent of the rule, the set B is called the consequent of the rule, and we shall call I the master itemset. Association rules are generated over a large set of transactions, denoted by T. An association rule can be deemed interesting if the items involved occur together often and there are suggestions that one of the sets might in some sense lead to the presence of the other set. Association rules are traditionally characterised as interesting, or not, based on mathematical notions called ‘support’, ‘confidence’ and ‘lift’. Although there are now a multitude of measures of interestingness available to the analyst, many of them are still based on these three functions. The notation P(A) is used to represent the proportion of times that the set A appears in the transaction set T. Similarly, P(A, B) represents the proportion of times that the sets A and B coincide in transactions in T. It is also necessary to define

Background
Although formally introduced towards the end of the twentieth century (Agrawal et al. 1993), many of the ideas behind association rules can be seen in the literature over a century earlier. Yule (1903) wrote about associations between attributes and, in doing so, he built upon the earlier writings of De Mogran (1847), Boole (1847, 1854) and Jevons (1890). Yule (1903) raised many of the issues that are now central to association rule analysis. In fact, Yule (1903) built the idea of not possessing an attribute into his paradigm, from the outset, as an important concept. Yet, it was several years after the introduction of association rules before

Box 1. Steps involved in a typical association rule analysis
Coding of data as binary (if data is not binary) ↓ Rule generation ↓ Post-mining

Table 1. Traditional functions of association rules.
Function Support Confidence Lift Definition s(A ⇒ B) = P(A, B) c(A ⇒ B) = P(B | A) L(A ⇒ B) = c(A ⇒ B)/P(B)

2

Association Rules

P ( B | A) =

P ( A, B ) , P ( A)

which is the proportion of times that the set B appears in all of the transactions involving the presence of the set A. Table 1 contains the definitions for the functions by which association rules are traditionally characterised. The lift, or as it is sometimes called, the ‘interest’, of an association rule can be viewed as some measure of the distance between P(B | A) and P(B) or, equivalently, as a function giving the extent to which A and B are dependent on one another. The support of the antecedent part of the rule, P(A), is sometimes called the ‘coverage’ of the rule and that of the consequent part, P(B), is referred to as the ‘expected confidence’ because it is the value that the confidence would take if the antecedent and consequent were independent.

One way to incorporate the absence of items into the association rule mining paradigm is to consider rules of the form A ⇒ B (Savasere et / al., 1998). Another is to think in terms of negations. Suppose A ⊂ I , then write ¬A to denote the absence or negation of the item, or items, in A from a transaction. Considering A as a binary {0, 1} variable, the presence of the items in A is equivalent to A = 1, while the negation ¬A is equivalent to A = 0. The concept of considering association rules involving negations, or “negative implications”, is due to Silverstein et al. (1998).

The Relationship to 2×2 Tables
Table 2, which is a cross-tabulation of A versus B, can be regarded as a representation of the supports of all of the rules that can be formed with antecedent A (A = 1) or ¬A (A = 0) and consequent B (B = 1) or ¬B (B = 0). The odds ratio associated with Table 2 is given by ad/bc. This odds ratio can be used to provide a partition of the eight possibly interesting association rules that can be formed from A, B and their respective negations: if odds ratio > 1, then only rules involving A and B or ¬A and ¬B may be of interest, while if odds ratio < 1, then only rules involving exactly one of ¬A and ¬B may be of interest.

Negations and Negative Association Rules
In many applications, it is not only the presence of items in the antecedent and the consequent parts of an association rule that may be of interest. Consideration, in many cases, should be given to the relationship between the absence of items from the antecedent part and the presence or absence of items from the consequent part. Further, the presence of items in the antecedent part can be related to the absence of items from the consequent part; for example, a rule such as {margarine} ⇒ {not butter}, which might be referred to as a ‘replacement rule’.

ruLe generation Introduction
Since their formal introduction in the year 1993 (Agrawal et al. 1993), most of the published work on association rules has focused on effective and fast techniques for rule generation. In the original formulation of the association rule mining problem, mining was performed by generating rules with support and confidence above predefined thresholds.

Table 2. A cross-tabulation of A versus B.
A 1 B 1 0 a c 0 b d

3

Association Rules

In general, the generation of association rules can be broken into two steps: frequent itemset counting and rule generation. A frequent itemset is an itemset with support no less than some pre-specified threshold. The first step counts the frequency of itemsets and finds frequent ones, and the second step generates high confidence rules from frequent itemsets. The first step is of exponential complexity in that, given m items, there are 2m possible candidate itemsets. The lattice of itemsets is shown in Figure 1, where the dashed line shows the boundary between frequent itemsets and infrequent ones. The itemsets above the line are frequent, while those below it are infrequent. Due to the huge search space of itemsets, many researchers work on frequent itemset counting, and more than two dozen publications and implementations on the topic can be found at the FIMI (Frequent Itemset Mining Implementations) Repository (http://fimi.cs.helsinki.fi/).

strategies for looking for frequent itemsets. The apriori algorithm was introduced by Agrawal and Srikant (1994). To improve the efficiency of searching, the algorithm exploits the downward-closure property of support (also called the anti-monotonicity property); that is, if an itemset is infrequent, then all of its supersets are also infrequent. To put it in another way, if an itemset is frequent, then all of its subsets are frequent. The above property is often employed to efficiently traverse the search space. For example, in the item lattice shown in Figure 1, if {ABC} is a frequent itemset, then {AB}, {AC}, {BC}, {A}, {B} and {C} are all frequent itemsets. If {AD} is infrequent, then {ABD}, {ACD} and {ABCD} are infrequent.

Alternative Algorithms
To reduce the number of candidate itemsets, the ideas of discovering only closed itemsets or maximal frequent itemsets were proposed. A closed itemset is an itemset without any immediate supersets having the same support (Pasquier et al., 1999; Zaki and Hsiao, 2005), and an itemset is a maximum frequent itemset if none of its im-

The Apriori Algorithm
Due to the explosion in the number of candidate itemsets as the number of items increases, much research work has focused on efficient searching

Figure 1. An example of an item lattice
Ø

A

B

C

D

AB

AC

AD

BC

BD

CD

ABC

ABD

ACD

BCD

ABCD

4

Association Rules

mediate supersets are frequent (Gouda and Zaki, 2001, Xiao et al., 2003). For example, {ABC} and {BD} are two maximum frequent itemsets in the itemset lattice in Figure 1. Alternatives to the apriori algorithm have been proposed, that count items in different ways, such as ‘dynamic itemset counting’ (Brin et al., 1997). However, recent improvements in the implementation of the apriori algorithm by Borgelt and Kruse (2002) and Borgelt (2003) have made it very efficient. Zaki et al. (1997) introduced the ECLAT algorithm, which was later implemented efficiently by Bolgelt (2003). The ECLAT algorithm is a viable alternative to the apriori algorithm that is based on equivalence class clustering using some predefined support threshold. For searching in the item lattice shown in Figure 1, there are two different approaches: breadth-first search and depth-first search. A typical example of a breadth-first search is the apriori algorithm. Depth-first traversal of itemset lattice was proposed by Burdick et al. (2001). Han et al. (2004) designed a frequent-pattern tree (FP-tree) structure as a compressed representation of the database, and proposed the FP-growth algorithm for mining the complete set of frequent patterns by pattern fragment growth. Throughout the literature, there have been alternatives to support-based rule generation. The idea of discovering association rules based on conditional independencies was introduced by Castelo et al. (2001) in the form of the MAMBO algorithm. Pasquier et al. (1999) showed that it is

not necessary to use support pruning in order to find all interesting association rules and that it is sufficient to mine the closed frequent itemsets. Pei et al. (2000) and Zaki and Hsiao (2005) propose efficient algorithms for this purpose, called CLOSET and CHARM respectively. The problem of finding efficient parallel routines for association rule generation has gone through several iterations. Aggarwal and Yu (1998) present an excellent review of work carried out up to that time, encompassing the contributions of Mueller (1995), Park et al. (1995) and Agrawal and Shafer (1996), amongst others. More recent work on parallel techniques for association rule generation, includes that of Zäiane et al. (2001) and Holt and Chung (2007). A detailed introduction to a variety of association rule mining algorithms is given by Ceglar and Roddick (2006).

pruning Pruning and Visualisation
Hofmann and Wilhelm (2001) discusss methods of visualising association rules, amongst which are doubledecker plots and two-key plots. Doubledecker plots are mosaic plots (Hartigan and Kleiner, 1981) that can be used to visualise the support and confidence of different combinations of presence and absence of the consequent part and the items in the antecedent part of a rule. Doubledecker plots can be used to implicitly find interesting rules involving negations but are of limited use for this purpose when the consequent is not singleton. A two-key plot is a scatter plot of confidence versus support, where each point lies on a line through the origin and each such line represents rules with a common antecedent. Two-key plots can be used to perform visual pruning since all rules with support of at least s and confidence of at least c will lie in a rectangle with corners (s, 1), (1, 1), (1, c) and (s, c).

Figure 2. The apriori algorithm
k ← 1; L1 ← frequent items; Repeat until Lk is empty Generate candidate (k+1)-itemsets Ck +1 from Lk; Lk +1 ← { X | X ∈ Ck +1 and supp( X ) ≥ minsupp}; k ← k + 1; End loop;

5

Association Rules

The Three-Step Approach
There have been some statistically-based pruning methods mentioned in the association rules literature. One such method was given by Bruzzese and Davino (2001), who presented three tests. Initially, association rules are mined with certain support and confidence thresholds, then three statistical tests are employed; an association rule is only retained if it ‘passes’ all three tests. These tests can be carried out using the software PISSARRO (Keller and Schlögl, 2002). By comparing the confidence of a rule to its expected confidence, the first test prunes rules that have high confidence due to the presence of frequent items in the consequent part of the rule. The second test involves performing a chi-squared test on each set of rules with the same antecedent part. The third test evaluates the strength of the association between the antecedent and the consequent of a rule by comparing the support of the rule with the support of its antecedent part. This pruning approach will, in many cases, be excessive. This third test in particular can cause the pruning of rules that one may wish to keep. Furthermore, this procedure can greatly slow down the post-mining stage of the process.

intereStingneSS Background
Apart from the staples of support, confidence and lift, a whole host of other functions have been introduced to measure the interestingness of an association rule. Interestingness is viewed by some as an alternative to pruning: rules can be ranked by interestingness and the pruning stage can then be omitted. However, interestingness can also be used before pruning or to break ties during pruning. Measures of rule interestingness can be considered as falling into one of two categories: subjective and objective (Silberschatz and Tuzhilin, 1996; Freitas, 1998). Subjective measures of interestingness focus on finding interesting patterns by matching against a given set of user beliefs, while objective ones measure the interestingness in terms of their probabilities. In this section, a selection of objective measures of interestingness is given and briefly discussed. More extensive reviews of measures of interestingness are given elsewhere in this book.

Pruning Methods Based on the Chi-Squared Test
Although the relationship that we have outlined between association rules and 2×2 tables might make pruning methods based on the chi-squared test seem attractive, they will suffer from the usual sample size-related problems attached to chi-squared tests. An excellent critique of pruning methods based on the chi-squared test is given by Wilhelm (2002).

Selected Measures of Interestingness
The lift of an association rule, which was defined in Table 1, is often used as a measure of interestingness. One notable feature of lift is that it is symmetric, in that the lift of the rule A ⇒ B is the same as the lift of the rule B ⇒ A. A rule with a lift value of 1 means that the antecedent and consequent parts of the rule are independent. However, lift is not symmetric about 1 and it is not clear how far lift needs to be from 1 before a rule can be considered interesting. In an attempt to solve this latter problem, McNicholas et al. (2008) introduced the standardised lift of an association rule. Their standardised lift gives a measure of the interestingness of an association rule by standardising the value of the lift to give a

6

Association Rules

number in the interval [0,1]. This standardisation takes into account the support of the antecedent, the support of the consequent, the support of the rule and any support or confidence thresholds that may have been used in the mining process. The ‘conviction’ (Brin et al., 1997) of an association rule A ⇒ B, which is given by Con(A⇒ B) =
P ( A) P (¬B ) , P ( A, ¬B )

used by Chan et al. (2003). More recently, Weiβ (2008) introduced the notions of α-precision and σ-precision.

POST-MINING
Pruning and interestingness, as discussed previously in this chapter, are employed in the post-mining stage of the association rule mining paradigm. However, there are a host of other techniques used in the post-mining stage that do not naturally fall under either of these headings. Some such techniques are in the area of redundancy-removal. There are often a huge number of association rules to contend with in the post-mining stage and it can be very difficult for the user to identify which ones are interesting. Therefore, it is important to remove insignificant rules, prune redundancy and do further post-mining on the discovered rules (Liu and Hsu, 1996; Liu et al., 1999). Liu et al. (1999) proposed a technique for pruning and summarising discovered association rules by first removing those insignificant associations and then forming direction-setting rules, which provide a sort of summary of the discovered association rules. Lent et al. (1997) proposed clustering the discovered association rules. A host of techniques on post mining of association rules will be introduced in the following chapters of this book.

presents an alternative to some of the more traditional measures of interestingness such as confidence and lift. Conviction takes the value 1 when A and B have no items in common and it is undefined when the rule A ⇒ B always holds. Gray and Orlowska (1998) define the interestingness of an association rule A ⇒ B as Int(A ⇒ B; K, M) =
 P ( A, B )  K  M   − 1 ( P ( A) P ( B ))  P ( A) P ( B )    

This is the distance of the Kth power of lift from 1 multiplied by the respective magnitudes of P(A) and P(B) raised to the Mth power. Since lift is symmetric, so is Gray and Orlowska’s interestingness

Other Measures of Interestingness
A whole host of other measures of interestingness exist. More than twenty measures of interestingness are reviewed by Tan et al. (2002): these include the Ф-coefficient, kappa, mutual information, the J-measure and the Gini index. The unexpected confidence interestingness and isolated interestingness of an association rule were introduced by Dong and Li (1998) by considering the unexpectedness of the rule in terms of other association rules in its neighbourhood. In addition, any-confidence, all-confidence and bond were given by Omiecinski (2003) and utility was

Summary
Whilst there has been much work on improving the algorithmic efficiency of association rule generation, relatively little has been done to investigate the in-depth implications of association rule analyses. With this chapter presenting some background and an overview of some of the developments in the area, the remainder of this book will focus on the post-mining stage of association rule mining.
7

Association Rules

referenceS
Aggarwal, C. C., & Yu, P. S. (1998). Mining Large Itemsets for Association Rules. In Bulletin of the Technical Committee, IEEE Computer Society, 21(1) 23-31. Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in very large databases. In Proceedings of the ACM SIGMOD Conference on Management of data, Washington, USA (pp. 207-216). Agrawal, R., & Srikant R. (1994). Fast Algorithms for Mining Association Rules in Large Databases. In Proceedings of the 20th International Conference on Very Large Data Bases, Santiago, Chile (pp. 478-499). Agrawal, R., & Shafer, J. (1996). Parallel Mining of Association Rules: Design, Implementation, and Experience. Technical Report RJ10004, IBM Almaden Research Center, San Jose, California, USA. Boole, G. (1847). The Mathematical Analysis of Logic. Cambridge: Macmillan, Barclay and Macmillan. Boole, G. (1854). An Investigation of the Laws of Thought. Cambridge: Macmillan and Co. Borgelt, C. (2003). Efficient implementations of apriori and éclat. In Workshop on Frequent Itemset Mining Implementations. Melbourne, FL. Borgelt, C., & Kruse, R. (2002). Induction of association rules: Apriori implementation. In 15th COMPSTAT Conference, Berlin, Germany. Brin, S., Motwani, R., Ullman, J. D., & Tsur, S. (1997). Dynamic itemset counting and implication rules for market basket data. In SIGMOD ‘97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data, Tucson, Arizona, USA (pp 255-264).

Bruzzese, D., & Davino, C. (2001). Statistical pruning of discovered association rules. Computational Statistics, 16(3), 387-398. Burdick, D., Calimlim, M., & Calimlim, M. (2001). MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases. Proceedings of the 17th International Conference on Data Engineering, IEEE Computer Society (pp. 443-452). Carmona-Saez, P., Chagoyen, M., Rodriguez, A., Trelles, O., Carazo, J. M., & Pascual-Montano, A. (2006). Integrated analysis of gene expression by association rules discovery. BMC Bioinformatics, 7(54). Castelo, R., Feelders, A., & Siebes, A. (2001). MAMBO: Discovering association rules based on conditional independencies. In Proceedings of 4th International Symposium on Intelligent Data Analysis, Cascais, Portugal. Ceglar, A., & Roddick, J. F. (2006). Association mining. ACM Computing Surveys, 38(2), 5. Chan, R., Yang, Q., & Shen, Y.-D. (2003). Mining high utility itemsets Data Mining, In ICDM 2003: Third IEEE International Conference on Data Mining, Melbourne, Florida, USA (pp. 19-26). De Morgan, A. (1847). Formal Logic or, The Calculus of Inference, Necessary and Probable. London: Taylor and Walton. Dong, G., & Li, J. (1998), Interestingness of discovered association rules in terms of neighborhood-based unexpectedness. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Melbourne, Australia (pp. 72-86). Fienberg, S. E., & Shmueli, G. (2005). Statistical issues and challenges associated with rapid detection of bio-terrorist attacks. Statistics in Medicine, 24(4), 513-529. Freitas, A. A. (1998). On Objective Measures of Rule Surprisingness. PKDD ‘98: Proceedings of the Second European Symposium on Principles of

8

Association Rules

Data Mining and Knowledge Discovery, SpringerVerlag, Nantes, France (pp. 1-9). Gouda, K., & Zaki, M. J. (2001). Efficiently Mining Maximal Frequent Itemsets. ICDM ‘01: Proceedings of the 2001 IEEE International Conference on Data Mining, San Jose, California, USA (pp. 163-170). Gray, B., & Orlowska, M. (1998). CCAIIA: Clustering categorical attributes into interesting association rules. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, Melbourne, Australia (pp. 132-143). Han, J., Pei, J., Yin, Y., & Mao, R. (2004). Mining frequent patterns without candidate generation. Data Mining and Knowledge Discovery, 8, 53-87. Hartigan, J. A., & Kleiner, B. (1981). Mosaics for contingency tables. In W. F. Eddy (Ed.), 13th Symposium on the Interface Between Computer Science and Statistics, New York: SpringerVerlag. Hofmann, H., & Wilhelm, A. (2001). Visual comparison of association rules. Computational Statistics, 16(3), 399-415. Holt, J. D., & Chung, S. M. (2007). Parallel mining of association rules from text databases. The Journal of Supercomputing, 39(3), 273-299. Jevons, W. S. (1890). Pure Logic and Other Minor Works. London: Macmillan. Keller, R., & Schlögl, A. (2002). PISSARRO: Picturing interactively statistically sensible association rules reporting overviews. Department of Computer Oriented Statistics and Data Analysis, Augsburg University, Germany. Lent, B., Swami, A. N., & Widom, J. (1997). Clustering Association Rules. Proceedings of the Thirteenth International Conference on Data Engineering. Birmingham U.K., IEEE Computer Society (pp. 220-231).

Liu, B., & Hsu, W. (1996). Post-Analysis of Learned Rules. AAAI/IAAI, 1(1996), 828-834. Liu, B., Hsu, W., & Ma, Y. (1999). Pruning and summarizing the discovered associations. Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD-99), ACM Press (pp. 125-134). McNicholas, P. D. (2007). Association rule analysis of CAO data (with discussion). Journal of the Statistical and Social Inquiry Society of Ireland, 36, 44-83. McNicholas, P. D., Murphy, T. B., & O’Regan, M. (2008). Standardising the Lift of an Association Rule, Computational Statistics and Data Analysis, 52(10), 4712-4721. Mueller A. (1995). Fast sequential and parallel methods for association rule mining: A comparison. Technical Report CS-TR-3515, Department of Computer Science, University of Maryland, College Park, MD. Omiecinski, E. R. (2003). Alternative Interest Measures for Mining Associations in Databases. IEEE Transactions on Knowledge and Data Engineering, 15, 57-69. Park, J. S., Chen, M. S., & Yu, P. S. (1995). Efficient Parallel Data Mining of Association Rules. In Fourth International Conference on Information and Knowledge Management, Baltimore, Maryland. (pp. 31-36). Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999). Discovering Frequent Closed Itemsets for Association Rules. Proceeding of the 7th International Conference on Database Theory, Lecture Notes in Computer Science (LNCS 1540), Springer, 1999, 398-416. Pei, J., Han, J., & Mao, R. (2000). CLOSET: An efficient algorithm for mining frequent closed itemsets. Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2000.

9

Association Rules

Savasere, A., Omiecinski, E., & Navathe, S. B. (1998). Mining for strong negative associations in a large database of customer transactions. Proceedings of the 14th International Conference on Data Engineering, Washington DC, USA, 1998, (pp. 494–502). Silberschatz, A., & Tuzhilin, A. (1996). What Makes Patterns Interesting in Knowledge Discovery Systems. IEEE Transactions on Knowledge and Data Engineering, IEEE Educational Activities Department, 8, 970-974. Silverstien, C., Brin, S., & Motwani, R. (1998). Beyond Market Baskets: Generalizing Association Rules to Dependence Rules. Data Mining and Knowledge Discovery, 2(1), 39-68. Tan, P., Kumar, V., & Srivastava, J. (2002). Selecting the right interestingness measure for association patterns. KDD ‘02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM Press, 2002, (pp. 32-41). Weiβ, C. H. (2008). Statistical mining of interesting association rules. Statistics and Computing, 18(2), 185-194.

Wilhelm, A. (2002). Statistical tests for pruning association rules. In Computational Statistics: 15th Symposium, Berlin, Germany. Xiao, Y., Yao, J., Li, Z., & Dunham, M. H. (2003). Efficient Data Mining for Maximal Frequent Subtrees. ICDM ‘03: Proceedings of the Third IEEE International Conference on Data Mining, IEEE Computer Society, 379. Yule, G. U. (1903). Notes on the theory of association of attributes in statistics. Biometrika, 2(2), 121–134. Zaki, M. J., Parthasarathy, S., Ogihara, M., & Li, W. (1997). New algorithms for fast discovery of association rules. Technical Report 651, Computer Science Department, University of Rochester, Rochester, NY 14627. Zaki, M. J., & Hsiao, C. (2005). Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure. IEEE Transactions on Knowledge and Data Engineering, 17, 462-478. Zäiane, O. R., El-Hajj, M., & Lu, P. (2001). Fast Parallel Association Rule Mining without Candidacy Generation. In First IEEE International Conference on Data Mining (pp. 665-668).

10


				
DOCUMENT INFO
Shared By:
Stats:
views:240
posted:6/15/2009
language:English
pages:10
Description: It provides a systematic collection on post-mining, summarization and presentation of association rules, and new forms of association rules. here is 1st chapter of the book..