Document Sample

Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation 1 Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References 2 Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References 3 Introduction Recall mining association rules Association rules mining finds interesting association or correlation relationships among a large set of data items. 4 Some problems we met during mining association rules Overwhelming? Not what you want? Wait so long? Lack of Focus 5 Introduction(cont.) Example in walmart Suppose a manager want to find which is the most popular shoes in winter? 6 Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References 7 Mining frequent itemsets vs. Mining association rules Mining frequent itemsets is almost the same as Mining association rules 8 Constrained Mining A naive solution First find all frequent sets, and then test them for constraint satisfaction Our approach: Analyze the properties of constraints comprehensively Push them as deeply as possible inside the frequent pattern computation. 9 Frequent Itemsets & Constraints Given a transaction database TDB (min_sup=2) TID Transaction Frequent itemset: a subset of 10 a, b, c items frequently appear in 20 b, c, d, f transactions, e.g. {a, c} 30 a, c Constraint: a predicate over Item Value a 40 itemsets b 10 C(I): sum(I)>50 c -20 C(abd)= true d 10 e -30 10 Mining Frequent Itemsets With Constraints Given A transaction database TDB A support threshold min_sup A constraint C Find the complete set of frequent itemsets satisfying the constraint Use constraint to Express user’s focus Improve both effectiveness and efficiency 11 Classification of Constraints We have the following classification of constraints Anti-monotone Monotone Succinct Convertible Convertible anti-monotone Convertible monotone Strongly convertible Inconvertible 12 Anti-Monotone Definition 1 (Anti-Monotone): A 1-var constraint C is anti-monotone if for all sets S, S’: S S’ & S satisfies C S’ satisfies C. Simply, when an intemset S violates the constraint, so does any of its superset 13 Is Min(S) v anti-monotone? S={5, 10, 14}, v = 7 Min(S) 7 {5} violates it. Superset {5}: {5, 10}, {5, 14}, {5, 10 , 14} So does {5, 10}, {5, 14}, {5, 10 , 14} Min(S) v is anti-monotone 14 Succinct Definition 2 (Succinct) I Item is a succinct set if it can be expressed as p(Item) for some selection predicate p. SP 2Item is a succinct powerset if there is a fixed number of succinct sets Item1, … Itemk Item such that SP can be expressed in terms of the strict powersets of Item1,…,Itemk, using union and minus. Finally, a 1-var constraint C is succinct provided SATc(Item) is a succinct powerset. 15 Succinct General idea: we can enumerate all and only those sets that are guaranteed to satisfy the constraint. If a constraint is succinct, we can directly generate precisely the sets that satisfy it. 16 Succinct example Itemset containing a or b Itemset containing some item with value more than 30 17 Succinct example C1 Item.Price 100 Item 1 = Item.price 100(Item)={a,b} 2Item1={{a}, {b}, {a, b}} SATc1 = {{a}, {b}, {a, b}} SATc1 = 2Item1 C1 is succinct 18 Convertible Convert tough constraints into anti- monotone or monotone by properly order items 19 Convertible Definition: R is an order of items Convertible anti-monotone Itemset X satisfies constraint so does every prefix of X w.r.t. R 20 Convertible example constraint C: avg(X) 25 Item Value Item Value Order items in value- a 40 a 40 descending order b 0 f 30 <a, f, g, d, b, h, c, e> c -20 g 20 Itemset afd satisfies C d 10 d 10 So do prefixes a and af e -30 b 0 f 30 h -10 Thus, it becomes g 20 c -20 Anti-monotone! h -10 e -30 21 Constraints— A General Picture Constraint Antimonotone Monotone Succinct vS no yes yes SV no yes yes SV yes no yes min(S) v no yes yes min(S) v yes no yes max(S) v yes no yes max(S) v no yes yes count(S) v yes no weakly count(S) v no yes weakly sum(S) v ( a S, a yes no no 0) sum(S) v ( a S, a no yes no 0) range(S) v yes no no range(S) v no yes no avg(S) v, { , , convertible convertible no } 22 Optional Proof of min(S) v is Anti-monotone According to the table, min(S) v is both anti-monotone and succinct. I only proof anti-monotone here due to time limitation. Something special… 23 Constraint Classification Monotone Antimonotone Strongly convertible Succinct Convertible Convertible anti-monotone monotone Inconvertible 24 Summary of Approach Recapitulation Basic idea about mining frequent itemsets with constraints. Introduce several important constraints. 25 Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References 26 Algorithms There are many algorithms in solving constrained based association rules mining. Algorithm Direct Algorithm MultiJoins & Reorder Algorithm Apriori† Algorithm Hybrid(m) Algorithm CAP (Main Focus) 27 Design of Algorithm Sound An algorithm is sound provided it only finds frequent sets that satisfy the given constraints. Complete An algorithm is complete provided all frequent sets satisfying the given constraints are found. 28 Algorithm Apriori† Main idea : Use Apriori Algorithm to get the frequent item sets. Then apply the constraints on the item sets found. Step 1) Apriori with Cfreq Step 2) Apply C – Cfreq to get final Ans 29 Algorithm Apriori† (Pseudocode) 1. C1 consists of sets of size 1; k = 1; Ans = ; 2. While (Ck not empty) { 2.1 conduct db scan to form Lk from Ck; 2.2 form Ck+1 from Lk based on Cfreq; k++; } 3. For each set S in some Lk: Add S to Ans if S satisfies (C – Cfreq). 30 The Apriori† Algorithm — An Example Itemset sup Itemset sup {A} 2 L1 Database TDB C1 {A} 2 {B} 3 Tid Items {B} 3 {C} 3 10 A, C, D 1st scan {C} 3 {D} 1 20 B, C, E {E} 3 {E} 3 30 A, B, C, E 40 B, E C2 Itemset sup C2 Itemset {A, B} 1 2nd scan {A, B} L2 Itemset sup {A, C} 2 {A, E} 1 {A, C} {A, C} 2 {B, C} 2 {B, C} 2 {A, E} {B, E} 3 {B, E} 3 {B, C} {C, E} 2 {C, E} 2 {B, E} {C, E} C3 Itemset 3rd scan L3 Itemset sup {B, C, E} {B, C, E} 2 The Apriori† Algorithm — An Example (cont.) L1 Itemset sup {A} 2 Constraint : Database TDB {B} 3 {A, C, E} T.Item {C} 3 Tid Items {E} 3 Ans 10 A, C, D {A} 20 B, C, E L2 Itemset sup {C} 30 A, B, C, E {A, C} 2 {E} 40 B, E {B, C} 2 {A, C} {B, E} 3 {C, E} {C, E} 2 L3 Itemset sup {B, C, E} 2 Algorithm CAP Succinct and Anti-monotone Strategy I: Replace C1 in the Apriori Algorithm by C1C. Anti-monotone but non-succinct Strategy II: Define Ck as in the Apriori Algorithm. Drop a set S Ck from counting if S fails C, i.e., constraint satisfaction is tested before counting is done. 33 Algorithm CAP (cont.) Succinct but non-anti-monotone Strategy III: Too Complicated. To be discussed later… Non-succinct & non-anti-monotone Strategy IV: Induce any weaker constraint C1 from C. Depending on whether C1 is anti-monotone and/or succinct, use one of the strategies I-III above for the generation of frequent set. 34 Algorithm CAP (Pseudocode) 1 if Csam Csuc Cnone is non-empty, prepare C1 as indicated in Strategies I, III, and IV; k = 1; 2 if Csuc is non-empty { 2.1 conduct db scan to form L1 as indicated in Strategy III; 2.2 form C2 as indicated in Strategy III; k = 2;} 3 while (Ck not empty) { 3.1 conduct db scan to form Lk from Ck; 3.2 form Ck+1 from Lk based on Strategy III if Csuc is non-empty, and Strategy II for constraints in Cam;} 4. if Cnone is empty, Ans = ULk. Otherwise, for each set S in some Lk, add S to Ans iff S satisfies Cnone. 35 The Algorithm CAP — An Example Constraints : {A, C, E} T.Item & min support count = 2 Question : Which strategy should we apply? Database TDB Tid Items 10 A, C, D 20 B, C, E 30 A, B, C, E 40 B, E The Algorithm CAP — An Example (Cont.) L Itemset sup 1 Database TDB Apply Strategy I!!! {A} 2 Tid Items C1 Itemset sup {C} 3 10 A, C, D 1st scan {A} 2 {E} 3 20 B, C, E {C} 3 30 A, B, C, E {E} 3 C2 Itemset 40 B, E {A, C} C2 2nd scan {A, E} Itemset sup L2 Itemset sup {C, E} {A, C} 2 {A, C} 2 {A, E} 1 {C, E} 2 Ans {C, E} 2 {A} {C} {E} C3 Itemset Because {A, E} is pruned earlier {A, C} {} {C, E} Case 3 : Succinct but not anti- monotone. Revisit… {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} min (S) < 5 {1} {2} {3} {4} Apriori {1} {2} {3} {4} {1,2} {2,3}………{3,4} Some possible frequent sets may ……… be lost: e.g. {1,8} {1,2,10} {1,2,3,4} **Information extracted from past presentation. 38 Case 3 : Succinct but not anti- monotone. Continue… Algorithm Direct Idea : Play it safe. Generate Cck+1 by using Lck x F where F is the set of all frequent items. Algorithm MultiJoins Algorithm Reorder 39 Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References 40 Performance Analysis (Specification) Programs written in C Generate transactional databases using program from IBM Almaden Research Center 100,000 records, domain of 1,000 items Page size 4KB SPARC-10 environment 41 Performance Analysis (Terminology) Speedup Comparison of execution time between two algorithms. Item Selectivity x% of them items satisfying the constraints. Support Threshold *Low support threshold means more frequent set to process. 42 Performance Analysis Note: Support threshold set at 0.5%. For 10% selectivity, CAP runs 80 times faster than Apriori†! For 30% selectivity, the speedup is about 10 times. 43 Performance Analysis Note: Item Selectivity fixed at 30%. Support threshold goes up, frequent item set goes down, Apriori† improves. CAP still at least 8 times faster. 44 Performance Analysis Support L1 L2 L3 L4 L5 L6 L7 L8 0.2% 174/582 79/969 29/1140 8/1250 1/934 0/451 0/132 0/20 0.6% 98/313 1/12 0/1 0 0 0 0 0 Each entry is of the form a/b a is the # of frequent set satisfying the constraint. B is the total number of frequent set. For L4 with support of 0.2%, Apriori† finds 1250 frequent sets where 8 of which is found by CAP. 45 Conclusion The idea of anti-monotonicity, succinctness, and convertible are introduced in the paper. Sound, complete, and efficient algorithms are introduced for the constraint based association rule mining. 46 Reference R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. KDD’97. R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. SIGMOD’98. J. Pei and J. Han. Can we push more constraints into frequent pattern mining? KDD’00. 47

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 4 |

posted: | 5/15/2012 |

language: | |

pages: | 47 |

OTHER DOCS BY yurtgc548

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.