Document Sample

Outline Association Rules Transactions and Frequent itemsets Frequent Item Subset Property Analysis Association rules 2 Transactions Example Transaction database: Example, 1 TID Produce TID Products 1 MILK, BREAD, EGGS 1 A, B, E 2 BREAD, SUGAR 2 B, D 3 BREAD, CEREAL 3 B, C 4 MILK, BREAD, SUGAR 4 A, B, D 5 MILK, CEREAL 5 A, C 6 BREAD, CEREAL 6 B, C 7 MILK, CEREAL 7 A, C 8 MILK, BREAD, CEREAL, EGGS 8 A, B, C, E 9 MILK, BREAD, CEREAL 9 A, B, C ITEMS: Instances = Transactions A = milk B= bread C= cereal D= sugar 3 E= eggs 4 Transaction database: Example, 2 Attributes converted to binary flags Definitions TID Products TID A B C D E Item: attribute=value pair or simply value 1 A, B, E 1 1 1 0 0 1 2 B, D 2 0 1 0 1 0 3 4 B, C A, B, D 3 0 1 1 0 0 usually attributes are converted to binary flags for 4 1 1 0 1 0 5 A, C 5 1 0 1 0 0 each value, e.g. product=“A” is written as “A” Itemset I : a subset of possible items 6 B, C 6 0 1 1 0 0 7 A, C 7 1 0 1 0 0 8 9 A, B, C, E A, B, C 8 1 1 1 0 1 Example: I = {A,B,E} (order unimportant) 9 1 1 1 0 0 Transaction: (TID, itemset) TID is transaction ID 5 6 Page 1 1 Support and Frequent Itemsets SUBSET PROPERTY Support (coverage) of an itemset Every subset of a frequent set is frequent! sup(I ) = no. of transactions t that support (i.e. Q: Why is it so? contain) I A: Example: Suppose {A,B} is frequent. Since In example database: each occurrence of A,B includes both A and B, sup ({A,B,E}) = 2, sup ({B,C}) = 4 then both A and B must also be frequent Frequent itemset I is one with at least the Similar argument for larger itemsets minimum support count Almost all association rule algorithms are based sup(I ) >= minsup on this subset property 7 8 From Frequent Itemsets to Association Association Rules Rules Association rule R : Itemset1 => Itemset2 Q: Given frequent set {A,B,E}, what are possible association rules? Itemset1, 2 are disjoint and Itemset2 is non-empty A => B, E meaning: if transaction includes Itemset1 then it also A, B => E has Itemset2 A, E => B Examples B => A, E A,B => E,C B, E => A A => B,C E => A, B __ => A,B,E (empty rule), or true => A,B,E 9 10 Classification vs Association Rules Rule Support and Confidence Suppose R : I => J is an association rule Classification Rules Association Rules sup (R) = sup (I ∪ J) is the support count (or Focus on one target field Many target fields coverage) Specify class in all cases Applicable in some cases support of itemset I ∪ J (I and J) Measures: Measures: conf (R) = sup(R) / sup(I) is the confidence of R (or Accuracy Support (coverage), accuracy) Confidence (accuracy) fraction of transactions with I that also have J Association rules with minimum support and count are sometimes called “strong” rules 11 12 Page 2 2 Association Rules Example, 1 Association Rules Example, 2 TID List of items TID List of items Q: Given frequent set {A,B,E}, 1 A, B, E Q: Given frequent set {A,B,E}, what 1 A, B, E what association rules have 2 B, D association rules have minsup = 2 and 2 B, D minconf= 50% ? minsup = 2 and minconf= 50% ? 3 B, C 3 B, C 4 A, B, D A, B => E : conf=2/4 = 50% 4 A, B, D A, B => E : conf=2/4 = 50% 5 A, C 5 A, C 6 B, C A, E => B : conf=2/2 = 100% 6 B, C 7 A, C 7 A, C B, E => A : conf=2/2 = 100% 8 A, B, C, E 8 A, B, C, E 9 A, B, C E => A, B : conf=2/2 = 100% 9 A, B, C Don’t qualify A =>B, E : conf=2/6 =33%< 50% B => A, E : conf=2/7 = 28% < 50% __ => A,B,E : conf: 2/9 = 22% < 50% 13 14 Find Strong Association Rules Finding Frequent Itemsets Start by finding one-item sets (easy) A rule has the parameters minsup and minconf: Q: How? sup(R) >= minsup and conf (R) >= minconf A: Simply count the frequencies of all items Problem: Find all association rules with given minsup and minconf First, find all frequent itemsets 15 16 Finding itemsets: next level Another example Given: five three-item sets Apriori algorithm (Agrawal & Srikant) (A B C), (A B D), (A C D), (A C E), (B C D) Idea: use one-item sets to generate two-item Lexicographic order improves efficiency sets, two-item sets to generate three-item sets, … Candidate four-item sets: If (A B) is a frequent item set, then (A) and (B) have to (A B C D) Q: OK? be frequent item sets as well! A: yes, because all 3-item subsets are frequent In general: if X is frequent k-item set, then all (k-1)- (A C D E) Q: OK? item subsets of X are also frequent A: No, because (C D E) is not frequent ⇒Compute k-item set by merging (k-1)-item sets 17 18 Page 3 3 Example: Generating Rules Generating Association Rules from an Itemset Frequent itemset from golf data: Two stage process: Humidity = Normal, Windy = False, Play = Yes (4) Determine frequent itemsets e.g. with the Apriori algorithm. Seven potential rules: For each frequent item set I If Humidity = Normal and Windy = False then Play = Yes 4/4 for each subset J of I If Humidity = Normal and Play = Yes then Windy = False 4/6 determine all association rules of the form: I-J => J If Windy = False and Play = Yes then Humidity = Normal 4/6 If Humidity = Normal then Windy = False and Play = Yes 4/7 Main idea used in both stages : subset property If Windy = False then Humidity = Normal and Play = Yes 4/8 If Play = Yes then Humidity = Normal and Windy = False 4/9 If True then Humidity = Normal and Windy = False and Play = Yes 4/12 19 20 Rules for the weather data Weka associations File: weather.nominal.arff MinSupport: 0.2 Rules with support > 1 and confidence = 100%: Association rule Sup. Conf. 1 Humidity=Normal Windy=False ⇒Play=Yes 4 100% 2 Temperature=Cool ⇒Humidity=Normal 4 100% 3 Outlook=Overcast ⇒Play=Yes 4 100% 4 Temperature=Cold Play=Yes ⇒Humidity=Normal 3 100% ... ... ... ... ... 58 Outlook=Sunny Temperature=Hot ⇒Humidity=High 2 100% In total: 3 rules with support four, 5 with support three, and 50 with support two 21 22 Weka associations: output Summary Frequent itemsets Association rules Subset property Apriori algorithm 23 30 Page 4 4

DOCUMENT INFO

Shared By:

Categories:

Tags:
transaction processing, concurrent transactions, rolled back, database transactions, sql statement, distributed transactions, sql statements, isolation level, transaction management, serializable schedules, transaction manager, conflict serializability, transaction isolation levels, how to, roll back

Stats:

views: | 10 |

posted: | 1/11/2010 |

language: | English |

pages: | 4 |

OTHER DOCS BY mercy2beans107

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.