# Association Rule by suchenfz

VIEWS: 8 PAGES: 16

• pg 1
```									Association Rule.
Association rule mining
 It is an important data mining model studied
extensively by the database and data mining
community.
 Assume all data are categorical.
 Initially used for Market Basket Analysis to
find how items purchased by customers are
related.
Transaction data: supermarket data
t2: {apple, eggs, salt, yogurt}
…              …
tn: {biscuit, eggs, milk}
 Concepts:
• An item: an item/article in a basket
• I: the set of all items sold in the store
• A transaction: items purchased in a basket; it
may have TID (transaction ID)
• A transactional dataset: A set of transactions
The model: rules
A transaction t contains X, a set of items
(itemset) in I, if X  t.
An association rule is an implication of the
form:
X  Y, where X, Y  I, and X Y = 

An itemset is a set of items.
• E.g., X = {milk, bread, cereal} is an itemset.
A k-itemset is an itemset with k items.
• E.g., {milk, bread, cereal} is a 3-itemset
Rule strength measures
 Support: The rule holds with support sup in T (the
transaction data set) if sup% of transactions
contain X  Y.
sup = Pr(X  Y)= Count (XY)/total count.
 Confidence: The rule holds in T with confidence
conf if conf% of transactions that contain X also
contain Y.
• conf = Pr(Y | X)=support(X,Y)/support(X).
 An association rule is a pattern that states when X
occurs, Y occurs with certain probability.
 Goal of Association Rule.
 Find all rules that satisfy the user-
specified minimum support (minsup) and
minimum confidence (minconf).
An Example.
 Transaction data                               •t1: Beef, Chicken, Milk
 Assume:
minsup = 30%                            •t2: Beef, Cheese
minconf = 80%                           •t3: Cheese, Boots
 An example frequent itemset:                   •t4: Beef, Chicken, Cheese
{Chicken, Clothes, Milk}          [sup = 3/7]   •t5: Beef, Chicken, Clothes,
 Association rules from the itemset:            Cheese, Milk
Clothes  Milk,Chicken[sup = 3/7, conf = 3/3]    •t6: Chicken, Clothes, Milk
…                              …
Clothes, Chicken  Milk[sup = 3/7, conf = 3/3]   •t7: Chicken, Milk, Clothes
Data set.
 This data set related to retail industry.
 The data set contains information of each
transaction with the transaction ids.
 Each row represent a single transaction ,i.e
information of a single customer.
 For example if a row present the data like this-
customer has taken those mentioned item in a
single transaction.
Objective.
 Here our main objective is to find out the pattern
of buying from this huge data base
 The discovery of such association rule can help
people to develop marketing strategies by
gaining insight into, which items are frequently
purchased together by customer.
 Here we have taken the following parameters,
Minsup=.08
Minconf=.40
Mincorr=.30
Analysis.
showing the frequently
item set with the
support values.
 From the table it is
clear that Fluid milk
has the maximum
frequencies followed
vegetable, Eggs etc.
 This means most of
the customers has
taken these three
 The fifth rule has got highest confidence value 58.83424%,which means
58% of customers who are taking Eggs also taking Fluid milk.
 Similarly 54% of customers who are taking Tomatoes also taking Salad
vegetables.
 Same way 52% of customer who are taking Bread Sandwiches also
taking Fluid milk.
Rule Graph.
 This will represent the entire Association rules
Graphically, which will help us to understand the
entire process in a single snapshot.
 In this graph, the support values for the Body and
Head portions of each association rule are
indicated by the sizes and colors of each circle.
 The thickness of each line indicates the
confidence value (conditional probability of Head
given Body) for the respective association rule.
 The sizes and colors of the circles in the center,
above the Implies label, indicate the joint support
(for the co-occurrences) of the respective Body
and Head components of the respective
association rules.
 In the graphical summary the strongest support value was found for
Fluid milk associated with Bananas, Bread sandwiches, and Eggs.
 From the graph it is also clear that Fluid milk and Eggs has got the
highest confidence value (thickness of these rule is very high).
3D Rule Graph.

 The above graph is the 3D version of the earlier graph.
 From the graph it is clear that Fluid milk and Eggs have the
highest confidence value compared to any other items.
Conclusion.
 According to the rule Fluid milk, Bananas, Bread
sandwiches, Eggs, Salad Vegetables, Grapes, Fruit juice
these items are frequently taken by customers into their
 Also the rule suggest that more than 50% of customers
sandwiches.
 All the above information can be utilized for better
marketing strategies.
 For example retailer can arrange those frequently brought
items very close to each other in the super market so that
customer can get all these items easily.
 Some new products (related to previous items) can also be
placed nearby which will attract to the customers.
Thank You.
Krishnendu Kundu
(Statistician)
StatSoft India.
Email- kkundu@statsoftindia.com
Mobile - +919873119520

```
To top