Docstoc

Association Rule

Document Sample
Association Rule Powered By Docstoc
					Association Rule.
      Association rule mining
 It is an important data mining model studied
  extensively by the database and data mining
  community.
 Assume all data are categorical.
 Initially used for Market Basket Analysis to
  find how items purchased by customers are
  related.
Transaction data: supermarket data
 Market basket transactions:
    t1: {bread, cheese, milk}
    t2: {apple, eggs, salt, yogurt}
    …              …
    tn: {biscuit, eggs, milk}
 Concepts:
  • An item: an item/article in a basket
  • I: the set of all items sold in the store
  • A transaction: items purchased in a basket; it
    may have TID (transaction ID)
  • A transactional dataset: A set of transactions
            The model: rules
A transaction t contains X, a set of items
 (itemset) in I, if X  t.
An association rule is an implication of the
 form:
     X  Y, where X, Y  I, and X Y = 

An itemset is a set of items.
  • E.g., X = {milk, bread, cereal} is an itemset.
A k-itemset is an itemset with k items.
  • E.g., {milk, bread, cereal} is a 3-itemset
     Rule strength measures
 Support: The rule holds with support sup in T (the
  transaction data set) if sup% of transactions
  contain X  Y.
   sup = Pr(X  Y)= Count (XY)/total count.
 Confidence: The rule holds in T with confidence
  conf if conf% of transactions that contain X also
  contain Y.
   • conf = Pr(Y | X)=support(X,Y)/support(X).
 An association rule is a pattern that states when X
  occurs, Y occurs with certain probability.
 Goal of Association Rule.
 Find all rules that satisfy the user-
 specified minimum support (minsup) and
 minimum confidence (minconf).
                              An Example.
 Transaction data                               •t1: Beef, Chicken, Milk
 Assume:
         minsup = 30%                            •t2: Beef, Cheese
         minconf = 80%                           •t3: Cheese, Boots
 An example frequent itemset:                   •t4: Beef, Chicken, Cheese
 {Chicken, Clothes, Milk}          [sup = 3/7]   •t5: Beef, Chicken, Clothes,
 Association rules from the itemset:            Cheese, Milk
Clothes  Milk,Chicken[sup = 3/7, conf = 3/3]    •t6: Chicken, Clothes, Milk
    …                              …
Clothes, Chicken  Milk[sup = 3/7, conf = 3/3]   •t7: Chicken, Milk, Clothes
                 Data set.
 This data set related to retail industry.
 The data set contains information of each
  transaction with the transaction ids.
 Each row represent a single transaction ,i.e
  information of a single customer.
 For example if a row present the data like this-
  {Bread sandwich,Milk,Egg,Butter}, it means this
  customer has taken those mentioned item in a
  single transaction.
               Objective.
 Here our main objective is to find out the pattern
  of buying from this huge data base
 The discovery of such association rule can help
  people to develop marketing strategies by
  gaining insight into, which items are frequently
  purchased together by customer.
 Here we have taken the following parameters,
Minsup=.08
Minconf=.40
Mincorr=.30
Analysis.
             The spreadsheet
              showing the frequently
              item set with the
              support values.
             From the table it is
              clear that Fluid milk
              has the maximum
              frequencies followed
              by Bananas ,Salad
              vegetable, Eggs etc.
             This means most of
              the customers has
              taken these three
              items into their basket.
 The fifth rule has got highest confidence value 58.83424%,which means
  58% of customers who are taking Eggs also taking Fluid milk.
 Similarly 54% of customers who are taking Tomatoes also taking Salad
  vegetables.
 Same way 52% of customer who are taking Bread Sandwiches also
  taking Fluid milk.
                Rule Graph.
 This will represent the entire Association rules
  Graphically, which will help us to understand the
  entire process in a single snapshot.
 In this graph, the support values for the Body and
  Head portions of each association rule are
  indicated by the sizes and colors of each circle.
 The thickness of each line indicates the
  confidence value (conditional probability of Head
  given Body) for the respective association rule.
 The sizes and colors of the circles in the center,
  above the Implies label, indicate the joint support
  (for the co-occurrences) of the respective Body
  and Head components of the respective
  association rules.
 In the graphical summary the strongest support value was found for
  Fluid milk associated with Bananas, Bread sandwiches, and Eggs.
 From the graph it is also clear that Fluid milk and Eggs has got the
  highest confidence value (thickness of these rule is very high).
                      3D Rule Graph.




 The above graph is the 3D version of the earlier graph.
 From the graph it is clear that Fluid milk and Eggs have the
  highest confidence value compared to any other items.
                  Conclusion.
 According to the rule Fluid milk, Bananas, Bread
  sandwiches, Eggs, Salad Vegetables, Grapes, Fruit juice
  these items are frequently taken by customers into their
  basket.
 Also the rule suggest that more than 50% of customers
  who are buying Fluid milk also buying Eggs and Bread
  sandwiches.
 All the above information can be utilized for better
  marketing strategies.
 For example retailer can arrange those frequently brought
  items very close to each other in the super market so that
  customer can get all these items easily.
 Some new products (related to previous items) can also be
  placed nearby which will attract to the customers.
         Thank You.
    Krishnendu Kundu
       (Statistician)
      StatSoft India.
Email- kkundu@statsoftindia.com
Mobile - +919873119520

				
DOCUMENT INFO