Tutorial on Association Rule Mining by BhushanNakod

VIEWS: 11 PAGES: 30

More Info
									Outline              Quick Review        Apriori Algorithm   FP-Growth Algorithm             Project




                       Tutorial on Association Rule Mining

                                           Yang Yang
                                    yang.yang@itee.uq.edu.au

                                        DKE Group, 78-625


                                        August 5, 2011




Y.Yang                                                                             DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review   Apriori Algorithm   FP-Growth Algorithm             Project




Outline


          1 Quick Review

          2 Apriori Algorithm


          3 FP-Growth Algorithm

          4 Project




Y.Yang                                                                        DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review    Apriori Algorithm    FP-Growth Algorithm             Project



Motivations


Why Association Rule Mining?

              Beer and Nappies—A Data Mining Urban Legend
                  - Fathers were stopping off at Wal-Mart to buy nappies for their
                    babies, would buy beer as well.




Y.Yang                                                                          DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review    Apriori Algorithm    FP-Growth Algorithm             Project



Motivations


Why Association Rule Mining?

              Beer and Nappies—A Data Mining Urban Legend
                  - Fathers were stopping off at Wal-Mart to buy nappies for their
                    babies, would buy beer as well.
              Google Autocomplete




Y.Yang                                                                          DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review     Apriori Algorithm    FP-Growth Algorithm             Project



Motivations


Why Association Rule Mining?

              Beer and Nappies—A Data Mining Urban Legend
                  - Fathers were stopping off at Wal-Mart to buy nappies for their
                    babies, would buy beer as well.
              Google Autocomplete
                  - Google’s algorithm predicts and displays search queries based
                    on other users’ search activities.




Y.Yang                                                                           DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review   Apriori Algorithm   FP-Growth Algorithm             Project



Motivations


More Applications
              Recommendation
                  - Amazon Book Recommendation




Y.Yang                                                                        DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review   Apriori Algorithm   FP-Growth Algorithm             Project



Motivations


More Applications
              Recommendation
                  - Amazon Book Recommendation




Y.Yang                                                                        DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review   Apriori Algorithm   FP-Growth Algorithm             Project



Motivations


More Applications
              Recommendation
                  - Amazon Book Recommendation




                  - Tag Recommendation, e.g. Youtube, Flickr
Y.Yang                                                                        DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review     Apriori Algorithm     FP-Growth Algorithm             Project



Quick Review


Quick Review



               What are Association Rules?
                  - Frequent patterns/behaviors, correlations, among items or
                    objects.
               What are Association Rules used for?
                  - Predict what may happen in future.
               Where are Association Rules mined?
                  - Transactional databases, relational databases, etc.




Y.Yang                                                                            DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline                Quick Review       Apriori Algorithm      FP-Growth Algorithm             Project



Notation Highlights


Notation Highlights

               Items I = {i1 , i2 , · · · , im }, Transactions D = {t1 , t2 , · · · , tn }
               k-itemset X , an itemset containing k items
               Support
                      - P(X ) = Number of transactions containing X
                                   Total Number of Transactions
                      - How often X appears in transactions of D.
                      - Frequent if P(X ) ≥ min sup.
               Association Rule X ⇒ Y
               Confidence
                      - P(Y |X ) = P(X ,Y ) = Support(X ∪Y )
                                    P(X )      Support(X )
                      - How likely Y happens when X happens.
                      - Strong if P(X ∪ Y ) ≥ min sup ∧ P(Y |X ) ≥ min conf .

Y.Yang                                                                                 DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review   Apriori Algorithm   FP-Growth Algorithm             Project




Apriori Algorithm


              Apriori Property: All nonempty subsets of a frequent itemset
              must also be frequent.




Y.Yang                                                                        DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review    Apriori Algorithm   FP-Growth Algorithm             Project




Apriori Algorithm


              Apriori Property: All nonempty subsets of a frequent itemset
              must also be frequent.
              An iterative approach where (k − 1)-itemsets are used to
              explore k-itemsets.
              Candidate Generation
                  - Join and Prune
              Test
                  - Compare candidate’s support with threshold




Y.Yang                                                                         DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review   Apriori Algorithm   FP-Growth Algorithm             Project




Pseudo Code
          Input : D, transactional database and min sup, user-specified
                   threshold;
          Output: L, ALL frequent itemsets in D
      1   L1 =find frequent 1-itemsets (D);
      2   for k = 2; Lk−1 = ∅; k + + do
      3       Ck = apriori gen(Lk−1 );
      4       foreach transaction t in D do
      5           Ct = subset(Ck , t);
      6           foreach candidate c ∈ Ct do
      7               c.count + +;
      8           end
      9       end
    10        Lk = {c ∈ Ck |c.count ≥ min sup};
    11    end
    12    return L = ∪k Lk ;
Y.Yang                                                                        DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review    Apriori Algorithm   FP-Growth Algorithm             Project




Candidate Generation — Join and Prune



              Self-Join Lk−1
                  - Two (k − 1)-itemsets having one and only one different item
                    can be joined.
              Prune
                  - Candidates containing non-frequent subsets are removed.




Y.Yang                                                                         DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review   Apriori Algorithm   FP-Growth Algorithm             Project




Generating Rules from Frequent Itemsets




              For each frequent itemset ℓ, generate all nonempty subsets of
              ℓ;
              For each nonempty subset s of ℓ, return the rule
              “s ⇒ (ℓ − s)” if confidence(s ⇒ (ℓ − s)) ≥ min conf .




Y.Yang                                                                        DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review   Apriori Algorithm   FP-Growth Algorithm             Project




A toy example

              Use the Apriori algorithm to find the strong association rules
              (min sup = 0.5 and min conf = 0.75) in the following
              database.
                                    TID Transaction
                                      1   {a, b, c, d}
                                      2   {a, c, d}
                                      3   {a, b, c}
                                      4   {b, c, d}
                                      5   {a, b, c}
                                      6   {a, b, c}
                                      7   {c, d, e}
                                      8   {a, c}
Y.Yang                                                                        DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review     Apriori Algorithm    FP-Growth Algorithm             Project




Issues with Apriori


              Pros
                  - Basic idea is straightforward and easy to understand.
                  - Efficient in dealing with small-scale dataset.
              Cons
                  - But we cannot avoid Candidate generation...
                  - Also we have to scan database again and again for test!




Y.Yang                                                                           DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review     Apriori Algorithm    FP-Growth Algorithm             Project




Issues with Apriori


              Pros
                  - Basic idea is straightforward and easy to understand.
                  - Efficient in dealing with small-scale dataset.
              Cons
                  - But we cannot avoid Candidate generation...
                  - Also we have to scan database again and again for test!
              Can we design a method that mines the complete set of
              frequent itemsets without candidate generation and repeatedly
              database scan?



Y.Yang                                                                           DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review    Apriori Algorithm    FP-Growth Algorithm             Project




Frequent-Pattern Growth


              Constructing Frequent-Pattern Tree (FP-Tree)
                  - Compress database to avoid scanning database again and
                    again.
                  - Retain itemset association information.
              Mining FP-Tree (Divide-and-Conquer)
                  - Frequent items and their corresponding conditional FP-Tree.
                  - Mine each conditional FP-Tree recursively and concatenate the
                    result with its frequent item.




Y.Yang                                                                          DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review   Apriori Algorithm   FP-Growth Algorithm             Project




FP-Tree Construction

              Scan database once and find out frequent items F . Sort F in
              support count descending order L.
              Create the root of FP-Tree and label it as null.
              Scan database again. For each transaction, select and sort
              frequent items in L-order. Create a branch in the tree if there
              is no common prefix in the path of the tree. The counting is
              performed for the items in the transaction along the path of
              the tree.
              An item header table is used to record the occurrences of
              items via a chain.


Y.Yang                                                                        DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review      Apriori Algorithm   FP-Growth Algorithm             Project




Mining FP-Tree

              Recursive Algorithm
                 1   For each frequent 1-itemset (suffix pattern) in header table,
                     find all conditional pattern bases ( the set of prefix paths
                     co-occurring with it). All items along the path have the same
                     counting with this suffix pattern.
                 2   Based on the conditional pattern bases, construct the
                     corresponding conditional FP-Tree, and then recursively
                     perform the mining algorithm on this tree.
                 3   Concatenate the suffix pattern with the frequent patterns
                     generated from a conditional FP-Tree.




Y.Yang                                                                           DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review      Apriori Algorithm    FP-Growth Algorithm             Project




Mining FP-Tree

              Recursive Algorithm
                 1   For each frequent 1-itemset (suffix pattern) in header table,
                     find all conditional pattern bases ( the set of prefix paths
                     co-occurring with it). All items along the path have the same
                     counting with this suffix pattern.
                 2   Based on the conditional pattern bases, construct the
                     corresponding conditional FP-Tree, and then recursively
                     perform the mining algorithm on this tree.
                 3   Concatenate the suffix pattern with the frequent patterns
                     generated from a conditional FP-Tree.
              Stop Condition
                  - Tree contains a single path P
                  - Return all possible combinations in P

Y.Yang                                                                            DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review   Apriori Algorithm   FP-Growth Algorithm             Project




The toy example again...

              Use the FP-Growth algorithm to find the rules with with
              min sup = 0.5 and min conf = 0.75 in the following database.
                                 TID Transaction
                                   1    {a, b, c, d}
                                   2    {a, c, d}
                                   3    {a, b, c}
                                   4    {b, c, d}
                                   5    {a, b, c}
                                   6    {a, b, c}
                                   7    {c, d, e}
                                   8    {a, c}

Y.Yang                                                                        DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review   Apriori Algorithm     FP-Growth Algorithm             Project




The toy example again...

              L-order
                                    TID        Transaction
                                     1         {c, a, b, d}
                                     2         {c, a, d}
                                     3         {c, a, b}
                                     4         {c, b, d}
                                     5         {c, a, b}
                                     6         {c, a, b}
                                     7         {c, d}
                                     8         {c, a}


Y.Yang                                                                          DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review   Apriori Algorithm   FP-Growth Algorithm             Project




Project Example — Mining Flickr and Tag
Recommendation

              Objective: Recommend tags for Flickr users when they upload
              images.




Y.Yang                                                                        DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review    Apriori Algorithm    FP-Growth Algorithm             Project




Project Example — Mining Flickr and Tag
Recommendation

              Objective: Recommend tags for Flickr users when they upload
              images.
              Data Collection
                  - Use Flickr API to collect image tag dataset.




Y.Yang                                                                          DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review     Apriori Algorithm    FP-Growth Algorithm             Project




Project Example — Mining Flickr and Tag
Recommendation

              Objective: Recommend tags for Flickr users when they upload
              images.
              Data Collection
                  - Use Flickr API to collect image tag dataset.
              Data Preprocessing (Text Mining)
                  - Meaningless words, misspelling, etc.




Y.Yang                                                                           DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review     Apriori Algorithm    FP-Growth Algorithm             Project




Project Example — Mining Flickr and Tag
Recommendation

              Objective: Recommend tags for Flickr users when they upload
              images.
              Data Collection
                  - Use Flickr API to collect image tag dataset.
              Data Preprocessing (Text Mining)
                  - Meaningless words, misspelling, etc.
              Mine user tagging behaviors/patterns
                  - Association Rule Mining (Apriori or FP-Tree?)
                  - Efficiency (Clustering?)



Y.Yang                                                                           DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review     Apriori Algorithm    FP-Growth Algorithm             Project




Project Example — Mining Flickr and Tag
Recommendation

              Objective: Recommend tags for Flickr users when they upload
              images.
              Data Collection
                  - Use Flickr API to collect image tag dataset.
              Data Preprocessing (Text Mining)
                  - Meaningless words, misspelling, etc.
              Mine user tagging behaviors/patterns
                  - Association Rule Mining (Apriori or FP-Tree?)
                  - Efficiency (Clustering?)
              Tag Recommendation

Y.Yang                                                                           DKE Group, 78-625
INFS4203/7203 Tutorial 1
Outline              Quick Review   Apriori Algorithm   FP-Growth Algorithm             Project




Y.Yang                                                                        DKE Group, 78-625
INFS4203/7203 Tutorial 1

								
To top