Your Federal Quarterly Tax Payments are due April 15th

# Homework3 Association Rule Mining by umf86597

VIEWS: 1,281 PAGES: 2

• pg 1
```									                 Homework 3: Association Rule Mining
Release date: October 28, 2003
Due date: November 6, 2003

1. Describe the diﬀerences between Association rule mining and Classiﬁcation in terms of data
used and results of learning. (10 points)

(a) Classiﬁcation algorithms need the class attribute, while the Association rule mining
algorithms, do not need it. In fact, association rule algorithms can consider the class
attribute as just another attribute. (5 points)
(b) Rules from classiﬁcation algorithms have only the class attribute on the RHS of the
rules, while in the case of association rule algorithms, class attribute could be on either
side of the rule and other attributes can also be on the RHS of the rule. (5 points)

2. The following contingency table summarizes supermarket transaction data, where hotdogs
refers to the transactions containing hot dogs, hotdogs refers to the transactions that do
not contain hot dogs, hamburgers refers to the transactions containing hamburgers, and
hamburgers refers to the transactions that do not contain hamburgers. (10 points)

hotdogs    hotdogs    row
hamburgers      2000        500     2500
hamburgers      1000       1500     2500
col       3000       2000     5000

Suppose the association rule (hotdogs ⇒ hamburgers) is mined. Given a minimum support
threshold of 25% and a minimum conﬁdence threshold of 50%, is this association rule strong?
Support(hotdogs ⇒ hamburgers) = P(hotdogs AND hamburgers) = 2000/5000 = 40%.
Conﬁdence(hotdogs ⇒ hamburgers) = P(hamburgers | hotdogs)
= P(hotdogs.hamburgers)/P(hotdogs) = 0.4/0.6 = 66.67%. (5 points)
So support and conﬁdence of rules(40%, 66.67%) are greater than the minimum support
threshold of 25% and the minimum conﬁdence threshold of 50%. So this association rule is
strong. (5 points)

3. Use the hash-based technique, introduced in class, to create candidate 3-itemsets (C3 ) for
the following transactions. Use hash index = 7, and min supp = 50%. (10 points)

T ID       items
T 100    {1, 3, 4}
T 200    {2, 3, 5}
T 300   {1, 2, 3, 5}
T 400     {2, 5}
Answer: Let the order of the items be 1,2,3,4,5. The hashing function h3 is,
h3 (x, y, z) = ((order of x)*102 + (order of y)*10 + (order of z)) mod 7.
h3 (1, 2, 3) = ((1)*100 + (2)*10 + (3)) mod 7 = 123 mod 7 = bin 4.
h3 (1, 2, 4) = ((1)*100 + (2)*10 + (4)) mod 7 = 124 mod 7 = bin 5.
h3 (1, 2, 5) = ((1)*100 + (2)*10 + (5)) mod 7 = 125 mod 7 = bin 6.
h3 (1, 3, 4) = ((1)*100 + (3)*10 + (4)) mod 7 = 134 mod 7 = bin 1.
h3 (1, 3, 5) = ((1)*100 + (3)*10 + (5)) mod 7 = 135 mod 7 = bin 2.
h3 (1, 4, 5) = ((1)*100 + (4)*10 + (5)) mod 7 = 145 mod 7 = bin 5.
h3 (2, 3, 4) = ((2)*100 + (3)*10 + (4)) mod 7 = 234 mod 7 = bin 3.
h3 (2, 3, 5) = ((2)*100 + (3)*10 + (5)) mod 7 = 235 mod 7 = bin 4.
h3 (2, 4, 5) = ((2)*100 + (4)*10 + (5)) mod 7 = 245 mod 7 = bin 0.
h3 (3, 4, 5) = ((3)*100 + (4)*10 + (5)) mod 7 = 345 mod 7 = bin 2.

bin#0   bin#1   bin#2         bin#3   bin#4      bin#5   bin#6
itemsets   {245}   {134}   {135}         {234}   {235}      {124}   {125}
{345}                 {123}      {145}
counts      0        1       1            0        3          0      1     (5 points)

So as min supp is 50%, min supp count is 2. The only bin which has count more than or
equal to 2 is bin#4. So C3 is [{1,2,3}, {2,3,5}]. (5 points)

4. A database has 4 transactions as shown. Let min sup = 60% and min conf = 80%. (In
text, Pg 272). (10 points)

T ID      items bought
T 100     {K, A, D, B}
T 200    {D, A, C, E, B}
T 300     {C, A, B, E}
T 400       {B, A, D}

a) Find ALL frequent itemsets for the database using the Apriori algorithm. (5 points)
Answer: Since min sup = 60%, the min sup count is 3 (after rounding 2.4)
The frequent itemsets of various sizes are as follows:

1-itemsets       2-itemsets        3-itemsets
{A} 4 times     {A,B} 4 times    {A,B,D} 3 times
{B} 4 times     {A,D} 3 times
{D} 3 times     {B,D} 4 times

The set of all frequent itemsets are{ {A},{B},{D},{A,B},{A,D},{B,D},{A,B,D} }
b) List all strong association rules (with their support and conﬁdence) matching the fol-
lowing metarule, where X is a variable representing the customers, and itemi denotes
the variables representing the items (e.g., ”A”, ”B”, etc.). (5 points)