# Mining Association Rules with Constraints

Document Sample

```					Mining Association Rules with
Constraints

Wei Ning
Joon Wong
COSC 6412 Presentation

1
Outline
   Introduction
   Summary of Approach
   Algorithm CAP
   Performance Analysis
   Conclusion
   References

2
Outline
   Introduction
   Summary of Approach
   Algorithm CAP
   Performance Analysis
   Conclusion
   References

3
Introduction
   Recall mining association rules

   Association rules mining finds
interesting association or correlation
relationships among a large set of data
items.

4
Some problems we met during
mining association rules
   Overwhelming?
   Not what you want?
   Wait so long?
   Lack of Focus

5
Introduction(cont.)
   Example in walmart
   Suppose a manager want to find which
is the most popular shoes in winter?

6
Outline
   Introduction
   Summary of Approach
   Algorithm CAP
   Performance Analysis
   Conclusion
   References

7
Mining frequent itemsets vs.
Mining association rules
   Mining frequent itemsets is almost the
same as Mining association rules

8
Constrained Mining
   A naive solution
   First find all frequent sets, and then test
them for constraint satisfaction
   Our approach:
Analyze the properties of constraints
comprehensively
Push them as deeply as possible

inside the frequent pattern
computation.
9
Frequent Itemsets &
Constraints
   Given a transaction database
TDB (min_sup=2)
TID       Transaction       Frequent itemset: a subset of
10           a, b, c         items frequently appear in
20          b, c, d, f       transactions, e.g. {a, c}
30            a, c
   Constraint: a predicate over
Item Value
a       40
itemsets
b       10                C(I): sum(I)>50
c      -20
   C(abd)= true
d       10
e      -30
10
Mining Frequent Itemsets With
Constraints
   Given
   A transaction database TDB
   A support threshold min_sup
   A constraint C
   Find the complete set of frequent itemsets
satisfying the constraint
   Use constraint to
   Express user’s focus
   Improve both effectiveness and efficiency

11
Classification of Constraints
   We have the following classification of
constraints
   Anti-monotone
   Monotone
   Succinct
   Convertible
   Convertible anti-monotone
   Convertible monotone
   Strongly convertible
   Inconvertible
12
Anti-Monotone
   Definition 1 (Anti-Monotone): A 1-var
constraint C is anti-monotone if for all
sets S, S’:
S  S’ & S satisfies C  S’ satisfies C.
   Simply, when an intemset S violates
the constraint, so does any of its
superset
13
Is Min(S)  v anti-monotone?
S={5, 10, 14}, v = 7
 Min(S)  7
{5} violates it.
Superset {5}: {5, 10}, {5, 14}, {5, 10 , 14}

So does {5, 10}, {5, 14}, {5, 10 , 14}
Min(S)  v is anti-monotone
14
Succinct
   Definition 2 (Succinct)
   I  Item is a succinct set if it can be expressed as
p(Item) for some selection predicate p.
   SP  2Item is a succinct powerset if there is a fixed
number of succinct sets Item1, … Itemk  Item
such that SP can be expressed in terms of the
strict powersets of Item1,…,Itemk, using union
and minus.
   Finally, a 1-var constraint C is succinct provided
SATc(Item) is a succinct powerset.

15
Succinct
   General idea: we can enumerate all and
only those sets that are guaranteed to
satisfy the constraint.
   If a constraint is succinct, we can
directly generate precisely the sets that
satisfy it.

16
Succinct example
   Itemset containing a or b

   Itemset containing some item with value
more than 30

17
Succinct example
   C1  Item.Price  100
   Item 1 = Item.price  100(Item)={a,b}
   2Item1={{a}, {b}, {a, b}}
   SATc1 = {{a}, {b}, {a, b}}
   SATc1 = 2Item1
   C1 is succinct

18
Convertible
   Convert tough constraints into anti-
monotone or monotone by properly
order items

19
Convertible
   Definition:
   R is an order of items
   Convertible anti-monotone
   Itemset X satisfies constraint  so does
every prefix of X w.r.t. R

20
Convertible example
   constraint C: avg(X)  25
Item   Value   Item   Value
   Order items in value-
a      40      a      40
descending order
b       0      f      30
   <a, f, g, d, b, h, c, e>    c      -20     g      20
   Itemset afd satisfies C          d      10      d      10

   So do prefixes a and af     e      -30     b       0
f      30      h      -10
   Thus, it becomes
g      20      c      -20
   Anti-monotone!
h      -10     e      -30

21
Constraints— A General
Picture
Constraint            Antimonotone   Monotone      Succinct
vS                     no           yes          yes
SV                     no           yes          yes
SV                    yes            no          yes
min(S)  v                no           yes          yes
min(S)  v               yes            no          yes
max(S)  v                yes            no          yes
max(S)  v                 no           yes          yes
count(S)  v              yes            no        weakly
count(S)  v               no           yes        weakly
sum(S)  v ( a  S, a          yes            no          no
0)
sum(S)  v ( a  S, a           no           yes          no
0)
range(S)  v              yes            no          no
range(S)  v               no           yes          no
avg(S)  v,   { , ,     convertible   convertible     no
}                                                       22
Optional Proof of min(S)  v is
Anti-monotone
   According to the table, min(S)  v is
both anti-monotone and succinct.
   I only proof anti-monotone here due to
time limitation.

   Something special…

23
Constraint Classification

Monotone
Antimonotone

Strongly
convertible
Succinct

Convertible                     Convertible
anti-monotone                   monotone

Inconvertible
24
Summary of Approach
Recapitulation
   Basic idea about mining frequent
itemsets with constraints.
   Introduce several important constraints.

25
Outline
   Introduction
   Summary of Approach
   Algorithm CAP
   Performance Analysis
   Conclusion
   References

26
Algorithms
   There are many algorithms in solving
constrained based association rules
mining.
   Algorithm   Direct
   Algorithm   MultiJoins & Reorder
   Algorithm   Apriori†
   Algorithm   Hybrid(m)
   Algorithm   CAP (Main Focus)

27
Design of Algorithm
   Sound
   An algorithm is sound provided it only finds
frequent sets that satisfy the given
constraints.

   Complete
   An algorithm is complete provided all
frequent sets satisfying the given
constraints are found.

28
Algorithm Apriori†
   Main idea : Use Apriori Algorithm to get
the frequent item sets. Then apply the
constraints on the item sets found.

   Step 1) Apriori with Cfreq
   Step 2) Apply C – Cfreq to get final Ans

29
Algorithm Apriori† (Pseudocode)
1. C1 consists of sets of size 1; k = 1; Ans = ;
2. While (Ck not empty) {
2.1 conduct db scan to form Lk from Ck;
2.2 form Ck+1 from Lk based on Cfreq; k++; }
3. For each set S in some Lk:
Add S to Ans if S satisfies (C – Cfreq).

30
The Apriori† Algorithm — An Example
Itemset sup
Itemset sup
{A}   2     L1
Database TDB                C1                             {A}   2
{B}   3
Tid     Items                                              {B}   3
{C}   3
10      A, C, D        1st scan                            {C}   3
{D}    1
20      B, C, E                                            {E}   3
{E}   3
30     A, B, C, E
40        B, E             C2     Itemset sup            C2   Itemset
{A, B}  1
2nd scan      {A, B}
L2    Itemset sup                  {A, C}  2
{A, E}  1                   {A, C}
{A, C}  2
{B, C}  2                   {B, C}  2                   {A, E}
{B, E}  3                   {B, E}  3                   {B, C}
{C, E}  2                   {C, E}  2                   {B, E}
{C, E}
C3   Itemset         3rd scan     L3   Itemset sup
{B, C, E}                         {B, C, E} 2
The Apriori† Algorithm — An Example
(cont.)
L1   Itemset sup
{A}   2       Constraint :
Database TDB               {B}   3    {A, C, E}  T.Item
{C}   3
Tid    Items               {E}   3          Ans
10     A, C, D                              {A}
20     B, C, E     L2   Itemset sup         {C}
30    A, B, C, E         {A, C}  2          {E}
40       B, E            {B, C}  2         {A, C}
{B, E}  3         {C, E}
{C, E}  2

L3   Itemset sup
{B, C, E} 2
Algorithm CAP
   Succinct and Anti-monotone
   Strategy I: Replace C1 in the Apriori Algorithm by
C1C.

   Anti-monotone but non-succinct
   Strategy II: Define Ck as in the Apriori Algorithm.
Drop a set S  Ck from counting if S fails C, i.e.,
constraint satisfaction is tested before counting is
done.

33
Algorithm CAP (cont.)
   Succinct but non-anti-monotone
   Strategy III: Too Complicated. To be discussed
later…

   Non-succinct & non-anti-monotone
   Strategy IV: Induce any weaker constraint C1 from
C. Depending on whether C1 is anti-monotone
and/or succinct, use one of the strategies I-III
above for the generation of frequent set.

34
Algorithm CAP (Pseudocode)
1 if Csam  Csuc  Cnone is non-empty, prepare C1 as indicated in
Strategies I, III, and IV; k = 1;
2 if Csuc is non-empty {
2.1 conduct db scan to form L1 as indicated in Strategy III;
2.2 form C2 as indicated in Strategy III; k = 2;}
3 while (Ck not empty) {
3.1 conduct db scan to form Lk from Ck;
3.2 form Ck+1 from Lk based on Strategy III if Csuc is non-empty,
and Strategy II for constraints in Cam;}
4. if Cnone is empty, Ans = ULk. Otherwise, for each set S in some
Lk, add S to Ans iff S satisfies Cnone.

35
The Algorithm CAP — An Example

Constraints : {A, C, E}  T.Item & min support count = 2
Question : Which strategy should we apply?

Database TDB
Tid    Items
10     A, C, D
20     B, C, E
30    A, B, C, E
40       B, E
The Algorithm CAP — An Example
(Cont.)              L Itemset sup               1
Database TDB                 Apply Strategy I!!!            {A}    2
Tid     Items               C1 Itemset sup                  {C}    3
10      A, C, D      1st scan     {A}     2                 {E}    3
20      B, C, E                   {C}     3
30     A, B, C, E                 {E}     3            C2     Itemset
40        B, E                                                 {A, C}
C2                   2nd scan         {A, E}
Itemset sup
L2 Itemset sup                                                 {C, E}
{A, C}  2
{A, C}  2
{A, E}  1
{C, E}  2                                                      Ans
{C, E}  2
{A}
{C}
{E}
C3   Itemset
Because {A, E} is pruned earlier             {A, C}
{}                                                  {C, E}
Case 3 : Succinct but not anti-
monotone. Revisit…

{1} {2} {3} {4} {5} {6} {7} {8} {9} {10}               min (S) < 5

{1} {2} {3} {4}                        Apriori

{1} {2} {3} {4}
{1,2} {2,3}………{3,4}
Some possible frequent sets may                           ………
be lost: e.g. {1,8} {1,2,10}                 {1,2,3,4}

**Information extracted from past presentation.
38
Case 3 : Succinct but not anti-
monotone. Continue…
   Algorithm Direct
   Idea : Play it safe. Generate Cck+1 by using
Lck x F where F is the set of all frequent
items.

   Algorithm MultiJoins
   Algorithm Reorder

39
Outline
   Introduction
   Summary of Approach
   Algorithm CAP
   Performance Analysis
   Conclusion
   References

40
Performance Analysis
(Specification)
   Programs written in C
   Generate transactional databases using
Center
   100,000 records, domain of 1,000 items
   Page size 4KB
   SPARC-10 environment

41
Performance Analysis
(Terminology)
   Speedup
   Comparison of execution time between two
algorithms.

   Item Selectivity
   x% of them items satisfying the constraints.

   Support Threshold
   *Low support threshold means more frequent set
to process.

42
Performance Analysis
   Note: Support threshold
set at 0.5%.

   For 10% selectivity,
CAP runs 80 times
faster than Apriori†!

   For 30% selectivity, the
times.

43
Performance Analysis
   Note: Item Selectivity
fixed at 30%.

   Support threshold goes
up, frequent item set
goes down, Apriori†
improves.

   CAP still at least 8 times
faster.

44
Performance Analysis
Support          L1       L2        L3       L4       L5      L6      L7     L8

0.2%       174/582   79/969   29/1140   8/1250   1/934   0/451   0/132   0/20

0.6%       98/313     1/12      0/1       0       0       0       0       0

   Each entry is of the form a/b
     a is the # of frequent set satisfying the constraint.
     B is the total number of frequent set.
   For L4 with support of 0.2%, Apriori† finds 1250
frequent sets where 8 of which is found by CAP.

45
Conclusion
   The idea of anti-monotonicity,
succinctness, and convertible are
introduced in the paper.

   Sound, complete, and efficient
algorithms are introduced for the
constraint based association rule mining.

46
Reference
   R. Srikant, Q. Vu, and R. Agrawal. Mining association
rules with item constraints. KDD’97.

   R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang.
Exploratory mining and pruning optimizations of
constrained associations rules. SIGMOD’98.

   J. Pei and J. Han. Can we push more constraints into
frequent pattern mining? KDD’00.

47

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 4 posted: 5/15/2012 language: pages: 47