# A Rough Set

Document Sample

```					 Evaluating Rules Using
Rough Sets Theory

Jiye Li                   Nick Cercone
University of Waterloo       York University
j27li@uwaterloo.ca        ncercone@yorku.ca
Motivation

•   Rule Discovery
•   Too many rules
•   How to find important rules?
•   New Usage of Rough Sets
– From Attributes Selection to Rule Discovery

Oct 30 ~ 31, 2006   KDM Workshop, Waterloo,          2
Decision Table
T=(U,C,D). U is the set of objects.
C is the set of condition attributes.
D is the set of decision attributes.
Condition Attributes                    Decision Attributes

Attribute 1    Attribute 2    …                   Attribute n     Decision
Attributes

…              …              …                   …               …
…              …              …                   …               …
…              …              …                   …               …

Oct 30 ~ 31, 2006                 KDM Workshop, Waterloo,                             3
Example

Artificial Data Set
Condition Attributes   Decision Attribute
• 14 instances
• No missing
attribute values
• 8 condition
attributes
• 1 decision
attribute

Oct 30 ~ 31, 2006    KDM Workshop, Waterloo,                       4
Rule Set Generated

Generated
Association Rules
to decide the
mileage of
different cars.

Question:
How can we tell
which rules are
more interesting,
more important?

Oct 30 ~ 31, 2006        KDM Workshop, Waterloo,                5

• Proposed by Pawlak in 1980’s
• Reduct
– A set of attributes that is sufficient to describe the
decision attributes
• Core
– Intersection of all the reducts
– Core generation (Hu et al. 2003 [4])

Oct 30 ~ 31, 2006        KDM Workshop, Waterloo,                6
Reduct and Core

• Genetic Algorithm in ROSETTA generates
4 reducts
• Core Attributes: make_model, trans

Oct 30 ~ 31, 2006      KDM Workshop, Waterloo,   7
Reduct and Rule Generation
Given a reduct = {make_model, compress, power, trans}
make_model         …   compress     …      power        …   trans   mileage

USA                …   High         …      Medium       …   Auto    Medium
USA                …   High         …      Low          …   Manual High
Japan              …   Medium       …      Low          …   Manual High
…                  …   …            …      …            …   …       …
Sample rule generated based on this reduct:
Auto Trans => Medium Mileage
is more interesting than
Medium Displacement, High Compression, Medium
Weight => Medium Mileage
Oct 30 ~ 31, 2006              KDM Workshop, Waterloo,                         8
New Decision Table!
• Decision Rules
– All the rules generated from the original
data set
• Question: Can we consider
“rules” as “condition attributes” ?

Oct 30 ~ 31, 2006        KDM Workshop, Waterloo,   9
Re-construct Decision Table:
Consider Rules as Attributes
Original Decision Table

Sample Rule 0:
USACar, Medium Displacement, Medium Weight
=> Medium Mileage
A[0, 0] = 1, A[1, 0] = 1, A[2, 0] = 0, ……
New Decision Table

Oct 30 ~ 31, 2006             KDM Workshop, Waterloo,                   10
Reduct Rule
• Reduct Rule Set
– A reduct generated from the new decision
table is defined as Reduct Rule Set. A Reduct
Rule Set contains Reduct Rules.
• Why Reduct Rules are important
– Reducts generated from the new decision
table contain all the important attributes,
which represent the important rules.

Oct 30 ~ 31, 2006    KDM Workshop, Waterloo,         11
Experiment – Car Data Set
• 19 rules as attributes in the new decision table
• Johnson’s Reduct generation from Rosetta
• Reduct Rules for the Car Data Set

Japan Car => High Mileage

Compression High, Trans Manual => High Mileage

Oct 30 ~ 31, 2006     KDM Workshop, Waterloo,          12
Experiment – Car Data Set

Evaluation by Rule Importance Measure [7]

Rules                          Rule
Importance
Japan Car => High Mileage                       100%
Compress High, Trans Manual =>                  75%
High Mileage

Oct 30 ~ 31, 2006     KDM Workshop, Waterloo,            13
Experiment – Car Data Set
Comparing: Rule Importance for All the Rules
Reduct Rules are more important !

Rules                            Rule
Importance
Japan Car => High Mileage                               100%
Compression High, Trans Manual => High Mileage          75%

…                                                       …
USACar, Power High => Medium Mileage                    25%

4 Door => Medium Mileage                                25%

Oct 30 ~ 31, 2006            KDM Workshop, Waterloo,           14
Concluding Remarks

• A method of ranking rules by considering
rules as condition attributes

• Automatic and effective

• Reduct Rules are more important

• Application
– Personalization System, medical diagnosis
Oct 30 ~ 31, 2006        KDM Workshop, Waterloo,   15
Selected References
1.     Pawlak, Z.: Rough Sets. In Theoretical Aspects of Reasoning about Data. Kluwer,
Netherlands, 1991.
2.     Klemettinen, M., Mannila, H., Ronkainen, R., Toivonen, H.,Verkamo, A.I.: Finding
interesting rules from large sets of discovered association rules. CIKM '94,401—
407
3.     Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. Proc. of
20th VLDB, Santiago de Chile, Chile, Morgan Kaufmann, (1994) 487—499
4.     Hu, X., Lin, T., Han, J.: A New Rough Sets Model Based on Database Systems.
Fundamenta Informaticae 59 no.2-3 (2004), pp.135-152.
5.     Aleksander Ohrn:Discernibility and Rough Sets in Medicine: Tools and
Applications. PhD Thesis, Department of Computer and Information Science,
Norwegian University of Science and Technology, Trondheim, Norway, NTNU
report 1999:133, IDI report 1999:14, ISBN 82-7984-014-1, 239 pages. 1999.
6.     Li, J. and Cercone, N.: Empirical Analysis on the Geriatric Care Data Set Using
Rough Sets Theory. Technical Report, CS-2005-05, School of Computer Science,
University of Waterloo, 2005.
7.     Li, J. and Cercone, N.:A Rough Set Based Model to Rank the Importance of
Association Rules. The Tenth International Conference on Rough Sets, Fuzzy
Sets, Data Mining, and Granular Computing (RSFDGrC 2005), August 31-
September 3rd, 2005, University of Regina, Canada. To appear .

Oct 30 ~ 31, 2006               KDM Workshop, Waterloo,                              16