A Rough Set

Document Sample
A Rough Set Powered By Docstoc
					 Evaluating Rules Using
   Rough Sets Theory

    Jiye Li                   Nick Cercone
 University of Waterloo       York University
 Waterloo, Canada           Toronto, Canada
j27li@uwaterloo.ca        ncercone@yorku.ca
                    Motivation

•   Rule Discovery
•   Too many rules
•   How to find important rules?
•   New Usage of Rough Sets
     – From Attributes Selection to Rule Discovery




Oct 30 ~ 31, 2006   KDM Workshop, Waterloo,          2
                          Canada
                        Decision Table
T=(U,C,D). U is the set of objects.
C is the set of condition attributes.
D is the set of decision attributes.
                             Condition Attributes                    Decision Attributes


     Attribute 1    Attribute 2    …                   Attribute n     Decision
                                                                       Attributes

     …              …              …                   …               …
     …              …              …                   …               …
     …              …              …                   …               …

Oct 30 ~ 31, 2006                 KDM Workshop, Waterloo,                             3
                                        Canada
                      Example

Artificial Data Set
                               Condition Attributes   Decision Attribute
• 14 instances
• No missing
  attribute values
• 8 condition
  attributes
• 1 decision
  attribute



 Oct 30 ~ 31, 2006    KDM Workshop, Waterloo,                       4
                            Canada
                    Rule Set Generated

                                                   Generated
                                                   Association Rules
                                                   to decide the
                                                   mileage of
                                                   different cars.


                                                   Question:
                                                   How can we tell
                                                   which rules are
                                                   more interesting,
                                                   more important?



Oct 30 ~ 31, 2006        KDM Workshop, Waterloo,                5
                               Canada
         Think about: Rough Sets Theory

• Proposed by Pawlak in 1980’s
• Reduct
     – A set of attributes that is sufficient to describe the
       decision attributes
• Core
     – Intersection of all the reducts
     – Core generation (Hu et al. 2003 [4])



Oct 30 ~ 31, 2006        KDM Workshop, Waterloo,                6
                               Canada
                    Reduct and Core

  • Genetic Algorithm in ROSETTA generates
    4 reducts
  • Core Attributes: make_model, trans




Oct 30 ~ 31, 2006      KDM Workshop, Waterloo,   7
                             Canada
          Reduct and Rule Generation
Given a reduct = {make_model, compress, power, trans}
 make_model         …   compress     …      power        …   trans   mileage

 USA                …   High         …      Medium       …   Auto    Medium
 USA                …   High         …      Low          …   Manual High
 Japan              …   Medium       …      Low          …   Manual High
 …                  …   …            …      …            …   …       …
 Sample rule generated based on this reduct:
 Auto Trans => Medium Mileage
 is more interesting than
 Medium Displacement, High Compression, Medium
 Weight => Medium Mileage
Oct 30 ~ 31, 2006              KDM Workshop, Waterloo,                         8
                                     Canada
                    New Decision Table!
• Decision Rules
   – All the rules generated from the original
     data set
• Question: Can we consider
  “rules” as “condition attributes” ?




Oct 30 ~ 31, 2006        KDM Workshop, Waterloo,   9
                               Canada
          Re-construct Decision Table:
          Consider Rules as Attributes
 Original Decision Table




                           Sample Rule 0:
                           USACar, Medium Displacement, Medium Weight
                           => Medium Mileage
                           A[0, 0] = 1, A[1, 0] = 1, A[2, 0] = 0, ……
  New Decision Table




Oct 30 ~ 31, 2006             KDM Workshop, Waterloo,                   10
                                    Canada
                     Reduct Rule
• Reduct Rule Set
     – A reduct generated from the new decision
       table is defined as Reduct Rule Set. A Reduct
       Rule Set contains Reduct Rules.
• Why Reduct Rules are important
     – Reducts generated from the new decision
       table contain all the important attributes,
       which represent the important rules.


Oct 30 ~ 31, 2006    KDM Workshop, Waterloo,         11
                           Canada
            Experiment – Car Data Set
• 19 rules as attributes in the new decision table
• Johnson’s Reduct generation from Rosetta
• Reduct Rules for the Car Data Set



      Japan Car => High Mileage

      Compression High, Trans Manual => High Mileage


Oct 30 ~ 31, 2006     KDM Workshop, Waterloo,          12
                            Canada
        Experiment – Car Data Set

Evaluation by Rule Importance Measure [7]

                    Rules                          Rule
                                                Importance
Japan Car => High Mileage                       100%
Compress High, Trans Manual =>                  75%
High Mileage


Oct 30 ~ 31, 2006     KDM Workshop, Waterloo,            13
                            Canada
         Experiment – Car Data Set
 Comparing: Rule Importance for All the Rules
        Reduct Rules are more important !


                          Rules                            Rule
                                                        Importance
Japan Car => High Mileage                               100%
Compression High, Trans Manual => High Mileage          75%

…                                                       …
USACar, Power High => Medium Mileage                    25%

4 Door => Medium Mileage                                25%

 Oct 30 ~ 31, 2006            KDM Workshop, Waterloo,           14
                                    Canada
                    Concluding Remarks

• A method of ranking rules by considering
  rules as condition attributes

• Automatic and effective

• Reduct Rules are more important

• Application
     – Personalization System, medical diagnosis
Oct 30 ~ 31, 2006        KDM Workshop, Waterloo,   15
                               Canada
                    Selected References
1.     Pawlak, Z.: Rough Sets. In Theoretical Aspects of Reasoning about Data. Kluwer,
       Netherlands, 1991.
2.     Klemettinen, M., Mannila, H., Ronkainen, R., Toivonen, H.,Verkamo, A.I.: Finding
       interesting rules from large sets of discovered association rules. CIKM '94,401—
       407
3.     Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. Proc. of
       20th VLDB, Santiago de Chile, Chile, Morgan Kaufmann, (1994) 487—499
4.     Hu, X., Lin, T., Han, J.: A New Rough Sets Model Based on Database Systems.
       Fundamenta Informaticae 59 no.2-3 (2004), pp.135-152.
5.     Aleksander Ohrn:Discernibility and Rough Sets in Medicine: Tools and
       Applications. PhD Thesis, Department of Computer and Information Science,
       Norwegian University of Science and Technology, Trondheim, Norway, NTNU
       report 1999:133, IDI report 1999:14, ISBN 82-7984-014-1, 239 pages. 1999.
6.     Li, J. and Cercone, N.: Empirical Analysis on the Geriatric Care Data Set Using
       Rough Sets Theory. Technical Report, CS-2005-05, School of Computer Science,
       University of Waterloo, 2005.
7.     Li, J. and Cercone, N.:A Rough Set Based Model to Rank the Importance of
       Association Rules. The Tenth International Conference on Rough Sets, Fuzzy
       Sets, Data Mining, and Granular Computing (RSFDGrC 2005), August 31-
       September 3rd, 2005, University of Regina, Canada. To appear .


Oct 30 ~ 31, 2006               KDM Workshop, Waterloo,                              16
                                      Canada

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:85
posted:3/28/2010
language:English
pages:16