OneR by keralaguest


									Use the 1R Algorithm to Generate a Set of Classification Rules

The instructor can introduce the 1R algorithm [4] to the students by first
demonstrate an example and show them how to generate a set of rules to
classify the instances in the example. After the students understand how the
method works, the instructor then introduce the 1R algorithm to the students by
putting it in words rather than a pseudo-code. The students will have a better
understanding this way and be able to apply the algorithm to other problems.

The following example is chosen from one of the data sets of WEKA [2, 3].
Suppose the eye doctors at some clinic want to determine the type (soft or hard
or none) of contact lenses with which their patients should be fitted, then the
three types become the three classes to be classified for fitting. The factors
which may affect the decision of a doctor at this clinic are: the age of a patient,
the patient’s spectacle prescription, the patient’s astigmatism, and tear
production rate. The four factors of concern and the contact lenses type are the
five attributes of the instances in this example. If age of a patient is considered to
be young or pre-presbyopic or presbyopic, then young, pre-presbyopic and
presbyopic are the three nominal values of age. Similarly, spectacle prescription
has two nominal values myope and hypermetrope. Astigmatism has two nominal
(or binary) values yes and no. Tear production rate has two nominal values
reduced and normal. Type of contact lenses has three nominal values soft, hard
and none. The data provides 24 instances as follows. Based on these instances,
one of the above four factors (attributes) will be used to generate a set of rules
for classifying the type of contact lenses for fitting.

young                myope                 no     reduced        none
young                myope                 no     normal         soft
young                myope                 yes    reduced        none
young                myope                 yes    normal         hard
young                hypermetrope          no     reduced        none
young                hypermetrope          no     normal         soft
young                hypermetrope          yes    reduced        none
young                hypermetrope          yes    normal         hard
pre-presbyopic       myope                 no     reduced        none
pre-presbyopic       myope                 no     normal         soft
pre-presbyopic       myope                 yes    reduced        none
pre-presbyopic       myope                 yes    normal         hard
pre-presbyopic       hypermetrope          no     reduced        none
pre-presbyopic       hypermetrope          no     normal         soft
pre-presbyopic       hypermetrope          yes    reduced        none
pre-presbyopic       hypermetrope          yes    normal         none
presbyopic           myope                 no     reduced        none
presbyopic           myope                 no     normal         none
presbyopic           myope                 yes    reduced        none
presbyopic           myope                 yes    normal         hard
presbyopic           hypermetrope          no     reduced        none
presbyopic           hypermetrope          no     normal         soft
presbyopic           hypermetrope          yes    reduced        none
presbyopic           hypermetrope          yes    normal         none

Starting with attribute age, for each attribute value, find the class (soft or hard or
none) which has the most frequent occurrences and assign the class to the
corresponding value. For example, for attribute value young, class none occurs
four times, and both soft and hard occur twice. Thus, the rule for young is class
none. The errors of this rule of classification are 4 out of 8 (i.e., eight instances
with attribute value young, four made an error of not classifying none). Similarly,
the rule for attribute value pre-presbyopic is none and the errors are 3 out of 8.
The rule for attribute value presbyopic is none and the errors are 2 out of 8.
Therefore, the above three rules are the set of rules for attribute age and the
error rate for attribute age is 9/24. Repeat the above procedure to generate a set
of rules for each remaining three attributes and calculate the corresponding error
rate. The process can be summarized as follows.

Attribute            Rules                        Errors         Error Rate

age                  young->none                  4/8            9/24
                     pre-presbyopic->none         3/8
                     presbyopic->none             2/8

spectacle-pres       myope->none                  5/12           9/24
                     hypermetrope->none           4/12

astigmatism          yes->none                    4/12           9/24
                     no->none                     5/12

tear-rate            normal->soft                 7/12           7/24
                     reduced->none                0/12

The last step is to choose the set of rules corresponding to the attribute which
has the minimum error rate. Since tear production rate has minimum error rate
7/24 in the above example, the set of rules for classifying contact lenses for fitting
is if a patient has normal tear production rate, then he should be fitted with soft
contact lenses. If a patient has reduced tear production rate, then he should not
be fitted with any contact lenses.

After students fully understand how the procedure works on the contact lenses
fitting example, the instructor can summary the method which is called the 1R
algorithm. Repeat the following procedure for each attribute: For each attribute
value, generate a rule by assigning the class having the most frequent
occurrences to this value, and then calculate the errors of classification based on
the rule; Calculate the error rate (total errors / total number of instances) of the
attribute. After the error rate for each attribute is obtained, find the set of rules
which corresponds to the attribute having the minimum error rate.

Observed students may have the following questions. When generating a rule for
an attribute value, if two or more classes have the same errors, which class
should be chosen? Also, which set of rules should be chosen if two or more
attributes have the same error rate? To break ties like these cases, 1R algorithm
simply chooses one at random. Buddhinath and Derry [1] suggest a simple
enhancement of the 1R algorithm to avoid or reduce the random selection when
ties occur (this discussion is optional to the instructor who can use it as further
reading, project or research for the students or just skip the discussion).

If two (or more) classes have the same errors within an attribute value, select the
global majority class (the class which occurs more in all instances) among the
two (or more) classes. If two (or more) attributes have the same minimum error
rate, choose the set of rules whose corresponding attribute has the highest Net
Class Weight (NCW). The NCW of an attribute A is calculated as follows: for
each class in the data set, divide the total number of this class correctly classified
by the set of rules corresponding to A by the total number of this class in the data
set; after such ratio is obtained for each class, add all the ratios. For example,
the NCW of attribute tear production rate in the contact lenses example is 5/5
(soft) + 12/15 (none) + 0/4 (hard) = 27/15.

Exercise: Some children have sleep problems that can affect their growth and
behavior. To determine if a child has sleep problem, the following three factors
are considered: bedtime (early, normal or late), waking in the morning (early,
normal or late), and mood during daytime (normal or irritable). Based on the
following 18 instances, apply 1R algorithm to generate a set of rules to classify
children’s sleep problems (yes or no).

       Bedtime        waking        mood           sleep problem

1      early          early         normal         yes
2      early          early         irritable      yes
3      early          normal        normal         no
4      early          normal        irritable      no
5      early          late          normal         no
6      early          late          irritable      yes
7      normal         early         normal         yes
8      normal         early         irritable      yes
9      normal         normal        normal         no
10     normal         normal        irritable      no
11     normal         late          normal         no
12     normal         late          irritable      yes
13     late           early         normal         yes
14    late          early        irritable     yes
15    late          normal       normal        no
16    late          normal       irritable     yes
17    late          late         normal        yes
18    late          late         irritable     yes


[1] G. Buddhinath and D. Derry. Email: or University of Melbourne, Melbourne,

[2] J. Cendrowska. PRISM: An Algorithm for Inducing Modular Rules.
International Journal of Man-Machine Studies, 27, 349-370, 1987.


[4] I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and
Techniques. Morgan Kaufmann Publishers.

To top