Use the 1R Algorithm to Generate a Set of Classification Rules The instructor can introduce the 1R algorithm  to the students by first demonstrate an example and show them how to generate a set of rules to classify the instances in the example. After the students understand how the method works, the instructor then introduce the 1R algorithm to the students by putting it in words rather than a pseudo-code. The students will have a better understanding this way and be able to apply the algorithm to other problems. The following example is chosen from one of the data sets of WEKA [2, 3]. Suppose the eye doctors at some clinic want to determine the type (soft or hard or none) of contact lenses with which their patients should be fitted, then the three types become the three classes to be classified for fitting. The factors which may affect the decision of a doctor at this clinic are: the age of a patient, the patient’s spectacle prescription, the patient’s astigmatism, and tear production rate. The four factors of concern and the contact lenses type are the five attributes of the instances in this example. If age of a patient is considered to be young or pre-presbyopic or presbyopic, then young, pre-presbyopic and presbyopic are the three nominal values of age. Similarly, spectacle prescription has two nominal values myope and hypermetrope. Astigmatism has two nominal (or binary) values yes and no. Tear production rate has two nominal values reduced and normal. Type of contact lenses has three nominal values soft, hard and none. The data provides 24 instances as follows. Based on these instances, one of the above four factors (attributes) will be used to generate a set of rules for classifying the type of contact lenses for fitting. young myope no reduced none young myope no normal soft young myope yes reduced none young myope yes normal hard young hypermetrope no reduced none young hypermetrope no normal soft young hypermetrope yes reduced none young hypermetrope yes normal hard pre-presbyopic myope no reduced none pre-presbyopic myope no normal soft pre-presbyopic myope yes reduced none pre-presbyopic myope yes normal hard pre-presbyopic hypermetrope no reduced none pre-presbyopic hypermetrope no normal soft pre-presbyopic hypermetrope yes reduced none pre-presbyopic hypermetrope yes normal none presbyopic myope no reduced none presbyopic myope no normal none presbyopic myope yes reduced none presbyopic myope yes normal hard presbyopic hypermetrope no reduced none presbyopic hypermetrope no normal soft presbyopic hypermetrope yes reduced none presbyopic hypermetrope yes normal none Starting with attribute age, for each attribute value, find the class (soft or hard or none) which has the most frequent occurrences and assign the class to the corresponding value. For example, for attribute value young, class none occurs four times, and both soft and hard occur twice. Thus, the rule for young is class none. The errors of this rule of classification are 4 out of 8 (i.e., eight instances with attribute value young, four made an error of not classifying none). Similarly, the rule for attribute value pre-presbyopic is none and the errors are 3 out of 8. The rule for attribute value presbyopic is none and the errors are 2 out of 8. Therefore, the above three rules are the set of rules for attribute age and the error rate for attribute age is 9/24. Repeat the above procedure to generate a set of rules for each remaining three attributes and calculate the corresponding error rate. The process can be summarized as follows. Attribute Rules Errors Error Rate age young->none 4/8 9/24 pre-presbyopic->none 3/8 presbyopic->none 2/8 spectacle-pres myope->none 5/12 9/24 hypermetrope->none 4/12 astigmatism yes->none 4/12 9/24 no->none 5/12 tear-rate normal->soft 7/12 7/24 reduced->none 0/12 The last step is to choose the set of rules corresponding to the attribute which has the minimum error rate. Since tear production rate has minimum error rate 7/24 in the above example, the set of rules for classifying contact lenses for fitting is if a patient has normal tear production rate, then he should be fitted with soft contact lenses. If a patient has reduced tear production rate, then he should not be fitted with any contact lenses. After students fully understand how the procedure works on the contact lenses fitting example, the instructor can summary the method which is called the 1R algorithm. Repeat the following procedure for each attribute: For each attribute value, generate a rule by assigning the class having the most frequent occurrences to this value, and then calculate the errors of classification based on the rule; Calculate the error rate (total errors / total number of instances) of the attribute. After the error rate for each attribute is obtained, find the set of rules which corresponds to the attribute having the minimum error rate. Observed students may have the following questions. When generating a rule for an attribute value, if two or more classes have the same errors, which class should be chosen? Also, which set of rules should be chosen if two or more attributes have the same error rate? To break ties like these cases, 1R algorithm simply chooses one at random. Buddhinath and Derry  suggest a simple enhancement of the 1R algorithm to avoid or reduce the random selection when ties occur (this discussion is optional to the instructor who can use it as further reading, project or research for the students or just skip the discussion). If two (or more) classes have the same errors within an attribute value, select the global majority class (the class which occurs more in all instances) among the two (or more) classes. If two (or more) attributes have the same minimum error rate, choose the set of rules whose corresponding attribute has the highest Net Class Weight (NCW). The NCW of an attribute A is calculated as follows: for each class in the data set, divide the total number of this class correctly classified by the set of rules corresponding to A by the total number of this class in the data set; after such ratio is obtained for each class, add all the ratios. For example, the NCW of attribute tear production rate in the contact lenses example is 5/5 (soft) + 12/15 (none) + 0/4 (hard) = 27/15. Exercise: Some children have sleep problems that can affect their growth and behavior. To determine if a child has sleep problem, the following three factors are considered: bedtime (early, normal or late), waking in the morning (early, normal or late), and mood during daytime (normal or irritable). Based on the following 18 instances, apply 1R algorithm to generate a set of rules to classify children’s sleep problems (yes or no). Bedtime waking mood sleep problem 1 early early normal yes 2 early early irritable yes 3 early normal normal no 4 early normal irritable no 5 early late normal no 6 early late irritable yes 7 normal early normal yes 8 normal early irritable yes 9 normal normal normal no 10 normal normal irritable no 11 normal late normal no 12 normal late irritable yes 13 late early normal yes 14 late early irritable yes 15 late normal normal no 16 late normal irritable yes 17 late late normal yes 18 late late irritable yes References  G. Buddhinath and D. Derry. Email: firstname.lastname@example.org or email@example.com. University of Melbourne, Melbourne, Australia.  J. Cendrowska. PRISM: An Algorithm for Inducing Modular Rules. International Journal of Man-Machine Studies, 27, 349-370, 1987.  http://www.cs.waikato.ac.nz/ml/weka/.  I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Publishers.
Pages to are hidden for
"OneR"Please download to view full document