Artificial Intelligence - an agent approach COMP4700 & COMP6424 Learning Eric McCreath The Australian National University Semester 2 2003 Outline 2 Introduction (18.1) A computation approach to induction(18.2) General learning approaches(19.1) Decision Trees (18.3) Inductive Logic Programming (19.5) Introduction 3 Learning involves moving from a finite set of observations about an 'object' or 'concept' to a conclusion that describes the object or concept. Machine learning is an area in AI that designs and studies computer programs that undertake learning. This is done at both empirical and theoretical levels. Example 4 Could you write a computer program that would learn the next number in the sequence given below? 2,4,6,8,10,... or could you write a computer program that would learn the set that the following number are from? 5,7,3,2,19,11,17,... Inference 5 There is three types of inference: deduction, abduction, and induction. Deduction 6 With deduction premises provide definite reasons for the conclusion made. This makes any knowledge gained in the conclusion both certain and justifiable. Man(fred). Man(X) =>Mortal(X) Mortal(fred). Abduction 7 Abduction is an inference approach where you know the 'process' and the 'conclusions' and attempt to find the 'cause'. Mortal(fred). Man(X) =>Mortal(X) Man(fred). Induction 8 Induction refers to any kind of inference in which we move from a finite set of observations about an 'object' or 'concept' to a conclusion that is a general description of the object of the concept. Man(fred). Mortal(fred). Man(X) =>Mortal(X). Induction 9 Hume noted that an inductive conclusion is not justifiable, as the conclusion goes beyond the set of observations. Several attempts have been made to overcome this problem: Inductive argument for induction(Black). Science does not rest on induction, rather, it is a process of disproving conjectures - a completely deductive process. (Popper) Another approach is to view inductive conclusions as 'probable' rather than 'certain'. Induction 10 The idea of induction was embraced by the rapidly expanding field of AI, resulting in the development of numerous practical learning systems. Also, researchers in theoretical computer science became interested in the mathematical foundations of induction resulting in the field of Computational Learning Theory. Computational Approach to Induction 11 Consider the universe of objects, U. Elements of U are referred to as instances. A concept is any subset of U. Let C be a concept. The an element of C is a positive example of C and an element of U - C is a negative example. A concept may be infinite in size. An intensional description of a concept is a finite 'description' that characterizes the concept. (eg decision procedure, decision tree, neural network, grammar, logic program, computer program, ..) Computational Approach to Induction 12 Now let us assume that 'nature' chooses an underlying reality in the form of a concept, also referred to as the target concept. A learner is a computational agent that attempts to find an intensional description of the target concept. To assist the learner in its task, nature provides the learner with data about the target concept. This data is usually a finite set of labeled instances from U; the label 1 it the instance is positive (0 if negative). These are examples of the target concept. Computational Approach to Induction 13 The hypothesis space is essentially a sequence of intensional descriptions: h1, h2, h3, .... The task of the learner is to find a hypothesis hi for the target concept based on the data presented to it about the concept. The conditions under which a learner is said to learn a target concept is referred to as the criterion of success. Computational Approach to Induction 14 Computational Approach to Induction 15 The golf example. outlook : sunny, overcast, rain. temperature : integer. humidity : integer. windy : boolean. U = outlook x temperature x humidity x windy The concept would be a subset of U containing all the example of when someone will play golf. Examples provided to the learner: <rain,65,70,true>, 0 <overcast,64,65,true>, 1 <sunny,72,95,false>, 0 <sunny,69,70,false>, 1 Computational Approach to Induction 16 The learner must induce a hypothesis from the hypothesis space. h1 = 'It is sunny.' h2 = 'It is raining.' h3 = 'Humidity less than 90.' h4 = 'Humidity less than 90 and it is not raining.' h5 = 'The temperature is over 60 and it is not windy.' In this case our learner may choose h4 as it explains all the examples. Generally there will be a number of okay hypotheses so the learner will need to select between them. Ockham's razor is often employed. This states that the simplest theory is often the correct one. Types of learning 17 Different types of learning include: Supervised - The learner is provided with an entire labeled data set. Then the learner must induce a hypothesis which represents the target concept. Unsupervised - Example are provided without class labels. The learner induces clusters of the data, this partitions the data. Reinforcement - The learner learns actions for a particular situation. Good outcomes are positively reinforced. Incremental - The learner updates an old hypothesis as new data is presented. Hypothesis Space 18 An appropriate selection for the hypothesis space is critical for the success of a machine learning approach. Clearly the target concept must be representable within the hypothesis space. The larger the hypothesis space the slower the search. Hypothesis spaces must be structured and searched efficiently for a learning approach to be practical. Hypothesis Space 19 To tame the huge(generally infinite) hypothesis space a bias may be placed over this space. There is basically three types of bias': language bias, search bias, and validation bias. Problems for Learning 20 A variety of problems arise: noise, over-fitting, insufficient data, large amounts of data, selecting a hypothesis language, and searching the hypothesis space. Consistent and Complete 21 A hypothesis h is said to be consistent with respect to a set of examples E if its extension does not contain any negative examples in E. A hypothesis h is said to be complete with respect to a set of examples E if its extension contains all the positive examples of E. A hypothesis h is said to be correct with respect to E if it is both consistent and complete with respect to E. Generalization 22 Suppose a hypothesis h is modified from h to h'. If the extension of h is increased, that is: ext h ext h ' then this modification is known as a generalization step. Specialization 23 Suppose a hypothesis h is modified from h to h'. If the extension of h is decreased, that is: ext h ' ext h then this modification is known as a specialization step. Current-Best Learning 24 One approach to learning is to maintain a single hypothesis h which is both consistent and complete with respect to the examples seen so far. As new examples arrive then the hypothesis is either generalized of specialized such that h remains correct with respect to all the examples. This approach is known as current-best learning. Current-Best Learning 25 In some cases there may be no generalization of specialization. This will cause the algorithm to fail. Current-Best Learning 26 Problems with current-best learning include: It will not always lead to the simplest solution. In some cases it may lead to no solution. It is expensive to consider all the examples for every modification. It is difficult to find a good heuristic for determining the specialization and generalization steps. Least-Commitment Search 27 Another approach to learning is to maintain a list of all possible hypotheses V. Each example e in the training set E is examined in tern. All hypotheses in V that are not correct with respect to e are removed from V. Once this process has finished all the hypotheses in V will be correct with respect to E. The set of possible hypotheses V is the version space. This learning approach is known as a least- commitment search or version space learning. Least-Commitment Learning 28 The advantage of this approach is that examples only need to be considered once. Least-Commitment Learning 29 One obvious difficulty with this approach is generally the hypothesis space is enormous if not infinite! One solution to this problem is to have a generalization/specialization partial ordering on the hypothesis space. The version space may be maintained by keeping track of a boundary set. This requires: a most general boundary (the G-set), and a most specific boundary (the S-set). Least-Commitment Learning 30 The main disadvantages of the approach are: If the examples contain noise or insufficient attributes then the version space will always collapse. The size of the G-set and the S-set may grow uncontrollably.