VIEWS: 8 PAGES: 20 POSTED ON: 6/1/2010 Public Domain
Machine Learning Basic definitions: • concept: often described implicitely(„good politician“) using examples, i.e. training data • hypothesis: an attempt to describe the concept in an explicite way – concept / hypothesis are presented in the corresponding language – hypothesis is verified using testing data • background knowledge provides info about the context (properties of environment) • learning algorithm searches the space of hypothesis to find consistent and complete h., the space is restricted by introducing bias Machine Learning 2 1 Goal of inductive ML Suggest a hypothesis characterizing concept in a given domain (= the set of objects in this domain) implicitely described through a limited set of classified examples E+ and E-. The hypothesis: • has to cover E+ while avoiding E- • be applicable to objects which do not belong to E+ and E-. Machine Learning 2 2 Basic notions • - domain of the concept K, ie. K. • E a set of training examples is complemented by a classifcation, i.e. a function cl: E -->yes, no. • E+ denotes all elements of E classified as yes • E+ and E- are a disjoint cover of the set E Machine Learning 2 3 Example 1 „computer game“: Is there a way how to distinguish quickly a friendly robot from the others? Friendly r. Unfriendly r. Machine Learning 2 4 Concept Language and Background Knowledge • Examples of concept language: A set of real or idealised examples expressed in the object language that represent each of the concepts learned (Nearest Neighbour) attribute-value pairs (propositional logic) relational concepts (first order logic) • One can extend the concept language with user-defined concepts or background knowledge. BK plays an important role in Inductive Logic Programming (ILP) The use of certain BK predicates may be a necessary condition for learning the right hypothesis. Redundant or irrelevant BK slows down the learning. Machine Learning 2 5 Example 1: hypothesis and its testing Head Smiling Neck Body Holding Friendly shape face shape circle nothing tie circle sword yes triangle yes nothing square nothing yes H1 in the form of a decision tree if neck( r) = bow then „friendly” = nothing then if head_shape ( r) = triangle then „friendly“ else „unfriendly“ = tie then if body_shape( r) = square then „unfriendly“ else if head_shape( r) = circle then „friendly“ Machine Learning 2 6 else „unfriendly“ Example 1: hypothesis and its testing H2 using the binary relation equal „=“ if head_shape ( r) = body_shape( r) then „friendly“ else „unfriendly“ Head shape Smiling Neck Body Holding Friendly face shape circle no tie circle sword yes triangle yes nothing square nothing no H1 and H2 classify correctly data in the training set, but their classification differs in the test set Machine Learning 2 7 Hypothesis - attempt for a formal description Both examples and hypothesis have to be specified in a language. Hypothesis has the form of a formula (X) with a single free variable X. Let us define extension Ext of a hypotheis (X) wrt. the domain as the set of all elements of , which meet the condition , tj.Ext = o: (o) platí Properties of hypothesis • hypothesis is complete (úplná), iff E+ Ext • h. is consistent, if it covers no negative examples, i.e. Ext E- = • h. is correct, if it is complete and consistent Machine Learning 2 8 How many correct hypothesis can be designed for a fixed training set E? • Fact: the number of possible concepts is much more than possible hypothesis (a formula) • concequence: most of the concepts cannot be characterized by a corresponding hypothesis - we have to accept the hypothesis, which are “approximately correct“ only. • Uniqueness of an “approximately correct“ hypothesis cannot be ensured. Machine Learning 2 9 Choice of a hypthesis and Ockham´s rasor Williamu of Ockham recommends the way how to compare the hypothesis: „Entia non sunt multiplicanda praeter necessitatem“, • „Einstein: „… the language should not be sompler than necessary.“ Machine Learning 2 10 Machine Learning Biases • The concept/hypothesis language specifies the language bias, which limits the set of all concepts/hypotheses that can be expressed/considered/learned. • The preference bias allows us to decide between two hypotheses (even if they both classify the training data equally). • The search bias defines the order in which hypotheses will be considered. – Important if one does not search the whole hypothesis space. Machine Learning 2 11 Preference Bias, Search Bias & Version Space Hypothesis are partially ordered Version space: searches for the subset of hypotheses that have zero training error. most gen. concept _ _ + + + most spec. concept + _ _ Machine Learning 2 12 Types of learning skill refinement (swimming, biking, ...) knowledge acquisition Rote Learning (chess, checkers), the aim is to find an appropriate heuristic function evaluating the current state of the game, e.g. MIN- MAX approach Case-Based Reasoning: past experience is stored in a database. To solve a new problem, the systém searches the DB to find „the closest (the most similar) case“ - its solution is modified for the current problem Advice Taking, learning to use "interpret" or "operacionalize" an abstract advice – search for „applicability conditions“ • Induction. Difference Analysis: candidate-elimination or version space approach, decision trees induction etc. Machine Learning 2 13 Decision tree induction Given: Training examples uniformly described by a single set of the same attributes and classified into a small set of classes (most often into 2 classes: positive X negative examples) Find: a decision tree allowing to characterize the new species Simple example: robots described by 5 discrete atributes and classified into 2 classes (friendly, unfriendly) • Is_smiling {no, yes}, Holding {sword, balloon, flag}, Has_tie {no, yes}, Head_shape {round, square, octagone}, Body_shape {round, square, octagone}. Machine Learning 2 14 Attributes Class. Is_smiling holding Has_tie Head_shap Body_shap e e friendly yes balloon yes square square friendly yes flag yes octagon octagon unfriendly yes sword yes round octagon unfriendly yes sword no square octagon unfriendly no sword no octagon round unfriendly no flag no round octagon Machine Learning 2 15 TDIDT: Top-Down Ind. of Decision Trees given: S ... the set of classified examples goal: design a decision tree DT ensuring the same classification as S 1. The root is denoted by S 2. Find the "best" attribute at to be used for splitting the current set S 3. Split the set S into the subsets S1, S2, ..., Sn wrt. value of at (all examples in the subset Si have the same value at = vi ). This set denotes a node of the DT 4. For each Si do: If all examples in Si belong to the same class or then create a leaf with the same label, else go to 1 with S = Si Machine Learning 2 16 TDIDT: How to choose the "best" attribute? minimize the entropy (Shanon) H(Si) = - pi+ log pi+ - pi- log pi- pi+ = the probability that a random example in Si is , estimated by frequency Let the attribute at split S into the subsets S1, S2, ..., Sn . The entropy of this system is defined H(S,at) = i n = 1 P(S i ) H (Si ) where P(S i ) is probability of the event S i , approx. by relative size |S i | / |S| Choose at with the minimal H(S,at) Machine Learning 2 17 Learning to fly simulator F16 [Samuel, 95] Design an automatic controller for F16 for following complex task: 1. Start up and rise upto the heigth 2000 feet 2. Fly 32000 feet north 3. Turn right 330° 4. When 42000 feet from the starting point (direction N-S) turn left and head towards the starting point, the rotation is finished when the course is between 140° and 180°. 5. Adjust the flight direction so that it is paralel to the landing course, tolerance 5 for flight direction and 10° for wing twist wrt. horizont 6. Decrease the heigth and move towards the start of the landing path 7. Lend Training data: 3 skilled pilots performed the assigned mission, each 30 times Each flight is described by 1000 vectors characterizing ( total of 90000 training examples): Position and state of the plane Pilot’s control action Machine Learning 2 18 Learning to fly simulator F16 [Samuel, 95] Position and state • on_gound boolean: is the plane on the ground? • g_limit boolean: acceleration limit exceeded? • wing_stall (is the plane stabile?), twist (int: 0°-360°, wings wrt. horizont) • elevation (angle „body wrt. horizont“), azimuth, roll_speed (wings deflection), elevation_speed, azimuth_speed , airspeed, climbspeed, E/W distance, N/S distance, fuel (weight of current supply) Control: • rollers and elevator: position of horizontal/ vertical deflection • thrust integer: 0-100%, force • flaps integer: 0°, 10° or 20°, wing twist Each of the 7 phases calls for a specific type of control. The training data are divided into 7 disjunctive sets which are used to design specific decision trees (independently for each task phase and each control action). Control ensured by 7 * 4 decison trees. Machine Learning 2 19 Tasks adressed by ML applications • Classification/prediction – diagnosis (troubleshooting motor pumps, medicine,.., SKICAT - astronomical cataloguing) – execution/control (GASOIL - separation of hydrocarbons) – configuration/design (Siemens: equipment c., Boeing) – language understanding – vision and speech – planning and schedulling • Why? Important speed up of the development and maintenace – 180 man-years to develop ES XCON with 8000 rules, 30 m-y needed for maint. – 1 man-year to develop BP GASOIL (MLbased) with 2800 rules, 0,1 m-y needed for maint. Machine Learning 2 20