Machine learning 1 by keara


									                     Machine Learning
Basic definitions:
• concept: often described implicitely(„good politician“) using
examples, i.e. training data
• hypothesis: an attempt to describe the concept in an
explicite way
    – concept / hypothesis are presented in the corresponding language
    – hypothesis is verified using testing data
• background knowledge provides info about the context
(properties of environment)
• learning algorithm searches the space of hypothesis to find
consistent and complete h., the space is restricted by
introducing bias
Machine Learning 2                                                       1
                Goal of inductive ML
                            Suggest a hypothesis
                            characterizing concept in a
                            given domain (= the set of
                            objects in this domain)
                            implicitely described through
                            a limited set of classified
                            examples E+ and E-.
                            The hypothesis:
                          • has to cover E+ while avoiding
                          • be applicable to objects which
                            do not belong to E+ and E-.
Machine Learning 2                                    2
                     Basic notions
•  - domain of the concept K, ie. K.
• E   a set of training examples is
  complemented by a classifcation, i.e. a
  function cl: E -->yes, no.
• E+ denotes all elements of E classified as
• E+ and E- are a disjoint cover of the set E

Machine Learning 2                              3
  Example 1 „computer game“: Is there a way
   how to distinguish quickly a friendly robot
                from the others?

       Friendly r.                 Unfriendly r.

Machine Learning 2                                 4
                      Concept Language and
                      Background Knowledge
• Examples of concept language:
    A set of real or idealised examples expressed in the object language that
     represent each of the concepts learned (Nearest Neighbour)
    attribute-value pairs (propositional logic)
    relational concepts (first order logic)

• One can extend the concept language with user-defined
  concepts or background knowledge.
    BK plays an important role in Inductive Logic Programming (ILP)
    The use of certain BK predicates may be a necessary condition for
     learning the right hypothesis.
    Redundant or irrelevant BK slows down the learning.

 Machine Learning 2                                                       5
         Example 1: hypothesis and its testing
Head          Smiling       Neck          Body          Holding       Friendly
shape         face                        shape
circle        nothing       tie           circle        sword         yes
triangle      yes           nothing       square        nothing       yes

H1 in the form of a decision tree
if neck( r) = bow         then „friendly”
              = nothing   then
                          if head_shape ( r) = triangle then „friendly“
                                                        else „unfriendly“
              = tie       then
                          if body_shape( r) = square then „unfriendly“ else
                                    if head_shape( r) = circle then „friendly“
Machine Learning 2                                                               6
                                                                   else „unfriendly“
          Example 1: hypothesis and its testing

 H2 using the binary relation equal „=“

 if head_shape ( r) = body_shape( r) then „friendly“
                                     else „unfriendly“

Head shape Smiling         Neck      Body         Holding   Friendly
           face                      shape

circle               no    tie       circle       sword     yes

triangle             yes   nothing   square       nothing   no

H1 and H2 classify correctly data in the training set, but their
classification differs in the test set
Machine Learning 2                                                     7
 Hypothesis - attempt for a formal description

Both examples and hypothesis have to be specified in a
  language. Hypothesis has the form of a formula (X) with
  a single free variable X.
Let us define extension Ext of a hypotheis (X) wrt. the
  domain  as the set of all elements of , which meet the
  condition , tj.Ext = o: (o) platí 

Properties of hypothesis
• hypothesis  is complete (úplná), iff E+  Ext
• h.  is consistent, if it covers no negative examples, i.e.
  Ext  E- = 
• h.  is correct, if it is complete and consistent
Machine Learning 2                                              8
     How many correct hypothesis can be
      designed for a fixed training set E?
• Fact: the number of possible concepts is much more than
  possible hypothesis (a formula)
• concequence: most of the concepts cannot be
  characterized by a corresponding hypothesis - we have to
  accept the hypothesis, which are “approximately correct“
• Uniqueness of an “approximately correct“ hypothesis
  cannot be ensured.

Machine Learning 2                                           9
  Choice of a hypthesis and Ockham´s rasor

                      Williamu of Ockham
                        recommends the way how
                        to compare the hypothesis:
                        „Entia non sunt
                        multiplicanda praeter
                      • „Einstein: „… the
                        language should not be
                        sompler than necessary.“

Machine Learning 2                              10
                     Machine Learning Biases
• The concept/hypothesis language specifies the
  language bias, which limits the set of all
  concepts/hypotheses that can be
• The preference bias allows us to decide between
  two hypotheses (even if they both classify the
  training data equally).
• The search bias defines the order in which
  hypotheses will be considered.
     – Important if one does not search the whole hypothesis

Machine Learning 2                                             11
   Preference Bias, Search Bias & Version Space

Hypothesis are partially ordered
Version space: searches for the subset of hypotheses that have zero
  training error.
                                               most gen. concept
                      _           +
                              +       +
                                              most spec. concept
                                  +       _

 Machine Learning 2                                           12
                     Types of learning
 skill refinement (swimming, biking, ...)

 knowledge acquisition
 Rote Learning (chess, checkers), the aim is to find an appropriate
  heuristic function evaluating the current state of the game, e.g. MIN-
  MAX approach
 Case-Based Reasoning: past experience is stored in a database. To
  solve a new problem, the systém searches the DB to find „the closest
  (the most similar) case“ - its solution is modified for the current
 Advice Taking, learning to use "interpret" or "operacionalize" an
  abstract advice – search for „applicability conditions“
• Induction. Difference Analysis: candidate-elimination or version
  space approach, decision trees induction etc.

Machine Learning 2                                                    13
              Decision tree induction
Given: Training examples uniformly described by a single set
  of the same attributes and classified into a small set of
  classes (most often into 2 classes: positive X negative
Find: a decision tree allowing to characterize the new species

Simple example: robots described by 5 discrete atributes and classified
   into 2 classes (friendly, unfriendly)
• Is_smiling {no, yes},
   Holding {sword, balloon, flag},
   Has_tie {no, yes},
   Head_shape {round, square, octagone},
   Body_shape {round, square, octagone}.

Machine Learning 2                                                        14
 Class.     Is_smiling      holding   Has_tie      Head_shap Body_shap
                                                   e         e
 friendly   yes             balloon   yes          square    square

 friendly   yes             flag      yes          octagon   octagon

unfriendly yes              sword     yes          round     octagon

unfriendly yes              sword     no           square    octagon

unfriendly no               sword     no           octagon   round

unfriendly no               flag      no           round     octagon

       Machine Learning 2                                                15
    TDIDT: Top-Down Ind. of Decision Trees

given: S ... the set of classified examples
goal: design a decision tree DT ensuring the same classification
    as S
1. The root is denoted by S
2. Find the "best" attribute at to be used for splitting the
    current set S
3. Split the set S into the subsets S1, S2, ..., Sn wrt. value of at
    (all examples in the subset Si have the same value at = vi ).
    This set denotes a node of the DT
4. For each Si do:
        If all examples in Si belong to the same class  or
        then create a leaf with the same label,
        else go to 1 with S = Si
 Machine Learning 2                                               16
   TDIDT: How to choose the "best" attribute?

                       minimize the entropy (Shanon)
                       H(Si) = - pi+ log pi+ - pi- log pi-

pi+ = the probability that a random example in Si is  ,
estimated by frequency

Let the attribute at split S into the subsets S1, S2, ..., Sn . The
     entropy of this system is defined
                     H(S,at) =  i n = 1 P(S i ) H (Si )
where P(S i ) is probability of the event S i , approx. by relative
     size |S i | / |S|

Choose at with the minimal H(S,at)
  Machine Learning 2                                              17
      Learning to fly simulator F16 [Samuel, 95]
Design an automatic controller for F16 for following complex task:
1. Start up and rise upto the heigth 2000 feet
2. Fly 32000 feet north
3. Turn right 330°
4. When 42000 feet from the starting point (direction N-S) turn left and head
   towards the starting point, the rotation is finished when the course is between
   140° and 180°.
5. Adjust the flight direction so that it is paralel to the landing course, tolerance 5
   for flight direction and 10° for wing twist wrt. horizont
6. Decrease the heigth and move towards the start of the landing path
7. Lend
Training data: 3 skilled pilots performed the assigned mission, each 30 times
Each flight is described by 1000 vectors characterizing ( total of 90000 training
   examples):              Position and state of the plane
                           Pilot’s control action

     Machine Learning 2                                                          18
    Learning to fly simulator F16 [Samuel, 95]
Position and state
• on_gound         boolean: is the plane on the ground?
• g_limit          boolean: acceleration limit exceeded?
• wing_stall (is the plane stabile?), twist (int: 0°-360°, wings wrt. horizont)
• elevation (angle „body wrt. horizont“), azimuth, roll_speed (wings deflection),
   elevation_speed, azimuth_speed , airspeed, climbspeed, E/W distance, N/S
   distance, fuel (weight of current supply)
• rollers and elevator: position of horizontal/ vertical deflection
• thrust           integer: 0-100%, force

• flaps                 integer: 0°, 10° or 20°, wing twist
              Each of the 7 phases calls for a specific type of control.
The training data are divided into 7 disjunctive sets which are used to design specific
   decision trees (independently for each task phase and each control action).
   Control ensured by 7 * 4 decison trees.
   Machine Learning 2                                                         19
         Tasks adressed by ML applications
• Classification/prediction
   – diagnosis (troubleshooting motor pumps, medicine,.., SKICAT -
     astronomical cataloguing)
   – execution/control (GASOIL - separation of hydrocarbons)
   – configuration/design (Siemens: equipment c., Boeing)
   – language understanding
   – vision and speech
   – planning and schedulling
• Why? Important speed up of the development and maintenace
     – 180 man-years to develop ES XCON with 8000 rules, 30 m-y needed for
     – 1 man-year to develop BP GASOIL (MLbased) with 2800 rules, 0,1 m-y
       needed for maint.

Machine Learning 2                                                       20

To top