Docstoc

Artificial Intelligence Methods University of Nottingham Malaysia

Document Sample
Artificial Intelligence Methods University of Nottingham Malaysia Powered By Docstoc
					Artificial Intelligence Methods
G52AIM
University of Nottingham
Malaysia campus

Andrzej Bargiela    2007/2008
   Genetic Programming (GP)
   Automatic programming
   Program synthesis or
   Program induction

     One of the central challenges of computer science is:
       To get a computer to do what needs to be done,
                without telling it how to do it.
In essence, this is the beginning of computer programs that program
                             themselves.

            Genetic programming is the application of
          evolutionary theory to computer programming.
Introduction:
What is a Computer Program?
   A computer program is an entity that receives inputs, performs
    computations, and produces outputs.
    Computer programs perform:
        basic arithmetic and conditional computations on variables of
         various types (including integer, floating-point, and Boolean
         variables),
        iterations and recursions,
        store intermediate results in memory,
        organize groups of operations into reusable subroutines,
        pass information to subroutines in the form of dummy variables
         (formal parameters),
        receive information from subroutines in the form of return values,
         and
        organize subroutines and a main program into a hierarchy.



2007/2008                    G52AIM Artificial Intelligence Methods
Introduction: Genetic Programming
(a branch of genetic algorithms)


• Genetic programming addresses this challenge by providing a
  method for automatically creating a working computer program
  from a high-level problem statement of the problem.

   Genetic programming is a domain-independent method that
    genetically breeds a population of computer programs to solve a
    problem.
   Genetic programming iteratively transforms a population of
    computer programs into a new generation of programs by
    applying analogs of naturally occurring genetic operations.
   The genetic operations include crossover (recombination),
    mutation, reproduction and architecture altering operations




2007/2008                     G52AIM Artificial Intelligence Methods
Introduction: GP Quick Overview
   Developed: USA in the 1990’s
   Early names: J. Koza
   Typically applied to:
        machine learning tasks (prediction, classification…)
   Attributed features:
        competes with neural nets and alike
        needs huge populations (thousands)
        slow
   Special:
        non-linear chromosomes: trees, graphs
        mutation possible but not necessary (disputed!)

2007/2008                  G52AIM Artificial Intelligence Methods
    GP Technical Summary Tableau

Representation              Tree structures
Recombination               Exchange of subtrees
Mutation                    Random change in
                            trees
Parent selection            Fitness proportionate
Survivor selection          Generational
                            replacement
    2007/2008        G52AIM Artificial Intelligence Methods
Starting Point for GP
   A run of genetic programming is a competitive search
    among a diverse population of programs composed
    of the available functions and terminals
   Genetic programming starts from a high-level
    statement of the requirements of a problem and
    attempts to produce a computer program that solves
    the problem.
   The human user communicates the high-level
    statement of the problem to the genetic
    programming system by performing certain well-
    defined 5 preparatory steps.

2007/2008            G52AIM Artificial Intelligence Methods
To Specify the GP Ingredient




2007/2008    G52AIM Artificial Intelligence Methods
5 Preparatory Steps of
Genetic Programming
(1) the set of terminals (e.g., the independent variables of the
      problem, zero-argument functions, and random constants) for
      each branch of the to-be-evolved program,

(2) the set of primitive functions for each branch of the to-be-
      evolved program,

(3) the fitness measure (for explicitly or implicitly measuring the
      fitness of individuals in the population),

(4) certain parameters for controlling the run, and

(5) the termination criterion and method for designating the
      result of the run.

2007/2008               G52AIM Artificial Intelligence Methods
Function Set & Terminal Set
(The Important Components & alphabet of the programs to be made of)


   The identification of the function set and terminal set for a
    particular problem (or category of problems) is usually a
    straightforward process. This function set and terminal set is
    useful for a wide variety of problems (and corresponds to the
    basic operations found in virtually every general-purpose digital
    computer).
For some problems:
   The function set may consist of merely the arithmetic functions
    of addition, subtraction, multiplication, and division as well as a
    conditional branching operator.
   The terminal set may consist of the program’s external inputs
    (independent variables) and numerical constants.



2007/2008                 G52AIM Artificial Intelligence Methods
T&F and Fitness Measure
   The first two preparatory steps (Set of
    Functions and Terminals) define the
    search space
   whereas the fitness measure implicitly
    specifies the search’s desired goal.




2007/2008        G52AIM Artificial Intelligence Methods
Ways for Measuring Fitness
1.   in terms of the amount of error between its output
     and the desired output,
2.   the amount of time (fuel, money, etc.) required to
     bring a system to a desired target state,
3.   the accuracy of the program in recognizing patterns
     or classifying objects into classes,
4.   the payoff that a game-playing program produces,
     or
5.   the compliance of a complex structure (such as an
     antenna, circuit, or controller) with user-specified
     design criteria.
6.   More…

2007/2008             G52AIM Artificial Intelligence Methods
T&F: Examples

    Arithmetic formula                                       y 
                                      2     ( x  3)      
                                                           5 1
   Logical formula      (x  true)  (( x  y )  (z  (x  y)))
                                          i =1;
                                          while (i < 20)
   Program                               {
                                                   i = i +1
                                          }

       Trees are a universal form for Representation

2007/2008             G52AIM Artificial Intelligence Methods
    T&F:
    Tree based representation

                                                                    y 
                                              2     ( x  3)      
                                                                  5 1




2007/2008    G52AIM Artificial Intelligence Methods
    T&F:
    Tree based representation

            (x  true)  (( x  y )  (z  (x  y)))




2007/2008    G52AIM Artificial Intelligence Methods
    T&F:
    Tree based representation

                                                      i =1;
                                                      while (i < 20)
                                                      {
                                                              i = i +1
                                                      }




2007/2008    G52AIM Artificial Intelligence Methods
            Credit Scoring: Problem
               Bank wants to distinguish good from
                bad loan applicants
               Model needed that matches historical
                data
       ID               No of children          Salary                Marital status   OK?

ID-1                          2                 45000                      Married      0
ID-2                          0                 30000                        Single     1
ID-3                          1                 40000                     Divorced      1
…
            2007/2008                    G52AIM Artificial Intelligence Methods
    Credit Scoring:
    Rule Generation
A possible model:
  IF (NOC = 2) AND (S > 80000) THEN good ELSE bad
In general:
             IF formula THEN good ELSE bad
 Only unknown is the right formula, hence

 Our search space (phenotypes) is the set of formulas
   (genotypes) is: parse trees
Natural fitness of a formula:
 percentage of well classified cases of the model

 it stands for natural representation of formulas



2007/2008          G52AIM Artificial Intelligence Methods
T&F: Tree Rep. of a Rule
    IF (NOC = 2) AND (S > 80000) THEN good
                     ELSE bad
               Tree representation
                           AND


            =                                                >




   NOC          2                               S                80000

2007/2008           G52AIM Artificial Intelligence Methods
GP for Specific Problems
For many other problems, the
 ingredients include
 specialized functions and
 terminals
such as:

2007/2008   G52AIM Artificial Intelligence Methods
Programming a Mopping Robot
   if the goal is to get genetic programming to
    automatically program a robot to mop the
    entire floor of an obstacle-laden room,
   the human user must tell genetic
    programming what the robot is capable of
    doing:
   the robot may be capable of executing
    functions such as :
    moving, turning, and swishing the mop

2007/2008          G52AIM Artificial Intelligence Methods
Synthesizing an Analog Electrical Circuit
   The function set may enable genetic
    programming to construct circuits from
    components such as transistors, capacitors,
    and resistors.
   Once the human user has identified the
    primitive ingredients for a problem of circuit
    synthesis, the function set and terminal set
    can be used to automatically synthesize an
    amplifier, computational circuit, active filter,
    voltage reference circuit, or any other circuit
    composed of these ingredients.

2007/2008           G52AIM Artificial Intelligence Methods
Synthesizing an Amplifier
   if the goal is to get genetic programming to
    automatically synthesize an amplifier Then:
   the fitness function is the mechanism for
    telling genetic programming to synthesize a
    circuit that amplifies an incoming signal (as
    opposed to, say, a circuit that suppresses the
    low frequencies of an incoming signal or a
    circuit that computes the square root of the
    incoming signal).
2007/2008          G52AIM Artificial Intelligence Methods
Example: Driving a Car
   a program that drives a car
   There is no ideal solution,
   There is no one solution to driving a car.
   Some solutions drive safely at the expense of time, while others drive fast at a
    high safety risk.
   Therefore, driving a car consists of making compromises of speed versus safety,
    as well as many other variables.
   In this case genetic programming will find a solution that attempts to
    compromise and be the most efficient solution from a large list of variables.

   the program will find one solution for a smooth concrete highway, while it will
    find a totally different solution for a rough unpaved road.

   Generally Speaking :

Genetic programming works best for several types of problems.



2007/2008                      G52AIM Artificial Intelligence Methods
Control (Administrative) Parameters
(The fourth preparatory steps)


   The fourth preparatory step entails specifying the
    control parameters for the run.
   The most important control parameter is the
    population size.

   Other control parameters include the probabilities of
    performing the genetic operations, the maximum size
    for programs, and other details of the run




2007/2008               G52AIM Artificial Intelligence Methods
     Initialisation
   Maximum initial depth of trees Dmax is set
   Full method (each branch has depth = Dmax):
       nodes at depth d < Dmax randomly chosen from function set F
       nodes at depth d = Dmax randomly chosen from terminal set T
   Grow method (each branch has depth  Dmax):
       nodes at depth d < Dmax randomly chosen from F  T
       nodes at depth d = Dmax randomly chosen from T
   Common GP initialisation: ramped half-and-half, where
    grow & full method each deliver half of initial population


     2007/2008                G52AIM Artificial Intelligence Methods
       Termination
       (The fifth preparatory step)


   To Specify the termination criterion and the method
    of designating the result of the run.
   The termination criterion may include a maximum
    number of generations to be run as well as a
    problem-specific success predicate.
   In practice, one may manually monitor and manually
    terminate the run when the values of fitness for
    numerous successive best-of-generation individuals
    appear to have reached a plateau.
   The single best-so-far individual is then harvested
    and designated as the result of the run.

2007/2008                  G52AIM Artificial Intelligence Methods
4 Steps for Running GP
(to solve a problem)


   1) Generate an initial population of random compositions of the
   functions and terminals of the problem (computer programs).
   2) Execute each program in the population and assign it a fitness value
   according to how well it solves the problem.
   3) Create a new population of computer programs.
         i) Copy the best existing programs
         ii) Create new computer programs by mutation.
         iii) Create new computer programs by crossover ( reproduction).
   4) The best computer program that appeared in any generation, the
   best-so-far solution, is designated as the result of genetic programming
   [Koza 1992].




2007/2008                  G52AIM Artificial Intelligence Methods
Running Genetic Programming
   After the human user has performed the preparatory
    steps for a problem, the run of genetic programming
    can be launched.
   Once the run is launched, a series of well-defined,
    problem-independent executional steps (that is, the
    flowchart of genetic programming) is executed.
   Important Note: Genetic programming is problem-
    independent in the sense that the flowchart
    specifying the basic sequence of executional steps is
    not modified for each new run or each new problem.


2007/2008             G52AIM Artificial Intelligence Methods
Flowchart (Executional Steps) of
Genetic Programming

    There is usually no discretionary human intervention
     or interaction during a run of genetic programming
     (although a human user may exercise judgment as
     to whether to terminate a run).
    The flowchart shows the genetic operations of
1.   crossover, (in this flowchart two-offspring version of the crossover operation)
2.   reproduction, and
3.   mutation as well as
4.   the architecture-altering operations.


2007/2008                     G52AIM Artificial Intelligence Methods
The figure below is a flowchart showing the executional steps of a run of genetic programming.




           2007/2008                      G52AIM Artificial Intelligence Methods
            Acknowledgements

            Most of the lecture slides
          are adapted from the same
               module taught in
          Nottingham, UK campus by
        Dr. Graham Kendall, Dr. Andrew
             Parkes and Dr. Rong Qu

2007/2008         G52AIM Artificial Intelligence Methods