Docstoc

ch7 gp

Document Sample
ch7 gp Powered By Docstoc
					Genetic Programming:
   An Introduction




                       1
      The Lunacy of Evolving Computer
                 Programs
Before we start, consider the general evolutionary algorithm :
 Randomly create a population of solutions.
 Evaluate each solution, giving each a score.
 Pick the best and reproduce, mutate or crossover with other
  fit solutions to produce new solutions for the next
  generation.




                                                             2
     The Lunacy of Evolving Computer
                Programs
Now consider what this means in the context of genetic
  programming:
Randomly create a population of programs.
Evaluate each program, giving each a score.
Pick the best and reproduce, mutate or crossover with other
  fit programs to produce new programs for the next
  generation.




                                                              3
         The Lunacy of Evolving Computer
                    Programs
A randomly generated C program
#bjsieldi <dkjsldkfj.?+
nit anim(tin x, rach*vrag[)}
{
nit x;
rof (x = 10; : ) {
touch ,? *wha”ts g01nG 0n?@; :
]
]
                                           4
     The Lunacy of Evolving Computer
                Programs
The argument against evolving programs
Randomly created programs have an infinitesimal chance of
 compiling, let alone doing what you want them to do..
Running a randomly created program will most likely give
 array out-of bounds errors, data-casting, core-dumps and
 division by zero errors, and is ultimately prone to the
 halting problem
Mutating and mixing segments of randomly created
 programs is as senseless as randomly creating them in
 the first place.
(How) does genetic programming get around this?
                                                            5
           What makes GP different

                   Individual       Individual Size
                   Representation   (complexity)

GA(conventional)   Coded strings of Fixed-length strings
                   numbers


GP                 In general by    Variable
                   LISP S-
                   expressions



                                                           6
       GP algorithm
    Create random population

     Evaluate fitness function

Apply evolution genetic operators
   probabilistically to obtain
    a new computer program
Reproduction/Crossover/Mutation


  Insert new computer program
       into new population          7
           The Genetic Programming
                Representation
The trick is to choose an underlying representation for
  programs such that:the random creation, mutation and
  crossover of programs always yields a syntactically
  correct program.
The representation employed in genetic-programming is a
  tree: this representation is natural for LISP programs and
  leads to elegant algorithms for creation, mutation and
  crossover.




                                                               8
                  Genetic Structure

Functions: Can be conditional(if, then,etc.), sequentual(+,-
  ,etc.), iterative (whileDo etc.)
Terminals: No arguments, just return a value




                                                               9
                    Evolving Trees

In fact the representation is useful for the evolution of more
   than just LISP programs! The tree structures in a genetic
   programming population can be used to determine layouts
   for analogue electric circuits, create neural networks,
   paralellise computer programs and much much more.
It’s a great representation because it can produce solutions
    of arbitrary size and complexity, as opposed to, for
    example, fixed-length genetic algorithms.
As we’ll be applying an evolutionary algorithm to this
  representation, we need to define creation, crossover and
  mutation operators.
                                                               10
      Creation, Crossover and Mutation

The following shows how tree structures can be created,
  crossed and mutated.
Creation: randomly generate a tree using the functions and
  terminals provided
Crossover: pick crossover points in both parents and swap
  the subtrees. If the parents are same, the offsprings will
  often be different.
Mutation: pick a mutation point in one parent and replace its
 sub-tree with a randomly generated tree.


                                                               11
Crossover




            12
Mutation




           13
                Population Creation

When creating a population, it’d be nice to begin with many
 trees of different shapes sizes. We can generate trees
 using the full or the grow method:
full - every path in the tree is the maximum length
grow - path lengths will vary up to the maximum length.
Typically, when a population is created, the “Ramp half-and-
  half” technique is used.Trees of varying depths from the
  minimum to maximum depth are created, and for each
  depth half are created using the full method and the other
  half are created using the grow method.

                                                              14
            Preparatory steps for GP

You’ve decided you want to use GP to solve a problem. To
  set up your GP runs, you need to do the following:
Determine the set of terminals (the leaves of your trees). In
 the programming context, these are usually variables,
 input values or action commands
Determine the set of functions (the nodes of your trees).
The fitness measure
The parameters for controlling the run: Population size,
  Maximum number of generations, Mutation, Crossover
  and Reproduction rates (1%, 90%, 9%)
The method for terminating a run and designating a result.
                                                                15
               Sufficiency & Closure

Function and terminal sets must satisfy the principles of
  closure and sufficiency:
Closure: every function f must be capable of accepting the
  values of every terminal t from the terminal set and every
  function f from the function set.
Sufficiency: A solution to the problem at hand must exist in
  the space of programs created from the function set and
  terminal set.
One way to get around closure is to use make all terminals
  and functions return the same type (for example, integer)
  or use strongly typed genetic programming to ensure that
  all expressions are type-safe.                            16
           Example: Symbolic Regression

Problem: Can GP evolve the function to fit the following data::
x f(x)
0   0
1   4
2   30
3   120
4   340
5   780
6   1554
7   2800
8   4680                                                     17
             GP Symbolic Regression

Function Set: +, - *, /
Terminal Set: X
Fitness Measure: use the absolute difference of the error.
   Best normalized fitness is 0.
Parameters: Population Size = 500, Max Generations = 10,
  Crossover = 90%, Mutation = 1%, Reproduction = 9%.
  Selection is by Tournament Selection (size 5), Creation is
  performed using RAMP_HALF_AND_HALF.
Termination Condition: Program with fitness 0 found.


                                                             18
                          Results

The following zero-fitness individual was found after two
  generations
(add (add (mul (mul X X) (mul X X)) (mul (mul X X) (- X)))
(sub X (sub (sub (sub X X) (mul X X)) (mul (add X X)
(mul X X)))))
which correctly captures the function:
f(x) = x4 + x3 + x2 + x




                                                             19
                     Santa Fe Trail

In the Santa Fe Trail, an ant must eat all the items of food in
   a trail. The ant can only move left, right or forward, and
   can only sense what is directly in front of him.




                                                              20
                     GP Santa Fe

Function Set: Prog2, Prog3, IfFoodAhead
Terminal Set: TurnLeft, TurnRight, MoveForward
Fitness Measure: count the number of items food eaten after
   a fixed number of moves, and subtract from 89. Bad
   fitness = 89, Good fitness = 0.
Parameters: Population Size = 500, Max Generations = 50,
  Crossover = 90%, Mutation = 1%, Reproduction = 9%.
  Selection is by Tournament Selection (size 5), Creation is
  performed using RAMP_HALF_AND_HALF.
Termination Condition: Program with fitness 0 found.

                                                           21
                  Some programs

Prog2(TurnRight)(TurnLeft)
Prog2(MoveForward)(MoveForward)




                                  22
                        Result

Here’s how one agent fared:




                                 23
                         Agent

(Prog3 (IfFoodAhead (IfFoodAhead (IfFoodAhead
   (IfFoodAhead (Prog2 MoveForward MoveForward)
   TurnLeft) TurnLeft) TurnLeft) (IfFoodAhead MoveForward
   (IfFoodAhead MoveForward (IfFoodAhead (IfFoodAhead
   (Prog2 MoveForward MoveForward) TurnLeft) TurnLeft))))
   TurnLeft (Prog3 (IfFoodAhead (IfFoodAhead
   MoveForward TurnLeft) TurnRight) MoveForward
   TurnRight))
Smaller agents can be found!



                                                       24
   Robot Wall-Following with GP (Koza,
                  1993)
Given: Odd-shaped room with robot in center.
Find: A control strategy for the robot that makes it move
   along the periphery.
GP Primitives:
Terminals: S0, S1..S11 (12 sensor readings, distance to wall),
Functions: IFLTE (if less than or equal), PROGN2, MF, MB
  (move forward/back), TL, TR (turn left/right).
Fitness Function: Fitness = peripheral cells visited.
Sample Individual/Strategy:
(IFLTE S3 S7 (MF) (PROG2 MB (IFLTE S4 S9 (TL) (PROG2
                                                   25
   (MB) (TL)))))
Wall-Following Evolution




                           26
                   Fitness function

The fitness function is based on executing the evolved
  programs on one or more prescribed test suites.
The test suites can be devised in the same way as those
  used when testing traditional manually produced
  programs.
Program size as part of fitness




                                                          27
                    Fitness function

Fitness Functions
Error-based
– Fitness inversely proportional to total error on the test data.
– E.g. symbolic regression, classification, image
   compression, multiplexer design..
Cost-based
– Fitness inversely proportional to use of resources (e.g. time,
   space, money, materials, tree nodes)
– E.g. truck-backing, broom-balancing, energy network
   design…
                                                               28
                    Fitness function

Benefit-based
– Fitness proportional to accrued resources or other benefits.
– E.g. foraging, investment strategies
Parsimony-base
– Fitness partly proportional to the simplicity of the
   phenotypes.
– E.g. sorting algorithms, data compression…
Entropy-based
– Fitness directly or inversely proportional to the statistical
   entropy of a set of collections
                                                                  29
– E.g. Random sequence generators, clustering algorithms,
                   … Designer GP

In recent times, the tree-representation employed by GP has
   been used for automatic design of electrical circuits.
The tree is no longer a “program”, but should be considered
  a “program that builds circuits”.
The idea of building graph structures using commands
  embedded in a tree was developed by Frederic Gruau. He
  used it to evolve neural networks: Koza et al now use it to
  evolve electric circuits.
Functions of node are Par (P) and Seq (S), that change the
  topology of the graph. Other functions and terminals
  modify the values at the nodes. Everything begins with
  one embryonic cell with a pointer to the head of the tree. 30
Cellular Encoding




                    31
Cellular Encoding




                    32
Cellular Encoding




                    33
Cellular Encoding




                    34
Cellular Encoding




                    35
Cellular Encoding




                    36
Cellular Encoding




                    37
Cellular Encoding




                    38
Cellular Encoding




                    39
Cellular Encoding




                    40
Cellular Encoding




                    41
Cellular Encoding




                    42
            So you want to use GP...

Genetic programming, at its heart, is the evolution of tree
  structures that can be interpreted as programs. Use GP to
solve problems where the solutions are naturally expressed
  as tree structures.
evolve LISP programs to solve a problem
evolve solutions in an indirect manner, by using the GP
  trees to build solutions to problems.
Your approach will be to determine the functions & terminals
  that constitute your trees, and how to interpret the
  resulting trees as solutions to your problem.

                                                           43