; GA+GP
Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out
Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

GA+GP

VIEWS: 13 PAGES: 35

  • pg 1
									Brief introduction to
 genetic algorithms
         and
genetic programming
         A.E. Eiben
     Free University Amsterdam
                     Genetic algorithm(s)
    Developed: USA in the 1970‟s
    Early names: J. Holland, K. DeJong, D. Goldberg
    Typically applied to:
         discrete optimization
    Attributed features:
         not too fast
         good solver for combinatorial problems
    Special:
         many variants, e.g., reproduction models, operators
         formerly: the GA, nowdays: a GA, GAs



A.E. Eiben, GAs and GP               2              EvoNet Summer School 2002
                     Representation

Phenotype space                             Genotype space = {0,1}L
                         Encoding
                         (representation)            10010001

                                                       10010010
                                               010001001

                                                 011101001
                         Decoding
                         (inverse representation)


A.E. Eiben, GAs and GP           3              EvoNet Summer School 2002
                   GA: crossover (1)

 Crossover is used with probability pc
    1-point crossover:
         Choose a random point on the two parents (same for both)
         Split parents at this crossover point
         Create children by exchanging tails


    n-point crossover:
         Choose n random crossover points
         Split along those points
         Glue parts, alternating between parents


    uniform crossover:
         Assign 'heads' to one parent, 'tails' to the other
         Flip a coin for each gene of the first child
         Make an inverse copy of the gene for the second child

A.E. Eiben, GAs and GP                      4                 EvoNet Summer School 2002
                   GA: crossover (2)




A.E. Eiben, GAs and GP      5     EvoNet Summer School 2002
                         GA: mutation
Mutation:
 Alter each gene independently with a probability pm




   Relatively large chance for not being mutated
    (exercise: L=100, pm =1/L)

A.E. Eiben, GAs and GP              6               EvoNet Summer School 2002
            Crossover OR mutation?

    Decade long debate: which one is better / necessary /
     main-background

    Answer (at least, rather wide agreement):
          it depends on the problem, but
          in general, it is good to have both
          both have another role
          mutation-only-EA is possible, xover-only-EA would not work




A.E. Eiben, GAs and GP               7              EvoNet Summer School 2002
          Crossover OR mutation?
                 (cont’d)
Exploration: Discovering promising areas in the search
space, i.e. gaining information on the problem

Exploitation: Optimising within a promising area, i.e. using
information

There is co-operation AND competition between them

Crossover is explorative, it makes a big jump to an area
somewhere “in between” two (parent) areas

Mutation is exploitative, it creates random small diversions,
thereby staying near (i.e., in the area of ) the parent

A.E. Eiben, GAs and GP         8            EvoNet Summer School 2002
                Crossover OR mutation?
                       (cont’d)
     Only crossover can combine information from two parents

     Only mutation can introduce new information (alleles)

     Crossover does not change the allele frequencies of the
      population (thought experiment: 50% 0‟s on first bit in the
      population, ?% after performing n crossovers)

     To hit the optimum you often need a „lucky‟ mutation.


    A.E. Eiben, GAs and GP        9            EvoNet Summer School 2002
                         Selection
   Main idea: better individuals get higher chance
   2ndary idea: chances proportional to fitness
   Implementation: roulette wheel technique
      Assign to each individual a part of the roulette wheel

       (“unfair”: size proportional to its fitness)
      Spin the wheel n times to select n individuals (fair)



                                                  1/6 = 17%

    fitness(A) = 3
                                        A          B
    fitness(B) = 1                                         C
                                     3/6 = 50%       2/6 = 33%
    fitness(C) = 2


A.E. Eiben, GAs and GP          10               EvoNet Summer School 2002
                    Selection (cont’d)
Fitness proportional selection (FPS):

                                                                 fi
Expected number of times fi is selected for mating is:       .
                                                                 f

Disadvantages:
 Outstanding individuals take over the entire population
  very quickly  danger for premature convergence.
 Low selection pressure when fitness values are near each
  other.
 Behaves differently on transposed versions of the same
  function.

 A.E. Eiben, GAs and GP        11           EvoNet Summer School 2002
                         Selection (cont’d)
Tournament selection:
 Pick k individuals randomly, without replacement
 Select the best of these k comparing their fitness values
 k is called the size of the tournament
 selection is repeated as many times as necessary




A.E. Eiben, GAs and GP           12        EvoNet Summer School 2002
                    Generational GA
                   reproduction cycle
1. Select parents for the mating pool
   (size of mating pool = population size)
2. Shuffle the mating pool
3. For each consecutive pair apply crossover with
   probability pc
4. For each new-born apply mutation (bit-flip with
   probability pm)
5. Replace the whole population by the resulting mating
   pool



 A.E. Eiben, GAs and GP       13          EvoNet Summer School 2002
                    Generational GA
                   reproduction cycle
   Generation t           Mating pool        Generation t+1


       string1              string2          child1(2,4)
       string2              string4          mut(child2(2,4))
       string3              string2          string2
       string4              string1          mut(string1)
          …                   …                     …



Notes:
• Offspring can be: clone, pure mutant, pure crossing, mutated crossing
• Generational replacement: whole population deleted/replaced
To be discussed: no survival of the fittest here?

 A.E. Eiben, GAs and GP                 14             EvoNet Summer School 2002
             An example after Goldberg
                      ‘89 (1)
    Simple problem: max x2 over {0,1,…,31}
    GA approach:
          Representation: binary code, e.g. 01101  13
          Population size: 4
          1-point xover, no mutation (just an example!)
          Roulette wheel selection
          Random initialisation
    One generational cycle with the hand shown


A.E. Eiben, GAs and GP           15          EvoNet Summer School 2002
           An example after Goldberg
                    ‘89 (2)




A.E. Eiben, GAs and GP   16   EvoNet Summer School 2002
           An example after Goldberg
                    ‘89 (3)




A.E. Eiben, GAs and GP   17   EvoNet Summer School 2002
                             The simple GA
   Has been subject of many (early) studies
   Is often used as benchmark for novel GAs
   Shows many shortcomings, e.g.
        Representation is too restrictive
        Mutation & crossovers only applicable for bit-string & integer
         representations
        Selection mechanism sensitive for converging populations with close
         fitness values
        Generational population model can be improved with explicit survivor
         selection



    A.E. Eiben, GAs and GP              18             EvoNet Summer School 2002
                   Genetic programming
    Developed: USA in the 1990‟s
    Early names: J. Koza
    Typically applied to:
         machine learning tasks
    Attributed features:
         competes with neural nets and alike
         slow
         needs huge populations (thousands)
    Special:
         non-linear chromosomes: trees, graphs
         mutation possible but not necessary (disputed!)


A.E. Eiben, GAs and GP               19             EvoNet Summer School 2002
                         Motivation
 Why introduce yet another EA?

                   Because fixed length linear
                   representations are too rigid

 Reasons:
  Elements of a search space may vary in length
  Linear representation may be too „unnatural‟
  Complex variable hierarchy can not be (easily) mapped on linear
   structures

 Example search space:
  Graphs without restriction on size and structure

A.E. Eiben, GAs and GP              20             EvoNet Summer School 2002
                 Credit score example (1)

    Given: lot of historical data on:
         customer profile and
         creditability index (good/bad).


    Needed: a model that classifies good customers.             (to be used
     for evaluating loan applicants)




           Data description for customer profiles.




A.E. Eiben, GAs and GP                       21        EvoNet Summer School 2002
                Credit score example (2)

    A possible model for classification:
         IF (NOC = 2) AND (S > 80000) THEN good

    In general: IF formula THEN good.
    Need to find the right formula.
    Natural representation of formulas is: parse trees
                                 AND

                         =                  >


               NOC           2          S        80000

    Natural fitness of models: percentage of well classified
     cases.
A.E. Eiben, GAs and GP             22           EvoNet Summer School 2002
                  GP: representation
   Problem domain: modelling (forecasting, regression,
    classification, data mining, robot control).

   Fitness: the performance on a given (training) data set,
    e.g. the nr. of hits/matches/good predictions

   Representation: implied by problem domain, i.e.
    individual = model = parse tree
        parse trees sometimes viewed as LISP expressions 
         GP = evolving computer programs
        parse trees sometimes viewed as just-another-genotype 
         GP = a GA sub-dialect
A.E. Eiben, GAs and GP             23            EvoNet Summer School 2002
                         GP: mutation
     Replace randomly chosen subtree by a randomly
     generated (sub)tree




A.E. Eiben, GAs and GP        24        EvoNet Summer School 2002
                         GP: crossover
    Exchange randomly selected subtrees in the parents




A.E. Eiben, GAs and GP        25         EvoNet Summer School 2002
                             GP: selection
    Standard GA selection is usual

    Sometimes overselection to increase efficiency:
          rank population by fitness and divide it into two groups:
               group 1: best c % of population
               group 2: other 100-c %
          when executing selection
               80% of selection operations chooses from group 1
               20% from group 2
          for pop. size = 1000, 2000, 4000, 8000 the portion c is
           c = 32%, 16%, 8%, 4%
          %‟s come from rule of thumb

A.E. Eiben, GAs and GP                    26             EvoNet Summer School 2002
                    Generating random trees

 Given a:
   Function set F and a
   terminal set T ,
   both satisfying the closure property.

 Trees are randomly generated by:
   Full method:
         Each branch is of length Dmax (pre-specified),
         nodes with depth < Dmax are from F
         nodes with depth = Dmax are from T
   Grow method:
         maximum branch length is Dmax (pre-specified)
   Ramped half-and-half:
    for each D  Dmax an equal nr. of trees
         half of them with full method
         half of them with grow method
A.E. Eiben, GAs and GP                    27      EvoNet Summer School 2002
                         Mutation of trees

      Replace randomly chosen subtree by a randomly generated
      (sub)tree.




A.E. Eiben, GAs and GP            28           EvoNet Summer School 2002
                         Crossover of trees

     Exchange randomly selected subtrees in the parents




A.E. Eiben, GAs and GP            29             EvoNet Summer School 2002
           Standard parameters in GP (1)

                    Qualitative variables
Initialisation: ramped half-and-half.
Fitness: adjusted fitness is used.
Selection:
    fitness proportionate,
    elitist strategy is not used,
    over-selection is used for populations of M  1000.

Over selection for population size = 1000:
    rank population by fitness and divide it into two groups:
    group 1: best c = 32% of pop, group 2 other 68%
    80% of selection operations chooses from group 1, 20% from group 2
    for pop. size = 2000, 4000, 8000 the portion c is c = 16%, 8%, 4%
    motivation: to increase efficiency, %‟s come from rule of thumb



A.E. Eiben, GAs and GP                 30              EvoNet Summer School 2002
          Standard parameters in GP (2)

                            Major numerical parameters
      Population size M = 500.
      Maximum number of generations G = 51.

                          Minor numerical parameters
      Probability pm of mutation = 0% !!!
      Probability pr of reproduction = 10%
      Probability pc of crossover = 90%
      Probability pip of choosing internal points for xover = 90%
      Maximum size Di for initial random S-expressions = 6
      Maximum size Dc for S-expressions during the run = 17

   … and some “exotic” options usually set at 0 (e.g. permutation,
     editing, encapsulation, decimation)

A.E. Eiben, GAs and GP               31              EvoNet Summer School 2002
        Simple symbolic regression (1)


Given a number of sample points in 2:
                 (x1, y1), (x2, y2), … , (xn, yn)
Find a one-dimensional numerical function f(x):
                 i  {1, …, n} : f(xi) = yi
In the present test 20 sample points are generated by:
                 x4 + x3 + x2 + x




   A.E. Eiben, GAs and GP              32                EvoNet Summer School 2002
      Simple symbolic regression (2)




Specification of the GP for the symbolic regression problem



 A.E. Eiben, GAs and GP       33          EvoNet Summer School 2002
     Simple symbolic regression (3)




   Graphical representation of an individual
   (and the benchmark formula x4 + x3 + x2 + x)
A.E. Eiben, GAs and GP        34          EvoNet Summer School 2002
            Simple symbolic regression (5)
             +
    X                    •
             +               X

    X                    •

                 •           X

        +                X   Best individual representing a perfect solution:
                                 X +(X • (X +(X • (X • (X +(COS(X - X ) - (X - X))))))) =
X             -                            X + (X • (X + (X • (X • (X + 1))))) =
                                                    X + X4 + X3 + X2
      cos                -
        -            X       X

X             X
    A.E. Eiben, GAs and GP                      35              EvoNet Summer School 2002

								
To top