Docstoc

Genetic Algorithms

Document Sample
Genetic Algorithms Powered By Docstoc
					Genetic Algorithms
                    Overview
• Genetic Algorithms: a gentle introduction
  – What are GAs
  – How do they work/ Why?
  – Critical issues


• Use in Data Mining
  – GAs and statistics
  – decile performance maximization
  – multi-objective models
        Natural Genetics to AI

• Computational models inspired by
  biological evolution
  – survival of the fittest
  – reproduction through cross-breeding
              Genetic Algorithms
• Population based search (parallel)
  – simultaneous search from multiple points in search space
  – useful in complex, unstructured search spaces
    (less prone to local failures)

     Population members: potential solutions


• Population of solutions evolve from one
  generation to the next
             Genetic Algorithms
• Search objective
   – Fitness score for population members
     (fitness function)

• Survival of the fittest
   – selection
• Generating new solutions
   – “Mating” and reproduction of individuals
      (crossover, mutation)
                Basic Operation
               Selection  Recombination
                                 Crossover      Mutation
String1 (f1)           String1           Offspring1(1,4)
String2 (f2)           String2           Offspring2(1,4)

String3 (f3)           String2           Offspring3(2,7)

String4 (f4)           String4           Offspring4(2,7)

  ...                      ...             ...

  ...                      ...             ...

StringN (fN)           Stringx           OffspringN(x,y)

Generation t                             Generation t+1
          GAs: Parallel Search
Fitness




                  X


              X       Hill
                      climber

                                 x
           GAs: Basic Principles
• Representation of individuals
  – String of parameters (genes) : chromosome
    eg. optimize a function F(p,q,r,s,t)
        Population members: p q r s t


  – genotype and phenotype
          Binary representation?
• Population members as bit strings
     F( p,q,r,s,t) as:

                 10011010110110011010
                     p   q       r       s     t


   – early theory in terms of binary strings       (schema
     theorem)
   – unnecessary perversity?
             GAs: Basic Principles
• Survival of the fittest (Fitness function)

   – numerical “figure of merit”/utility measure of an individual
   – tradeoff amongst a multiple evaluation criteria
   – efficient evaluation
         GAs: Basic Principles
• Iterative search
   – population evolves over generations


• Convergence
   – progression towards uniformity in population
   – premature convergence?
    (local optima)
          Typical GA Run
Fitness

              Best

                 Average




                           Generations
         Operators: Selection
• Fitness proportionate selection (fi/f )
• number of reproductive trials for individuals
                          Selection
• Roulette-wheel selection
  (stochastic sampling with replacement)
    – wheel spaced in proportion to
      fitness values
    – N (pop size) spins of the wheel


• Stochastic universal sampling
   – N equally spaced pins on wheel
   – single turn of the wheel
                       Selection
• Premature converge
• Fitness scaling
       f = f - (2*avg. - max.)

•   Ranked fitness
•   Elitism
•   Steady-state selection
•   Demetic grouping
         Operators: Crossover
  Parent 1: axpsqvqbtpihd
  Parent 2: qzxxaycgbtphw
                                  crossover sites
Offspring 1: azpsavcbtpphd
Offspring 2: qxxxqyqgbtihw
            (Uniform crossover)
• combining good building blocks
        Operators: Mutation

• alters each gene with small probability
     x1yx0y0yy0x yxy


     x1yx0y1yy0x xxy
       Non-Binary Representations
• Integer, real-number, order-based, rules, ...

• Binary or Real-valued?
     real representations give faster, more
      consistent, more accurate results


• High-level representation
   – intuitive, can utilize specialized operators
   – effective search over complex spaces
      Real-valued representation
Parent1:     3.45 0.56 6.78 0.976 2.5
Parent2:     0.98 1.06 4.20 0.34 1.8

Offspring1: 3.22 0.56 6.78 0.65 2.12
Offspring2: 1.43 1.06 4.20 0.41 1.93
            (Arithmetic crossover)
       High-level representation
Parent1:      {(1.2  x1  3.4)(5.8  x2  6.0)(0.2  x7  0.61)}
Parent2:   {(2.3 x6  41)  (36  x2  51)  (51 x4  561)
                        .      .         .      .        .
                  (03 x3 11)  (2.2  x9  2.7)}
                     .       .
Offspring1:
                 {(1.2  x1  3.4) (2.2  x9  2.7)  (51 x4  561)}
                                                         .        .
Offspring2:
                 {(2.3 x6  41) [(36  x2  51) (5.8  x2  6.0)]
                              .      .         .
                   (03 x3 11) (0.2  x7  0.61)}
                      .       .
      High-level representation
• Generalize/Specialize
   {(03 x3 11)  (2.2  x9  2.7)}
      .       .
               {(03 x3 11)  (2.2  x9  2.7)  (51 x4  62)}
                  .       .                         .        .

   {(03 x3 11)  (2.2  x9  2.7)}
      .       .
               {(045 x3  09)  (19  x9  2.9)}
                  .         .      .
Tree-structured representation (GP)
•Automated learning of programs (originally)
      parse tree expressions
                                            *
•Non-linear interaction terms
                                        /       log

•Function set : internal nodes                   y
  {+,-,*,/,log}                   x         5
•terminal set: leaf nodes
  {constants, variables}              (x log(y))/5)
        Tree-structured representation
• Representing complex patterns
                      if

                                       If (y<7) and (x>2)
        AND            0       +
                                       then 0
    <         >
                                       else 2x+y

y   7      x      2        *       y



                       x       2
•
           Genetic search: Issues
    Coding scheme, fitness function critical
    – the “art” in GA design!
    – General mechanism so robust that, within reasonable margins,
      parameter settings are not critical .

• Representation to match problem, domain
    – utilizing domain knowledge
        • problem-specific crossover, mutation, selection

• Flexibility in fitness function formulation
    – modeling business objectives
           Genetic search: Issues
• Stochastic search
   – initial populations, probabilistic operators
   – multiple runs with different random streams

   – Initializing population with known solutions
   – seeding initial population with solutions from multiple,
     independent runs
        Genetic search: Issues
• Guarantees optimality?
  – But...

• GAs and traditional techniques
  – especially useful where traditional approaches fail
  – in conjunction with traditional techniques

• Parallelizable for large data
  – multi-processor, networked machines
                    Using GAs ?

•   When to use a GA?
•   GA and traditional techniques
•   How long does it take?
•   Will it perform better?
                    Using GAs
•   population size
•   mutation, crossover rates
•   how many generations
•   multiple runs
            Is it a “black-box”?

                         ?    Huh?



• Data characteristics
• Fitness function
• GA parameters
        GA Application Examples
• Function optimizers
   – difficult, discontinuous, multi-modal, noisy functions
• Combinatorial optimization
   – layout of VLSI circuits, factory scheduling, traveling
     salesman problem
• Design and Control
   – bridge structures, neural networks, communication networks
     design; control of chemical plants, pipelines
       GA Application Examples
• Machine learning
    – classification rules, economic modeling, scheduling strategies

Portfolio design, optimized trading models, direct
marketing models, sequencing of TV advertisements,
adaptive agents, data mining, etc.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:9
posted:7/14/2011
language:English
pages:31