Lecture6
Document Sample


CC282
Genetic Algorithm
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 1
Lecture 06 – Outline
• Introduction
• GA terminology
• GA basic description
• Encoding of chromosomes
• Selection operator in GA
• Crossover and mutation operators in GA
• Applications
– Evolving ANN
– Genetic Programming
• Toy example
• Advantages and disadvantage of GA
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 2
Genetic Algorithm (GA) - Introduction
• GA is a part of evolutionary computation
• GA is inspired by Darwin’s theory of evolution -
problems are solved by an evolutionary process
resulting in the survival of the fittest
• EC was introduced in 1960s by Recheneberg
• J. Holland invented GA in the 70s
• J. Koza used GA to evolve programs (GP) in 1992
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008
Genetic Algorithm (GA) - Terminology
Living organisms consist of cells. Cells contains DNA
carrying the genetic material of the organism defining its
traits
• Chromosomes are strings of DNA and serve as a model for the whole organism
(genetic material)
• Genes - blocks of DNA of which the chromosomes consist. It can be said that each
gene encodes a trait or feature
• Alleles are possible values for a trait (i.e. the gene)
• Genome - a complete set of genetic material (i.e. all chromosomes), this is called a
population in GA
• Crossover is the operation when genes from parents combine to form a whole
new chromosome during reproduction producing offspring
• Mutation is when some elements of the genetic material is changed (normally
through a random procedure)
• Fitness of an organism is measured by its degree of success/failure in survival
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008
Hypothesis/search space - revisited
• Each point is a possible solution
and has a fitness value
• Fitness measure how good the
solution is
• Fitness in this case is opposite to
error measure
• GA searches for the best/optimal
solution, though there is no
guarantee that it will find it
• GA finds a solution in a
evolutionary manner
• Other similar methods are hill
climbing, tabu search, simulated
annealing
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008
GA – Basic description
Steps in brief:
• GA begins with an initial population, i.e. a START
set of solutions/chromosomes
• Fitness of each chromosome is computed Randomly generate an
initial population
• Selection operators are applied that
favours more fit chromosomes Evaluate fitness of
Replace old population with new one each individual
• Crossover - with the hope that by
recombination of parents, offspring Generate offspring by mutation with
probability, Pm
produced may be fitter than the parents ->
chromosomes recombine to produce Generate offspring by crossover with
offspring probability, Pc
• Mutation operator is applied
Select individuals to mate no
• Assess the fitness of the new population – Terminate
stop if the optimal solution is achieved or if yes
the maximum generation number is STOP
reached
• Else, repeat to next generation with
selection, crossover, mutation operators
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008
The GA algorithm
GA(Fitness, Fitness_threshold, max_generation, popsize, Pc, Pm)
Fitness: A function that assigns an evaluation score, given a hypothesis
Fitness_threshold: A threshold specifying the termination criterion
Max_generation: The maximum generation number to terminate GA
popsize: The size of the population
Pc: Crossover probability, i.e. the fraction of the population to be replaced by crossover operator at each
generation
Pm: Mutation probability, i.e. the fraction of the population to be replaced by mutation operator at each
generation
• Initialise population: P ← Generate popsize random hypotheses
• Evaluate: for each h in P, compute Fitness(h)
• While [maxh Fitness(h)] < Fitness_threshold | generation < max_generation
1. Selection: Select popsize members of P (with replacement) to add to Pnext
2. Crossover: Pairs of hypotheses are randomly selected using Pc. For each pair,
<h1,h2>, produce two offspring by applying the crossover operator. Add all
offspring to Pnext
3. Mutate: Invert a randomly selected bit in random members of Pnext using
probability Pm
4. Update: P ← Pnext
5. Evaluate: for each h in P, compute Fitness(h)
• Return the hypothesis from P that has the highest fitness
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008
GA – Some preliminary design questions
• Encoding
– GA operates on the coding of parameters rather than the parameter
itself
– These parameters are called chromosomes and are a string of values
which represent potential solutions to the given problem
– The encoding could be binary, decimal or continuous – which to use?
• Constraints - Any constraint to the gene values?
• Fitness – How to obtain the fitness for each chromosome?
• Selection - How to select candidate chromosomes?
• The other two operators - How to perform Crossover and
Mutation?
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008
Chromosomes – binary representation
• Chromosomes are mostly represented by a string of bits
• Each bit/group of bits represents some characteristic/attribute/feature
• Values of each feature are checked
– represent each feature with enough bits to cover all possible values
• Recall the play-tennis example:
• Wind : {strong, weak} can be represented by two bits
• Example:
• Wind =strong, {10}, , Wind =weak, {01}, Wind =strong or weak {11}
• Outlook: {cloudy, rainy, sunny} can be represented by three bits
eg: Outlook =cloudy or rainy then this is represented as 110
• So, a rule such as (Outlook=cloudy rain) (Wind=strong) the
chromosome representation is 11010
Lecture 7 slides for CC282 Machine Learning, R. Palaniappan, 2008
Binary and decimal coding chromosomes
• Let us consider a more general situation
• Assume we have three variables, x, y and z
• Decimal coding is simply the integer values for genes, eg: x=35, y=191, z=5
• Binary coding – the genes are coded in binary form
• Let us assume that these variables can take integer values from 0 to 255
• So, we need 8 bits for each variable (i.e. gene)
• If x =35, y=191, z=5, we have
– x=00100011, y=10111111, z=00000101
– And the chromosome 001000111011111100000101
• But why go through the hassle of representing integers using binary
coding?
– Answer (see Exercise 6, question 4)
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 10
Continuous coding chromosomes
• But what if we want genes to represent continuous values eg: x=0.67, y=1.56,
z=3.45
• Solution: use binary chromosome with approximation or use continuous valued
chromosomes
• We will not cover continuous valued chromosomes in this course
– As they require special type of GA operators
• Binary chromosome with approximation eg: x=0.145 (assume 8 bits per gene)
– Use the general equation: xdecimal xm in
xcontinuous
( xm ax xm in )
round ( xcontinuous ( xmax xmin ) x min) xdecimal
– With 8 bits, xmax=255 and xmin=0
– 0.145*255=36.975, round this to 37, so x =00100101
– So, x=00100101 is an approximation of x=0.145
– More bits will improve the approximation but computation becomes time consuming
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 11
Fitness function and gene contraints – an example
• Let us consider a linear programming problem, which arise naturally in production
planning:
• Suppose a particular Ford plant can build Escorts at the rate of one per minute,
Explorer at the rate of one every 2 minutes, and Lincoln Navigators at the rate of
one every 3 minutes. The vehicles get 8, 5, and 4 miles per litre, respectively, and
Parliament mandates that the average fuel economy of vehicles produced be at
least 6 miles per litre. Ford loses £1000 on each Escort, but makes a profit of £5000
on each Explorer and £15,000 on each Navigator. What is the maximum profit this
Ford plant can make in one 8-hour day?
• The fitness function here is the cost function, i.e. the profit Ford can make by
building x Escorts, y Explorers, and z Navigators
• And we want to maximize it
• The fitness function is f=-1000x+5000y+15000z
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 12
Gene constraints
• Using the same example in the previous slide:
• The constraints arise from the production times and Parliament mandate
on fuel economy
• There are 480 minutes in an 8-hour day, and so the production times for
the vehicles lead to the following limit:
x+2y+3z 480
• The average fuel economy restriction can be written:
8x+5y+4z 6(x+y+z) which simplifies to 2x-y-z 0
• There is an additional implicit constraint that the variables are all non-
negative:
x, y, z 0
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 13
Selection
• Selection (aka reproduction) operator is applied many times
to produce a mating pool of the new population
• There are a number of ways to do selection to ensure that the
members of the population are drawn with the correct
probability
– Roulette wheel (fitness proportionate) selection
– Tournament selection
– Steady-state selection
– Rank selection
– Elitism
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008
Roulette wheel (fitness proportionate) selection
• Chromosomes are selected according
to their proportionate fitness
Fitness(hi )
Fitnessproportionate (hi )
popsize
j 1
Fitness(h j )
• The higher fitness they are, the more
chances they have to be selected
• Sampling can be viewed as playing a
game of roulette where the pocket
Example: sizes are proportional to the
fitness_chromosome A =6.0 180: probability of selecting a particular
fitness_chromosome B =4.0 120: individual
fitness_chromosome C =2.0 60: • Each new member of the population
Random number generated is 0.29 (about 104.4:), is drawn independently when the
so chromosome A is selected, repeat this process roulette wheel is spun randomly
two more times to obtain three chromosomes for • In computer, this spin is done using a
Pnext randomly generated number [0,1]
Since there is the possibility of A,A,A for Pnext, this
• But the best (so far) found solution
could result in ‘overcrowding’
may be lost, eg: Pnext={B,B,C}
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008
Selection (ctd)
• Tournament selection
– Pick a few chromosomes (say, popsize/4 chromosomes) at random from the
population
– From these few, select the one fittest (i.e. with highest fitness), replace the
rest and repeat the process popsize times
– This method can retain some good chromosomes while giving chance for
other weaker chromosomes to take part in mating
• Steady-state selection
– A few good (with high fitness) chromosomes are selected to replace the few
bad (with low fitness) chromosomes
– The rest of population (the in-between fitness ones) are selected by other
methods or all are selected to remain in Pnext
Lecture 7 slides for CC282 Machine Learning, R. Palaniappan, 2008
Selection (ctd)
• Rank selection
– The other selection methods will have problems if the fitness
differs a lot
– For example, if the best chromosome fitness is 90% of all the rest,
then using roulette wheel, the other chromosomes will have very
few chances to be selected
– Rank selection first ranks the population and then every
chromosome receives fitness from this ranking (i.e. probability of
selection is proportional to rank)
– The worst will have fitness 1, second worst 2 etc and the best will
have fitness N (number of chromosomes in population)
– Then, using these new fitness values, roulette wheel selection Figure from
http://cs.felk.cvut.cz/~xobitko/ga/selection.html
method is performed
– Using this, all the chromosomes have a fair chance to be selected
– But this method can lead to slower convergence, because the
best chromosomes do not differ so much from other ones
• Elitism
– First, copies the best chromosome (or a few best chromosomes) to new population
– The rest is done using the any other selection methods, normally roulette wheel
– Can very rapidly increase performance of GA, as it prevents losing the best found solution
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 17
Crossover
• Even though reproduction increases the percentage of better fitness
chromosomes, the procedure is considerably sterile; it cannot create new
and better chromosomes
• This function is left over to crossover and to a lesser but critical extent, to
mutation
• Crossover process simulates the exchange of genetic material that occurs
during biological reproduction
• In this process pairs in the breeding population are mated randomly with a
crossover rate, Pc
• Typical crossover properties include that an offspring inherits the common
feature from the parents along with the ability of the offspring to inherit
two completely different features
• Popular crossover techniques: one point, two point and uniform crossover
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 18
Crossover (ctd)
• First, randomly select a pair of parents (i.e. two chromosomes)
• Perform crossover (swapping of bits) to obtain offspring, repeat this
process Pc*popsize/2 times with the used parent chromosomes not
included
• Example: if Pc=0.5 and popsize=20, then do crossover 5 times
• Single point and two-point crossover:
Single point crossover
Crossover points
Two point crossover
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 19
Crossover (ctd)
• The uniform crossover scheme works as follows
• A randomly generated bit string called the crossover mask
generalises the process
• A bit value of 1 in this bit string indicates that corresponding
bits in the parents are to be exchanged while a 0 bit indicates
no bit interchange
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 20
Mutation
• Mutation consists of making small alterations to the values of one or more genes in
a chromosome
• Mutation randomly perturbs the population’s characteristics, and prevents
evolutionary dead ends
• Most mutations are damaging rather than beneficial and hence mutation rate must
be low to avoid the destruction of species
• It works by randomly selecting a bit with a certain mutation rate in the string and
reversing its value
• Mutation is applied to the randomly chosen bit in a chromosome chosen randomly
• If Pm is 0.01, with a popsize of 20 with 18 bits each, then the mutation is repeated
for 0.01 x 18 x 20 =3.6 ≈4 times
Mutation example (for a randomly chosen bit in a randomly chosen chromosome)
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 21
Applications
• The possible applications of genetic algorithm are
immense
• Any problem that has a large search domain could be
suitably tackled by GA
• We shall explore (very briefly) on the use of GA to
evolve neural network weights and to evolve
function/programs in genetic programming
• We’ll also look at a simple toy example
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008
Evolving NN weights using GA – a simple example
• GA has been used successfully to evolve NN weights
• GA is suitable for evolving the weights of a neural network –
standard learning techniques such as backpropagation would
take thousands upon thousands of iterations to converge
• But GA could (given the appropriate direction) evolve suitable
weights within a hundred or so iterations
• Example
• Obtain the weights for perceptron unit for learning the OR
function (we saw this in the previous lecture)
• But rather than using backpropagation to update the weights, we
can use GA x =1 0
w0
x1 w1
z
y
w2
x2
A simple artificial neuron model
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008
Evolving NN weights using GA – a simple example
1. Initial parameters
– Fitness function: 1/MSE of desired to actual output, GA will maximise this
fitness function 1 1
Fitness function
1 4
( yi d i ) 2
MSE
4 i 1
– Coding, binary approximation: w1, w2 and w0 weights, say with each 6 bits, so
chromosome length is 18
– Popsize=20, i.e. 20 chromosomes, initially generated randomly
– Pc=0.5, Pm=0.01
– MSE_limit=0.1, so, fitness_threshold=10; max_generation=100
2. Gene constraints, w1, w2 and w0 in the range [-1,1]
3. Apply selection (say, tournament selection), crossover (say one point) and
mutation to produce a new population
4. Repeat step 3 until convergence to an acceptable solution
(fitness>fitness_threshold or generation>max_generation)
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 24
Genetic programming (GP) – An example
• In programming languages such as LISP, the mathematical notation is not written
in standard notation, but in prefix notation
– Examples:
+ 1 2 : 1+2
* + 1 2 2 : (1+2)*2
* + - 2 1 4 9 : ((2-1)+4)*9
– Notice the difference between the left–hand side and the right? Apart from the order
being different, there are no use of parenthesis
– The prefix method makes life a lot easier for programmers and compilers alike, because
order precedence is not an issue
• You can build expression trees out of these strings that then can be easily
evaluated. For example, the trees for the previous three expressions are.
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008
Genetic programming (GP) –An example (ctd)
• Having numerical data and primitive functions, but no expression to
conjoin the data with the primitive functions, a genetic algorithm
can be used to evolve an expression tree to create a very close fit to
the data
• By “splicing” and “grafting” the trees and evaluating the resulting
expression with the data and testing it to the primitive functions,
the fitness function can return how close the expression is
• The limitations of genetic programming lie in the huge search space
the GA have to search for - an infinite number of equations
• Therefore, normally before running a GA to search for an equation,
the user tells the program which primitive functions to search
under
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008
Genetic programming (GP) – An example (ctd)
• Assume we have data like the following and we wish to obtain the function
that maps z using x and y
x y z
0.1 0.5 0.81
0.3 0.4 0.99
0.6 0.2 1.31
. . .
. . .
. . .
0.4 0.5 1.20
• Assume the only available primitive functions are sin,, sqr, sqrt
• GP will splice and graft the trees using these primitive functions with the
fitness function to minimise prediction error of z using x and y data as
above
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 27
Genetic programming (GP) – example (ctd)
• Crossover example in GP ->
• Mutation randomly changes
the primitive function
• The actual function is
z sin( x) x 2 y
Crossover example
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 28
Toy example
• Consider: a + 2b + 3c + 4d = 30, where a, b, c, d are positive integers
• Use GA to find a, b, c and d
– Assume decimal coding is used
– Choose say 5 random initial solution sets (i.e. popsize=5) forming the
initial population with the constraint 1 ≤ a, b, c, d ≤ 30
Chromosome (a, b, c, d)
1 (1, 28, 15, 3)
2 (14, 9, 2, 4)
3 (13, 5, 7, 3)
4 (23, 8, 16, 19)
5 (9, 13, 5, 2)
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008
Example (ctd)
• Calculate the fitness value for each chromosome, i.e. calculate the absolute
difference of each expression to 30, take inverse, this will be our fitness value
• Eg: Chromosome 1, expression=1+2*28+3*15+4*3=114
Chromosome Absolute diff Fitness value
1 |114-30|=84 1/84
2 |54-30|=24 1/24
3 |56-30|=26 1/26
4 |163-30|=133 1/133
5 |58-30|=28 1/26
– Since expression values that are lower are closer to the desired answer (30), these values are more
desirable
– So, take the inverse of the absolute difference as fitness value
– Now, GA will try to maximise higher fitness values
– In order to create a system where chromosomes with more desirable fitness values are more likely to be
chosen as parents, we have to do selection
– Assume we use the roulette wheel (fitness proportionate) method
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008
Example (ctd)
• Calculate the fitness proportion (likelihood) for each chromosome to be
picked/selected as parent. e.g. take the sum of the all fitness values (0.135266),
and calculate the percentages from there
Fitness(hi )
• Use Fitnessproportionate (hi )
popsize
j 1
Fitness(h j )
Chromosome Fitness proportion
1 (1/84)/0.135266 = 8.80%
2 (1/24)/0.135266 = 30.8%
3 (1/26)/0.135266 = 28.4%
4 (1/133)/0.135266 = 5.56%
5 (1/28)/0.135266 = 26.4%
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008
Example (ctd)
• Spin the roulette wheel for 5 times
• Assume the result was
Chromosome Chromosome selected
after spinning roulette
wheel
1 1
2 2
3 5
4 5
5 3
• Since chromosome 4 had a poor fitness, it’s chances of survival was
slim and died out in the selection process
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008
Example (ctd)
• Do crossover, say single point
• The offspring of each of these parents contains the genetic information of both
father and mother
For example;
• a father has the solution set a1, b1, c1, d1, and a mother has the solution set a2,
b2, c2, d2, then there can be three pairs of possible crossovered offspring (| =
crossover point):
Father chromosome Mother Chromosome Offspring
a1 | b1,c1, d1 a2 | b2, c2, d2 a1 b2,c2, d2 or a2 b1,c1, d1
a1, b1 | c1, d1 a2, b2 | c2, d2 a1 b1,c2, d2 or a2 b2,c1, d1
a1, b1, c1 | d1 a2, b2, c2 | d2 a1 b1,c1, d2 or a2, b2,c2, d1
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008
Example (ctd)
• Assume that through random parent selections, we have the following
parent chromosomes
• Applying crossover to our example to produce one offspring for each pair of
parents (assuming the crossover points are chosen randomly):
• Note: normally, there would be two offspring from parents but for simplicity
of discussion, assume only one offspring is produced here
Father chromosome Mother Chromosome Offspring
(13 | 5, 7, 3) (1 | 28, 15, 3) (13, 28, 15, 3)
(9, 13 | 5, 2) (14, 9 | 2, 4) (9, 13, 2, 4)
(13, 5, 7 | 3) (9, 13, 5 | 2) (13, 5, 7, 2)
(14 | 9, 2, 4) (9 | 13, 5, 2) (14, 13, 5, 2)
(13, 5 | 7, 3) (9, 13 | 5, 2) (13, 5, 5 2)
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008
Example (ctd)
• Apply mutation to a randomly chosen chromosome, say gene a in
chromosome 1
• Mutation here would change the randomly selected gene value from
0 to 30
(13, 28, 15, 3) (8, 28, 15, 3)
• Recalculate the fitness value for the offspring representing the new
generation:
Offspring Absolute difference Fitness Value
chromosome
(8, 28, 15, 3) |121-30|=91 1/91
(9, 13, 2, 4) |57-30|=27 1/27
(13, 5, 7, 2) |57-30|=22 1/22
(14, 13, 5, 2) |63-30|=33 1/33
(13, 5, 5, 2) |46-30|=16 1/16
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008
Example - Commentary
• The average fitness value for the offspring chromosomes were 0.026,
while the average fitness value for the parent chromosomes were 0.017
• Progressing at this rate, one chromosome should eventually reach a very
high fitness value (i.e. when absolute difference is close= 0), that is when
an optimal solution is found
• If you tried and simulated this yourself, you may actually get a fitness
average that is lower on some generations, but on the long–run, the
fitness levels will increase
• For systems where the population is larger (say 50, instead of 5), the
fitness levels should be more steadily and stably approach the desired
level, i.e. nearly every generation will have better solutions than previous
ones
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008
GA strengths and weaknesses
Advantage
• Often achieves good results
• In most cases, fitness function can be designed easily to fit the hypothesis
(solution)
• Can be easily hybridised with many other ML algorithms to yield improved
results
• There is no hard and fast rules, many users use variations freely in their
applications
Disadvantage
• There is no guarantee that GA converges to the optimal solution
– Because of incomplete searches
– Because of hypothesis crowding, i.e. most chromosomes become similar and
the fitness is high but not best and GA can’t progress further due to lack of
variety
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008
Lecture 6: Study guide
At the end of this section, you should be able to
• Define chromosome, gene, allele, crossover, mutation, fitness function
• Describe how GA work using a flowchart or an algorithm
• Explain how chromosomes and hypothesis are represented in GA, i.e.
coding in GA
• Estimate the fitness function of a given population
• Describe chromosome selection mechanisms
• Perform crossover between two chromosomes using a single, two-point
and uniform masks
• Perform mutation
• Explain how GA can be used to evolve NN weights
• State the main advantages and disadvantage of GA
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008
Get documents about "