Document Sample

CC282 Genetic Algorithm Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 1 Lecture 06 – Outline • Introduction • GA terminology • GA basic description • Encoding of chromosomes • Selection operator in GA • Crossover and mutation operators in GA • Applications – Evolving ANN – Genetic Programming • Toy example • Advantages and disadvantage of GA Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 2 Genetic Algorithm (GA) - Introduction • GA is a part of evolutionary computation • GA is inspired by Darwin’s theory of evolution - problems are solved by an evolutionary process resulting in the survival of the fittest • EC was introduced in 1960s by Recheneberg • J. Holland invented GA in the 70s • J. Koza used GA to evolve programs (GP) in 1992 Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 Genetic Algorithm (GA) - Terminology Living organisms consist of cells. Cells contains DNA carrying the genetic material of the organism defining its traits • Chromosomes are strings of DNA and serve as a model for the whole organism (genetic material) • Genes - blocks of DNA of which the chromosomes consist. It can be said that each gene encodes a trait or feature • Alleles are possible values for a trait (i.e. the gene) • Genome - a complete set of genetic material (i.e. all chromosomes), this is called a population in GA • Crossover is the operation when genes from parents combine to form a whole new chromosome during reproduction producing offspring • Mutation is when some elements of the genetic material is changed (normally through a random procedure) • Fitness of an organism is measured by its degree of success/failure in survival Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 Hypothesis/search space - revisited • Each point is a possible solution and has a fitness value • Fitness measure how good the solution is • Fitness in this case is opposite to error measure • GA searches for the best/optimal solution, though there is no guarantee that it will find it • GA finds a solution in a evolutionary manner • Other similar methods are hill climbing, tabu search, simulated annealing Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 GA – Basic description Steps in brief: • GA begins with an initial population, i.e. a START set of solutions/chromosomes • Fitness of each chromosome is computed Randomly generate an initial population • Selection operators are applied that favours more fit chromosomes Evaluate fitness of Replace old population with new one each individual • Crossover - with the hope that by recombination of parents, offspring Generate offspring by mutation with probability, Pm produced may be fitter than the parents -> chromosomes recombine to produce Generate offspring by crossover with offspring probability, Pc • Mutation operator is applied Select individuals to mate no • Assess the fitness of the new population – Terminate stop if the optimal solution is achieved or if yes the maximum generation number is STOP reached • Else, repeat to next generation with selection, crossover, mutation operators Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 The GA algorithm GA(Fitness, Fitness_threshold, max_generation, popsize, Pc, Pm) Fitness: A function that assigns an evaluation score, given a hypothesis Fitness_threshold: A threshold specifying the termination criterion Max_generation: The maximum generation number to terminate GA popsize: The size of the population Pc: Crossover probability, i.e. the fraction of the population to be replaced by crossover operator at each generation Pm: Mutation probability, i.e. the fraction of the population to be replaced by mutation operator at each generation • Initialise population: P ← Generate popsize random hypotheses • Evaluate: for each h in P, compute Fitness(h) • While [maxh Fitness(h)] < Fitness_threshold | generation < max_generation 1. Selection: Select popsize members of P (with replacement) to add to Pnext 2. Crossover: Pairs of hypotheses are randomly selected using Pc. For each pair, <h1,h2>, produce two offspring by applying the crossover operator. Add all offspring to Pnext 3. Mutate: Invert a randomly selected bit in random members of Pnext using probability Pm 4. Update: P ← Pnext 5. Evaluate: for each h in P, compute Fitness(h) • Return the hypothesis from P that has the highest fitness Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 GA – Some preliminary design questions • Encoding – GA operates on the coding of parameters rather than the parameter itself – These parameters are called chromosomes and are a string of values which represent potential solutions to the given problem – The encoding could be binary, decimal or continuous – which to use? • Constraints - Any constraint to the gene values? • Fitness – How to obtain the fitness for each chromosome? • Selection - How to select candidate chromosomes? • The other two operators - How to perform Crossover and Mutation? Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 Chromosomes – binary representation • Chromosomes are mostly represented by a string of bits • Each bit/group of bits represents some characteristic/attribute/feature • Values of each feature are checked – represent each feature with enough bits to cover all possible values • Recall the play-tennis example: • Wind : {strong, weak} can be represented by two bits • Example: • Wind =strong, {10}, , Wind =weak, {01}, Wind =strong or weak {11} • Outlook: {cloudy, rainy, sunny} can be represented by three bits eg: Outlook =cloudy or rainy then this is represented as 110 • So, a rule such as (Outlook=cloudy rain) (Wind=strong) the chromosome representation is 11010 Lecture 7 slides for CC282 Machine Learning, R. Palaniappan, 2008 Binary and decimal coding chromosomes • Let us consider a more general situation • Assume we have three variables, x, y and z • Decimal coding is simply the integer values for genes, eg: x=35, y=191, z=5 • Binary coding – the genes are coded in binary form • Let us assume that these variables can take integer values from 0 to 255 • So, we need 8 bits for each variable (i.e. gene) • If x =35, y=191, z=5, we have – x=00100011, y=10111111, z=00000101 – And the chromosome 001000111011111100000101 • But why go through the hassle of representing integers using binary coding? – Answer (see Exercise 6, question 4) Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 10 Continuous coding chromosomes • But what if we want genes to represent continuous values eg: x=0.67, y=1.56, z=3.45 • Solution: use binary chromosome with approximation or use continuous valued chromosomes • We will not cover continuous valued chromosomes in this course – As they require special type of GA operators • Binary chromosome with approximation eg: x=0.145 (assume 8 bits per gene) – Use the general equation: xdecimal xm in xcontinuous ( xm ax xm in ) round ( xcontinuous ( xmax xmin ) x min) xdecimal – With 8 bits, xmax=255 and xmin=0 – 0.145*255=36.975, round this to 37, so x =00100101 – So, x=00100101 is an approximation of x=0.145 – More bits will improve the approximation but computation becomes time consuming Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 11 Fitness function and gene contraints – an example • Let us consider a linear programming problem, which arise naturally in production planning: • Suppose a particular Ford plant can build Escorts at the rate of one per minute, Explorer at the rate of one every 2 minutes, and Lincoln Navigators at the rate of one every 3 minutes. The vehicles get 8, 5, and 4 miles per litre, respectively, and Parliament mandates that the average fuel economy of vehicles produced be at least 6 miles per litre. Ford loses £1000 on each Escort, but makes a profit of £5000 on each Explorer and £15,000 on each Navigator. What is the maximum profit this Ford plant can make in one 8-hour day? • The fitness function here is the cost function, i.e. the profit Ford can make by building x Escorts, y Explorers, and z Navigators • And we want to maximize it • The fitness function is f=-1000x+5000y+15000z Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 12 Gene constraints • Using the same example in the previous slide: • The constraints arise from the production times and Parliament mandate on fuel economy • There are 480 minutes in an 8-hour day, and so the production times for the vehicles lead to the following limit: x+2y+3z 480 • The average fuel economy restriction can be written: 8x+5y+4z 6(x+y+z) which simplifies to 2x-y-z 0 • There is an additional implicit constraint that the variables are all non- negative: x, y, z 0 Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 13 Selection • Selection (aka reproduction) operator is applied many times to produce a mating pool of the new population • There are a number of ways to do selection to ensure that the members of the population are drawn with the correct probability – Roulette wheel (fitness proportionate) selection – Tournament selection – Steady-state selection – Rank selection – Elitism Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 Roulette wheel (fitness proportionate) selection • Chromosomes are selected according to their proportionate fitness Fitness(hi ) Fitnessproportionate (hi ) popsize j 1 Fitness(h j ) • The higher fitness they are, the more chances they have to be selected • Sampling can be viewed as playing a game of roulette where the pocket Example: sizes are proportional to the fitness_chromosome A =6.0 180: probability of selecting a particular fitness_chromosome B =4.0 120: individual fitness_chromosome C =2.0 60: • Each new member of the population Random number generated is 0.29 (about 104.4:), is drawn independently when the so chromosome A is selected, repeat this process roulette wheel is spun randomly two more times to obtain three chromosomes for • In computer, this spin is done using a Pnext randomly generated number [0,1] Since there is the possibility of A,A,A for Pnext, this • But the best (so far) found solution could result in ‘overcrowding’ may be lost, eg: Pnext={B,B,C} Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 Selection (ctd) • Tournament selection – Pick a few chromosomes (say, popsize/4 chromosomes) at random from the population – From these few, select the one fittest (i.e. with highest fitness), replace the rest and repeat the process popsize times – This method can retain some good chromosomes while giving chance for other weaker chromosomes to take part in mating • Steady-state selection – A few good (with high fitness) chromosomes are selected to replace the few bad (with low fitness) chromosomes – The rest of population (the in-between fitness ones) are selected by other methods or all are selected to remain in Pnext Lecture 7 slides for CC282 Machine Learning, R. Palaniappan, 2008 Selection (ctd) • Rank selection – The other selection methods will have problems if the fitness differs a lot – For example, if the best chromosome fitness is 90% of all the rest, then using roulette wheel, the other chromosomes will have very few chances to be selected – Rank selection first ranks the population and then every chromosome receives fitness from this ranking (i.e. probability of selection is proportional to rank) – The worst will have fitness 1, second worst 2 etc and the best will have fitness N (number of chromosomes in population) – Then, using these new fitness values, roulette wheel selection Figure from http://cs.felk.cvut.cz/~xobitko/ga/selection.html method is performed – Using this, all the chromosomes have a fair chance to be selected – But this method can lead to slower convergence, because the best chromosomes do not differ so much from other ones • Elitism – First, copies the best chromosome (or a few best chromosomes) to new population – The rest is done using the any other selection methods, normally roulette wheel – Can very rapidly increase performance of GA, as it prevents losing the best found solution Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 17 Crossover • Even though reproduction increases the percentage of better fitness chromosomes, the procedure is considerably sterile; it cannot create new and better chromosomes • This function is left over to crossover and to a lesser but critical extent, to mutation • Crossover process simulates the exchange of genetic material that occurs during biological reproduction • In this process pairs in the breeding population are mated randomly with a crossover rate, Pc • Typical crossover properties include that an offspring inherits the common feature from the parents along with the ability of the offspring to inherit two completely different features • Popular crossover techniques: one point, two point and uniform crossover Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 18 Crossover (ctd) • First, randomly select a pair of parents (i.e. two chromosomes) • Perform crossover (swapping of bits) to obtain offspring, repeat this process Pc*popsize/2 times with the used parent chromosomes not included • Example: if Pc=0.5 and popsize=20, then do crossover 5 times • Single point and two-point crossover: Single point crossover Crossover points Two point crossover Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 19 Crossover (ctd) • The uniform crossover scheme works as follows • A randomly generated bit string called the crossover mask generalises the process • A bit value of 1 in this bit string indicates that corresponding bits in the parents are to be exchanged while a 0 bit indicates no bit interchange Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 20 Mutation • Mutation consists of making small alterations to the values of one or more genes in a chromosome • Mutation randomly perturbs the population’s characteristics, and prevents evolutionary dead ends • Most mutations are damaging rather than beneficial and hence mutation rate must be low to avoid the destruction of species • It works by randomly selecting a bit with a certain mutation rate in the string and reversing its value • Mutation is applied to the randomly chosen bit in a chromosome chosen randomly • If Pm is 0.01, with a popsize of 20 with 18 bits each, then the mutation is repeated for 0.01 x 18 x 20 =3.6 ≈4 times Mutation example (for a randomly chosen bit in a randomly chosen chromosome) Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 21 Applications • The possible applications of genetic algorithm are immense • Any problem that has a large search domain could be suitably tackled by GA • We shall explore (very briefly) on the use of GA to evolve neural network weights and to evolve function/programs in genetic programming • We’ll also look at a simple toy example Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 Evolving NN weights using GA – a simple example • GA has been used successfully to evolve NN weights • GA is suitable for evolving the weights of a neural network – standard learning techniques such as backpropagation would take thousands upon thousands of iterations to converge • But GA could (given the appropriate direction) evolve suitable weights within a hundred or so iterations • Example • Obtain the weights for perceptron unit for learning the OR function (we saw this in the previous lecture) • But rather than using backpropagation to update the weights, we can use GA x =1 0 w0 x1 w1 z y w2 x2 A simple artificial neuron model Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 Evolving NN weights using GA – a simple example 1. Initial parameters – Fitness function: 1/MSE of desired to actual output, GA will maximise this fitness function 1 1 Fitness function 1 4 ( yi d i ) 2 MSE 4 i 1 – Coding, binary approximation: w1, w2 and w0 weights, say with each 6 bits, so chromosome length is 18 – Popsize=20, i.e. 20 chromosomes, initially generated randomly – Pc=0.5, Pm=0.01 – MSE_limit=0.1, so, fitness_threshold=10; max_generation=100 2. Gene constraints, w1, w2 and w0 in the range [-1,1] 3. Apply selection (say, tournament selection), crossover (say one point) and mutation to produce a new population 4. Repeat step 3 until convergence to an acceptable solution (fitness>fitness_threshold or generation>max_generation) Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 24 Genetic programming (GP) – An example • In programming languages such as LISP, the mathematical notation is not written in standard notation, but in prefix notation – Examples: + 1 2 : 1+2 * + 1 2 2 : (1+2)*2 * + - 2 1 4 9 : ((2-1)+4)*9 – Notice the difference between the left–hand side and the right? Apart from the order being different, there are no use of parenthesis – The prefix method makes life a lot easier for programmers and compilers alike, because order precedence is not an issue • You can build expression trees out of these strings that then can be easily evaluated. For example, the trees for the previous three expressions are. Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 Genetic programming (GP) –An example (ctd) • Having numerical data and primitive functions, but no expression to conjoin the data with the primitive functions, a genetic algorithm can be used to evolve an expression tree to create a very close fit to the data • By “splicing” and “grafting” the trees and evaluating the resulting expression with the data and testing it to the primitive functions, the fitness function can return how close the expression is • The limitations of genetic programming lie in the huge search space the GA have to search for - an infinite number of equations • Therefore, normally before running a GA to search for an equation, the user tells the program which primitive functions to search under Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 Genetic programming (GP) – An example (ctd) • Assume we have data like the following and we wish to obtain the function that maps z using x and y x y z 0.1 0.5 0.81 0.3 0.4 0.99 0.6 0.2 1.31 . . . . . . . . . 0.4 0.5 1.20 • Assume the only available primitive functions are sin,, sqr, sqrt • GP will splice and graft the trees using these primitive functions with the fitness function to minimise prediction error of z using x and y data as above Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 27 Genetic programming (GP) – example (ctd) • Crossover example in GP -> • Mutation randomly changes the primitive function • The actual function is z sin( x) x 2 y Crossover example Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 28 Toy example • Consider: a + 2b + 3c + 4d = 30, where a, b, c, d are positive integers • Use GA to find a, b, c and d – Assume decimal coding is used – Choose say 5 random initial solution sets (i.e. popsize=5) forming the initial population with the constraint 1 ≤ a, b, c, d ≤ 30 Chromosome (a, b, c, d) 1 (1, 28, 15, 3) 2 (14, 9, 2, 4) 3 (13, 5, 7, 3) 4 (23, 8, 16, 19) 5 (9, 13, 5, 2) Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 Example (ctd) • Calculate the fitness value for each chromosome, i.e. calculate the absolute difference of each expression to 30, take inverse, this will be our fitness value • Eg: Chromosome 1, expression=1+2*28+3*15+4*3=114 Chromosome Absolute diff Fitness value 1 |114-30|=84 1/84 2 |54-30|=24 1/24 3 |56-30|=26 1/26 4 |163-30|=133 1/133 5 |58-30|=28 1/26 – Since expression values that are lower are closer to the desired answer (30), these values are more desirable – So, take the inverse of the absolute difference as fitness value – Now, GA will try to maximise higher fitness values – In order to create a system where chromosomes with more desirable fitness values are more likely to be chosen as parents, we have to do selection – Assume we use the roulette wheel (fitness proportionate) method Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 Example (ctd) • Calculate the fitness proportion (likelihood) for each chromosome to be picked/selected as parent. e.g. take the sum of the all fitness values (0.135266), and calculate the percentages from there Fitness(hi ) • Use Fitnessproportionate (hi ) popsize j 1 Fitness(h j ) Chromosome Fitness proportion 1 (1/84)/0.135266 = 8.80% 2 (1/24)/0.135266 = 30.8% 3 (1/26)/0.135266 = 28.4% 4 (1/133)/0.135266 = 5.56% 5 (1/28)/0.135266 = 26.4% Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 Example (ctd) • Spin the roulette wheel for 5 times • Assume the result was Chromosome Chromosome selected after spinning roulette wheel 1 1 2 2 3 5 4 5 5 3 • Since chromosome 4 had a poor fitness, it’s chances of survival was slim and died out in the selection process Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 Example (ctd) • Do crossover, say single point • The offspring of each of these parents contains the genetic information of both father and mother For example; • a father has the solution set a1, b1, c1, d1, and a mother has the solution set a2, b2, c2, d2, then there can be three pairs of possible crossovered offspring (| = crossover point): Father chromosome Mother Chromosome Offspring a1 | b1,c1, d1 a2 | b2, c2, d2 a1 b2,c2, d2 or a2 b1,c1, d1 a1, b1 | c1, d1 a2, b2 | c2, d2 a1 b1,c2, d2 or a2 b2,c1, d1 a1, b1, c1 | d1 a2, b2, c2 | d2 a1 b1,c1, d2 or a2, b2,c2, d1 Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 Example (ctd) • Assume that through random parent selections, we have the following parent chromosomes • Applying crossover to our example to produce one offspring for each pair of parents (assuming the crossover points are chosen randomly): • Note: normally, there would be two offspring from parents but for simplicity of discussion, assume only one offspring is produced here Father chromosome Mother Chromosome Offspring (13 | 5, 7, 3) (1 | 28, 15, 3) (13, 28, 15, 3) (9, 13 | 5, 2) (14, 9 | 2, 4) (9, 13, 2, 4) (13, 5, 7 | 3) (9, 13, 5 | 2) (13, 5, 7, 2) (14 | 9, 2, 4) (9 | 13, 5, 2) (14, 13, 5, 2) (13, 5 | 7, 3) (9, 13 | 5, 2) (13, 5, 5 2) Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 Example (ctd) • Apply mutation to a randomly chosen chromosome, say gene a in chromosome 1 • Mutation here would change the randomly selected gene value from 0 to 30 (13, 28, 15, 3) (8, 28, 15, 3) • Recalculate the fitness value for the offspring representing the new generation: Offspring Absolute difference Fitness Value chromosome (8, 28, 15, 3) |121-30|=91 1/91 (9, 13, 2, 4) |57-30|=27 1/27 (13, 5, 7, 2) |57-30|=22 1/22 (14, 13, 5, 2) |63-30|=33 1/33 (13, 5, 5, 2) |46-30|=16 1/16 Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 Example - Commentary • The average fitness value for the offspring chromosomes were 0.026, while the average fitness value for the parent chromosomes were 0.017 • Progressing at this rate, one chromosome should eventually reach a very high fitness value (i.e. when absolute difference is close= 0), that is when an optimal solution is found • If you tried and simulated this yourself, you may actually get a fitness average that is lower on some generations, but on the long–run, the fitness levels will increase • For systems where the population is larger (say 50, instead of 5), the fitness levels should be more steadily and stably approach the desired level, i.e. nearly every generation will have better solutions than previous ones Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 GA strengths and weaknesses Advantage • Often achieves good results • In most cases, fitness function can be designed easily to fit the hypothesis (solution) • Can be easily hybridised with many other ML algorithms to yield improved results • There is no hard and fast rules, many users use variations freely in their applications Disadvantage • There is no guarantee that GA converges to the optimal solution – Because of incomplete searches – Because of hypothesis crowding, i.e. most chromosomes become similar and the fitness is high but not best and GA can’t progress further due to lack of variety Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 Lecture 6: Study guide At the end of this section, you should be able to • Define chromosome, gene, allele, crossover, mutation, fitness function • Describe how GA work using a flowchart or an algorithm • Explain how chromosomes and hypothesis are represented in GA, i.e. coding in GA • Estimate the fitness function of a given population • Describe chromosome selection mechanisms • Perform crossover between two chromosomes using a single, two-point and uniform masks • Perform mutation • Explain how GA can be used to evolve NN weights • State the main advantages and disadvantage of GA Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008

DOCUMENT INFO

Shared By:

Categories:

Tags:
Lecture Notes, video lectures, Video Lecture, Video Courses, Lecture Presentations, Video course, free web space, free file sharing, file sharing service, Remove Ads

Stats:

views: | 12 |

posted: | 3/6/2011 |

language: | English |

pages: | 38 |

OTHER DOCS BY wanghonghx

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.