Embed
Email

GA

Document Sample

Shared by: huanghengdong
Categories
Tags
Stats
views:
2
posted:
12/3/2011
language:
English
pages:
57
Genetic Algorithms



An Evolutionary Approach to Problem

Solving

Illusions of Design

 Living things in nature seem to be like they

were designed by a skilled engineer/designer.





 Evolution Theory: Too exquisitely

designed to be “random”.

If not random, then what is the non-random

process?

Evolution

A process of cumulative selection



Individuals with new traits are developed through

Mutations

Sexual Reproduction



Individuals with better traits are

more likely to survive and are

more likely to transfer those traits to their

descendents.

Key Principles of Evolution

http://www.pbs.org/wgbh/evolution/library/11/2/e_s_4.html

 Variety: population of individuals with different set of

traits

 Through reproduction: The importance of sexual reproductions

versus asexual reproduction

(http://www.pbs.org/wgbh/evolution/library/01/5/l_015_03.html)

 Random mutations





 Evaluation & Selection (through constraints in the

environment)



 Reproduction: Transfer of traits to offsprings

From Evolution of life in Nature

to Evolution of Solutions



Can evolutionary principles be used to

develop solutions to problems?



Can these principles be used to do Data

Mining?

Computation Analogy of

Evolution

 What is the equivalent of

an individual

(chromosome)?

 What is the equivalent of

an individual having “good”

genetic material/good set

of traits?

 What is the equivalent of a

population of individuals?

 What is the equivalent of

reproduction?

 What is the equivalent of

survival of the fittest?

Computation Analogy of Evolution



What is an individual

A collection of traits



Example: An individual has the following

traits

 Color: Black or White

 Speed: Fast, Medium, Slow or Very Slow

 Intelligence: Very Smart, Smart, Somewhat Smart,

Medium Smart, Dumb or Very Dumb

GA – Generic Strategy

 Start with a “population” of solutions (individuals)

 Repeat

 Choose 2 solutions from the population

 With a certain probability Apply a crossover to create

child1, child2

 With a certain probability, mutate child1 and child2

 Update the population (discard of bad solutions).

 Until stopping criteria has been met

 Output the best solution(s) in the population

What is an individual

A collection of traits



Example: An individual has the following

traits

 Color: Black or White

 Speed: Fast, Medium, Slow, Very Slow

 Intelligence: Very Smart, Smart, Somewhat Smart,

Medium Smart, Dumb or Very Dumb

 Typically a coding system is used to code

each trait. E.g., binary coding.

Representing Solutions

The Fitness Function: Evaluation of

Individual Solutions

 A “fitness function” .

Process of Evaluating the

Fitness

How a Population of Solutions

Looks Like

Reproduction: Crossover

Crossover between parents’ traits creating two children.

There are many crossover operators

1. Randomly choose a position and cross over contents

before that position. Crossover point



White Medium Dumb







Black Slow Smart



Children:

White Slow Smart Black Medium Dumb

Reproduction Crossover





2. Randomly choose a set of traits (genes) and

cross over those.

 Examples: Cross over color and intelligence

White Medium Dumb







Black Slow Smart



Children:

Black Medium Smart White Slow Dumb

Reproduction Mutation



The purpose of mutations is to take a single solution and

introduce some random “shock” or changes to it to

create a new solution.



Implementation: Randomly chose a trait for each child

and randomly change its value to another valid value

Black Medium Smart White Slow Dumb







Black Fast Smart White Slow Intelligent

What is the equivalent of

survival of the fittest?

Simply give solutions with better fitness a higher

probability of being chosen for reproduction.









Sample with replacement: solutions with bigger

fitness will be selected more times.

Population

What will be the composition of

population in future

generations?

Example: The Traveling Sales

Person

 First look at the Traveling Salesman Problem

 Then see how the same principles can be

applied for :

 Extracting rules from data to understand what

customers are more likely to be responsive

 Extract technical trading rules



 Optimize service schedules

The Traveling Salesman

Problem

 A travelling salesman who has to visit a set of n cities.

 Find the order in which the salesman visits cities so as to

minimize total distance.

 Variants of this problem are found in several domains.

 As n gets very large, exhaustive search becomes impossible

due to the combinatorial nature of the problem.

 Need heuristic methods to find good solutions, even if these

are not guaranteed to be the “best”.

The Traveling Salesman Problem



Consider the TSP problem

5 cities to visit: London, Oxford, Cambridge,

Brighton, and Bath.

What is the best path?



London

Cambridge





Oxford Brighton







Bath

The Traveling Salesman Problem



Genetic Algorithms Solution

Step-1:

An individual is a “candidate solution”, a path.



Examples of candidate solutions for the TSP:

 London, Oxford, Cambridge, Brighton, Bath

 London, Bath, Oxford, Brighton, Cambridge



 Brighton, London, Cambridge, Bath, Oxford.



 …

The Traveling Salesman Problem



Coding Scheme for Candidate Solutions

Order in Order in Order in Order in Order in

sequence sequence sequence sequence sequence

of of of of of

London Oxford Cambridge Brighton Bath





 Example:

 London  Oxford  Cambridge  Brighton  Bath





1 2 3 4 5

 Oxford  London  Cambridge  Bath  Brighton





2 1 3 5 4

The Traveling Salesman Problem



Step 2: Fitness Function

How “good” are the following solutions?



 London, Oxford, Cambridge, Brighton, Bath

 London, Bath, Oxford, Brighton, Cambridge



 Brighton, London, Cambridge, Bath, Oxford.

The Traveling Salesman Problem



Fitness in TSP Problem: Distance Table



Distance (in Miles)



London Oxford Cambridge Brighton Bath

London 0 350 50 280 470

Oxford 0 130 270 310

Cambridge 0 210 340

Brighton 0 220

Bath 0

The Traveling Salesman Problem

Creating New Solutions

Crossover Operation : Randomly choose a position and cross over

contents before that position.

Crossover Operation for TSP: part of the first parent is copied and the

rest is taken in the same order as in the second parent



London Oxford Cambridge Brighton Bath

Crossover point

1 2 3 4 5



1 5 3 2 4

Reproduction The Traveling Salesman Problem



London Oxford Cambridge Brighton Bath

Crossover point

1 2 3 4 5



1 5 3 2 4



London  Oxford  Cambridge  Brighton  Bath



London  Brighton  Cambridge Bath  Oxford

Child:

London  Oxford  Brighton  Cambridge Bath

1 2 4 3 5

Reproduction The Traveling Salesman Problem





London Oxford Cambridge Brighton Bath

Crossover point

1 2 3 4 5



1 5 3 2 4



London  Oxford  Cambridge  Brighton  Bath



London  Brighton  Cambridge Bath  Oxford

Child:

London  Cambridge  Brighton Bath  Oxford





1 5 2 3 4

Reproduction The Traveling Salesman Problem







 Exchange the cities in second and forth place



London  Oxford  Cambridge  Brighton  Bath



London  Brighton  Cambridge Bath  Oxford





London  Oxford  Cambridge  Brighton  Bath

London  Oxford  Brighton  Cambridge Bath

Reproduction

Mutation in the TSP Problem

 Randomly changing one gene won’t work.

London Oxford Cambridge Brighton Bath

1 2 3 4 5





1 2 4 4 5



 Design mutations around the “swap” concept:





1 2 5 4 3

GA for TSP

 For the TSP problem we have:

 Solution representation

 A fitness evaluation function



 Crossover operations on parents



 Mutation on a single solution







 Start with a population of solutions, and let

them evolve

Next Step: Initialize a population

Decision parameter: population size

Let us choose 5 solutions in our population.

We will now randomly initialize the population.



 London, Oxford, Cambridge, Brighton, Bath (-1320)

 Oxford, London, Cambridge, Bath, Brighton (-1230)

 Cambridge, Oxford, Brighton, Bath, London (-1140)

 Bath, London, Brighton Cambridge, Oxford (-1400)

 Bath, Oxford, Cambridge, London, Brighton (-990)

Evolving Solutions for TSP

 Repeat (Until stopping criteria has been met)

 Choose 2 solutions from the population

 With a certain “crossover probability” (say, 0.8) apply a

crossover operator to create child1, child2

 With a certain “mutation” probability (say 0.1), mutate

child1, child2

 Place the resulting 2 children in the population



 Selection: which 5 solutions survive – the probability of

each individual to survive is propositional to its fitness

function

 After stopping, output the best chromosome in the

population for the solution

Overview of the Selection Process



Over time through various operators,

solutions mate and traits passed on to the

offspring.

Children with “better” traits have a ability to

survive.

The weak solutions gradually disappear

from the population.

Rood solution predominate the population

Building Solutions through

Evolution





http://www.pbs.org/saf/1103/video/watcho

nline.htm

GA – Advantages:

 Not engineering : enables finding

surprising solutions to prpblems

 Quickly and reliably solve problems that

are hard to tackle by traditional means.

 Implicit parallelism makes GAs a very

efficient optimization algorithm.

 Great property is the ability to find

approximate solutions to combinatorially

explosive problems.

GA - Disadvantages

 A heuristic: GAs may find only near-

optimal solutions.

 Further restrictions are the difficulties of

choosing a suitable representation

technique, and making the right decision

regarding the choice of the selection

method and the genetic operator

probabilities

Learning Classification Rules

With GA Employed

A classifier can be represented as set of

No Yes

IF (set of conditions) then (Class)



No Balance

 IF (Employed=No) Then (Class=No)

 IF (Employed=Yes AND Balance=50K

(Yes)

 IF (Employed=Yes AND Balance>=50K AND

Yes Age

Age=50K AND =45

Age>=45) Then (Yes)



No Yes

GA for Learning Classification

Rules From Data

Representation of Rules

 All rules represent the class Yes

 (Conjunction of Conditions):

 Each position is a trait and its value



Marital Has a Age>40

Status Job?

Married Yes No

 In addition to all valid values: each attribute can take an empty

condition

Marital Has a Age>40

Status Job?



Married * No

Reproduction

Crossover

Marital Has a Age>40

Status Job?



Married Yes No





Marital Has a Age>40

Status Job?

Divorced No Yes



Marital Has a Age>40 Marital Has a Age>40

Status Job? Status Job?

Married No Yes Divorced Yes No

Reproduction

Mutation

Random changes in attribute values



Marital Has a Age>40

Status Job?

Married Yes No





Marital Has a Age>40

Status Job?

Divorced No No

Fitness

 Support

 Confidence

 Lift

 Support*Lift

Selection

 Problem with regular selection mechanism

 Want to develop a variety of rules which cover minority groups as

well

 Solution: “Segmented” Elections

Each example votes for one of the rules which apply to it. For

example:

 Assume our population of rules includes:

 (Marital status: Married, Has_a_job: Yes, Age>40?: Yes)

 Single, Empty, Yes

 Empty, Yes, Empty

 Divorced, Empty, No

 The example: (Single, Yes, Yes) can vote for either:

 Single, Empty, Yes

 Empty, Yes, Empty

 The probability of voting for either rule is proportional to the fitness of

the rule

Selection: The survival of the fittest



 Each rule gets a score that is the proportion of examples

which it applies and that voted for this rule

 Example:

 Single, Empty, Yes – 30%

 Empty, Yes, Empty – 70%





 Each rule competes only with rules that apply to the

same set of examples.

 This allows a form of niching. Rules applying to small

subsets of examples can survive.

Another rule representation : Tree

And









And =









> > Employed Yes









Age 45 Balance 50

Crossover Operations

And

And









=

And > =









> > Employed Yes Age 45 Employed No









Age 45 Balance 50

Mutation: Randomly changing subtree



And









=

And









> Employed Yes









Age 45 Balance 50

Learn Trading Rules from data



Individual:

A “buy” / “sell” rule.

Specifying conditions for sell or buy



Representation: tree

Example for a “buy” rule





> >









High at High at Low at Low at

T-2 T-2 T-4 T-3

Fitness Function

 Return over a period of time (Jonsson et al.)

 Adjusted to a benchmark strategy

Crossover

and and









> > > >

T-4 T-1









High at High at Low at Low at

T-1 T-2 T-4 T-3

Maintaining diversity

 Problem with regular selection mechanism

 Want to develop a variety of rules which cover different scenarios,

particularly atypical ones

 Solution: Niching

 Fitness function is adjusted to accommodate diversity

 Example: Returns divided by periods in which conditions of the rule

apply

Performance of GA for Trading

Rules

 Some reported failure (Allen F.,

Karajalainen R. (1993) while others

success over benchmarks such as buy &

hold strategies.

 Main argument: Efficiency Market

Hypothesis and Non-Predictability

 Technical rules have been shown to

outperform

GA a Methodology

 GA is a methodology not a solution

 Provides tools for engineering a system to

generate solutions

 Need to formulate the right questions

 Good engineering

 Risk of overfitting

 Design a good fitness function



 Incorporate important factors for rules to capture

Topics for Final Exam

Final Exam

 The Data Mining Tasks (regression, classification, etc.)

 Predictive Vs. Prescriptive

 Data drive vs. Model/Theory driven

 DBMS, Data Warehouse & OLAP

 Modeling: Problem formulation, Model Building, Evaluation

 Classification Decision Trees

 Association rules (representation, understand measures)

 K-Nearest Neighbor Classification

 Personalization with Collaborative Filtering

 Clustering (K-means clustering)

 Text Mining

 Representation of documents in tabular format for data mining tasks (association

rules, classification)

 Information Retrieval

 Measures (precision, recall)

 Possible applications

 Genetic Algorithms:

 The principles of GA

 How to design solutions to problems with GA



Related docs
Other docs by huanghengdong
pozvanka popela 2011
Views: 0  |  Downloads: 0
Making Connections_18_
Views: 0  |  Downloads: 0
20100121125409223
Views: 0  |  Downloads: 0
385-390
Views: 0  |  Downloads: 0
09-uncertainty
Views: 0  |  Downloads: 0
2010Cal
Views: 0  |  Downloads: 0
Schedule by Date September 2011
Views: 3  |  Downloads: 0
supmpr
Views: 0  |  Downloads: 0
Science and our Food Supply video guide
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!