; Evolutionary Computation
Documents
User Generated
Resources
Learning Center
Your Federal Quarterly Tax Payments are due April 15th

# Evolutionary Computation

VIEWS: 17 PAGES: 2

• pg 1
```									Evolutionary Computation SS 2001
Prof. Petros Koumoutsakos Assistants: Sibylle Mueller, Dr. Nic Schraudolph

Project 4
It's a nice summer day in Augsburg, Germany, and you foolishly decide to visit all the 127 beer gardens of the town. Given that you're bound to consume a lot of beer on this tour, you should try to minimize the total distance you have to walk. This is an instance of the so-called Travelling Salesperson Problem (TSP): given the coordinates of the beer gardens (which we'll send you by email) you want to find the shortest tour possible that starts at one beergarden, visits each of the others exactly once, then returns to the starting point. (Formally, this is a permutation of the integers 1 to 127.) Information about TSP is available e.g. from http://www.densis.fee.unicamp.br/~moscato/TSPBIB_home.html Your assignment is to use reinforcement learning (RL), and in particular the learning and action selection techniques you have studied for the n-armed bandit, as a tool for the optimization of this TSP problem. For example, you could maintain a 127x127 matrix of expected rewards for using any given edge (between two beer gardens) as part of a solution tour. These values would be updated based upon the actual tour lengths of the solutions you try during the optimization process. Conversely, the generation of new solutions could be biased towards using edges that have a high expected reward. You are free to use any optimization technique; possibilities include the evolutionary algorithms we covered in class, and the ant algorithms we will discuss in the next lecture. These optimization techniques typically involve some random elements (e.g., where to mutate) which, using action selection techniques such as e-greedy or softmax, you can bias towards preferentially generating solutions that RL finds promising. For example, you could use RL to pick the cutting points for the "2-opt" operator. Feel free to vary from these recipes, and experiment on your own! The bottom line is: 1) You should use reinforcement learning to bias a TSP optimization algorithm towards generating good solutions, and experiment with various learning and action selection schemes. 2) The team that finds the shortest tour of Augsburg's beer gardens wins a beer each at the bQm! 3) Please hand in your source code, the shortest tour found, and the length of the shortest tour found by email to

Sibylle Mueller (muellers@inf.ethz.ch) using the title ‘Class EC, Project 4, Team <your.team.number>’. In the body of your mail, please indicate the names of the team members. 4) Also hand in a hardcopy of the following pieces including the names of the team members and the team number on the due date: a) Printout of your source code b) A discussion of your implementation choices and results, including graphs of tour length vs. function evaluations for the various learning and action selection schemes you tried, and a plot of the shortest tour you found. Due: Monday, July 2, 11:15 a.m. in IFW A 34

```
To top