Latest News - The Latest (03/04/2012) - Generation5 10-year Anniversary (03/09/2008) - New Generation5 Design! (09/04/2007) - Happy New Year 2007 (02/01/2007) - Where has Generation5 Gone?! (04/11/2005) What's New? - Back-propagation using the Generation5 JDK (07/04/2008) - Hough Transforms (02/01/2008) - Kohonen-based Image Analysis using the Generation5 JDK (11/12/2007) - Modelling Bacterium using the JDK (19/03/2007) - Modelling Bacterium using the JDK (19/03/2007)
contents
1. Genetic Algorithm o Theory o GA and TSP 2. Base implementation, Template class GA<> and GA Selection classes 3. Genome of Travel 4. TSP Application o GA thread o UI interface 5. Environment 6. Reference
Disclaimer
I am not a GA guru and I do not have any degree in GA so this article can't be used as GA book or GA tutorial. There aren't any mathematics nor logic nor algebra about GA. It's only a programmer's view on Genetic Algorithms and only example of GA coding. Use it carefully! Any comments and criticism are highly appreciated.
Reproduce (& Children Mutate) Those chromosomes with a higher fitness value are more likely to reproduce offspring (which can mutate after reproduction). The offspring is a product of the father and mother, whose composition consists of a combination of genes from them (this process is known as "crossing over". Next Generation If the new generation contains a solution that produces an output that is close enough or equal to the desired answer then the problem has been solved. If this is not the case, then the new generation will go through the same process as their parents did. This will continue until a solution is reached." In my opinion, the GA is easy to understand and easy to implement using C++. The main advantage and disadvantage of GA at the same time is robustness. Even if you implement some features incorrectly, the GA will continue run and sooner or later will solve the problem (or find any local optimum). Definitely such a feature can produce some trouble during debugging and tuning. Another interesting problem is to choose the best algorithms from existing plethora of algorithms of crossover, mutation, gene presentation, etc. See references topic for some useful links about GA theory.
For a gene presentation, I used a sequential representation where the cities are listed in the order in which they are visited. It's common way for TSP Genome.
Example: [9 3 4 0 1 2 5 7 6 8]
For a crossover operation after several tests and researching I selected the Greedy Crossover by J. Grefenstette. The citation from Sushil J. Louis "Greedy crossover selects the first city of one parent, compares the cities leaving that city in both parents, and chooses the closer one to extend the tour. If one city has already appeared in the tour, we choose the other city. If both cities have already appeared, we randomly select a nonselected city." From my experience it's a very effective method.
Mutation
We can't change the gene's bits as the usual traditional mutation does. Instead we must swap the order of cities in a path.
Example: Before mutation After mutation [0 1 2 3 4 5 6] [0 1 3 2 4 5 6]
There are a lot of ways of doing such a swapping operation. Easiest way in using random swap. Unfortunately, such a strategy is unable to achieve an optimum quickly but can prevent convergence into a local optimum. Additionally I used a greedy-swap mutation. Once more citation from Sushil J. Louis "The basic idea of greedy-swap is to randomly select two cities from one chromosome and swap them if the new (swapped) tour length is shorter than the old one" While browsing the web I discovered research into GA "A fast TSP solver using a genetic algorithm" by Hiroaki Sengoku and Ikuo Yoshihara. It's the fastest algorithm I ever saw. Unfortunately, It's a Java implementation and without any source. I could find only one PDF document with a description about this algorithm: arob98.pdf After reading and studying I used the "Mutation by 2opt" idea in my code. This method has the same idea as greedy-swap mutation but more
expansive and more effective. After adding into code I can improve speed of my program greatly on small and middle sets (till 200 cities). However for big sets (1000 and more) this heuristics is very slow method.
Selection
I implemented three selection methods : routlette rank, roulette cost and tournamnet. Of course I used elitism too. Roulette Wheel Selection Definition from Marek Obitko's Site "Cost Selection : Parents are selected according to their fitness. The better the chromosomes are, the more chances to be selected they have. Imagine a roulette wheel where are placed all chromosomes in the population, every has its place big accordingly to its fitness function Rank Selection : The previous selection will have problems when the fitnesses differ very much. For example, if the best chromosome fitness is 90% of all the roulette wheel then the other chromosomes will have very few chances to be selected. Rank selection first ranks the population and then every chromosome receives fitness from this ranking. The worst will have fitness 1, second worst 2 etc. and the best will have fitness N (number of chromosomes in population)." Tournamnet Selection and Elitism Definition from W. B. Langdon, University College, London "A mechanism for choosing individuals from a population. A group (typically between 2 and 7 individuals) are selected at random from the population and the best (normally only one, but possibly more) is chosen An elitist genetic algorithm is one that always retains in the population the best individual found so far, Tournamnet Selection is naturally elitist." In my opinion and after testing the roulette rank and tournament selections are slightly faster for TSP case. For another problems, others selection algorithms can be best.
Co-evolutions. Migrations
I didn't find many documents about these methods in WEB. The base idea: allow evolving of several populations at the same time. The description found in Generation5.org Site
"Genetic algorithms are neat, but they do come with their own set of problems. One big problem is that genetic algorithms have a tendency to get stuck at local optima. In other words, they will find a reasonable solution, but not the best solution. There are several methods that have been devised to counter this problem, and the one we will look at is coevolution" Simultaneously, co-evolution idea allows utilizing the SMP ability of WinNT\2k machines with multi CPUs. We can easily run several GA in separate threads without any penalty. For exchange data between different GA we can migrate the best genes in population.
Template parameter Traits must define typedefs for Gene class, Random class, Population container class and Thread Synchronize class. Template parameter Selection must provide the GA selection algorithm. Now there are three such classes: selection_tournament<>, selection_roulette_cost<>, selection_roulette_rank<>.
GA<>
interface
init - initializes population update - computes fitness's values
end()
find_best - finds the gene with best fitness epoch - makes next population (selection, crossover,
etc) - makes selection, produce new genes, removes non elite parents and removes twins gene mutate - attempts to mutate genes migration - exchanges the best genes between populations sort - orders the genes in population depending on fitness value (moves the best into a beginning of population)
recombine
begin - returns iterator at the first (best) gene of the population end - returns a iterator that points just beyond the end of the
Genome of Travel
class TSPData
Context of travel holds a travel data, auxiliary data and methods for crossover operation.
struct TSPBase
Base gene class has thread specific memory pool for memory optimization In process of computing GA creates and destroys a lot of dynamic gene objects. It's very ineffective to use default memory allocation routine. Instead of it I used a special memory pool class, which pre-allocated a large block of memory from process heap and then caches a freeing blocks.
class TSPGene<>
: TSPBase
Gene's implementation Every gene holds a path (travel) of salesman and fitness value of this travel. Of course the lower the cost of travel the better fitness of gene. It has some constructors and methods for mutation and heuristics computing. The default constructor creates gene with random travel and for crossover operation uses another constructor:
TSPGene* gnp = new TSPGene(parent1, parent2).
Co-evolution field: a number of co-evolutions (separate threads) from 1 to 16. Default value is number of CPU * 2. Population field: a size of population per co-evolution, from 10 to 1000 Elite field: an elite size in the population, from 0 to size of population. Migration field: a size of migrated genes, from 0 to size of population. Heuristics filed: a size of best genes improved via heuristics method in every epoch. Note: in large populations it can be extremely slow in calculating the solution. Crossover field: probability of crossover, from 0 to 100 Mutation filed: probability of mutation. From 0 to 100 Selection combo: selection method (roulette rank, roulette cost and tournament) Remove Twins checkbox: set using a natural selection algorithm. It eliminates similar genes to avoid the immature convergence.
Environment
I used VC++ 6.0. SP5, Win2k SP2, MS Platform SDK April 2001. And tested on Win2k SP2 IE 6.0,Win2k SP1 IE 5.0, Win ME IE 5.5, Win 98 SE IE 5.0.
References
1. S.Hsiung and J.Matthews, Generation 5 - Genetic Algorithms and Genetic Programming 2. Marek Obitko, Introduction to Genetic Algorithms 3. Hiroaki Sengoku and Ikuo Yoshihara, A fast TSP solver using a genetic algorithm 4. Sushil J. Louis and Rilun Tang, o Interactive Genetic Algorithms for the Traveling Salesman Problem, o Genetic Algorithms with Memory for Traveling Salesman Problems, o Augmenting Genetic Algorithms with Memory to Solve Traveling Salesman Problems 5. Sergey Isaev, Genetic Algorithm 6. W. B. Langdon, Genetic Programming and Data Structures
7. Solving Traveling Salesman Problem