Vukan Turkulov
University of Novi Sad
Trg Dositeja Obradovica 6
Novi Sad, Serbia
vukant@gmail.com
ABSTRACT
Engineers and scientists often stumble upon optimisation
problems in their work. There is a need to find the most
aerodynamic windshield shape, the shortest path between
two points, the most cost-effective fuel structure, or the
most efficient way to route internet traffic. Many of those
problems have proven to be impossible to solve by means
of traditional optimisation techniques, encouraging people
to conduct research in developing new methods of optimisation. In the second half of the 20th century, a family of
optimisation methods inspired by the principles of evolution
was developed. One representative of that family is the genetic algorithm, which is widely used today in various applications and different fields. The aim of this paper is to give
a brief introduction to the genetic algorithm, with a large
focus on the idea behind its development. Also, an overview
of the algorithms usage in different scenarios shall be presented, providing the reader with a good starting point for
further researching on her or his own.
1.
INTRODUCTION
2. PROBLEM DEFINITION
In order to understand what advantages do evolutionary algorithms bring over the traditional ones, we shall first examine how traditional algorithms work. A good example of
such algorithms is the hill climbing optimisation technique.
It is widely used, simple to understand and is representative
for the larger class of similar techniques.
Assume that we are searching a single-argument function
f (x) for its maximum value. Starting from an initial solution
x, each iteration will make adjustments to x and determine
whether the new solution is better compared to the last one.
If it is true, new iterations will continue to change the value
of x in the same direction, until no further improvements
can be made.
While this technique has proven to be effective in finding
local optimums, it struggles with finding global optimums.
Consider applying hill climbing to the function shown on
figure 1. Depending on the choice for the initial x, the algorithm may get stuck at x1 or x3 and yield them as the
final results of the optimisation. For example, the algorithm
evaluates x1 . Afterwards, solutions close to x1 are evaluated. Since x1 is the best solution in its proximity, any small
changes to x1 worsen the result, so the algorithm assumes
that the result cannot be improved anymore. The problem
6.
3.
RELATED WORK
Alan Turing was amongst the first people to propose developing programs that would mimic the principles of evolution,
as described in [22]. Nils Aall Barricelli conducted the first
simulations of evolution in 1954. Some of his works are described in [3]. There has been various research done on the
topic of artificial evolution in the following decades, resulting in different algorithms. John Holland is said to be the
pioneer of the genetic algorithm, conducting research on its
development and publishing an influential book Adaptation
in Natural and Artificial Systems in 1975 [11]. [7] and [9]
provide good overview of the subject.
Since then, much research has been conducted on the genetic algorithm. Here we will give a few examples of such
research. [4] and [8] evaluate the algorithm performance depending on various parameters, namely the pseudo-random
number generator being used, and different mutation types.
[1] and [19] discuss the development of appropriate termination criteria for the algorithm. They focus on defining proper
upper bounds for the number of algorithm iterations, while
keeping a certain level of confidence for the algorithm convergence. [14] and [17] describe how algorithm performance
can be improved by approximation of the fitness functions,
while [13] and [18] discuss performance improvements based
on parallelization of the algorithm. Lastly, there is a great
number of papers discussing the genetic algorithms application to specific problems. A few examples are: [23] uses the
algorithm to develop desired nano structures; [24] solves the
problem of finding mixed-metallic clusters with lowest energy surfaces; [16] designs control system parameters using
the genetic algorithm; [5] produces the optimal timetable for
a high school. The latter two are briefly described in section
Living organisms reproduce, creating children - a new generation of that species. Children have a mix of their parents characteristics, which is a feature commonly called
crossover in genetic algorithm literature. However, they
also usually posses a few deviations as well, which introduce
some minor changes compared to their parents. The changes
that happen are random, and they can have either a negative
impact on the species ability to survive, or a positive one.
Such changes are called mutations. If the mutations have
a positive impact on the organism, it has a higher chance
to survive and reproduce, carrying on the mutation to the
future generations. However, if the mutations are negative
- the organism is more likely to die, thus not reproducing
and not carrying the mutation to its children. The process
of eliminating the least fit organisms, and keeping the fittest
ones is called selection.
Although it is not immediately obvious, the process that
makes natural selection the most different from hill climbing is mutation. Such completely random elements were
not present in the traditional optimisation methods. In hill
climbing, we always moved our solution towards the better
ones, with no randomness involved. If we involved a similar
aspect in the technique, it would sometimes allow us to move
away from local optimums, and not get stuck in them.
Let us now ponder on how to implement the algorithm based
on natural selection. In order to make it easier for us to understand the process, we shall work on an example function.
Assume that we are trying to find the number x for which
the following function has the maximum value:
f (x) = sin(x)2 /x
Firstly, we must model the organisms. The evolution optimises organisms, while we want to optimise our solutions for
a problem. Thus, the organisms shall be represented by our
solutions. If we think about our function f (x) that we are
trying to optimise, an organism would be a specific value
for the input argument x. A population of organisms would
be a set of various values for x. For our example, we can
make an initial population consisting of 10 random sample
values ranging from -10 to 10. We will label the population
as set P:
P = {3.27, 1.76, 7.2, 4.6, 8.82, 8.67, 0.66, 0.36, 3.12, 8.2}
Note that we have chosen population size to be 10 for the
sake of simplicity. In reality, population size is usually bigger, and represents an algorithm parameter. Moreover, the
selection of the initial population has an impact on algorithm
required to produce children is not the same for all living organisms. However, implementation of the genetic algorithm
usually assumes that two parents reproduce, although variations with a different number of parents exist.
We should now find a way to implement such behaviour in
our algorithm. However, this is not a trivial task, and many
different crossover methods exist. The simplest one is to
observe solutions as strings of certain values. The string
solution as a whole would then mimic the DNA, while the
smaller elements that build up the string would mimic nucleotides. If our solutions are integers for example, we can
take their binary representation as a sequence of ones and
zeroes. The process of performing a crossover consists of
constructing a new string (child string) by using a combination of parents strings. Again, there are a few ways to
do this. The simplest one is called single point crossover.
We randomly choose a point inside the child string. Up to
that point, we construct the child string by copying one parents string, and after that point we copy the other parents
string. Two other common crossover techniques are shortly
described:
Selection
Crossover
Mutation
We are now going to describe them in the following sections
4.1, 4.2 and 4.3.
4.1 Selection
As we have mentioned, the environment performs the selection of the organisms by eliminating the least fit ones
and keeping the fittest. In the algorithm, the function that
we are to optimise determines which solutions are fit, and
which are not. Such function is often called the evaluation
function. For our example, the actual function f (x) is the
evaluation function. The results of our evaluation function
on the population set P are displayed in table 1.
Since we are interested in finding the maximum of f (x),
we shall treat higher results as good, and lower results as
bad. The actual selection is done by selecting a number of
best solutions. The number of best solutions to be selected
is an algorithm parameter. In our simple example we could
take the best 4 results, which are values 1.76, 0.36, 4.60 and
7.20 for the input argument x.
4.2 Crossover
Before implementing crossover into the algorithm, we shall
give a brief description of crossover in living organisms. Most
organisms store biological information in a molecule called
DNA (Deoxyribonucleic acid ). The DNA molecule itself is
a sequence of simpler molecules called nucleotides, which
come in four types. Since the chemistry involved is not of
importance for this explanation, we shall simply abbreviate
those four types with A, G, C and T. Briefly speaking, the
sequence in which those four nucleotides build up the DNA
determines the structure of the living organism. When two
parents reproduce, their child has a combination of their
DNA molecules. Note that in nature, the number of parents
Two point crossover: Instead of using one point for intersecting the parents, we use two points, thus resulting in three segments. The first and third segment
are copied from one parent, and the second segment is
copied from the other one.
Uniform crossover: For each element in the childs string,
we randomly pick a parent from which to copy it.
If our solutions cannot be presented as integers, we must
find another way to represent the DNA sequence and perform the crossover. The solution representation method is
called encoding. Sometimes our solutions are not integers,
but real values. Such solutions can be encoded by using their
binary format, whether it is a fixed point representation, or
a floating point one. We can also interpret real numbers
as strings of digits, as we usually do in everyday life. [12]
describes and evaluates floating point representation techniques in more detail. Often, the solutions are not integer
or real values, but some more complex data sets. A good
example would be a timetable schedule, a problem that is
commonly solved by a genetic algorithm, as demonstrated in
[5], [15] and [26]. Unfortunately, since such data sets vastly
differ amongst themselves, there is no general encoding or
crossover technique that covers all of them. However, there
is a great number of scholarly articles describing genetic algorithm approaches to various specific practical problems,
which could be a good starting point for researchers and
developers.
A combination of multiple crossover techniques is sometimes
used. However, it has been shown that certain techniques
work well combined with each other, while others do not,
depending on the characteristics of the particular problem
being solved. Article [6] proposes a method with combining
different crossovers, with the addition of evaluating their
effectiveness and adjusting the crossover techniques as the
algorithm moves on to new iterations. This allows the algorithm to find the best crossover techniques for the particular
problem being solved.
4.3 Mutation
Similarly to crossover, mutation techniques also depend on
the encoding techniques used. If the solutions are encoded
as strings of ones and zeroes, mutations can be performed
by changing some zeroes to ones, and vice-versa. If the
solutions are encoded as complex data sets, changing some
elements in those data sets could act as a mutation.
Initialize population;
Evaluate solutions;
while solutions do not meet specified criteria do
Perform selection;
Perform crossover;
Perform mutation;
Evaluate solutions;
if max number of iterations reached then
terminate;
end
end
Algorithm 1: The genetic algorithm
5. ALGORITHM BEHAVIOUR
In this chapter, we will discuss the effectiveness of the genetic algorithm depending on the input parameters and the
characteristics of the problem being solved, as well as the
algorithms computational performance.
Mutation rate determines how much should mutations affect the new generations chromosomes. If the mutation
rate is 0 - children will have no mutations, if the mutation rate is 0.5 - half of the childrens chromosomes will
mutate, and if the mutation rate is 1.0 - the whole chromosome will mutate. Generally speaking, decreasing mutation
rate increases the speed at which the algorithm converges
towards an optimum. However, it also increases the chance
that the algorithm will become stuck in local optimums, as
the mutation is the process that adds the randomness factor
required to avoid such a problem. Increasing the mutation
rate makes the algorithm more robust at avoiding local optimums. However it decreases the convergence rate of the
algorithm, as high mutation rates will result in almost random searches, and will result in good solutions being lost
due to mutations.
The crossover rate determines how often children are created by performing crossover on a number of parents. The
remaining children are pure copies of their parents. In that
case, the best solutions are usually the ones that are copied
to the new generation. Decreasing the crossover rate results in good solutions persisting through algorithm iterations, but also decreases the chance that even better solutions will be made by crossovering good parents. Increasing
the crossover rate increases the chance that new, better solutions will emerge, but could also result in losing very good
solutions once they are found.
Population size determines the number of solutions present
in each iteration of the algorithm. As the number of solutions grows, the chances of getting better results rise. Unlike
with the previous two parameters, increasing the population
size does not add any qualitative drawbacks to the algorithm
behaviour. The only drawback is the technical one, as having more solutions increases the computation time required
to perform each iteration. Since the genetic algorithm is
usually applied to solving complex problems, evaluating solutions is often computationally demanding. The population
5.2 Performance
As we have mentioned, the bottle-neck of the genetic algorithm is the calculation of the evaluation functions. As
computers become more powerful, problems that are being
optimized grow more complex.
One common solution for the problem is parallelization. Evaluating a population is a highly parallelizable problem, as the
evaluation of any single individual solution is independent
from evaluations of other solutions. For example, if our population size is 32, we could perform the evaluation of each
solution on a separate processing core, provided that we have
32 cores available. [18] describes and evaluates a hybrid technique, where the algorithm is programmed using messagepassing interface to run on a large distributed system, while
a shared memory model called OpenMP is used for parallelization inside each individual node. The algorithm could
also be parallelised on hardware level. [13] proposes a field
programmable gate array implementation and compares the
results to the software-based parallelization.
Another solution is to avoid calculating complex evaluation
functions, using function approximations instead. For example, [17] proposes an approximation method based on neural
networks. [14] combines traditional approximation methods
with fuzzy clustering techniques in order to get improved
performance compared to using traditional approximation
methods alone.
6.
APPLICATIONS
Here are listed some of the common applications of the genetic algorithm: artificial creativity, bioinformatics, cyphering, code-breaking, computer architecture design, control
engineering, water resource systems design, economics, mechanical engineering, neural networks, timetabling, traveling salesman problem and vehicle routing. These are some
of the fields mentioned in [25], which should be consulted
for a more comprehensive list.
We shall present two examples of the genetic algorithms
applications. Both of them are brief explanations of published research, with the credit belonging to their respective
authors. It is worth noting that some new concepts will be
introduced in these examples, showing how small algorithm
alterations can improve its performance.
to create a fitness function which penalizes certain solutions characteristics. For example, the algorithm parameters could be configured to adjust the importance of the
schedule being balanced over the week, the importance of
the number of teaching hours per day for any single teacher
or the importance of not having holes in the teachers and
students schedules. As a result, the user can configure the
algorithm to create schedules favoring certain characteristics. It is worth noting that the fitness function was not
calculated directly. An objective function is evaluated first
for all the solutions, and the fitness function is evaluated
afterwards using the objective function results for all the
solutions inside the population.
Four different mutations have been implemented, with mutation rates 0.01, 0.01, 0.30 and 0.80. As we can observe,
the mutation rates differ vastly, showing that such parameters are heavily dependant on the type of the mutations
used. For example, some mutations can be very destructive
to solutions. Having a high mutation rate for such mutations would result in a much more randomized search. The
crossover rate has been set to 1.0, preserving no parents between different generations.
One last thing worth mentioning is the use of local searches
to aid the genetic algorithm. Since the genetic algorithm is
good at avoiding local optimums and not so good at converging to the single best one, a local search is often performed
after the genetic algorithm has yielded its best result. This
allows the local search to converge to the single best solution
with the genetic algorithm already skipping over all the local
optimums. However, this research concludes that instead of
performing the local search after the genetic algorithm has
finished, a local search is performed in each iteration inside
the genetic algorithm. While this might seem computationally intensive, it allows the genetic algorithm to converge
faster, and it provides more feasible solutions at each iteration, reducing the number of calls to the filter algorithm.
Results of the paper indicate that the genetic algorithm is
superior for solving this type of problems in comparison to
hand made timetables and to several other heuristic optimisation methods.
7.
SUMMARY
8. REFERENCES
[1] H. Aytug and G. J. Koehler. Stopping criteria for
finite length genetic algorithms. INFORMS Journal on
Computing, 8(2):183191, 1996.
[2] S. Baluja and R. Caruana. Removing the genetics
from the standard genetic algorithm. pages 3846.
Morgan Kaufmann Publishers, 1995.
[3] N. Barricelli. Numerical testing of evolution theories.
Acta Biotheoretica, 16(1-2):6998, 1962.
[4] E. Cant
u-Paz. On random numbers and the
performance of genetic algorithms. In Proceedings of
the Genetic and Evolutionary Computation
Conference, GECCO 02, pages 311318, San
Francisco, CA, USA, 2002. Morgan Kaufmann
Publishers Inc.
[5] A. Colorni, M. Dorigo, and V. Maniezzo. A genetic
algorithm to solve the timetable problem, 1993.
[6] C. Contreras-Bolton and V. Parada. Automatic
combination of operators in a genetic algorithm to
solve the traveling salesman problem. PLoS ONE,
10(9):1 25, 2015.
[7] K. A. De Jong. An Analysis of the Behavior of a Class
of Genetic Adaptive Systems. PhD thesis, Ann Arbor,
MI, USA, 1975. AAI7609381.
[8] I. D. Falco, A. D. Cioppa, and E. Tarantino.
Mutation-based genetic algorithm: performance
evaluation. Applied Soft Computing, 1(4):285 299,
2002.
[9] D. Goldberg. Genetic Algorithms in Search,
Optimization, and Machine Learning. Artificial
Intelligence. Addison-Wesley Publishing Company,
1989.